# **Stars Classification**

## Overview

This notebook focuses on classifying stars based on their physical properties using machine learning techniques.

## Dataset

The dataset consisting of several features of stars.

Some of them are:

- Absolute Temperature (in K)
- Relative Luminosity (L/Lo)
- Relative Radius (R/Ro)
- Absolute Magnitude (Mv)
- Star Color (White, Red, Blue, Yellow, Yellow-Orange etc)
- Spectral Class (O,B,A,F,G,K,M)
- Star Type **(Red Dwarf, Brown Dwarf, White Dwarf, Main Sequence, SuperGiants, HyperGiants)**
- Lo = 3.828 x 10^26 Watts (Avg Luminosity of Sun)
- Ro = 6.9551 x 10^8 m (Avg Radius of Sun)

## Data Preprocessing / EDA

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.preprocessing import StandardScaler

In [None]:
# Import data
star = pd.read_csv('Stars.csv')

In [None]:
# Displays the first few rows of the DataFrame.
star.head()

In [None]:
# Overview of the DataFrame, including the data types and non-null counts.
star.info()

In [None]:
# Provides descriptive statistics of the DataFrame
star.describe()

## Feature Selection

Feature selection is a crucial step in the machine learning pipeline where you identify the relevant features (independent variables) that will be used to train the model.

In [None]:
# Number of categories in the 'Spectral Class' column
star[['Spectral Class']].value_counts()

In [None]:
# Encoding 'Spectral Class' by replacing the class names with integers
# Groups multiple spectral classes into a smaller number of categories
pd.set_option('future.no_silent_downcasting', True)
star['Spectral Class'] = star['Spectral Class'].replace(
    {'M': 0, 'A': 1, 'B': 1, 'F': 1, 'O': 1, 'K': 1, 'G': 1}
).astype(int)

In [None]:
# Count the number of categories in the 'Star Type' column
star[['Star type']].value_counts()

In [None]:
# Replaces values in the 'Star color' column for encoding categorical variables
star.replace({'Star color': {
    'Red': 0, 
    'Yellow': 1, 
    'White': 2, 
    'White ': 2, 
    'Blue ': 3, 
    'Blue': 3}}, inplace=True)

In [None]:
# Count the occurrences of each category in the 'Star color' column
star[['Star color']].value_counts()

In [None]:
# One-hot encoding categorical variables
# This step is necessary because machine learning models like Logistic Regression require numeric input
star_encoded = pd.get_dummies(star, columns=['Star category', 'Star color'])

In [None]:
# Separate features and target variable (Spectral Class)
features = star_encoded.drop(columns=['Spectral Class'])
target = star['Spectral Class']

In [None]:
# Visualizing the distribution of Spectral Class to differentiate it from other features
plt.figure(figsize=(8, 6))
sns.countplot(x=target, hue=target, palette="Set2", legend=False)
plt.title('Distribution of Spectral Classes')
plt.xlabel('Spectral Class')
plt.ylabel('Count')
plt.show()

## Train the Model

In [None]:
# Split the data into training and testing sets
# 30% of the data is used for testing, while the remaining 70% is used for training
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3, random_state=42)

In [None]:
# Scales the features so that they all have a mean of 0 and a standard deviation of 1.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Train model with LogisticRegression 
model = LogisticRegression(max_iter=3000)

# Trains the Logistic Regression model on the scaled training data 
model.fit(X_train_scaled, y_train)

In [None]:
# Make predictions
y_pred = model.predict(X_test_scaled)

In [None]:
y_pred

## Evaluate the Model

In [None]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")

In [None]:
print(classification_report(y_test,y_pred))

In [None]:
# Visualize the confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10, 7))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title("Confusion Matrix Heatmap")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.show()
