# Simple Logistic Regression on StarDataset

In this notebook, I will try to perform a Simple Logistic Regression on this Star Dataset.

I use this Notebook just to give you some idea on how to play with this dataset.

We will use some simple features to predict if a star is giant or dwarfs.

Previous Notebook:
[Preprocessing the StarDataset](https://www.kaggle.com/vinesmsuic/preprocessing-the-stardataset)

DataSet Link:
[Star Dataset: Stellar Classification [Beginner]](https://www.kaggle.com/vinesmsuic/star-categorization-giants-and-dwarfs)

In [None]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns # seaborne is a package built on top of matplotlib.
sns.set() # activate seaborn to override all the matplotlib graphics

import statsmodels.api as sm

In [None]:
raw_df = pd.read_csv("../input/star-categorization-giants-and-dwarfs/Star39552_balanced.csv")
raw_df

* Vmag - Visual Apparent Magnitude of the Star
* Plx - Distance Between the Star and the Earth
* e_Plx - Standard Error of Plx (Drop the Row if you find the e_Plx is too high!)
* B-V - B-V color index. (A hot star has a B-V color index close to 0 or negative, while a cool star has a B-V color index close to 2.0. Other stars are somewhere in between.)
* SpType - Stellar classification. (Roman Numerals >IV are giants. Otherwise are dwarfs)
* Amag - Absolute Magnitude of the Star.
* TargetClass - Whether the Star is Dwarf (0) or Giant (1)


# Select Features

In the Previous notebook I have already picked Amag and B-V as Selected Features, so let's drop the unwanted features.

In [None]:
df = raw_df[['B-V', 'Amag', 'TargetClass']]
df

In [None]:
df.describe(include='all')

In [None]:
# Select Target
y = df['TargetClass']

# Select Features
x = df[['B-V','Amag']]

In [None]:
# Splitting the data into train dataset and test dataset

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.25, random_state=0)

In [None]:
# Data normalization

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

In [None]:
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)

Use a Logistic Regression From Sklearn

In [None]:
from sklearn.linear_model import LogisticRegression

star_predictor = LogisticRegression(random_state=0)

In [None]:
# Start model training
star_predictor.fit(x_train, y_train)

In [None]:
print('the score on train dataset is') 
print(star_predictor.score(x_train, y_train))

In [None]:
y_pred = star_predictor.predict(x_test)

In [None]:
# Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix : \n", cm)



In [None]:
# Model evaluation

from sklearn.metrics import accuracy_score

print("Accuracy : ", accuracy_score(y_test, y_pred))

In [None]:
star_predictor.coef_

In [None]:
star_predictor.intercept_

In [None]:
#Make Prediction

feature = [[1.130,15.792525]]

star_predictor.predict(feature)  # Target Class is 0 -> Dwarf

In [None]:
#Make Prediction

feature = [[0.227,17.159748]]

star_predictor.predict(feature)  # Target Class is 1 -> Giant

That's it! You can try to play with the Dataset and try to improve the score.