# Logestic Regression

Logistic regression is a popular classification algorithm in machine learning used for binary classification problems, where the goal is to predict a binary outcome based on a set of input variables using a logistic function.

### import Libraries

In [1]:
import pandas as pd

### DATA

In [2]:
df = pd.read_csv("preprocessed_dataset.csv")

In [3]:
df

Unnamed: 0.1,Unnamed: 0,PayloadMass,Flights,GridFins,Reused,Legs,Block,ReusedCount,Class,Orbit_ES-L1,...,Serial_B1048,Serial_B1049,Serial_B1050,Serial_B1051,Serial_B1054,Serial_B1056,Serial_B1058,Serial_B1059,Serial_B1060,Serial_B1062
0,0,6104.959412,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,525.000000,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,677.000000,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,500.000000,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,3170.000000,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85,85,15400.000000,2,1,1,1,5.0,2,1,0,...,0,0,0,0,0,0,0,0,1,0
86,86,15400.000000,3,1,1,1,5.0,2,1,0,...,0,0,0,0,0,0,1,0,0,0
87,87,15400.000000,6,1,1,1,5.0,5,1,0,...,0,0,0,1,0,0,0,0,0,0
88,88,15400.000000,3,1,1,1,5.0,2,1,0,...,0,0,0,0,0,0,0,0,1,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90 entries, 0 to 89
Data columns (total 89 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   Unnamed: 0                           90 non-null     int64  
 1   PayloadMass                          90 non-null     float64
 2   Flights                              90 non-null     int64  
 3   GridFins                             90 non-null     int64  
 4   Reused                               90 non-null     int64  
 5   Legs                                 90 non-null     int64  
 6   Block                                90 non-null     float64
 7   ReusedCount                          90 non-null     int64  
 8   Class                                90 non-null     int64  
 9   Orbit_ES-L1                          90 non-null     int64  
 10  Orbit_GEO                            90 non-null     int64  
 11  Orbit_GTO                         

In [5]:
df.head(5)

Unnamed: 0.1,Unnamed: 0,PayloadMass,Flights,GridFins,Reused,Legs,Block,ReusedCount,Class,Orbit_ES-L1,...,Serial_B1048,Serial_B1049,Serial_B1050,Serial_B1051,Serial_B1054,Serial_B1056,Serial_B1058,Serial_B1059,Serial_B1060,Serial_B1062
0,0,6104.959412,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,525.0,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,677.0,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,500.0,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,3170.0,1,0,0,0,1.0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Bulding Logestic Regression Model

### Define x and y

In [6]:
x=df.drop("Class",axis=1)
y=df["Class"]

### Train Test Split

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=101)

### Training

In [9]:
from sklearn.linear_model import LogisticRegression

In [10]:
logmodel = LogisticRegression()

In [11]:
logmodel.fit(x_train,y_train)   

LogisticRegression()

### predicting

In [12]:
predictions = logmodel.predict(x_test)

In [13]:
predictions

array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1], dtype=int64)

In [14]:
y_test

50    0
6     1
51    0
54    1
53    1
69    1
32    1
31    1
21    1
88    1
43    1
47    0
3     0
1     0
74    0
16    1
45    0
25    1
Name: Class, dtype: int64

## Evaluation

### confusion matrix

In [15]:
from sklearn.metrics import confusion_matrix

In [16]:
confusion_matrix(y_test , predictions)

array([[ 7,  0],
       [ 1, 10]], dtype=int64)

### accuracy score

In [17]:
from sklearn.metrics import accuracy_score

In [18]:
accuracy_score(y_test , predictions , normalize = False)

17

In [19]:
accuracy_score(y_test , predictions , normalize =True)

0.9444444444444444

### classification report

In [20]:
from sklearn.metrics import classification_report

In [21]:
print(classification_report(y_test , predictions))

              precision    recall  f1-score   support

           0       0.88      1.00      0.93         7
           1       1.00      0.91      0.95        11

    accuracy                           0.94        18
   macro avg       0.94      0.95      0.94        18
weighted avg       0.95      0.94      0.94        18



## Hyper parameters

In [22]:
from sklearn.model_selection import GridSearchCV

In [23]:
logmodel_1 = LogisticRegression()

In [24]:
parameters = {'penalty':['l1','l2'] , "C":[0.01,0.1,1,10,100] , 'solver':['lbfgs','liblinear','sag','saga']}

In [25]:
logmodel_cv = GridSearchCV(logmodel_1,parameters)

In [26]:
logmodel_cv .fit(x_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

GridSearchCV(estimator=LogisticRegression(),
             param_grid={'C': [0.01, 0.1, 1, 10, 100], 'penalty': ['l1', 'l2'],
                         'solver': ['lbfgs', 'liblinear', 'sag', 'saga']})

In [27]:
print("tuned hyperparameters: (best parametesr)" , logmodel_cv.best_params_)

tuned hyperparameters: (best parametesr) {'C': 100, 'penalty': 'l1', 'solver': 'liblinear'}


In [28]:
logmodel_1 = LogisticRegression(C=100, penalty= 'l1', solver= 'liblinear')

In [29]:
logmodel_1.fit(x_train, y_train)

LogisticRegression(C=100, penalty='l1', solver='liblinear')

In [30]:
predictions_1 = logmodel_1.predict(x_test)

In [31]:
confusion_matrix(y_test,predictions_1)

array([[ 7,  0],
       [ 0, 11]], dtype=int64)

In [32]:
accuracy_score(y_test,predictions_1, normalize=False )

18

In [33]:
accuracy_score(y_test,predictions_1, normalize=True )

1.0

In [34]:
print(classification_report(y_test,predictions_1))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       1.00      1.00      1.00        11

    accuracy                           1.00        18
   macro avg       1.00      1.00      1.00        18
weighted avg       1.00      1.00      1.00        18



## Hyper parameters

In [35]:
logmodel_2 = LogisticRegression()

In [36]:
parameters_1 = {'penalty':['l2'] , "C":[0.01,0.1,1] , 'solver':['lbfgs']}

In [37]:
logmodel_2_cv = GridSearchCV(logmodel_2 , parameters_1)

In [38]:
logmodel_2_cv.fit(x_test, y_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

GridSearchCV(estimator=LogisticRegression(),
             param_grid={'C': [0.01, 0.1, 1], 'penalty': ['l2'],
                         'solver': ['lbfgs']})

In [39]:
print("tuned hpyerparameters :(best parameters) ", logmodel_2_cv.best_params_)

tuned hpyerparameters :(best parameters)  {'C': 1, 'penalty': 'l2', 'solver': 'lbfgs'}


In [40]:
logmodel_2 = LogisticRegression(C= 1, penalty= 'l2', solver= 'lbfgs')

In [41]:
logmodel_2.fit(x_train, y_train)

LogisticRegression(C=1)

In [42]:
predictions_2 = logmodel_2.predict(x_test)

In [43]:
confusion_matrix(y_test,predictions_2)

array([[ 7,  0],
       [ 1, 10]], dtype=int64)

In [44]:
accuracy_score(y_test,predictions_2, normalize = False)

17

In [45]:
accuracy_score(y_test,predictions_2, normalize = True)

0.9444444444444444

In [46]:
print(classification_report(y_test,predictions_2))

              precision    recall  f1-score   support

           0       0.88      1.00      0.93         7
           1       1.00      0.91      0.95        11

    accuracy                           0.94        18
   macro avg       0.94      0.95      0.94        18
weighted avg       0.95      0.94      0.94        18

