![Logistic Regression](https://i.ibb.co/Ptg3Czv/kaggle-ml-part1.png)

- **ML Part 1 - Logistic Regression**
- **ML Part 2** - K-Nearest Neighbors (KNN)
- **ML Part 3** - Support Vector Machine (SVM)
- **ML Part 4** - Artificial Neural Network (NN) 
- **ML Part 5** - Classification and Regression Tree (CART)
- **ML Part 6** - Random Forests
- **ML Part 7** - Gradient Boosting Machines (GBM)
- **ML Part 8** - XGBoost
- **ML Part 9** - LightGBM
- **ML Part 10** - CatBoost


Like linear regression, Logistic regression is the right algorithm to start with classification algorithms. Although it has the name 'regression', this is a classification model, not a regression model. It uses a logistical function to frame the binary output model. The output of the logistic regression will be a probability (0≤x≤1) and can be used to predict binary 0 or 1 as output (x <0.5, output = 0, otherwise output = 1).

## Basic Theory
Logistic Regression behaves quite similar to linear regression. It also calculates the linear output, then follows a storage function through the regression output. The sigmoid function is the logistic function that is used frequently. Below you can clearly see that the z value is the same as the linear regression output in Equation (1).

![](https://i.ibb.co/X7NGG5W/Ek-A-klama-2020-08-27-113356.jpg)

The value of h (θ) here corresponds to P (y = 1 | x), that is, the probability that the output is binary 1 when input x is given. P (y = 0 | x) will be equal to 1-h ().

When the value of z is 0, g (z) will be 0.5. When Z is positive, h () will be greater than 0.5 and the output will be binary 1. Similarly, when z is negative, the value of y will be 0. When we use a linear equation to find the classifier, the output model will also be a linear dimension, i.e. divide the input size into two spaces so that all points in a field correspond to the same label.

The figure below shows the distribution of a sigmoid function.

![](https://i.ibb.co/mtbyBzZ/Ek-A-klama-2020-08-27-113557.jpg)

## Loss Function

We cannot use Mean Squared Error as a loss function (like linear regression) because we are using a nonlinear sigmoid function at the end. The MSE function can fetch local minimums and will affect the Gradient Descent algorithm.

So here we are using Cross Entropy as the missing function. y = 1 and two equations corresponding to y = 0 will be used. The basic logic here is that when my guess is too wrong (for example: y '= 1 & y = 0), the cost will be -log (0), which is infinite.

![](https://i.ibb.co/MDRJZKw/Ek-A-klama-2020-08-27-113827.jpg)

In the given equation, m represents the training data size, y 'represents the estimated output and y represents the actual output.


## Advantages
- Easy, fast and simple method of classification.
- θ parameters describe the direction and density of the importance of independent variables on the dependent variable.
- It can also be used for multi-class classifications.
- Its lost function is always convex.


## Disadvantages
- It cannot be applied to nonlinear classification problems.
- Appropriate feature selection is required.
- Good signal-to-noise ratio is expected.
- Collinearity and outliers deteriorate the accuracy of the LR model.


## Hyperparameters
Logistic regression hyperparameters are similar to those of linear regression. The learning speed (α) and the smoothing parameter (λ) must be set correctly to achieve high accuracy.


## Comparison with Other Models



![Logistic Regression vs SVM](https://i.ibb.co/MDTY8GG/lg-vs-svm.png)
- While SVM can handle nonlinear solutions, logistic regression can only process linear solutions.
- Linear SVM manages outliers better as it achieves maximum margin solution.
- The hinge loss in SVM outperforms its daily loss in LR.

![Logistic Regression vs Decision Trees](https://i.ibb.co/3czHSLR/lg-vs-Decision-Trees.png)
- Decision tree handles collinearity better than LR.
- Decision trees cannot deduce the importance of features, but LR can.
- Decision trees are better for categorical values than LR.

![Logistic Regression vs Neural Network(NN)](https://i.ibb.co/PTCJCx8/lr-vs-Neural-Network.png)
- NN can support nonlinear solutions that LR cannot.
- LR has a convex loss function so it won't hang at a local minimum whereas NN can hang.
- While LR performs better than NN when training data is less and features are large, NN needs large training data.

![Logistic Regression vs Decision Trees](https://i.ibb.co/Z8cgqwd/lg-vs-naive-bayes.png)
- Naive Bayes is a productive model, while LR is a distinctive model.
- Naive Bayes works well with small data sets, whereas LR regulation can provide similar performance.
- Since Naive Bayes expects all features to be independent, LR outperforms Naive Bayes on linearity.

![Logistic Regression vs KNN](https://i.ibb.co/tXjJMmM/lr-vs-knn.png)
- KNN is a nonparametric model in which LR is a parametric model.
- KNN is relatively slower than Logistic Regression.
- KNN supports nonlinear solutions where LR only supports linear solutions.
- LR can derive (about the estimate) confidence level while KNN can only omit tags.

## Coding Time
![](https://ac-cdn.azureedge.net/infusionnewssiteimages/agingcare/21e637ea-aa74-4ae2-b278-181d2cded7a3.jpg)

### Import the necessary packages

In [None]:
import numpy as np
import pandas as pd

import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
%matplotlib inline


import sklearn
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.metrics import f1_score, recall_score, precision_score, confusion_matrix
from sklearn.metrics import r2_score, roc_auc_score, roc_curve, classification_report

### Import and read dataset

In [None]:
input_ = "../input/heart-failure-clinical-data/heart_failure_clinical_records_dataset.csv"
data = pd.read_csv(input_)
df = data.copy()

data.head(10)

In [None]:
df.describe()

### Data Visualization

In [None]:
fig = px.histogram(data, "age", title="Age Distribution", width=750)
fig.show()

In [None]:
fig = px.histogram(data, "time", title="Time Distribution", width=750)
fig.show()

In [None]:
fig = px.histogram(data, "creatinine_phosphokinase", title="Creatinine Phosphokinase Distribution", width=750)
fig.show()

In [None]:
fig = px.histogram(data, "ejection_fraction", title="Ejection Fraction Distribution", width=750)
fig.show()

In [None]:
fig = px.histogram(data, "platelets", title="Platelets Distribution", width=750)
fig.show()

In [None]:
fig = px.histogram(data, "serum_creatinine", title="Serum Creatinine Distribution", width=750)
fig.show()

In [None]:
fig = px.histogram(data, "serum_sodium", title="Serum Sodium Distribution", width=750)
fig.show()

In [None]:
anaemia_dis = data["anaemia"].value_counts().reset_index()
fig = px.bar(anaemia_dis, x="index", y="anaemia", title="Anaemia Distribution",
             width=750, labels={"index": "Anaemia", "anaemia": "Count"})
fig.show()

In [None]:
diabetes_dis = data["diabetes"].value_counts().reset_index()
fig = px.bar(diabetes_dis, x="index", y="diabetes", title="Diabetes Distribution", 
             width=750, labels={"index": "Diabetes", "diabetes": "Count"})
fig.show()

In [None]:
hbp_dis = data["high_blood_pressure"].value_counts().reset_index()
fig = px.bar(hbp_dis, x="index", y="high_blood_pressure", title="High Blood Pressure Distribution",
             width=750, labels={"index": "High Blood Pressure", "high_blood_pressure": "Count"})
fig.show()

In [None]:
sex_dis = data["sex"].value_counts().reset_index()
fig = px.bar(sex_dis, x="index", y="sex", title="Sex Distribution",
             width=750, labels={"index": "Sec", "sex": "Count"})
fig.show()

In [None]:
smooking_dis = data["smoking"].value_counts().reset_index()
fig = px.bar(smooking_dis, x="index", y="smoking", title="Sex Distribution",
             width=750, labels={"index": "Smooking", "smoking": "Count"})
fig.show()

In [None]:
death_dis = data["DEATH_EVENT"].value_counts().reset_index()
fig = px.bar(death_dis, x="index", y="DEATH_EVENT", title="DEATH EVENT Distribution",
             width=750, labels={"index": "DEATH_EVENT", "DEATH_EVENT": "Count"})
fig.show()

In [None]:
fig = px.pie(data, values='DEATH_EVENT',names='sex', title='GENDER',
      width=680, height=480)
fig.show()

In [None]:
f, ax = plt.subplots(figsize=(14,14))
sns.heatmap(data.corr(), annot=True, linewidths=.5, fmt=".1f", ax=ax)
plt.show()

In [None]:
sns.pairplot(data[['age', 'creatinine_phosphokinase',
       'ejection_fraction', 'platelets',
       'serum_creatinine', 'serum_sodium','time',
       'DEATH_EVENT']], hue="DEATH_EVENT")

## Model Construction (with our own talent)

In [None]:
inp_data = data.drop(data[['DEATH_EVENT']], axis=1)
out_data = data[['DEATH_EVENT']]

scaler = StandardScaler()
inp_data = scaler.fit_transform(inp_data)

X_train, X_test, y_train, y_test = train_test_split(inp_data, out_data, test_size=0.2, random_state=42)

In [None]:
print("X_train Shape : ", X_train.shape)
print("X_test Shape  : ", X_test.shape)
print("y_train Shape : ", y_train.shape)
print("y_test Shape  : ", y_test.shape)

In [None]:
def weightInitialization(n_features):
    w = np.zeros((1, n_features))
    b = 0
    return w,b

In [None]:
def sigmoid_activation(result):
    final_result = 1/(1 + np.exp(-result))
    return final_result

Cost Formula
![Cost Formula](https://miro.medium.com/max/2908/1*dEZxrHeNGlhfNt-JyRLpig.png)

In [None]:
def model_optimize(w, b, X, Y):
    m = X.shape[0]
    
    # Prediction
    final_result = sigmoid_activation(np.dot(w,X.T) + b)
    cost = (-1/m)*(np.sum(Y.T * np.log(final_result)) + ((1-Y.T) * (np.log(1-final_result))))
    
    # Gradient Calculation
    dw = (1/m)*(np.dot(X.T, (final_result-Y.T).T)) # look down (photo)
    db = (1/m)*(np.sum(final_result-Y.T))
    
    grads = {
        "dw": dw,
        "db": db
    }
    
    return grads, cost

![](https://i.ibb.co/ZV334Mn/20.png)

In [None]:
def model_predict(w, b, X, Y, learning_rate, no_iterations):
    costs = []
    for i in range(no_iterations):
        grads, cost = model_optimize(w, b, X, Y)
        dw = grads['dw']
        db = grads['db']
        
        w = w - (learning_rate * dw.T) # look up (photo)
        b = b - (learning_rate * db)
        
        if (i % 100 == 0):
            costs.append(cost)
            
    # final parameters
    coeff = {"w":w, "b":b}
    gradient = {"dw":dw, "db":db}
    
    return coeff, gradient, costs

In [None]:
def predict(final_pred, m):
    y_pred = np.zeros((1,m))
    for i in range(final_pred.shape[1]):
        if final_pred[0][i] > 0.5:
            y_pred[0][i] = 1
    return y_pred

In [None]:
# Get number of features
n_features = X_train.shape[1]
print('Number of Features: {}'.format(n_features))

w, b = weightInitialization(n_features)
# Gradient Descent
coeff, gradient, costs = model_predict(w, b, X_train, y_train.values.reshape(-1,1), learning_rate=0.0001,no_iterations=4500)
# Final Prediction
w = coeff['w']
b = coeff['b']
print('Optimized weights: {}'.format(w))
print('Optimized intercept: {}'.format(b))

final_train_pred = sigmoid_activation(np.dot(w,X_train.T)+b)
final_test_pred = sigmoid_activation(np.dot(w,X_test.T)+b)

print("="*60)

y_train_pred = predict(final_train_pred, X_train.shape[0])
print('Training Accuracy             : {:.4f}'.format(accuracy_score(y_train_pred.T, y_train)))

y_test_pred = predict(final_test_pred, X_test.shape[0])
print('Test Accuracy                 : {:.4f}'.format(accuracy_score(y_test_pred.T, y_test)))

print('Logistic Regression f1-score  : {:.4f}'.format(f1_score(y_test_pred.T, y_test)))
print('Logistic Regression precision : {:.4f}'.format(precision_score(y_test_pred.T, y_test)))
print('Logistic Regression recall    : {:.4f}'.format(recall_score(y_test_pred.T, y_test)))
print("\n",classification_report(y_test_pred.T, y_test))

In [None]:
cf_matrix = confusion_matrix(y_test_pred.T, y_test)
sns.heatmap((cf_matrix / np.sum(cf_matrix)*100), annot = True, fmt=".2f", cmap="Blues")

## Reporting

I evaluated the results I found with Confusion Matrix, the results are as follows:

**Correctly predicted -> %81.67 (244 of 299 predict are correct)**
- True Negative -> %56.67 -> Those who were predicted not to die and who did not die
- True Positive -> %25.00 -> Those who were predicted to die and who did die

**Wrong predicted-> %18.33 (50 of 299 predict are wrong)**
- False Positive -> %16.67 -> Those who were predicted to die but who did not die
- False Negative -> %01.67 -> Those who were predicted to not die but who did die

**Not dead**
- 203 -> Those who haven't died in the real data set
- 219 -> Predicted for test data set

**The dead**
- 96 -> Those who have died in the real data set
- 80 -> Predicted for test data set

## After the SMOTE process (Shortcut for logistic regression)

In [None]:
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression

sms = SMOTE(random_state=12345)
X_res, y_res = sms.fit_sample(inp_data, out_data)

print("X_train Shape : ", X_train.shape)
print("X_test Shape  : ", X_test.shape)
print("y_train Shape : ", y_train.shape)
print("y_test Shape  : ", y_test.shape)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_res, y_res, test_size=0.2, random_state=42)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

y_pred = logreg.predict(X_test)

print('Accuracy of logistic regression classifier on test set: {}'.format(logreg.score(X_test, y_test)))
print('Logistic Regression f1-score  : {:.4f}'.format(f1_score(y_pred, y_test)))
print('Logistic Regression precision : {:.4f}'.format(precision_score(y_pred, y_test)))
print('Logistic Regression recall    : {:.4f}'.format(recall_score(y_pred, y_test)))
print("\n",classification_report(y_pred, y_test))

In [None]:
cf_matrix = confusion_matrix(y_pred, y_test)
sns.heatmap((cf_matrix / np.sum(cf_matrix)*100), annot = True, fmt=".2f", cmap="Blues")

In [None]:
logit_roc_auc = roc_auc_score(y_test, logreg.predict(X_test))
fpr, tpr, thresholds = roc_curve(y_test, logreg.predict_proba(X_test)[:,1])
plt.figure(figsize=(10,6))
plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc)
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Reporting')
plt.legend(loc="lower right")
plt.savefig('Log_ROC')
plt.show()