# Logistic Regression

**Objective:** To build Logistic Regression model for classification using scikitlearn.

**Secondary Objectives:** 
* To study sigmoid function , maximum likelihood estimation.
* To study confusion matrix and ROC curve.


Logistic regression is a statistical method for predicting binary classes. The output variable has two possible outcomes 0 or 1. **For example:** Email Spam Filter,Transaction is fraudulent, Yes/No ,Tumor is Benign/Malignant.


Logistics regression majorly makes predictions to handle problems which require a probability estimate as output, in the form of 0/1. It is a special case of linear regression where the output variable is categorical in nature,like married /unmarried/divorced such scenarios are classified as multinomial logistic regression.

**Linear Regression Equation:**
![![image.png](attachment:image.png)](https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1534281880/image1_ga8gze.png)

Where, y is dependent variable and x1, x2 ... and Xn are explanatory variables.


**Sigmoid Function:**
The sigmoid function, also called logistic function gives an ‘S’ shaped curve that can take any real-valued number and map it into a value between 0 and 1. If the curve goes to positive infinity, y predicted will become 1, and if the curve goes to negative infinity, y predicted will become 0. If the output of the sigmoid function is more than 0.5, we can classify the outcome as 1 or YES, and if it is less than 0.5, we can classify it as 0 or NO.

![![image.png](attachment:image.png)](http://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1534281880/image2_kwxquj.png)

![![image.png](attachment:image.png)](http://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1534281070/sigmoid2_lv4c7i.png)

**Applying Sigmoid function on linear equation:**
![![image.png](attachment:image.png)](http://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1534281880/image3_qldafx.png)

**Maximum Likelihood Estimation:** Maximizing the likelihood function determines the parameters that are most likely to produce the observed data. From a statistical point of view, MLE sets the mean and variance as parameters in determining the specific parametric values for a given model. This set of parameters can be used for predicting the data needed in a normal distribution.


**Types of Logistic Regression:**

**Binary Logistic Regression:** The target variable has only two possible outcomes such as Spam or Not Spam, Cancer or No Cancer.

**Multinomial Logistic Regression:** The target variable has three or more nominal categories such as predicting the type of Wine.

**Ordinal Logistic Regression:** The target variable has three or more ordinal categories such as restaurant or product rating from 1 to 5.


Here I am going to use Insurance dataset for Logistic Regression classification which tells  insurance bought by a customer on the basis of age.

In [None]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
df = pd.read_csv('../input/insurance/insurance_data.csv')
df.head()

In [None]:
df.plot.scatter(x='age',y='bought_insurance')

**Model Building using scikit learn:**

In [None]:
import statsmodels.api as sm
y = df['bought_insurance']
X = sm.add_constant(df[['age']])
mod = sm.Logit(y,X)
result= mod.fit()
print(result.summary())

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
logreg = LogisticRegression()
logreg.fit(X, y)

In [None]:
y_pred = logreg.predict(X)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(X, y)))

**The confusion matrix:** 
It shows the ways in which your classification model is confused when it makes predictions on observations, it helps us to measure the type of error our model is making while classifying the observation into different classes.

**Key Parts Of Confusion Matrix:**

**True Positive (TP):** This refers to the cases in which we predicted “YES” and our prediction was actually TRUE

**True Negative (TN):** This refers to the cases in which we predicted “NO” and our prediction was actually TRUE

**False Positive (FP):** This refers to the cases in which we predicted “YES”, but our prediction turned out FALSE

**False Negative (FN):** This refers to the cases in which we predicted “NO” but our prediction turned out FALSE


**Key Learning Metrics From Confusion Matrix:**

![![image.png](attachment:image.png)](https://www.researchgate.net/profile/Ibrahim_Gad3/post/What_is_the_best_metric_precision_recall_f1_and_accuracy_to_evaluate_the_machine_learning_model_for_imbalanced_data/attachment/5f0d92695e3fff000177fe28/AS%3A913156859777024%401594724969138/download/accuracy.png)

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y, y_pred)
print(confusion_matrix)

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.heatmap(pd.DataFrame(confusion_matrix), annot=True, cmap="YlGnBu" ,fmt='g')
plt.tight_layout()
plt.title('Confusion matrix', y=1.1)
plt.ylabel('Actual')
plt.xlabel('Predicted')

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y, y_pred))

One more useful metric to evaluate and compare predictive models is the

**ROC Curve:**

In statistics, a Receiver Operating Characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system. The curve is created by plotting the true positive rate (sensitivity) against the false positive rate (1 — specificity) at various threshold settings.The model which predicts at chance will have a ROC curve that looks like the diagonal green line. That is not a discriminating model. The further the curve is from the diagonal line, the better the model is at discriminating between positives and negatives in general.

Where,
**Specificity or True Negative Rate = TN/(TN+FP)
Sensitivity or True Positive Rate= TP/(TP+FN)
So, False Positive Rate = 1–Specificity**

In [None]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

y_pred_proba = logreg.predict_proba(X)[::,1]
fpr, tpr, _ = metrics.roc_curve(y,  y_pred_proba)
auc = metrics.roc_auc_score(y, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()

**Conclusion:** Thus, we have build a classification model with 89% accuracy. The area under the ROC curve is 0.89.