# LOGISTIC REGRESSION

Logistic regression lies under the domain of Supervised Machine Learning. It is a Classification algorithm.

![logistic.jpg](attachment:logistic.jpg)

Logistic Regression is used when the dependent variable(target) is categorical.

For example,

To predict whether an email is spam (1) or ham (0)</br>
Whether the tumor is malignant (1) or not (0)

Logistic regression predicts “probability value” through a linear combination of the given features plugged inside a logistic function (aka inverse-logit) given as:![1_Z-XP2vn7ZmUFv70TVMzYhA.webp](attachment:1_Z-XP2vn7ZmUFv70TVMzYhA.webp)



![1_xcPQJSUEaeItrNwDUPqEaQ.webp](attachment:1_xcPQJSUEaeItrNwDUPqEaQ.webp)

## Maths behind Logistic Regression

We could start by assuming p(x) be the linear function. However, the problem is that p is the probability that should vary from 0 to 1 whereas p(x) is an unbounded linear equation. To address this problem, let us assume, log p(x) be a linear function of x and further, to bound it between a range of (0,1), we will use logit transformation. Therefore, we will consider log p(x)/(1-p(x)). Next, we will make this function to be linear:
![1_wximNUnd8_VnQ8crCeoqLQ.webp](attachment:1_wximNUnd8_VnQ8crCeoqLQ.webp)

After solving for p(x):
![1_QtzqCwQF5y5by4lqFDmAzQ.webp](attachment:1_QtzqCwQF5y5by4lqFDmAzQ.webp)


To make the logistic regression a linear classifier, we could choose a certain threshold, e.g. 0.5. Now, the misclassification rate can be minimized if we predict y=1 when p ≥ 0.5 and y=0 when p<0.5. Here, 1 and 0 are the classes.

Since Logistic regression predicts probabilities, we can fit it using likelihood. Therefore, for each training data point x, the predicted class is y. Probability of y is either p if y=1 or 1-p if y=0. Now, the likelihood can be written as:

![1_T4r2zQuToM_S2PP3RiQHaA.webp](attachment:1_T4r2zQuToM_S2PP3RiQHaA.webp)

The multiplication can be transformed into a sum by taking the log:


![1_Vyi1H-a15bL2tDqC_rhYKA.webp](attachment:1_Vyi1H-a15bL2tDqC_rhYKA.webp)

Further, after putting the value of p(x):
![1_c8SqPFvEtN0tmgovdIPWAA.webp](attachment:1_c8SqPFvEtN0tmgovdIPWAA.webp)

The next step is to take a maximum of the above likelihood function

## Heart Disease Prediction Using Logistic Regression

In [1]:
#imports
import pandas as pd
import matplotlib as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [2]:
#reading dataset
df = pd.read_csv("dataset/heart.csv")
df.tail()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
1020,59,1,1,140,221,0,1,164,1,0.0,2,0,2,1
1021,60,1,0,125,258,0,0,141,1,2.8,1,1,3,0
1022,47,1,0,110,275,0,0,118,1,1.0,1,1,2,0
1023,50,0,0,110,254,0,0,159,0,0.0,2,0,2,1
1024,54,1,0,120,188,0,1,113,0,1.4,1,1,3,0


In [3]:
#separating x features and y output
x = df[['age','sex','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','thal']].values
y = df[['target']].values

In [4]:
#preparing training and testing data sets
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.3, random_state = 0)

In [5]:
#Applying logistic regression
model = LogisticRegression(C = 0.1, penalty = 'l1', solver = 'liblinear')
model.fit(x_train,y_train)

  return f(*args, **kwargs)


LogisticRegression(C=0.1, penalty='l1', solver='liblinear')

In [6]:
model.score(x_train,y_train) 

0.8423988842398884

In [7]:
#Reporting accuracy
print("Training Accuracy: %.2f" %model.score(x_train,y_train))
print("Testing Accuracy: %.2f" %model.score(x_test,y_test))

Training Accuracy: 0.84
Testing Accuracy: 0.86
