# Logistic Regression

In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one.

Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression). Mathematically, a binary logistic model has a dependent variable with two possible values, such as pass/fail which is represented by an indicator variable, where the two values are labeled "0" and "1". In the logistic model, the log-odds (the logarithm of the odds) for the value labeled "1" is a linear combination of one or more independent variables ("predictors"); the independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value). The corresponding probability of the value labeled "1" can vary between 0 (certainly the value "0") and 1 (certainly the value "1"), hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. Analogous models with a different sigmoid function instead of the logistic function can also be used, such as the probit model; the defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds of the given outcome at a constant rate, with each independent variable having its own parameter; for a binary dependent variable this generalizes the odds ratio.

## Libraries and Utilities

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
data = pd.read_csv("/kaggle/input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")
data.head()

In [None]:
data.describe().T

In [None]:
data.isnull().sum()

In [None]:
f,ax=plt.subplots(figsize = (6,6))
sns.heatmap(data.corr(),annot= True,linewidths=0.5,fmt = ".2f",ax=ax)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title('Correlation Map')
plt.show()

In [None]:
data["class"] = ["a" if each == "Abnormal" else "n" for each in data["class"]]
data.head()

In [None]:
x = data.iloc[:,0:6].values
y = data.iloc[:,-1].values

## Train Test Split

In [None]:
x_train, x_test,y_train, y_test = train_test_split(x,y, test_size = 0.13, random_state= 0)

## Standard Scaler

In [None]:
sc = StandardScaler()
X_train = sc.fit_transform(x_train)
X_test = sc.transform(x_test)

## Accuracy

In [None]:
lr = LogisticRegression()
lr.fit(X_train,y_train)
lr_score = lr.score(X_test,y_test)
print("Logistic Regression Accuracy: ""%.2f" % lr_score);

## Prediction

In [None]:
y_pred = lr.predict(X_test)

In [None]:
print(y_pred)

In [None]:
print(y_test)

In [None]:
pr = pd.DataFrame(columns=["y_pred","y_test"], index= None)
pr["y_pred"] = y_pred
pr["y_test"] = y_test
predictions = []
for i in range(0,pr.shape[0]):
    if pr.iloc[i,0] == pr.iloc[i,1]:
        predictions.append("True")
    else:
        predictions.append("False")
pr["prediction"] = predictions
pr.head(15)

## Confusion Matrix

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it makes it easy to see whether the system is confusing two classes (i.e. commonly mislabeling one as another).

In [None]:
cm = confusion_matrix(y_test,y_pred)
df = pd.DataFrame(columns=["a","n"], index= ["a","n"], data= cm ) 
f,ax = plt.subplots(figsize=(3,3))
sns.heatmap(df, annot=True, linewidths=0.12,cmap="Blues",linecolor="gray", fmt= '.0f',ax=ax)
plt.xlabel("Predicted Label")
plt.xticks(size = 14)
plt.yticks(size = 14, rotation = 0)
plt.ylabel("True Label")
plt.title("Confusion Matrix", size = 16)
plt.show()
print ("True Positive:" , (cm[0,0]))
print ("True Negative:" , (cm[1,1]))
print ("False Positive:" , (cm[0,1]))
print ("False Negative:" , (cm[1,0]))