Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X.

# Logistic Regression Assumptions
* Binary logistic regression requires the dependent variable to be binary.
* For a binary regression, the factor level 1 of the dependent variable should represent the desired outcome.
* Only the meaningful variables should be included.
* The independent variables should be independent of each other. That is, the model should have little or no multicollinearity.
* The independent variables are linearly related to the log odds.
* Logistic regression requires quite large sample sizes. <br>

Keeping the above assumptions in mind, let’s look at our dataset.

In [None]:
import pandas as pd
import numpy as np
from sklearn import preprocessing
import matplotlib.pyplot as plt 
plt.rc("font", size=14)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, mean_squared_error, confusion_matrix, classification_report
import seaborn as sns
sns.set(style="white")
sns.set(style="whitegrid", color_codes=True)

In [None]:
data = pd.read_csv("/kaggle/input/biomechanical-features-of-orthopedic-patients/column_2C_weka.csv")
data

The data set has 310 rows and 7 different columns. These columns are:
* pelvic incidence
* pelvic tilt
* lumbar lordosis angle
* sacral slope
* pelvic radius
* grade of spondylolisthesis
* class

Let's get information about the values ​​in the dataset.

In [None]:
data.info()

In [None]:
data["class"].unique()

In [None]:
data.isnull().any()

In [None]:
data.describe().T

We examined its statistical values. But such is not just data analysis with numbers. Now let's try to better understand the dataset with some visuals.

# Data Visualization

In [None]:
data["class"].value_counts()

In [None]:
sns.countplot(y="class", data=data, palette="hls")
plt.show()

In [None]:
def localSubplot(data,feature):
    fig, ax = plt.subplots(nrows=1,ncols=2,figsize=(12,6))
    
    data[feature].value_counts().plot(kind="bar",ax=ax[0])
    data[feature].value_counts().plot.pie(autopct="%1.1f%%",ax=ax[1])
    
    plt.tight_layout()
    plt.show()

In [None]:
localSubplot(data=data,feature="class")

In [None]:
data.hist(bins=10, density=True, figsize= (12,8))
plt.show()

In [None]:
fig,ax = plt.subplots(nrows=2, ncols=3, figsize=(12,8))
ax = ax.flatten()
col_names = data.drop('class',axis=1).columns.values

for i,col_name in enumerate(col_names):
    sns.distplot(a=data[col_name], ax=ax[i])

In [None]:
corr= data.corr()
fig, ax=plt.subplots(1,1,figsize=(12,8))
sns.heatmap(corr, annot=True, linewidth=5, ax=ax);

# Logistic Regression - Model Tuning

## Method 1

In [None]:
data["class"] = [ 1 if each == "Abnormal" else 0 for each in data["class"]]

In [None]:
X = data.loc[:, data.columns != 'class']
y = data.loc[:, data.columns == 'class']

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=42)
lg = LogisticRegression().fit(X_train,y_train)

In [None]:
print("Accuracy: ",lg.score( X_test,y_test)*100)

## Method 2

In [None]:
y_pred = lg.predict(X)
print(classification_report(y,y_pred))

In [None]:
y_pred = lg.predict(X_test)
print("Accuracy: ",accuracy_score(y_test,y_pred)*100)