## Logistic Regression

Logistic Regression is usually used when data are categorical.

In [None]:
# import everything we need first
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt

We are going to import some dummy value from a sample csv `LogReg-dummy.csv`
For better understanding, we can see the independent variable (x) as the age of children, and the dependent variable (y) as whether the child is infected by some disease.

0 means not infected
1 means infected
As the dependent variable only has two possible values, 0 or 1, we can use a binary logistic model to predict.

In [None]:
# read in data from the file
df = pd.read_csv('Logistic_Reg_Infected.csv').dropna()
df.head() # show the first five values

In [None]:
df.tail() # show the last five values

In [None]:
df.describe()

# Visualize the data

In [None]:
df_0 = df[df['infected'] == 0]
df_1 = df[df['infected'] == 1]

In [None]:
# in this plot, data points above the line are categorized as 1, those below the line are categorized as 0
# this line is our regression model's decision boundary
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(df_0['age'], df_0['sleep']) # class = 0 circle points 
ax.scatter(df_1['age'], df_1['sleep'], marker = 's') # class = 1 square points
x_min, x_max = ax.get_xlim()

## Model training

We are going to use the same technic to build a Logistic Regression model.


In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

logReg = LogisticRegression(solver = 'lbfgs')

x = df[['age', 'sleep']]
y = df['infected']    # Classification : infected = 0 or 1



In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state = 0)
x_train.head()

In [None]:
logReg.fit(x_train, y_train)

# Evaluate Model with Accuracy Score

We can use our model to predict values.

In [None]:
from sklearn.metrics import accuracy_score

y_pred = logReg.predict(x_test)
print(y_test)
print(y_pred)

In [None]:
accuracy_score(y_test, y_pred)

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix (y_test, y_pred, labels = [1,0])

In [None]:
logReg.predict(x_test)  # prediction value of x_test

# Predict 
Predict with some values

In [None]:
# predict y when x1 = 7.5 and x2 = 0.65
logReg.predict([[7.5,0.65]])  

In [None]:
# predict y when x1 = 10 and x2 = 0.74
print(logReg.predict([[10, 0.74]]))

# Probability

In logistic regression, we could retrieve the probability of the classification outcome.

In [None]:
# probability of getting 0 versus 1
logReg.predict_proba(x_test)

In [None]:
# probability of getting 0 versus 1
logReg.predict_proba([[10, 0.74]])