# Worked Examples and Exercises from Chapter 6 of *Essential Math for Data Science*

## Logistic Regression

People say it's AI when you're pitching to clients, ML when you're hiring, and logistic regression when you're actually doing it.

1. Perform a logistic regression on the data from [https://bit.ly/3imidqa](https://bit.ly/3imidqa) using three-fold cross validation and accuracy as your metric.


In [1]:
# load libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold, cross_val_score

# import our data
df = pd.read_csv("https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/classification/light_dark_font_training_set.csv")

# let's take a look

df.head()

Unnamed: 0,RED,GREEN,BLUE,LIGHT_OR_DARK_FONT_IND
0,0,0,0,0
1,0,0,128,0
2,0,0,139,0
3,0,0,205,0
4,0,0,238,0


In [9]:

# create some model feature objects

X = df.values[:, :-1]
Y = df.values[:, -1]

# Cross-validation!
kfold = KFold(n_splits = 3, random_state = 5112023, shuffle = True)

# fit the model
model = LogisticRegression(penalty = None)

results = cross_val_score(model, X, Y, cv = kfold)

print("Accuracy Mean: %.3f (stdev=%.3f)" % (results.mean(), results.std()))

                 
                

Accuracy Mean: 1.000 (stdev=0.000)


2. Produce a confusion matrix comparing the predictions and the actual data

In [7]:
# split our prior data in train-test
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 5112023)

# fit the model
model.fit(X_train, Y_train)

# create a prediction object
prediction = model.predict(X_test)

# create a simple confusion matrix
matrix = confusion_matrix(y_true = Y_test, y_pred = prediction)
print(matrix)


[[104   0]
 [  0 165]]


3. pick a background color and see our model chooses light (1) or dark (0)


In [37]:
# create independent variable columns
inputs = df.iloc[:, :-1]

output = df.iloc[:, -1]

# build a model object
fit = LogisticRegression(penalty = None).fit(inputs, output)

0       0
1       0
2       0
3       0
4       0
       ..
1340    1
1341    0
1342    0
1343    1
1344    0
Name: LIGHT_OR_DARK_FONT_IND, Length: 1345, dtype: int64


In [41]:
# create a test of new RGB codes

def predict_light_dark(RED, BLUE, GREEN):
    prediction = fit.predict([[RED, BLUE, GREEN]])
    probabilities = fit.predict_proba([[RED, BLUE, GREEN]])
    if prediction == [[1]]:
        return "This color is light: {0}".format(probabilities)
    else:
        return "This color is dark: {0}".format(probabilities)
    
# test the function on black
print(predict_light_dark(int(3), int(3), int(3)))



This color is dark: [[1. 0.]]


In [43]:
# on yellow
print(predict_light_dark(int(223), int(221), int(3)))


This color is light: [[0. 1.]]


In [44]:
# logistic regression actually seems to be really good at predicting light and dark!