# Classification Model
Here we pull in reflectance data measured through Arduino and run a multi-class and binary classification using Logistic Regression to see if we can differentiate the simulated jaundice condition.

In [22]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.linear_model import LogisticRegression

In [23]:
import os
from google.colab import drive

drive.mount('/content/MyDrive', force_remount=True)
os.chdir('/content/MyDrive/MyDrive/Courses/SignalProcessingLab/')

Mounted at /content/MyDrive


In [4]:
# Read reflectance data in csv format

fp = 'lab5  - reflectance.csv'
df = pd.read_csv(fp)

In [24]:
df

Unnamed: 0,class,Ailsa,Leela,Maha,Megan,Victoria,1,2,3,4,5,6
0,baseline,0.143659,0.181369,0.144332,0.080135,0.079461,0.079012,0.068462,0.040404,0.154658,0.137374,0.121661
1,nail_down,0.592368,0.544108,0.633446,0.18743,0.310213,0.245118,0.167228,0.100561,0.488215,0.432548,0.425589
2,nail_up,0.44624,0.205836,0.135129,0.118743,0.15174,0.063749,0.095847,0.104377,0.235241,0.185859,0.144781


In [6]:
# Transpose so each sample = one subject
df_t = df.set_index("class").T.reset_index()
df_t = df_t.rename(columns={"index":"Subject"})

# Melt the df so that each class is in a different row
df_melt = df_t.melt(
    id_vars=['Subject'],
    value_vars=['baseline', 'nail_down', 'nail_up'],
    var_name='Condition',
    value_name='Measurement'
)

In [7]:
df_melt

Unnamed: 0,Subject,Condition,Measurement
0,Ailsa,baseline,0.143659
1,Leela,baseline,0.181369
2,Maha,baseline,0.144332
3,Megan,baseline,0.080135
4,Victoria,baseline,0.079461
5,1,baseline,0.079012
6,2,baseline,0.068462
7,3,baseline,0.040404
8,4,baseline,0.154658
9,5,baseline,0.137374


In [21]:
# Separate X values
X = df_melt[['Measurement']]

# Encode y values
y = df_melt['Condition']
le = LabelEncoder()
y_encoded = le.fit_transform(y)

In [25]:
# Run a simple Logistic Regression with a Stratified K Fold
clf = LogisticRegression()
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(clf, X, y_encoded, cv=cv)

# Look at the accuracy for each fold, and the overall accuracy
print("Cross Validation accuracy scores:", scores)
print("Mean accuracy:", scores.mean())

Cross Validation accuracy scores: [0.28571429 0.71428571 0.57142857 0.66666667 0.5       ]
Mean accuracy: 0.5476190476190477


Since we are more interested in determining when yellow light is present than whether the fingernail is facing up or facing down, I concatenated the fingernail up and down cases to simplify the problem in to a binary classification.

In [26]:
y_binary = (y_encoded > 0).astype(int)
scores_binary = cross_val_score(clf, X, y_binary, cv=cv)
print("Cross Validation accuracy scores:", scores_binary)
print("Mean accuracy:", scores_binary.mean())


Cross Validation accuracy scores: [0.57142857 0.71428571 0.71428571 0.66666667 0.66666667]
Mean accuracy: 0.6666666666666666


We can see that the model accuracy went up when we changed the problem format to binary classification.