<h1>Logistic Regression</h1>
<h3>Goal:</h3>
<p>Perform classification over Iris dataset using sklearn's logistic regression</p>

In [1]:
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression

In [2]:
# Load the iris data
iris = datasets.load_iris()

# Separate features and labels
features = iris['data']
labels = iris['target']

<br>
<h3>Quick Note:</h3>
<p>We need to balance our data set because it comes ordered with respect to the labels; this implies we would train over some classes but not over others.</p>
<p>Pandas has a nice built in feature for shuffling rows.</p>


In [3]:
feature_names = []
for i in range(4):
    feature_names.append(f'feature_{i}')
df = pd.DataFrame(data=features, columns=feature_names)
df['labels'] = labels
df = df.sample(frac=1)

In [4]:
# Create the split for train, test data
data_shape = labels.shape[0]
split = int(data_shape*0.7)

# Split train and test
x_train = df[feature_names].iloc[:split]
y_train = df['labels'].iloc[:split]
x_test = df[feature_names].iloc[split:]
y_test = df['labels'].iloc[split:]

# We print the labels to see the dispersion of classes
print(y_train[y_train==0].count(),y_train[y_train==1].count(),y_train[y_train==2].count())
print(y_test[y_test==0].count(),y_test[y_test==1].count(),y_test[y_test==2].count())

model = LogisticRegression(random_state=0, fit_intercept=False).fit(x_train, y_train)

36 33 36
14 17 14


<br>
<h3>We confirm the accuracy of the model via a confusion matrix</h3>
<p>Logistic regression is more than enough to handle this data and so we should see good results</p>

In [5]:
# Make predictions
prediction = model.predict(x_test)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, prediction)

array([[14,  0,  0],
       [ 0, 15,  2],
       [ 0,  3, 11]], dtype=int64)