## Logistic Regression

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing

# Loading the Iris dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['Species'] = iris.target
data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),Species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [None]:
# Select feature columns (all except 'Species')
X = data.iloc[:,[0,1,2,3]].values

# Select the target column ('Species')
y = data.iloc[:,4].values

# Reshape X to have 4 features per row
X = X.reshape(-1, 4)
# Scale the feature values for easier model fitting
X = preprocessing.scale(X)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X,
                y, test_size=0.25, random_state=0)

In [None]:
# Fit the logistic regression model to the training data
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

# Make predictions on the test data and reshape the output
y_pred = log_reg.predict(X_test).reshape(-1,1)

### Measuring Model Performance


To evaluate a model's performance, it is essential to consider both the overall performance and the performance of each individual class. For the Iris dataset, which has a balanced number of instances for each class, accuracy serves as an appropriate metric for assessing the model's classification performance. The accuracy score, which indicates the proportion of correct predictions made by the model, can be easily obtained using the score() method from the scikit-learn library. This method provides a quick assessment of how well the model is performing across all classes combined.

In [None]:
# Use the score method to calculate the accuracy of the logistic regression model.
score = log_reg.score(X_test, y_test)

print('Accuracy: {}'.format(score))


Accuracy: 0.9736842105263158


You can also check how the model did for different classes. The confusion matrix shows us that the model found instances of the species Versicolor the most challenging to classify correctly.  

In [None]:
from sklearn.metrics import confusion_matrix

# Create a list of class names from the Iris dataset
classes = list(iris.target_names)

# Generate the confusion matrix using the true labels (y_test) and predicted labels (y_pred)
conf_mat = confusion_matrix(y_test, y_pred)

# Convert the confusion matrix into a DataFrame for better visual representation
cm_df = pd.DataFrame(conf_mat, columns=classes, index=classes)
cm_df

Unnamed: 0,setosa,versicolor,virginica
setosa,13,0,0
versicolor,0,15,1
virginica,0,0,9


In addition to the confusion matrix, we can also evaluate the model using metrics such as the F1 score, precision, and recall. The F1 score provides a single measure of a model's accuracy by considering both precision (the proportion of true positive predictions out of all positive predictions) and recall (the proportion of true positives out of all actual positive instances).

For instance, if a model predicts that there are 100 instances of a certain class, but only 70 of those are correct, the precision would be 0.70 (or 70%). If there are actually 120 instances of that class, the recall would be approximately 0.58 (or 58%). The F1 score would be calculated as the harmonic mean of these two values, resulting in a score of about 0.63. This imperfect F1 score indicates that the model is not classifying all instances perfectly.

Moreover, examining the F1 scores for each individual class helps us identify specific classes that the model finds challenging to classify accurately. For example, if the F1 score for the 'Iris-setosa' class is 0.90, but for 'Iris-virginica' it is only 0.60, this suggests that the model is much better at identifying 'Iris-setosa' compared to 'Iris-virginica'. By analysing these scores, we can gain insights into which classes require more attention and possibly improve the model’s performance in those areas.

In [None]:
from sklearn.metrics import f1_score, precision_score, recall_score

# Calculate the average F1 score
av_f1 = f1_score(y_test, y_pred, average='micro')
print(av_f1)

# Calculate the F1 score for each individual class
f = f1_score(y_test, y_pred, average=None)

# Identify the lowest F1 score among the calculated scores
lowest_score = min(f)

# Determine the class corresponding to the lowest F1 score
difficult_class = classes[list(f).index(lowest_score)]

# Print the most challenging class based on the lowest F1 score
print('Most challenging class:', difficult_class)

0.9736842105263158
Most challenging class: virginica


The average F1 score of approximately 0.97 indicates that the model performs well overall, effectively balancing precision and recall across all classes. However, the identification of Iris-virginica as the most challenging class, with the lowest F1 score, suggests that the model struggles to accurately classify this specific type of iris compared to the others.

In [None]:
# Precision and Recall for virginica
prec = precision_score(y_test == classes.index('virginica'), y_pred == classes.index('virginica'))
rec = recall_score(y_test == classes.index('virginica'), y_pred == classes.index('virginica'))

print('Precision:', prec)
print('Recall:', rec)


Precision: 0.9
Recall: 1.0


The output indicates that the model has a precision of 0.9 for the Iris-virginica class, meaning that when it predicts a sample as Iris-virginica, it is correct 90% of the time. This suggests that there is some misclassification, as 10% of the predictions are incorrect. Furthermore, the recall is 1.0, which means the model successfully identifies all actual Iris-virginica instances in the test set without missing any, indicating there are no false negatives. Overall, the model is effective at identifying Iris-virginica, but there is a slight risk of misclassifying other classes as well.