Description
Describe the bug
sklearn.metrics.classification may report flipped values for precision and recall?
Steps/Code to Reproduce
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
def calc_precision_recall(conf_matrix, class_labels):
# for each class
for i in range(len(class_labels)):
# calculate true positives
true_positives =(conf_matrix[i, i])
# false positives
false_positives = (conf_matrix[i, :].sum() - true_positives)
# false negatives
false_negatives = 0
for j in range(len(class_labels)):
false_negatives += conf_matrix[j, i]
false_negatives -= true_positives
# and finally true negatives
true_negatives= (conf_matrix.sum() - false_positives - false_negatives - true_positives)
# print calculated values
print(
"Class label", class_labels[i],
"T_positive", true_positives,
"F_positive", false_positives,
"T_negative", true_negatives,
"F_negative", false_negatives,
"\nSensitivity/recall", true_positives / (true_positives + false_negatives),
"Specificity", true_negatives / (true_negatives + false_positives),
"Precision", true_positives/(true_positives+false_positives), "\n"
)
return
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, 0:3] # we only take the first two features.
y = iris.target
# Random_state parameter is just a random seed that can be used to reproduce these specific results.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=27)
# Instantiate a K-Nearest Neighbors Classifier:
KNN_model = KNeighborsClassifier(n_neighbors=2)
# Fit the classifiers:
KNN_model.fit(X_train, y_train)
# Predict and store the prediction:
KNN_prediction = KNN_model.predict(X_test)
# Generate the confusion matrix
conf_matrix = confusion_matrix(KNN_prediction, y_test)
# Print the classification report
print(classification_report(KNN_prediction, y_test))
# Dummy class labels for the three iris classes
class_labels = [0,1,2]
# Own function to calculate precision and recall from the confusion matrix
calc_precision_recall(conf_matrix, class_labels)
Expected Results
My function returns the following for each class:
Class label 0 T_positive 7 F_positive 0 T_negative 23 F_negative 0
Sensitivity/recall 1.0 Specificity 1.0 Precision 1.0
Class label 1 T_positive 11 F_positive 1 T_negative 18 F_negative 0
Sensitivity/recall 1.0 Specificity 0.9473684210526315 Precision 0.9166666666666666
Class label 2 T_positive 11 F_positive 0 T_negative 18 F_negative 1
Sensitivity/recall 0.9166666666666666 Specificity 1.0 Precision 1.0
precision recall
0 1.00 1.00
1 0.92 1.00
2 1.00 0.92
My function assumes the confusion matrix is structured with actual values on the top x-axis and predicted values down the left y-axis. This is the same structure as the one used in Wikipedia and the one referenced in the documentation for the confusion matrix function.
Actual Results
In contrast these are the results reported by sklearn.metrics import classification_report
precision recall f1-score support
0 1.00 1.00 1.00 7
1 1.00 0.92 0.96 12
2 0.92 1.00 0.96 11
Versions
System:
python: 3.8.1 (default, Jan 8 2020, 22:29:32) [GCC 7.3.0]
executable: /home/will/anaconda3/envs/ElStatLearn/bin/python
machine: Linux-4.15.0-91-generic-x86_64-with-glibc2.10
Python dependencies:
pip: 20.0.2
setuptools: 38.2.5
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: None
pandas: 1.0.1
matplotlib: 3.1.3
joblib: 0.14.1
Built with OpenMP: True