https://archive.ics.uci.edu/ml/datasets/sms+spam+collection
NLP for Text Classification with NLTK & Scikit-learn for classifying sms as spam or not spam.
0 represents the ham class and 1 represents the spam class.
Precision - Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
Precision = TP/TP+FP
Precision for ham = 0.96
Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to the all observations in actual class.
Recall = TP/TP+FN
F1 score - F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best if false positives and false negatives have similar cost. If the cost of false positives and false negatives are very different, it’s better to look at both Precision and Recall.
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
TP = 1201 (sms which were actually ham and were marked ham)
FP = 53 (some sms which were marked ham but were actually spam)
FN = 9 (very less sms which are actually not spam were marked as spam)
TN = 130 (sms which were marked spam and were spam)
Accuracy=(TP+TN)/(TP+FP+FN+TN)= 0.9597701149425
Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted observation to the total observations. One may think that, if we have high accuracy then our model is best. Yes, accuracy is a great measure but only when you have symmetric datasets where values of false positive and false negatives are almost same.
Thus we need F1 Score to evaluate the classification model.