## Accuracy
$$ Accuracy = n_{correct} / n_{total} = (tp+tn)/(tp+tn+fp+fn) $$

* The limitation of using accuracy: (when the data is highly unbalanced)

* e.g. if I have 99% data in class 1, and I classify all data points to class 1, I still get high accuracy

In [1]:
# assume prediction, label are both 1-d numpy array containing 0,1 (True/False) of same size
def accuracy(prediction, label):
    return sum(prediction==label)/len(label)

## Precision-Recall Curve

* When we talk about Percision/Recall, it is specific to one class; For example, suppose we have 2 classes P,N

* The Percision for P class is "你预测为正的样本中有多少预测对了":
$$ Percision = tp/(tp+fp) $$

* The Recall for P class is "真实标签为正的样本有多少被你预测对了":
$$ Recall = tp/(tp+fn)$$

* Precision-Recall Curve is best for naturally highly skewed data (e.g. credit fraud data)


In [3]:
# assume prediction, label are both 1-d numpy array containing 0,1 (True/False) of same size
def precision(prediction, label):
    return sum(prediction[label==1])/sum(prediction)

def recall(prediction, label):
    return sum(prediction[label==1])/sum(label)

## F1 Score
* Just like precision/recall, F1 score is also specific to one class

$$ F1 score = 2 * recall * precision /(recall+precision) $$

In [4]:
def f1_score(prediction, label):
    return 2*precision(prediction, label)*recall(prediction, label)/(precision(prediction, label)+recall(prediction, label))

## ROC Curve
* when the distribution of positive/negative data changes, ROC curve remains stable. This is useful because in real life a lot of times the distribution of pos/neg data always changes...
* Thus ROC Curve is best for ~equally distributed two-class data; 
* ROC Curve is a more stable way to measure the models when the distribution of pos/neg data varies between datasets
* AND Precision-Recall Curve is best for observing naturally highly skewed data (e.g. credit fraud data)

## AUC
* usually AUC is between 0.5 - 1.
* The larger the AUC, the higher possibility that the true positive data would be   the better the classification model.