# Evaluation: Precision and Recall

by 

[__Michael Granitzer__ (michael.granitzer@uni-passau.de)]( http://www.mendeley.com/profiles/michael-granitzer/)


__License__

This work is licensded under a [Creative Commons Attribution 3.0 Unported License](http://creativecommons.org/licenses/by/3.0/)




## Introduction

In this tutorial we introduce precision and recall as evaluation measure.

## Single Class Contingency Table

Let  $f_c(x)\Rightarrow\{0,1\}$ be a classifier that that assigns an instance $x$ to a class $c$ if $f_c(x)=1$. Furthermore, let $g(x)=1$ ($g(x)=0$) if $x$ belongs (does not belong) to class $c$.

Then, the contingencey table $T_c$ is given as

|$c$      | $f_c(x)=1$    | $f_c(x)=0$    |
|:------------:|--------------|-----------|
|$g(x)=1$    | $TP_c$     | $FN_c$    | 
|$g(x)=0$    | $FP_c$     | $TN_c$    | 

We can calculate precision as

$$
\pi_c = \frac{TP_c}{TP_c+FP_c}
$$

and recall as

$$
\rho_c = \frac{TP_c}{TP_c+FN_c}
$$

Combined we calculate the $F_1$ score as harmonic mean of precision and recall:

$$
F_1=\frac{2}{\frac{1}{\pi_c}+\frac{1}{\rho_c}}
$$

In [20]:
import numpy as np
# we denote f as array having f[i]=1 if the i-th instance is assigned to a class and f[i]=0 otherwise
f=np.array([0,0,0,1,1,1])
# we use the same format for g
g=np.array([0,0,0,1,1,1])
# now lets compute precision and recall
TP_c = np.logical_and(f==1,g==1).sum()
FP_c = np.logical_and(f==1,g==0).sum()
FN_c = np.logical_and(f==0,g==1).sum()
print("Precision ", TP_c/(TP_c+FP_c), ", Recall ", TP_c/(TP_c+FN_c),)

Precision  1.0 , Recall  1.0


In [59]:
# now lets put this into a function and show some results and cover edge cases
def binary_contingency_table(g, f):
    TP_c = np.logical_and(f==1,g==1).sum()
    FP_c = np.logical_and(f==1,g==0).sum()
    FN_c = np.logical_and(f==0,g==1).sum()
    return (TP_c, FP_c, FN_c)

def binary_precision_recall(g, f):
    TP_c, FP_c, FN_c = binary_contingency_table(g, f)
    if TP_c == 0: 
      return (.0, .0)
    else:
      return (TP_c/(TP_c + FP_c),
              TP_c/(TP_c + FN_c))

In [60]:
# fully correct
binary_precision_recall (np.array([0,0,0,1,1,1]), np.array([0,0,0,1,1,1]))

(1.0, 1.0)

In [61]:
# trivial acceptor
binary_precision_recall (np.array([1,1,1,1,1,1]), np.array([0,0,0,1,1,1]))

(1.0, 0.5)

In [62]:
# trivial rejector
binary_precision_recall (np.array([0,0,0,0,0,0]), np.array([0,0,0,1,1,1]))

(0.0, 0.0)

In [47]:
# we can also use sklearn metrics package
from sklearn.metrics import precision_recall_fscore_support

In [53]:
precision_recall_fscore_support(np.array([0,1,1,1,0,1]), 
                                np.array([0,0,0,1,1,1]), 
                                average='binary')

(0.66666666666666663, 0.5, 0.57142857142857151, None)

Note that for `sklearn` we had to select `average="binary"` to indicate that we only want to take the positive class, i.e. test on "1". If we do not use that option, sklearn will create two contingency tables, one for class 0 and one for class 1 and calculate precision recall regarding those two classes (see multi-class below for details)

In [52]:
precision_recall_fscore_support(np.array([0,1,1,1,0,1]), 
                                np.array([0,0,0,1,1,1]), 
                                average='macro')



(0.66666666666666663, 0.5, 0.57142857142857151, None)

## Multi-Class Classification

How do we calculate precision and recall when we have more classes?

Let $C$ be a set of $k$ classes, i.e. $C=\{0,1, \ldots, k\}$ and $C\subseteq\mathbf{N}$. Let  $f(x)\Rightarrow\{0,1, \ldots, k\}$ be a classifier that that assigns an instance $x$ to class $c \in C$ if $f(x)=c$. Furthermore, let $g(x)=c$ if $x$ belongs to class $c$.


In [67]:
import numpy as np
# we denote f as array having f[i]=1 if the i-th instance is assigned to a class and f[i]=0 otherwise
f=np.array([1,2,3,1,2,3])
# we use the same format for g
g=np.array([1,2,3,1,2,3])
# classes
c = np.unique(g)
# now lets compute precision and recall by averaging over classes
p_, r_ = .0, .0
for i in c:
    p,r = precision_recall(g==i,f==i)
    p_ += p
    r_ += r
    print ("class ",i, " precision ", p, " recall ", r)
print ("Macroaverage Precision ",p_/len(c), ", Macroaverage Recall ",r_/len(c))
# now lets compute precision and recall by averaging over instances
tp, fp, fn = 0, 0, 0
for i in c:
    TP_i, FP_i, FN_i = binary_contingency_table(g==i,f==i)
    print("class ",i, " tp=", TP_i, " fp=", FP_i, " fn=", FN_i)
    tp += TP_i
    fp += FP_i
    fn += FN_i
print ("Microaverage precision ", tp/(tp+fp), ", Microaverage recall ", tp/(tp+fn))

class  1  precision  1.0  recall  1.0
class  2  precision  1.0  recall  0.666666666667
class  3  precision  0.5  recall  1.0
Macroaverage Precision  0.833333333333 , Macroaverage Recall  0.888888888889
class  1  tp= 2  fp= 0  fn= 0
class  2  tp= 2  fp= 1  fn= 0
class  3  tp= 1  fp= 0  fn= 1
Microaverage precision  0.833333333333 , Microaverage recall  0.833333333333


In [72]:
g, f = np.array([1,2,3,1,2,3]), np.array([1,2,2,1,2,3])
print("Macro:",precision_recall_fscore_support(g,f, average='macro'))
print("Micro:",precision_recall_fscore_support(g,f, average='micro'))

Macro: (0.88888888888888884, 0.83333333333333337, 0.8222222222222223, None)
Micro: (0.83333333333333337, 0.83333333333333337, 0.83333333333333337, None)


## Multi-Class, Multi-Label


In [86]:
from sklearn.metrics import precision_recall_fscore_support
import scipy.sparse as sp
classes = np.array([1,2,3])
# ground truth and prediction in the format of list of instances with a tuple of assigned classes for every instance
g, f = np.array([(1,2),(1,2), (2, ), (3,)]), np.array([(1,),(2,), (2, ), (3,)])
sm_g = sp.dok_matrix((len(g),len(classes)))
for instance_ix, classes_ix in enumerate(g):
    for clasz in classes_ix:
        sm_g[instance_ix, clasz-1]=1
sm_f = sp.dok_matrix((len(f),len(classes)))       
for instance_ix, classes_ix in enumerate(f):
    for clasz in classes_ix:
        sm_f[instance_ix, clasz-1]=1
    
print("Macro:",precision_recall_fscore_support(sm_g,sm_f, average='macro'))
print("Micro:",precision_recall_fscore_support(sm_g,sm_f, average='micro'))

Macro: (1.0, 0.72222222222222221, 0.8222222222222223, None)
Micro: (1.0, 0.66666666666666663, 0.80000000000000004, None)


ValueError: This class is not intended to be instantiated directly.

TODOs:
- Show that accuracy is the same as microaveraged-precision and microaveraged-recall in the binary case
- Show that macroaverage and microaverage is the same given a equally class-distributed data set