In [57]:
import pandas as pd
import sklearn

|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |


Given the following confusion matrix, evaluate (by hand) the model's performance.
Accuracy:
(46+34)/(46+7+13+34) = .8

Precision:
TP(46)/(TP(46)+FP(13) = .78

Recall:
TP(46)/TP(46)+FN(7) = .87

In the context of this problem, what is a false positive?  
You get to choose what constitues positive and negative. If I use dog as the positive, then a false positive is if I predict dog but it's actually a cat.

In the context of this problem, what is a false negative?   
A false negative is if I predict cat but it's actually a dog.  

How would you describe this model?   
It's all relative - depends on what you're trying to accomplish  

In [83]:
df = pd.read_csv('c3.csv')
df['baseline']='No Defect'
df.head()

Unnamed: 0,actual,model1,model2,model3,baseline
0,No Defect,No Defect,Defect,No Defect,No Defect
1,No Defect,No Defect,Defect,Defect,No Defect
2,No Defect,No Defect,Defect,No Defect,No Defect
3,No Defect,Defect,Defect,Defect,No Defect
4,No Defect,No Defect,Defect,No Defect,No Defect


In [84]:
subset = df[df.actual == 'No Defect']
mod1 = (subset.model1 == subset.actual).mean()


mod2 = (subset.model2 == subset.actual).mean()


mod3 = (subset.model3 == subset.actual).mean()


base = (subset.baseline == subset.actual).mean()

round(mod1,2),round(mod2,2),round(mod3,2),round(base,2)

(0.99, 0.56, 0.53, 1.0)

An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

-- We would want to use precision which is the number of positive prediction that are correct. Model 3 scores best with a rate of 81%.



In [85]:
subset = df[df.actual == 'Defect']
m1 = (subset.actual == subset.model1).mean()
m2 = (subset.actual == subset.model2).mean()
m3 = (subset.actual == subset.model3).mean()
m1,m2,m3

(0.5, 0.5625, 0.8125)

Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

We want to predict if the duck has a defect, so that that's our positive. A false positive (we say it has a defect but it doesn't) is fine. What we don't want is a false negative (we say no defect, but it has a defect). Therefore, we want to use Recall - we really care about identifying positive cases. Our best model is Model 3 at 81%.

In [44]:
subset = df[df['actual']=='Defect']
r1 = (subset.actual==subset.model1).mean()
r2 = (subset.actual==subset.model2).mean()
r3 = (subset.actual==subset.model3).mean()
b = (subset.actual==subset.baseline).mean()
round(r1,2),round(r2,2),round(r3,2),round(b,2)

(0.5, 0.56, 0.81, 0.0)

In [87]:
df = pd.read_csv('gives_you_paws.csv')
df.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In [88]:
df.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

In [89]:
df['baseline'] = 'dog'

In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

In [90]:
m1 = (df.model1 == df.actual).mean()
m2 = (df.model2 == df.actual).mean()
m3 = (df.model3 == df.actual).mean()
m4 = (df.model4 == df.actual).mean()
b = (df.baseline == df.actual).mean()
round(m1,2),round(m2,2),round(m3,2),round(m4,2),round(b,2)

(0.81, 0.63, 0.51, 0.74, 0.65)

Model1 - 81% and Model4 - 74% both beat baseline accuracy at 65%

Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recommend?


In [91]:
subset = df[df['actual']=='dog']
d1 = (subset.actual==subset.model1).mean()
d2 = (subset.actual==subset.model2).mean()
d3 = (subset.actual==subset.model3).mean()
d4 = (subset.actual==subset.model4).mean()
d1,d2,d3,d4



(0.803318992009834,
 0.49078057775046097,
 0.5086047940995697,
 0.9557467732022127)

Model4 predicts 'dog' the best

Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recommend?

In [92]:
subset = df[df['actual']=='cat']
c1 = (subset.actual==subset.model1).mean()
c2 = (subset.actual==subset.model2).mean()
c3 = (subset.actual==subset.model3).mean()
c4 = (subset.actual==subset.model4).mean()
c1,c2,c3,c4

(0.8150057273768614,
 0.8906071019473081,
 0.5114547537227949,
 0.34536082474226804)

Model 2 predicts cats the best

In [93]:
print(sklearn.metrics.accuracy_score(df.actual,df.model1))
print(sklearn.metrics.accuracy_score(df.actual,df.model2))
print(sklearn.metrics.accuracy_score(df.actual,df.model3))
print(sklearn.metrics.accuracy_score(df.actual,df.model4))

0.8074
0.6304
0.5096
0.7426


In [96]:
print(sklearn.metrics.classification_report(df.actual,df.model1))

              precision    recall  f1-score   support

         cat       0.69      0.82      0.75      1746
         dog       0.89      0.80      0.84      3254

    accuracy                           0.81      5000
   macro avg       0.79      0.81      0.80      5000
weighted avg       0.82      0.81      0.81      5000



In [97]:
print(sklearn.metrics.classification_report(df.actual,df.model2))

              precision    recall  f1-score   support

         cat       0.48      0.89      0.63      1746
         dog       0.89      0.49      0.63      3254

    accuracy                           0.63      5000
   macro avg       0.69      0.69      0.63      5000
weighted avg       0.75      0.63      0.63      5000



In [98]:
print(sklearn.metrics.classification_report(df.actual,df.model3))

              precision    recall  f1-score   support

         cat       0.36      0.51      0.42      1746
         dog       0.66      0.51      0.57      3254

    accuracy                           0.51      5000
   macro avg       0.51      0.51      0.50      5000
weighted avg       0.55      0.51      0.52      5000



In [99]:
print(sklearn.metrics.classification_report(df.actual,df.model4))

              precision    recall  f1-score   support

         cat       0.81      0.35      0.48      1746
         dog       0.73      0.96      0.83      3254

    accuracy                           0.74      5000
   macro avg       0.77      0.65      0.66      5000
weighted avg       0.76      0.74      0.71      5000

