### Given the following confusion matrix, evaluate (by hand) the model's performance.


|               | actual cat | actual dog 
|:------------  |-----------:|-----------:|
| predicted cat |         34 |          7 |
| predicted dog |         13 |         46 |


- In the context of this problem, what is a false positive?
- In the context of this problem, what is a false negative?
- How would you describe this model?

In [1]:
import pandas as pd

In [2]:
# In the context of this problem, what is a false positive?
# False alarm. Predicted cat and it was an actual dog. Value == 7

In [3]:
# In the context of this problem, what is a false negative?
# Miss. Predicted dog and was actually a cat. Value == 13

In [4]:
# How would you describe this model?
TP = 34
TN = 46
FP = 7
FN = 13
accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
print(f"The model is {accuracy}% accurate.")

The model is 80.0% accurate.


### You are working as a datascientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here.

Use the predictions dataset and pandas to help answer the following questions:

An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?
Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

In [5]:
# Which evaluation metric would be appropriate here?
# The team wants to investigate the defects. 

In [6]:
r_duck = pd.read_csv('untidy_data/c3.csv')
r_duck.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


In [7]:
# Value counts to see how many defects and non defects there are
r_duck.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

In [8]:
# Baseline shows there are more no defects than defects.
# Basline is to predict no defects, so we should use specificity to calculate TN out of
# all Actual Negatives since the baseline is predicting positives
actual_v_m1 = pd.crosstab(r_duck.model1, r_duck.actual)
actual_v_m1

actual,Defect,No Defect
model1,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,8,2
No Defect,8,182


In [15]:
# Model_1 accuracy and specificity
TP = 8
TN = 182
FP = 2
FN = 8
accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
accuracy_2 = (r_duck.actual == r_duck.model1).mean() * 100
print(f"The model is {accuracy}% accurate.")
print(f"The model is {round(accuracy_2,2)}% accurate with the DF calculation.")
# % of predicting TN out of all actual Negatives
specificity = TN/(TN + FP) * 100
print(f"The models specificity is {round(specificity,2)}%.")

The model is 95.0% accurate.
The model is 95.0% accurate with the DF calculation.
The models specificity is 98.91%.


In [16]:
actual_v_m2 = pd.crosstab(r_duck.model2, r_duck.actual)
actual_v_m2

actual,Defect,No Defect
model2,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,9,81
No Defect,7,103


In [17]:
# Model_2 accuracy and specificty
TP = 9
TN = 103
FP = 81
FN = 7
accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy_2 = (r_duck.actual == r_duck.model2).mean() * 100
print(f"The model is {round(accuracy,2)}% accurate.")
print(f"The model is {round(accuracy_2,2)}% accurate with the DF calculation.")
# % of predicting TN out of all actual Negatives
specificity = TN/(TN + FP) * 100
print(f"The models specificity is {round(specificity,2)}%.")

The model is 56.0% accurate.
The model is 56.0% accurate with the DF calculation.
The models specificity is 55.98%.


In [18]:
actual_v_m3 = pd.crosstab(r_duck.model3, r_duck.actual)
actual_v_m3

actual,Defect,No Defect
model3,Unnamed: 1_level_1,Unnamed: 2_level_1
Defect,13,86
No Defect,3,98


In [19]:
# Model_3 accuracy and specificity
TP = 13
TN = 98
FP = 86
FN = 3
accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
accuracy_2 = (r_duck.actual == r_duck.model3).mean() * 100
print(f"The model is {round(accuracy,2)}% accurate.")
print(f"The model is {round(accuracy_2,2)}% accurate with the DF calculation.")
# % of predicting TN out of all actual Negatives
specificity = TN/(TN + FP) * 100
print(f"The models specificity is {round(specificity,2)}%.")

The model is 55.5% accurate.
The model is 55.5% accurate with the DF calculation.
The models specificity is 53.26%.


### Model 1 shows the highest specificity that is predicting the TN out of all actual negatives at 98.91% since the baseline is predicting positives.

Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light. The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. They need you to predict which ducks will have defects, but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

In [28]:
# They want us to predict defects but don't want to accidently give out a vacation if 
# there is no defect. We would pick the precision metric or % positive predictions that 
# are correct 
# model_1
TP = 8
TN = 182
FP = 2
FN = 8

precision = TP/(TP + FP) * 100
model1_precision = (r_duck.model1 == r_duck.actual).mean()
model1_precision
print(f"The models precision is {round(precision,2)}%.")
print(f"The models precision is {round(precision,2)}% using the DF calculation.")

The models precision is 80.0%.
The models precision is 80.0% using the DF calculation.


In [29]:
# model_2
TP = 9
TN = 103
FP = 81
FN = 7
precision = TP/(TP + FP) * 100
model2_precision = (r_duck.model2 == r_duck.actual).mean()
model2_precision
print(f"The models precision is {round(precision,2)}%.")
print(f"The models precision is {round(precision,2)}% using the DF calculation.")

The models precision is 10.0%.
The models precision is 10.0% using the DF calculation.


In [30]:
# model_3
TP = 13
TN = 98
FP = 86
FN = 3
precision = TP/(TP + FP) * 100
model3_precision = (r_duck.model3 == r_duck.actual).mean()
model3_precision
print(f"The models precision is {round(precision,2)}%.")
print(f"The models precision is {round(precision,2)}% using the DF calculation.")

The models precision is 13.13%.
The models precision is 13.13% using the DF calculation.


### Based on the 3 models. Model_1 show the highest precision or the model the highest positive predictive value at 80.00% to optimize FP

### 3. You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

- At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. First an automated algorithm tags pictures as either a cat or a dog (Phase I). Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II).

- Several models have already been developed with the data, and you can find their results here.

- Given this dataset, use pandas to create a baseline model (i.e. a model that just predicts the most common class) and answer the following questions:

In [31]:
paws = pd.read_csv('untidy_data/gives_you_paws.csv')
paws.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [32]:
# Show more dogs than cats. Baseline is that they will give you a dog pic
paws.actual.value_counts()

dog    3254
cat    1746
Name: actual, dtype: int64

In [None]:
# Based on the value counts, predicting dog is the baseline

### A.In terms of accuracy, how do the various models compare to the baseline model? Are any of the models better than the baseline?

In [38]:
# Model_1
a_v_m1 = pd.crosstab(paws.model1, paws.actual)
a_v_m1

actual,cat,dog
model1,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1423,640
dog,323,2614


In [34]:
# In terms of accuracy, how do the various models compare to the baseline model? Are any
# of the models better than the baseline?
# Model1
TP = 1423
TN = 2614
FP = 640
FN = 323

accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
model1_accuracy = (paws.model1 == paws.actual).mean
print(f"The model is {round(accuracy,2)}% accurate.")
print(f"The model is {round(accuracy,2)}% accurate using the DF calculation.")

The model is 80.74% accurate.
The model is 80.74% accurate using the DF calculation.


In [37]:
# Model2
a_v_m2 = pd.crosstab(paws.model2, paws.actual)
a_v_m2

actual,cat,dog
model2,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,1555,1657
dog,191,1597


In [40]:
# Model2
TP = 155
TN = 1597
FP = 1657
FN = 191

accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
model2_accuracy = (paws.model2 == paws.actual).mean
print(f"The model is {round(accuracy,2)}% accurate.")
print(f"The model is {round(accuracy,2)}% accurate using the DF calculation.")

The model is 48.67% accurate.
The model is 48.67% accurate using the DF calculation.


In [39]:
#Model3
a_v_m3 = pd.crosstab(paws.model3, paws.actual)
a_v_m3

actual,cat,dog
model3,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,893,1599
dog,853,1655


In [41]:
# Model3
TP = 893
TN = 1655
FP = 1599
FN = 853

accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
model3_accuracy = (paws.model3 == paws.actual).mean
print(f"The model is {round(accuracy,2)}% accurate.")
print(f"The model is {round(accuracy,2)}% accurate using the DF calculation.")

The model is 50.96% accurate.
The model is 50.96% accurate using the DF calculation.


In [42]:
# Model4
a_v_m4 = pd.crosstab(paws.model4, paws.actual)
a_v_m4

actual,cat,dog
model4,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,603,144
dog,1143,3110


In [43]:
# Model4
TP = 603
TN = 3110
FP = 144
FN = 1143

accuracy = (TP + TN)/(TP + TN + FP + FN) * 100
accuracy
model4_accuracy = (paws.model4 == paws.actual).mean
print(f"The model is {round(accuracy,2)}% accurate.")
print(f"The model is {round(accuracy,2)}% accurate using the DF calculation.")

The model is 74.26% accurate.
The model is 74.26% accurate using the DF calculation.


In [None]:
# Based on the accuracy calculation, model 1 is the most accurate with 80.74%

In [None]:
!git add model_evaluation.ipynb

In [None]:
!git commit -m "Adding model evaluations"

In [None]:
!git push