### Evaluation Exercises:
    start: wednesday, July 6th 2022

----

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from env import user, host, password, get_connection

#### Exercise #2:

Given the following confusion matrix, evaluate (by hand) the model's performance.

|               | pred dog   | pred cat   |
|:------------  |-----------:|-----------:|
| actual dog    |         46 |         7  |
| actual cat    |         13 |         34 |


In the context of this problem, what is a false positive?
    where "dog" is highest frequency in table

0 = dog

1 = cat

**False positive: predicted cat == actual dog**

* True Positive = predicted cat == actual cat (correct)
* True Negative = predicted dog == actual dog (correct/actual cat)
* False Positive = predicted cat == actual dog (incorrect)
* False Negative = predicted dog == actual cat (incorrect)

In the context of this problem, what is a false negative?

**False Negative: predicted dog == actual cat** 

How would you describe this model?
(ambigous questions...need to clarify with Ravinder): 
    potential answer..."Model Accuracy", where the model success matric is focused on accurately predicting all instances where the predicted outcome == actual outcome.


----
### Exercise #3: 

You are working as a data scientist working for Codeup Cody Creator (C3 for short), a rubber-duck manufacturing plant.

Unfortunately, some of the rubber ducks that are produced will have defects. Your team has built several models that try to predict those defects, and the data from their predictions can be found here.

Use the predictions dataset and pandas to help answer the following questions:

An internal team wants to investigate the cause of the manufacturing defects. They tell you that they want to identify as many of the ducks that have a defect as possible. Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?

----

**goal:** "Identify as many item defects as possible."

* False Positive: predicted item defect == "no defect found" / outcome: potential revenue loss due to inaccurate defect predictions and item replacements
* False Negative: predicted no defect == "defected item" / outcome: trust and credibility risk to the company; customers are reluctant to shop with us again and tell their friends and families.

**Since one of the company's goal is to identify as many defected ducks as possible and reduce the number of "false negative" predictions -- we want to utilize a "Recall Model".**

<u>**breakdown:**</u>

0 = "no defect" (negative)

1 = "defect" (positive)

* True Positive: predicted defect == **defect**
* True Negative: predicted no defect == **no defect**
* False Positive: predicted defect == *no defect*
* False Negative: predicted no defect == *defect*
-----

In [2]:
# Exercise 3A: RECALL model

In [3]:
duck_df = pd.read_csv("c3.csv")
duck_df.head()

Unnamed: 0,actual,model1,model2,model3
0,No Defect,No Defect,Defect,No Defect
1,No Defect,No Defect,Defect,Defect
2,No Defect,No Defect,Defect,No Defect
3,No Defect,Defect,Defect,Defect
4,No Defect,No Defect,Defect,No Defect


In [4]:
# checking the value frequency

duck_df.actual.value_counts()

No Defect    184
Defect        16
Name: actual, dtype: int64

In [5]:
# let's create a "baseline" for the dataframe
duck_df["baseline_prediction"] = "No Defect"
duck_df.head()

Unnamed: 0,actual,model1,model2,model3,baseline_prediction
0,No Defect,No Defect,Defect,No Defect,No Defect
1,No Defect,No Defect,Defect,Defect,No Defect
2,No Defect,No Defect,Defect,No Defect,No Defect
3,No Defect,Defect,Defect,Defect,No Defect
4,No Defect,No Defect,Defect,No Defect,No Defect


In [6]:
duck_df.shape

(200, 5)

In [22]:
# initial model accuracy

print(f"Baseline accuracy: {(duck_df.baseline_prediction == duck_df.actual).mean()}")
print(f"Model 01 accuracy: {(duck_df.model1 == duck_df.actual).mean()}")
print(f"Model 02 accuracy: {(duck_df.model2 == duck_df.actual).mean()}")
print(f"Model 03 accuracy: {(duck_df.model3 == duck_df.actual).mean()}")

Baseline accuracy: 0.92
Model 01 accuracy: 0.95
Model 02 accuracy: 0.56
Model 03 accuracy: 0.555


In [7]:
# comparing across baseline_predictions

pd.crosstab(duck_df.baseline_prediction, duck_df.actual, margins=True)

# 16 total False Negatives

actual,Defect,No Defect,All
baseline_prediction,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
No Defect,16,184,200
All,16,184,200


In [8]:
# comparing across model1

pd.crosstab(duck_df.model1, duck_df.actual, margins=True)

# 8 total False Negatives

actual,Defect,No Defect,All
model1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Defect,8,2,10
No Defect,8,182,190
All,16,184,200


In [9]:
# comparing across model2

pd.crosstab( duck_df.model2, duck_df.actual, margins=True)

# 7 total False Negatives

actual,Defect,No Defect,All
model2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Defect,9,81,90
No Defect,7,103,110
All,16,184,200


In [10]:
# comparing across model3

pd.crosstab( duck_df.model3, duck_df.actual, margins=True)

# 3 total False Negatives

actual,Defect,No Defect,All
model3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Defect,13,86,99
No Defect,3,98,101
All,16,184,200


In [12]:
# RECALL - only looking at data where actual = positives (e.g., "Defect")

subset = duck_df[duck_df["actual"] == "Defect"]
subset.head()

Unnamed: 0,actual,model1,model2,model3,baseline_prediction
13,Defect,No Defect,Defect,Defect,No Defect
30,Defect,Defect,No Defect,Defect,No Defect
65,Defect,Defect,Defect,Defect,No Defect
70,Defect,Defect,Defect,Defect,No Defect
74,Defect,No Defect,No Defect,Defect,No Defect


In [13]:
print(f"Baseline accuracy: {(subset.baseline_prediction == subset.actual).mean()}")
print(f"Model 01 accuracy: {(subset.model1 == subset.actual).mean()}")
print(f"Model 02 accuracy: {(subset.model2 == subset.actual).mean()}")
print(f"Model 03 accuracy: {(subset.model3 == subset.actual).mean()}")

Baseline accuracy: 0.0
Model 01 accuracy: 0.5
Model 02 accuracy: 0.5625
Model 03 accuracy: 0.8125


**Conclusion:** based on our model predictions and actual results, Model 3 appears to have the lowest count of "false positive" observations and highest total "true positives" from the actual results - therefore, making this model the "most effective" model for reaching our intended results/goals.


----


**3B. Recently several stories in the local news have come out highlighting customers who received a rubber duck with a defect, and portraying C3 in a bad light.**

The PR team has decided to launch a program that gives customers with a defective duck a vacation to Hawaii. 

They need you to predict which ducks will have defects, **but tell you the really don't want to accidentally give out a vacation package when the duck really doesn't have a defect.** Which evaluation metric would be appropriate here? Which model would be the best fit for this use case?


In [None]:
# In this scenario we are focusing on "True Positives" and "False Positives"


### Exercise #4:

You are working as a data scientist for Gives You Paws ™, a subscription based service that shows you cute pictures of dogs or cats (or both for an additional fee).

At Gives You Paws, anyone can upload pictures of their cats or dogs. The photos are then put through a two step process. 

**phase 1:**

First an automated algorithm tags pictures as either a cat or a dog (Phase I)



**phase 2:**

Next, the photos that have been initially identified are put through another round of review, possibly with some human oversight, before being presented to the users (Phase II)

Several models have already been developed with the data, and you can find their results here.

----
Given this dataset, use pandas to create a baseline model **(i.e. a model that just predicts the most common class)** and answer the following questions:

In terms of accuracy, how do the various models compare to the baseline model? 

Are any of the models better than the baseline?

Suppose you are working on a team that solely deals with dog pictures. Which of these models would you recomend for Phase I? For Phase II?

Suppose you are working on a team that solely deals with cat pictures. Which of these models would you recomend for Phase I? For Phase II?

In [14]:
# bringing in the dataset

gives_paws = pd.read_csv("gives_you_paws.csv")
gives_paws.head()

Unnamed: 0,actual,model1,model2,model3,model4
0,cat,cat,dog,cat,dog
1,dog,dog,cat,cat,dog
2,dog,cat,cat,cat,dog
3,dog,dog,dog,cat,dog
4,cat,cat,cat,dog,dog


In [15]:
# let's check the frequency of most common class

gives_paws.actual.value_counts()

# where "dog" appears to be the most uploaded class

dog    3254
cat    1746
Name: actual, dtype: int64

In [16]:
# where my baseline will be the most "frequent" class in the actual results

gives_paws["baseline_prediction"] = "dog"
gives_paws.head()

Unnamed: 0,actual,model1,model2,model3,model4,baseline_prediction
0,cat,cat,dog,cat,dog,dog
1,dog,dog,cat,cat,dog,dog
2,dog,cat,cat,cat,dog,dog
3,dog,dog,dog,cat,dog,dog
4,cat,cat,cat,dog,dog,dog


**<u>breakdown:</u>**

0 = dog (negative)

1 = cat (positive)

- True Positive: predict cat == actual cat
- True Negative: predict dog == actual dog
- False Positive: predict cat == actual dog
- False Negative: predict dog == actual cat

In [17]:
# initial model accuracy

print(f"Baseline accuracy: {(gives_paws.baseline_prediction == gives_paws.actual).mean()}")
print(f"Model 01 accuracy: {(gives_paws.model1 == gives_paws.actual).mean()}")
print(f"Model 02 accuracy: {(gives_paws.model2 == gives_paws.actual).mean()}")
print(f"Model 03 accuracy: {(gives_paws.model3 == gives_paws.actual).mean()}")

Baseline accuracy: 0.6508
Model 01 accuracy: 0.8074
Model 02 accuracy: 0.6304
Model 03 accuracy: 0.5096


In [18]:
# Suppose you are working on a team that solely deals with dog pictures. 
# Which of these models would you recomend for Phase I? For Phase II?

# for phase 1: Recall
# for phase 2: Precision?

In [19]:
# RECALL - only looking at data where actual = positives (e.g., "cat")
subset = gives_paws[gives_paws["actual"] == "cat"]
subset.head()

Unnamed: 0,actual,model1,model2,model3,model4,baseline_prediction
0,cat,cat,dog,cat,dog,dog
4,cat,cat,cat,dog,dog,dog
6,cat,cat,cat,cat,dog,dog
7,cat,dog,cat,cat,dog,dog
11,cat,cat,dog,cat,cat,dog


In [20]:
print(f"Baseline accuracy: {(subset.baseline_prediction == subset.actual).mean().round(3)}")
print(f"Model 01 accuracy: {(subset.model1 == subset.actual).mean().round(3)}")
print(f"Model 02 accuracy: {(subset.model2 == subset.actual).mean().round(3)}")
print(f"Model 03 accuracy: {(subset.model3 == subset.actual).mean().round(3)}")

Baseline accuracy: 0.0
Model 01 accuracy: 0.815
Model 02 accuracy: 0.891
Model 03 accuracy: 0.511


In [21]:
Conclusion: 

SyntaxError: invalid syntax (940180751.py, line 1)