## Cost Benefit Questions
1. How would you rephrase the business problem if your model was optimising toward _precision_? i.e., How might the model behave differently and what effect would it have?
2. How would you rephrase the business problem if your model was optimising toward _recall_?
3. What would the most ideal model look like in this case?

Answers: ?

### Visualising models over variables

In [None]:
import pandas as pd
import sklearn.linear_model as lm

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("../../../../data/flight_delays.csv")
df = df.loc[df.DEP_DEL15.notnull()].copy()

In [None]:
df.head()

In [None]:
df = df[df.DEP_DEL15.notnull()]
df = df.join(pd.get_dummies(df["CARRIER"], prefix = "carrier"))
df = df.join(pd.get_dummies(df["DAY_OF_WEEK"], prefix = "dow"))
model = lm.LogisticRegression()
features = [i for i in df.columns if "dow_" in i]

In [None]:
df.shape

In [None]:
features += ["CRS_DEP_TIME"]
model.fit(df[features[1:]], df["DEP_DEL15"])

df["probability"] = model.predict_proba(df[features[1:]]).T[1]

In [None]:
ax = plt.subplot(111)
colors = ["blue", "green", "red", "purple", "orange", "brown"]
for e, c in enumerate(colors):
    df[df[features[e]] == 1].plot(x = "CRS_DEP_TIME",
                                  y = "probability",
                                  kind = "scatter",
                                  color = c,
                                  ax = ax)
ax.set(title = "Probability of Delay\n Based on Day of Week and Time of Day")
plt.show()

### Other Answers: Visualising Airline performance over time; Visualising the inverse

In [None]:
features = [i for i in df.columns if "carrier_" in i]
features += ["CRS_DEP_TIME"]
# <Code Here>

In [None]:
# <code Here>

## Visualising Performance Against Baseline

### Visualising AUC and comparing Models

In [None]:
from sklearn import dummy, metrics

In [None]:
model0 = dummy.DummyClassifier()
model0.fit(df[features[1:]], df["DEP_DEL15"])
df["probability_0"] = model0.predict_proba(df[features[1:]]).T[1]

model1 = lm.LogisticRegression()
model.fit(df[features[1:]], df["DEP_DEL15"])
df["probability_1"] = model.predict_proba(df[features[1:]]).T[1]

In [None]:
df.shape

In [None]:
ax = plt.subplot(111)
vals = metrics.roc_curve(df.DEP_DEL15, df.probability_0)
ax.plot(vals[0], vals[1])
vals = metrics.roc_curve(df.DEP_DEL15, df.probability_1)
ax.plot(vals[0], vals[1])

ax.set(title = "Area Under the Curve for prediction delayed = 1",
       xlabel = "FPR",
       ylabel = "TPR",
       xlim = (0, 1),
       ylim = (0, 1))
plt.show()

### Visualising Precision / Recall (with cleaner code)

In [None]:
# <Code Here>

In [None]:
# <Code Here>