**_Section 9.0:_** Load packages

In [None]:
import pandas as pd
import numpy as np

# Statsmodels logistic regression is sm.Logit
import statsmodels.api as sm

import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, roc_auc_score, accuracy_score
from sklearn import grid_search, cross_validation

%matplotlib inline

### _Section 9.1_
#### Guided Practice: Logit Function and Odds
```diff
+ The following section provides an opportunity for the student to create logit and sigmoid functions "from scratch" (see slides for formulas), as well as an example data set for converting odds into probability using those functions. The 2nd section focuses on visualizing our functions for greater illustration.
```

In [None]:
def logit_func(odds):
    # uses a float (odds) and returns back the log odds (logit)
    return None

def sigmoid_func(logit):
    # uses a float (logit) and returns back the probability
    return None

odds_set = [
    5./1,
    20./1,
    1.1/1,
    1.8/1,
    1.6/1
]

#### Plotting functions

In [None]:
# plot sigmoid function (after defining it above)
plt.axhline(0.5, color='grey')
plt.axvline(0, color='grey')
plt.plot(range(-6,6),[sigmoid_func(x) for x in range(-6,6)])
plt.xlabel('x')
plt.ylabel('probability')
plt.title('inverse logit/sigmoidal function')

plt.show()

### _Section 9.2_
#### Implement logistic regression with college admissions data
```diff
+ The following section provides an opportunity for the student to implement an existing sklearn model - LogisticRegression, and then examine its output.
```

In [None]:
# Read in the data
df = pd.read_csv('./dataset/collegeadmissions.csv')

df.head()

In [None]:
df = df.join(pd.get_dummies(df['rank']))

df.head()

In [None]:
lm = LogisticRegression()
X = df[['gre', 'gpa', 1, 2, 3,]]
y = df['admit']

lm.fit(X, y)

predictions = lm.predict_proba(X)[:,1]

print(lm.coef_)
print(lm.intercept_)
print(df.admit.mean())

### _Section 9.3_
#### Classification metrics: Confusion matrices, etc.
```diff
+ The following section provides an opportunity for the student to implement classification metrics, and present the results in a popular format - the ROC curve, which will be useful for working through the Titanic problem.
```

Below the ROC curve is based on various thresholds: it shows with a false positive rate (x-axis) ~0, it also expects a true positive rate (y-axis) ~0 (the same, ish, for the top right hand of the figure).

This chart will help you compare models and determine where the decision line should exist for the data.

In [None]:
fpr, tpr, _ = roc_curve(y, predictions)

threshold = 0.5
predicted_classes = (predictions > threshold).astype(int)

print('accuracy: ' + str(accuracy_score(y, predicted_classes)))

plt.plot(fpr, tpr)
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.title('varied thresholds')

plt.show()

Finally, you can use the `roc_auc_score` function to calculate the area under these curves (AUC).

In [None]:
roc_auc_score(df['admit'], lm.predict(df[['gre', 'gpa', 1, 2, 3,]]))

In [None]:
roc_curve(df['admit'], lm.predict(df[['gre', 'gpa', 1, 2, 3,]]))

### _Section 9.4_
#### Independent Practice: Evaluating Logistic Regression with Alternative Metrics
#### "A _Titanic_ Problem"
```diff
+ The following section provides a more in-depth opportunity to implement classification metrics using a more elaborate data set. The emphasis here should be on comparing alternative metrics and selecting the most appropriate one.
```

** Goals **

1. Spend a few minutes determining which data would be most important to use in the prediction problem. You may need to create new features based on the data available. Consider using a feature selection aide in sklearn. But a worst case scenario; identify one or two strong features that would be useful to include in the model.
2. Spend 1-2 minutes considering which _metric_ makes the most sense to optimize. Accuracy? FPR or TPR? AUC? Given the business problem (understanding survival rate aboard the Titanic), why should you use this metric?
3. Build a tuned Logistic model. Be prepared to explain your design (including regularization), metric, and feature set in predicting survival using the tools necessary (such as a fit chart).