**Run the following two cells before you begin.**

In [10]:
%autosave 10

Autosaving every 10 seconds


In [11]:
import pandas as pd
import numpy as np

______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [12]:
# Import the data set
df = pd.read_csv('cleaned_data.csv')

In [13]:
# Define the sigmoid function
sigmoid = lambda X : (1 / (1 + np.exp(-X)))

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [14]:
# Create a train/test split
from sklearn.model_selection import train_test_split

X1_train, X1_test, X2_train, X2_test, y_train, y_test = train_test_split(
    df['PAY_1'].values.reshape(-1,1),
    df['LIMIT_BAL'].values.reshape(-1,1),
    df['default payment next month'].values,
test_size=0.2, random_state=24)

X_train = np.block([X1_train, X2_train])
X_test = np.block([X1_test, X2_test])

______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [15]:
from sklearn.linear_model import LogisticRegression
my_lr = LogisticRegression(solver='liblinear')

______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [16]:
# Fit the logistic regression model on training data
my_lr.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

In [17]:
# Make predictions using `.predict()`
y_pred = my_lr.predict(X_test)

In [18]:
# Find class probabilities using `.predict_proba()`
y_pred_proba = my_lr.predict_proba(X_test)
y_pred_proba

array([[0.74826924, 0.25173076],
       [0.584297  , 0.415703  ],
       [0.79604453, 0.20395547],
       ...,
       [0.584297  , 0.415703  ],
       [0.82721498, 0.17278502],
       [0.66393435, 0.33606565]])

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [19]:
# Add column of 1s to features
ones = np.ones((len(X1_test),1))

In [20]:
# Get coefficients and intercepts from trained model
theta_1 = my_lr.coef_[0][0]
theta_2 = my_lr.coef_[0][1]
theta_0 = my_lr.intercept_[0]

In [21]:
# Manually calculate predicted probabilities
X = ones*theta_0 + X1_test*theta_1 + X2_test*theta_2
pos_proba_manual = sigmoid(X)
neg_proba_manual = 1 - pos_proba_manual

np.block([neg_proba_manual, pos_proba_manual])

array([[0.74826924, 0.25173076],
       [0.584297  , 0.415703  ],
       [0.79604453, 0.20395547],
       ...,
       [0.584297  , 0.415703  ],
       [0.82721498, 0.17278502],
       [0.66393435, 0.33606565]])

______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [22]:
# Manually calculate predicted classes
y_pred_manual = [1 if x>=0.5 else 0 for x in pos_proba_manual]

In [23]:
# Compare to scikit-learn's predicted classes
compare = y_pred == y_pred_manual

print('Matching predicted values = {} & Total predicted values = {}'
      .format(compare.sum(),len(y_pred)))

Matching predicted values = 5333 & Total predicted values = 5333


______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [24]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
from sklearn import metrics

roc_auc_pred_proba = metrics.roc_auc_score(y_test, y_pred_proba[:,1])
roc_auc_pred_proba

0.627207450280691

In [25]:
# Use manually calculated predicted probabilities to calculate ROC AUC
roc_auc_pred_proba_manual = metrics.roc_auc_score(y_test, pos_proba_manual)
roc_auc_pred_proba_manual

0.627207450280691