**Run the following two cells before you begin.**

In [1]:
%autosave 10

Autosaving every 10 seconds


In [2]:
import pandas as pd
import numpy as np

______________________________________________________________________
**First, import your data set and define the sigmoid function.**
<details>
    <summary>Hint:</summary>
    The definition of the sigmoid is $f(x) = \frac{1}{1 + e^{-X}}$.
</details>

In [3]:
# Import the data set
df = pd.read_csv('cleaned_data.csv')

In [4]:
# Define the sigmoid function
def sigmoid(X):
    Y = 1 / (1 + np.exp(-X))
    return Y

**Now, create a train/test split (80/20) with `PAY_1` and `LIMIT_BAL` as features and `default payment next month` as values. Use a random state of 24.**

In [5]:
# Create a train/test split
X = df[['PAY_1','LIMIT_BAL']]
y = df['default payment next month']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 24)

______________________________________________________________________
**Next, import LogisticRegression, with the default options, but set the solver to `'liblinear'`.**

In [6]:
from sklearn.linear_model import LogisticRegression
my_lr = LogisticRegression(solver = 'liblinear')

______________________________________________________________________
**Now, train on the training data and obtain predicted classes, as well as class probabilities, using the testing data.**

In [7]:
# Fit the logistic regression model on training data
model = my_lr.fit(X_train, y_train)

In [8]:
# Make predictions using `.predict()`
y_pred = model.predict(X_test)

In [9]:
# Find class probabilities using `.predict_proba()`
y_pred_proba = model.predict_proba(X_test)

______________________________________________________________________
**Then, pull out the coefficients and intercept from the trained model and manually calculate predicted probabilities. You'll need to add a column of 1s to your features, to multiply by the intercept.**

In [10]:
# Add column of 1s to features
X_test['Intercept'] = 1
X_test.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_test['Intercept'] = 1


Unnamed: 0,PAY_1,LIMIT_BAL,Intercept
14306,2,160000,1
2978,1,50000,1
16641,-1,200000,1
18580,3,200000,1
131,1,50000,1


In [11]:
# Get coefficients and intercepts from trained model
coef1 = model.coef_[0][0]
coef2 = model.coef_[0][1]
intercept = model.intercept_

In [12]:
# Manually calculate predicted probabilities
X_manual = intercept * X_test['Intercept'] + X_test['PAY_1'] * coef1 + X_test['LIMIT_BAL'] * coef2
manual_predict = sigmoid(X_manual)
manual_predict[0:5]

14306    0.251731
2978     0.415703
16641    0.203955
18580    0.203955
131      0.415703
dtype: float64

______________________________________________________________________
**Next, using a threshold of `0.5`, manually calculate predicted classes. Compare this to the class predictions output by scikit-learn.**

In [27]:
# Manually calculate predicted classes
y_pred_manual = manual_predict >= 0.5
y_pred_manual[0:5]

14306    False
2978     False
16641    False
18580    False
131      False
dtype: bool

In [14]:
# Compare to scikit-learn's predicted classes
print(y_pred_manual.shape)
print(y_pred_proba.shape)

(5333,)
(5333, 2)


______________________________________________________________________
**Finally, calculate ROC AUC using both scikit-learn's predicted probabilities, and your manually predicted probabilities, and compare.**

In [22]:
# Use scikit-learn's predicted probabilities to calculate ROC AUC
from sklearn.metrics import roc_auc_score
roc_auc_score(y_test, y_pred_proba[:,1])

0.627207450280691

In [26]:
# Use manually calculated predicted probabilities to calculate ROC AUC
roc_auc_score(y_test, y_pred_manual)

0.5

In [24]:
y_test.shape

(5333,)

In [25]:
y_pred_manual.shape

(5333,)