### Logistic Regression Example 4.1
In order to use the cross validation, we need to use sklearn instead of statsmodels.

The logistic model for the **Default** data set will now be evaluated with $k$-fold cross validation. We use the **cross\_val\_score()**-function from **sklearn.model\_selection** for computing the estimated error. We choose $k = 5$ and use the downsampled version of the training data.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

# Load data
df = pd.read_csv('./data/Default.csv', sep=';')

# Add a numerical column for default
df = df.join(pd.get_dummies(df['default'], 
                            prefix='default', 
                            drop_first=True))

# Set ramdom seed
np.random.seed(1)
# Index of Yes:
i_yes = df.loc[df['default_Yes'] == 1, :].index

# Random set of No:
i_no = df.loc[df['default_Yes'] == 0, :].index
i_no = np.random.choice(i_no, replace=False, size=333)

# Fit Linear Model on downsampled data
i_ds = np.concatenate((i_no, i_yes))
x_ds = df.iloc[i_ds][['balance']]
y_ds = df.iloc[i_ds]['default_Yes']

model = LogisticRegression()

# Calculate cross validation scores:
scores = cross_val_score(model, x_ds, y_ds, cv=5)
print(scores)
print(np.mean(scores))

[0.93283582 0.84962406 0.85714286 0.90225564 0.87218045]
0.8828077656828638
