Now that you have two new regression methods at your fingertips, it's time to give them a spin. In fact, for this challenge, let's put them together! Pick a dataset of your choice with a binary outcome and the potential for at least 15 features. If you're drawing a blank, the crime rates in 2013 dataset has a lot of variables that could be made into a modelable binary outcome.

Engineer your features, then create three models. Each model will be run on a training set and a test-set (or multiple test-sets, if you take a folds approach). The models should be:

Vanilla logistic regression
Ridge logistic regression
Lasso logistic regression
If you're stuck on how to begin combining your two new modeling skills, here's a hint: the SKlearn LogisticRegression method has a "penalty" argument that takes either 'l1' or 'l2' as a value.

In your report, evaluate all three models and decide on your best. Be clear about the decisions you made that led to these models (feature selection, regularization parameter selection, model evaluation criteria) and why you think that particular model is the best of the three. Also reflect on the strengths and limitations of regression as a modeling approach. Were there things you couldn't do but you wish you could have done?

In [25]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error as mse
%matplotlib inline

In [26]:
distress = pd.read_csv('Financial Distress.csv')
x = distress.drop('Financial Distress', axis=1)
distress['Financial Distress'] = np.where(distress['Financial Distress'] > -0.50, 0, 1)
y = distress['Financial Distress']

# Vanilla Logistic Regression

In [39]:
from sklearn.linear_model import LogisticRegression

#initiate our model
logr = LogisticRegression(C=1e5)

#fit the model
fit_log = logr.fit(x,y)

# Display results

print('R-squared:',round(logreg.score(x, y),4))

X_train , X_test, y_train, y_test = train_test_split(x, y, test_size = 0.1, random_state=1)
trained = logr.fit(X_train, y_train)
print('\nTraining Score:', round(trained.score(X_train, y_train),4))
print('Testing Score:', round(trained.score(X_test, y_test),4))

y_pred = logr.predict(x)
print('\nMSE:', round(mse(y,y_pred),4))
print('rMSE:',round(mse(y,y_pred)**.5,4))



R-squared: 0.963

Training Score: 0.9646
Testing Score: 0.9484

MSE: 0.037
rMSE: 0.1925




# Logistic Ridge Regression

In [41]:
#initiate our model
logreg = LogisticRegression(penalty='l2',C=1e5)

#fit the model
logreg = logreg.fit(x,y)

# Display results
print('R-squared:',round(logreg.score(x, y),4))

X_train , X_test, y_train, y_test = train_test_split(x, y, test_size = 0.1, random_state=1)
trained = logreg.fit(X_train, y_train)
print('\nTraining Score:', round(trained.score(X_train, y_train),4))
print('Testing Score:', round(trained.score(X_test, y_test),4))

y_pred = logreg.predict(x)
print('\nMSE:', round(mse(y,y_pred),4))
print('rMSE:',round(mse(y,y_pred)**.5,4))



R-squared: 0.963

Training Score: 0.9646
Testing Score: 0.9484

MSE: 0.037
rMSE: 0.1925




# Logistic Lasso Regression

In [42]:
#initiate our model
loglass = LogisticRegression(penalty='l1',C=1e5)

#fit the model
loglass = loglass.fit(x,y)

# Display results
print('R-squared:',round(loglass.score(x, y),4))

X_train , X_test, y_train, y_test = train_test_split(x, y, test_size = 0.1, random_state=1)
trained = loglass.fit(X_train, y_train)
print('\nTraining Score:', round(trained.score(X_train, y_train),4))
print('Testing Score:', round(trained.score(X_test, y_test),4))

y_pred = loglass.predict(x)
print('\nMSE:', round(mse(y,y_pred),4))
print('rMSE:',round(mse(y,y_pred)**.5,4))



R-squared: 0.9687

Training Score: 0.97
Testing Score: 0.9429

MSE: 0.0327
rMSE: 0.1808




Doesn't seem to be any significant difference between these methods for this dataset...what gives?

logistic lasso regression seems to edge out the rest of the competition