## Train/Test split

In [0]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

## Train/Validate/Test split

In [0]:
train = homes[homes['Yr_Sold'] < 2009]
val = homes[homes['Yr_Sold']== 2009]
test = homes[homes['Yr_Sold']== 2010]

In [0]:
X_train= train[['Exter_Cond','Sale_Condition']]
X_test= test[['Exter_Cond','Sale_Condition']]
X_val= val[['Exter_Cond','Sale_Condition']]

y_train= train['SalePrice']
y_test= test['SalePrice']
y_val= val['SalePrice']

## Baselines

In [0]:
guess = y_train.mean()

## one-hot encoding
One-hot encoding allows us to turn nominal categorical data into features with numerical values, while not mathematically imply any ordinal relationship between the classes.

In [0]:
import category_encoders as ce
encoder = ce.OneHotEncoder(use_cat_names=True)
X_train = encoder.fit_transform(X_train)
X_test = encoder.transform(X_test)

## Linear regression
In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression

In [0]:
### Re-arrange X features matrices
features = ['Average Recent Growth in Personal Incomes', 
            'US Military Fatalities per Million']
X_train = train[features]
X_test = test[features]
print(f'Linear Regression, dependent on: {features}')

### Fit the model
model.fit(X_train, y_train)
y_pred = model.predict(X_train)
mae = mean_absolute_error(y_train, y_pred)
print(f'Train Error: {mae:.2f} percentage points')

### Apply the model to new data
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print(f'Test Error: {mae:.2f} percentage points')

## Ridge regression model
Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity.
By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors. It is hoped that the net effect will be to give estimates that are more reliable.

In [0]:
from sklearn.linear_model import RidgeCV
ridge = RidgeCV(alphas=alphas, normalize=True)
ridge.fit(anscombe[['x']], anscombe['y'])
ridge.alpha_

## Logistic regression model
In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc.

In [0]:
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression(solver='lbfgs')
log_reg.fit(X_train_imputed, y_train)
print('Validation Accuracy', log_reg.score(X_val_imputed, y_val))

# The predictions look like this
log_reg.predict(X_val_imputed)

# What's the math?
#log_reg.coef_
print(features, log_reg.coef_)
#pd.Series(linear_reg.coef_, features)

## MAE (Mean Absolute Error)
Statistically, Mean Absolute Error (MAE) refers to a the results of measuring the difference between two continuous variables. Remember it can be positve or negative and the example of traveling in the car to two diffrent places -5miles and 5 miles. You want to get the total distance.

In [0]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error = errors.abs().mean() #MAE
mean_absolute_error

In [0]:
print(f'This is where you guess goes ${guess:,.2f},')
print(f'we would be off by ${mean_absolute_error:,.2f} on average.')

## R2
R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

In [0]:
train_r2s.append(train_r2)
test_r2s.append(test_r2)

## Simple Imputer

In [0]:
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train_encoded)
X_val_imputed = imputer.transform(X_val_encoded)