### Hi, welcome to my project! Today we will build regression models, linear, ridge, lasso and elastic net, compute their error metrics for our case of study and choose the best one.  

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

### Let's read our csv file Ames_Housing_Sales: 

In [None]:
data = pd.read_csv('../input/regression/Ames_Housing_Sales.csv', sep=',')

In [None]:
data.shape

In [None]:
data.dtypes.value_counts()

Let's find out how many unique values are in each object type column:

In [None]:
categorical=data.dtypes[data.dtypes == np.object]
categorical.index

In [None]:
n=0
for x in categorical.index:
    print (x, len(data[x].unique()))
    n=n+len(data[x].unique())
n

Above we see 258 corresponds to the sum of all unique values for each column, so after one-hot encoding them we should get 258 features.

# Encoding of categorical variables:

Let's create a list of categorial data and one-hot encode them. Pandas one-hot encoder (get_dummies) works well with data that is defined as a categorical.

In [None]:
categorical_var = data.dtypes[data.dtypes == np.object]  
categorical_var = categorical_var.index.tolist()  # list of categorical fields

In [None]:
categorical_var

Encoding using get_dummies function:

In [None]:
data=pd.get_dummies(data, columns=categorical_var)

In [None]:
data.shape

As we can see above, the number of features has increased considerably because of the encoding.

In [None]:
data.dtypes.value_counts()

We confirm that 258 features were obtained by the encoding of categorical variables as we said before.
Next, split the data in train and test data sets:

In [None]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.3, random_state=42)

There are a number of columns which correspond to skewed features, as we know a log transformation can be applied to them. Take into account that our label "SalePrice" is in this group too, we will leave it in such group only to see its corresponding skew, but then we will omit it from the transformation. 

In [None]:
# Create a list of float colums to check for skewing
mask = data.dtypes == np.float
float_cols = data.columns[mask]
float_cols

In [None]:
len(float_cols)

Let's print the skew of each feature in the training dataset:

In [None]:
train[float_cols].skew().sort_values(ascending=False)

In [None]:
skew_limit = 0.75
skew_vals = train[float_cols].skew()

skew_cols = (skew_vals
             .sort_values(ascending=False)
             .to_frame()
             .rename(columns={0:'Skew'})
             .query('abs(Skew) > {0}'.format(skew_limit)))

skew_cols

#### Apply the log transformation to the features with skew greater than 0.75 excluding the label:

In [None]:
for col in skew_cols.index.tolist():
    if col == "SalePrice":
        continue
    train[col] = np.log1p(train[col])
    test[col]  = np.log1p(test[col])  

Define X_train, X_test, y_train and y_test:

In [None]:
feature_cols = [x for x in train.columns if x != 'SalePrice']
X_train = train[feature_cols]
y_train = train['SalePrice']

X_test = test[feature_cols]
y_test = test['SalePrice']

In order to obtain error metric for this model we will declare a function "rmse" which takes the actual and predicted values and returns the root-mean-squared error. Use sklearn's mean_squared_error.

In [None]:
from sklearn.metrics import mean_squared_error

def rmse(y_true,y_predicted):
    return np.sqrt(mean_squared_error(y_true,y_predicted))

# Building our regression models: 
### We will start with a basic linear regression model and compute its root-mean-squared error.

In [None]:
from sklearn.linear_model import LinearRegression
linearr=LinearRegression().fit(X_train,y_train)
LinearRegression_rmse=rmse(y_test,linearr.predict(X_test))
print(LinearRegression_rmse)

Let's take into account this rmse as we will build models with regularization and best hyperparameters to reduce error metrics.

Plotting the predicted vs actual sale price based on the model.

In [None]:
f = plt.figure(figsize=(6,6))
ax = plt.axes()

ax.plot(y_test, linearr.predict(X_test), marker='o', ls='', ms=3.0)

lim = (0, y_test.max())

ax.set(xlabel='Actual Price', 
       ylabel='Predicted Price', 
       xlim=lim,
       ylim=lim,
       title='Linear Regression Results');

### Now let's fit a linear regression with L2 regularization "Ridge" and Cross Validation to find the best alpha value:


In [None]:
# Mute the sklearn warning about regularization
import warnings
warnings.filterwarnings('ignore', module='sklearn')

In [None]:
from sklearn.linear_model import RidgeCV

alphas = [0.005, 0.05, 0.1, 0.3, 1, 3, 5, 10, 15, 30, 80]

RidgeCV = RidgeCV(alphas=alphas, cv=4).fit(X_train,y_train)
RidgeCV_rmse = rmse(y_test,RidgeCV.predict(X_test))

In [None]:
print(RidgeCV.alpha_,RidgeCV_rmse)

### Build the same linear regression but now with L1 regularization "Lasso" and Cross Validation:

In [None]:
from sklearn.linear_model import LassoCV

alphas2 = np.array([1e-5, 5e-5, 0.0001, 0.0005])

lassoCV = LassoCV(alphas=alphas2, max_iter=5e4, cv=3).fit(X_train, y_train)

lassoCV_rmse = rmse(y_test, lassoCV.predict(X_test))

In [None]:
print(lassoCV.alpha_, lassoCV_rmse)  # Lasso is slower

We can determine how many of these coefficients remain non-zero.

In [None]:
len(lassoCV.coef_) # Remember that number of coefficients is the same as number of features

L1 regularization "Lasso" will selectively shrink coefficients, some of them until zero, thus performing feature elimination, following we can see how many of our coefficients are not zero after lasso:

In [None]:
len(lassoCV.coef_.nonzero()[0])  

We see above that our model has 294 coefficients, where 22 of them are zero or we could say the features were eliminated.

### Now try Elastic Net, with the same alphas as in Lasso, and l1_ratios between 0.1 and 0.9:

In [None]:
from sklearn.linear_model import ElasticNetCV

l1_ratios = np.linspace(0.1, 0.9, 9)

elasticNetCV = ElasticNetCV(alphas=alphas2, 
                            l1_ratio=l1_ratios,
                            max_iter=1e4).fit(X_train, y_train)
elasticNetCV_rmse = rmse(y_test, elasticNetCV.predict(X_test))

In [None]:
print(elasticNetCV.alpha_, elasticNetCV.l1_ratio_, elasticNetCV_rmse)

# Comparing the RMSE calculation for every model in a table:

In [None]:
new_df = [['Linear', LinearRegression_rmse], ['Ridge',RidgeCV_rmse], ['Lasso', lassoCV_rmse], ['ElasticNet',elasticNetCV_rmse]]
table_rmse=pd.DataFrame(new_df,columns=['Model','RMSE']).set_index('Model')
table_rmse

**From the table above we can conclude that our Ridge regression is the best model for our case of study, due to the fact that has the lowest RMSE.**

# Plotting actual vs predicted house prices for the three models:

In [None]:
lim = (0, y_test.max())

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 6))
fig.suptitle('Regression Results:')
ax1.plot(y_test, RidgeCV.predict(X_test), marker='o', ls='', ms=3.0, color='green')
ax1.set(xlabel='Actual Price', ylabel='Predicted Price', xlim=lim, ylim=lim, title='RidgeCV Results')
ax2.plot(y_test, lassoCV.predict(X_test), marker='o', ls='', ms=3.0, color='red')
ax2.set(xlabel='Actual Price', ylabel='Predicted Price', xlim=lim, ylim=lim, title='LassoCV Results')
ax3.plot(y_test, elasticNetCV.predict(X_test), marker='o', ls='', ms=3.0, color='blue')
ax3.set(xlabel='Actual Price', ylabel='Predicted Price', xlim=lim, ylim=lim, title='ElasticNetCV Results')
