### Fine-Tune the Model
Given that we have got a shortlist of promising models. Now there are several ways to fine-tune:
1. Grid Search
2. Randomized Search
3. Ensemble Methods

In [12]:
# load data
import pickle
train_pickle_path = 'datasets/train.pickle'

with open(train_pickle_path, 'rb') as f:
    data = pickle.load(f)
    
housing_labels = data[0]
housing_prepared = data[1]

#### 1. Grid Search

In [8]:
from sklearn.model_selection import GridSearchCV   # try the values and evaluate with cross-validation
from sklearn.ensemble import RandomForestRegressor

param_grid = [
    {'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
    {'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
# 3 x 4 = 12 combinations from 1st dict
# 2 x 3 = 6  combinations from 2nd dict

forest_reg = RandomForestRegressor()

grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
                          scoring='neg_mean_squared_error',
                          return_train_score=True)

grid_search.fit(housing_prepared, housing_labels)

GridSearchCV(cv=5, estimator=RandomForestRegressor(),
             param_grid=[{'max_features': [2, 4, 6, 8],
                          'n_estimators': [3, 10, 30]},
                         {'bootstrap': [False], 'max_features': [2, 3, 4],
                          'n_estimators': [3, 10]}],
             return_train_score=True, scoring='neg_mean_squared_error')

In [9]:
grid_search.best_params_   

{'max_features': 8, 'n_estimators': 30}

--> observing that the best values are the maximum values in the grid, should try searching again with higher values

In [10]:
grid_search.best_estimator_

RandomForestRegressor(max_features=8, n_estimators=30)

In [11]:
import numpy as np

cvres = grid_search.cv_results_
for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
    print(np.sqrt(-mean_score), params)

64123.653653266585 {'max_features': 2, 'n_estimators': 3}
55293.976835407964 {'max_features': 2, 'n_estimators': 10}
52878.50980723279 {'max_features': 2, 'n_estimators': 30}
59301.9935157182 {'max_features': 4, 'n_estimators': 3}
52764.27044418638 {'max_features': 4, 'n_estimators': 10}
50792.03891498849 {'max_features': 4, 'n_estimators': 30}
59445.82603033464 {'max_features': 6, 'n_estimators': 3}
52609.6146075253 {'max_features': 6, 'n_estimators': 10}
50010.53572367903 {'max_features': 6, 'n_estimators': 30}
59019.96772834409 {'max_features': 8, 'n_estimators': 3}
51938.67144870794 {'max_features': 8, 'n_estimators': 10}
49903.53226365907 {'max_features': 8, 'n_estimators': 30}
63174.975693533255 {'bootstrap': False, 'max_features': 2, 'n_estimators': 3}
54339.606052357645 {'bootstrap': False, 'max_features': 2, 'n_estimators': 10}
60067.304555404684 {'bootstrap': False, 'max_features': 3, 'n_estimators': 3}
52234.15507018981 {'bootstrap': False, 'max_features': 3, 'n_estimators':

--> features that are not sure about can also be treated as hyperparameters

#### 2. Randomized Search
preferable when search space is large, by trying random hyperparameters at each iteration
#### 3. Ensemble Methods
"ensemble" methods will often perform better than the best individual model

### Analyze the Best Model
- Check the importance of the attributes to decide on which features to keep
- Check the specific errors the model is making, understand why and fix the problem

In [13]:
feature_importances = grid_search.best_estimator_.feature_importances_
feature_importances

array([6.91634296e-02, 6.95757669e-02, 4.37219642e-02, 1.57916331e-02,
       1.43938346e-02, 1.45351234e-02, 1.44777900e-02, 3.57193769e-01,
       3.96387068e-02, 1.11862204e-01, 8.23834481e-02, 6.27440862e-03,
       1.56613554e-01, 5.81584731e-05, 1.30351637e-03, 3.01269287e-03])

### Evaluate the Model on Test Set

In [14]:
test_pickle_path = 'datasets/test.pickle'

with open(test_pickle_path, 'rb') as f:
    data = pickle.load(f)
    
y_test = data[0]
X_test_prepared = data[1]

In [17]:
from sklearn.metrics import mean_squared_error

final_model = grid_search.best_estimator_

final_predictions = final_model.predict(X_test_prepared)
final_mse = mean_squared_error(y_test, final_predictions)
final_rmse = np.sqrt(final_mse)
final_rmse  # an estimate of the generalization error (out-of-sample error)

47809.56699185216

In [18]:
from scipy import stats
confidence = 0.95
squared_errors = (final_predictions - y_test) ** 2
np.sqrt(stats.t.interval(confidence, len(squared_errors) -1, 
                        loc=squared_errors.mean(),
                        scale=stats.sem(squared_errors)))  # 95% confidence interval for the generalization error

array([45836.86842477, 49704.03288373])

### Launch, Monitor and Maintain the System
__Launch__: deploy the model to the production environment\
__Monitor__: evaluate the live model with or w/o human raters (be smart)\
__Maintain__: evaluate and update the data sets & retrain the model --> on a regular basis needed