**Use of Grid Search with Random Forest on the Boston Housing data**

**AWS Configuration required**

Fit2ec2 use your credentials stored in ~/.aws/credential

Make sure you have install aws-cli and it's configured

https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html

In [3]:
### Issue Jupyter Notebook 100
import ipyparams
print(ipyparams.notebook_name)


example.ipynb


In [18]:
# Import the usual suspects
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
import pandas as pd

In [21]:
%load_ext autoreload
%autoreload 2
%reload_ext autoreload

# fit2ec2
import fit2ec2 as fe

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


**Load and Split data**

In [6]:
data = pd.read_csv('data/data.csv')

y = np.log(data.SalePrice)
X = data.drop(['SalePrice', 'Id'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(
                          X, y, random_state=42, test_size=.33)

print(X_train.shape)

(978, 246)


**Initialize the Random Forest and the Grid Search**

In [7]:
# Initialize Random Forest
rf = RandomForestRegressor()

In [8]:
# Choose parameters for Grid Search
rf_params = {'n_estimators': range(150, 300, 10), 'max_depth': range(3, 15, 1),'min_samples_leaf':[0.04,0.06,0.08], 'max_features':[0.2,0.4,0.6,0.8]}

In [9]:
# Don't forget the n_jobs parameters on a multi-cpu machine!
grid = GridSearchCV(estimator = rf,
                         param_grid = rf_params,
                         cv=2,
                         scoring='neg_mean_squared_error',
                         verbose=1,
                         n_jobs=-1)

**Initialize Fit2ec2**

In [10]:
cmp = fe.Compute()

# Or specify the private key name
# cmpC = fe.Fit2ec2(keyname="mykey")

**Create the ec2 instance**

In [11]:
# Create an EC2 with the default AWS Linux AMI and a type of t2.micro
cmp.create(instanceType="c5.12xlarge")

# Or specify the image id (AMI ID) and the instance type (see https://aws.amazon.com/ec2/instance-types/)
#cmp.create(imageId='ami-06ce3edf0cff21f07', instanceType='t2.micro')

Creating key pair [ec2-keypair-20200509-185456.pem]...
Key pair created!
Creating EC2 instance with type [c5.12xlarge]
Wait for EC2 instance to  be created...
EC2 instance created!
	Id: i-08fb4a5c4adef2cea
	Public IP address: 34.250.100.82
Generating project Python requirements


**Fit the model with the data on the ec2 instance**

In [22]:
grid = cmp.fit(grid, X_train, y_train)
# If you get NoValidConnectionsError, better to wait 30secs for the ec2 instance to turn on

Saving model.pickle
Saving X.pickle
Saving y.pickle
Transfering [./temp/model.pickle] to ec2
Transfering [./temp/requirements.txt] to ec2
Transfering [./temp/X.pickle] to ec2
Transfering [run.sh] to ec2
Transfering [./temp/y.pickle] to ec2
Transfering [./temp/model.py] to ec2
Executing [chmod 700 run.sh]
Out: 
Error: 
Executing [./run.sh]
Out: Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Package python3-3.7.6-1.amzn2.0.1.x86_64 already installed and latest version
Nothing to do
Collecting pip
  Using cached pip-20.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.1
    Uninstalling pip-20.1:
      Successfully uninstalled pip-20.1
Successfully installed pip-20.1
Defaulting to user installation because normal site-packages is not writeable
This package only works within Jupyter/IPython accessed from a browser.
Starting
Loading model...
GridSearchCV(cv=2, error_score=nan,
      

Retrieving [./temp/model.pickle] from ec2
Loading model.pickle




In [23]:
grid.best_score_

-0.03129856373358496

In [24]:
grid.best_estimator_

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=14, max_features=0.6, max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=0.04,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=160, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)

**Terminate the ec2 instance and remove the private key**

In [25]:
cmp.terminate()

Deleting remote key pair...
Terminate EC2 instance [i-08fb4a5c4adef2cea]...
Delete local key pair file...
Done!
