**Use of Grid Search with Random Forest on the Boston Housing data**

**AWS Configuration required**

Fit2ec2 use your credentials stored in ~/.aws/credential

Make sure you have install aws-cli and it's configured

https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html

In [5]:
%load_ext autoreload
%autoreload 2

In [4]:
# Import the usual suspects
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
import pandas as pd
# fit2ec2
import fit2ec2 as fe

**Load and Split data**

In [6]:
data = pd.read_csv('data/data.csv')

y = np.log(data.SalePrice)
X = data.drop(['SalePrice', 'Id'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(
                          X, y, random_state=42, test_size=.33)

print(X_train.shape)

(978, 246)


**Initialize the Random Forest and the Grid Search**

In [7]:
# Initialize Random Forest
rf = RandomForestRegressor()

In [8]:
# Choose parameters for Grid Search
rf_params = {'n_estimators': range(150, 300, 10), 'max_depth': range(3, 15, 1),'min_samples_leaf':[0.04,0.06,0.08], 'max_features':[0.2,0.4,0.6,0.8]}

In [9]:
# Don't forget the n_jobs parameters on a multi-cpu machine!
grid = GridSearchCV(estimator = rf,
                         param_grid = rf_params,
                         cv=2,
                         scoring='neg_mean_squared_error',
                         verbose=1,
                         n_jobs=-1)

**Initialize Fit2ec2**

In [16]:
cmp = fe.Compute()

# Or specify the private key name
# cmpC = fe.Fit2ec2(keyname="mykey")

**Create the ec2 instance**

In [17]:
# Create an EC2 with the default AWS Linux AMI and a type of t2.micro
cmp.create(instanceType="c5.12xlarge")

# Or specify the image id (AMI ID) and the instance type (see https://aws.amazon.com/ec2/instance-types/)
#cmp.create(imageId='ami-06ce3edf0cff21f07', instanceType='t2.micro')

Creating key pair [./tmp/ec2-keypair-20200510-003536.pem]
Launching EC2 instance. Type: c5.12xlarge
EC2 instance created:
	Id: i-07de3d7072eaf81dd
	Public IP address: 3.249.194.187


**Fit the model with the data on the ec2 instance**

In [18]:
grid = cmp.fit(grid, X_train, y_train)
# If you get NoValidConnectionsError, better to wait 30secs for the ec2 instance to turn on

Transfering [./tmp/requirements.txt] to ec2
Transfering [./tmp/X.pickle] to ec2
Transfering [./tmp/y.pickle] to ec2
Transfering [run.sh] to ec2
Transfering [./tmp/model.pickle] to ec2
Transfering [./tmp/model.py] to ec2
Executing [chmod 700 run.sh]
Out: 
Error: 
Executing [./run.sh]
Out: Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Resolving Dependencies
--> Running transaction check
---> Package python3.x86_64 0:3.7.6-1.amzn2.0.1 will be installed
--> Processing Dependency: python3-libs(x86-64) = 3.7.6-1.amzn2.0.1 for package: python3-3.7.6-1.amzn2.0.1.x86_64
--> Processing Dependency: python3-setuptools for package: python3-3.7.6-1.amzn2.0.1.x86_64
--> Processing Dependency: python3-pip for package: python3-3.7.6-1.amzn2.0.1.x86_64
--> Processing Dependency: libpython3.7m.so.1.0()(64bit) for package: python3-3.7.6-1.amzn2.0.1.x86_64
--> Running transaction check
---> Package python3-libs.x86_64 0:3.7.6-1.amzn2.0.1 will be installed
---> Package python3-pip.n

Retrieving [./tmp/model.pickle] from ec2




In [19]:
grid.best_score_

-0.03142525229239425

In [20]:
grid.best_estimator_

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
                      max_depth=6, max_features=0.6, max_leaf_nodes=None,
                      max_samples=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=0.04,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      n_estimators=170, n_jobs=None, oob_score=False,
                      random_state=None, verbose=0, warm_start=False)

**Terminate the ec2 instance and remove the private key**

In [21]:
cmp.terminate()

Deleting remote key pair...
Terminating EC2 instance [i-07de3d7072eaf81dd]
Deleting local key pair file
Done
