## Initialization

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import pandas as pd

In [None]:
col_index = ["Sex","Length","Diameter","Height",
              "Whole weight","Shucked weight",
              "Viscera weight","Shell weight", "Rings"]
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data', index_col=False)
data.columns = col_index
data

## Exploration

In [None]:
data.describe(include="all")

The rings around an abalone's shell are used to determine its age: more rings means the abalone is older. (Age is what we're trying to predict with this model.)

In [None]:
data.info()

In [None]:
data.isna().sum()

## Preprocessing

In [None]:
y = what_goes_here?
y

In [None]:
features = what_goes_here?
features

In [None]:
X = what_goes_here?
X

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=3)
X_train = X_train.values
print(f"X train set length: {len(X_train)}, y train set length: {len(y_train)}")
print(f"X test set length: {len(X_test)}, y test set length: {len(y_test)}")

## Training

In [None]:
rf_model = RandomForestRegressor(random_state=1)
rf_model.get_params()

In [None]:
rf_model.what_goes_here?(X_train, y_train)

## Testing

In [None]:
y_pred = rf_model.predict(X_test.values)
y_pred

In [None]:
from sklearn.metrics import mean_squared_error

mse = mean_squared_error(y_test, y_pred)
print(f"MSE: {mse}")

Since MSE is not as easily interpreted as mean accuracy, let's compare the model's output to the results on a couple test examples.

In [None]:
from random import seed
from random import randint

seed(5)

def sample_predictions(model):
  for _ in range(5):
    random_example_index = randint(0, len(y_test) -1)
    random_example = X_test.iloc[random_example_index,:]
    random_label = y_test.iloc[random_example_index]
    random_prediction = model.predict([random_example])
    print(f"Example {random_example_index}: The abalone's actual ring count is {random_label}, predicted ring count is {random_prediction[0]}.")
  model_predictions = model.predict(X_test.values)
  print(f"MSE for this model is {mean_squared_error(y_test, model_predictions)}" )

In [None]:
sample_predictions(rf_model)

## Iteration

Let's try out a few different hyperparameter settings and see which version of the model performs best.

In [None]:
alternative_rf_model_1 = RandomForestRegressor(random_state=1, max_features=what_goes_here?)
alternative_rf_model_1.fit(X_train,y_train)
sample_predictions(alternative_rf_model_1)

In [None]:
alternative_rf_model_2 = RandomForestRegressor(random_state=1, max_features=0.3, n_estimators=200)
alternative_rf_model_2.fit(X_train, y_train)
sample_predictions(alternative_rf_model_2)

In [None]:
alternative_rf_model_3 = RandomForestRegressor(random_state=1, max_features=0.3, n_estimators=500)
alternative_rf_model_3.fit(X_train,y_train)
sample_predictions(alternative_rf_model_3)

## Deployment

After the hyperparameters have been tuned to our satisfaction, we'd proceed with deployment. This lesson omits that code for the sake of space.