# Random Forest Classifier

In this notebook we train a [random forest classifier](https://en.wikipedia.org/wiki/Random_forest). Random forest classifiers are made up of sets of decision trees. As such, they are often deemed to be 'explainable' machine learning models.

### Loading in training data

In [None]:
import pandas as pd
import numpy as np

import os.path

training_data = pd.read_parquet(os.path.join("data", "training.parquet"))

In [None]:
training_data.sample(10)

### Feature Engineering

In [None]:
import cloudpickle as cp
feature_pipeline = cp.load(open('feature_pipeline.sav', 'rb'))

In [None]:
training_vecs = feature_pipeline.fit_transform(training_data["Text"])

### Training a Model

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn import model_selection

In [None]:
rfc = RandomForestClassifier(n_estimators=500, max_depth=5, random_state=404)

In [None]:
rfc.fit(training_vecs, training_data["Category"])

✅ Play around with the parameters of your random forest classifier. What happens to the performance as you increase `max_depth`?

### Evaluating model performance

In [None]:
rfc.score(training_vecs, training_data["Category"])

In [None]:
testing_data = pd.read_parquet(os.path.join("data", "testing.parquet"))
testing_vecs=feature_pipeline.transform(testing_data["Text"])
rfc.score(testing_vecs, testing_data["Category"])

In [None]:
from mlworkflows import plot

df, chart =plot.confusion_matrix(testing_data["Category"], rfc.predict(testing_vecs))

In [None]:
chart

In [None]:
from sklearn.metrics import classification_report
print(classification_report(testing_data["Category"], rfc.predict(testing_vecs)))

In [None]:
from mlworkflows import util

util.serialize_to(rfc, "model.sav")