A random forest is a supervised machine learning algorithm that is built from decision tree algorithms. It is used to solve regression and classification problems, and employs ensemble learning technique, i.e., combining many classifiers to provide solutions to complex problems. A random forest algorithm is typically made of many decision trees, and is trained through bootstrap aggregating (or bagging) -- an ensemble algorithm that improves the accuracy of machine learning algorithms. 

The algorithm determines the outcome depending on the predictions of the decision trees. Predictions are made by taking the average/mean of the output from various trees. More trees lead to more precision of the outcome. Random forests are known to reduce the overfitting issue and increase precision.

In [9]:
import pandas as pd
import numpy as np

dataset = pd.read_csv('petrol_consumption.csv') 
#Data from https://drive.google.com/file/d/1mVmGNx6cbfvRHC_DvF12ZL3wGLSHD9f_/view made available by https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/ 

#dataset.head()
X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)


# Feature Scaling
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error: 51.76500000000001
Mean Squared Error: 4216.166749999999
Root Mean Squared Error: 64.93201637097064


CLASSIFICATION

In [10]:
import pandas as pd
import numpy as np

dataset = pd.read_csv("bill_authentication.csv") #See data at: https://archive.ics.uci.edu/ml/machine-learning-databases/00267/
#dataset.head()

X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.ensemble import RandomForestRegressor

classifier = RandomForestClassifier(n_estimators=20, random_state=0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))

<IPython.core.display.Javascript object>

[[155   2]
 [  1 117]]
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       157
           1       0.98      0.99      0.99       118

    accuracy                           0.99       275
   macro avg       0.99      0.99      0.99       275
weighted avg       0.99      0.99      0.99       275

0.9890909090909091


Useful resources:
- https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
- https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/
- https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html
- https://www.section.io/engineering-education/introduction-to-random-forest-in-machine-learning/