# Random Forest
Random Forest is a `supervised learning algorithm.` Like you can already see from it’s name, it creates a forest and makes it somehow random. `The "forest" it builds, is an ensemble of Decision Trees,` most of the time trained with the “bagging” method. The general idea of the bagging method is that a combination of learning models increases the overall result.

To say it in simple words: Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction.

In [1]:
# import all the nessesary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.metrics import mean_squared_error, r2_score, root_mean_squared_error, mean_absolute_error

In [2]:
# load the dataset
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [3]:
# encode the categorical and object variables by using LabelEncoder
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == 'object' or df[col].dtype == 'category':
        df[col] = le.fit_transform(df[col])
df.head()


Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,0,0,2,0,2
1,10.34,1.66,1,0,2,0,3
2,21.01,3.5,1,0,2,0,3
3,23.68,3.31,1,0,2,0,2
4,24.59,3.61,0,0,2,0,4


## Using Random Forest for classification

In [5]:
# separate the data into features and target or label
X = df.drop('sex', axis=1)
y = df.sex

# spliting the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# call the model 
model = RandomForestClassifier(random_state=42)

# train the model
model.fit(X_train, y_train)

# predict the model 
y_pred = model.predict(X_test)

# evaluate the classification model 
print (confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


[[ 6 13]
 [ 7 23]]
              precision    recall  f1-score   support

           0       0.46      0.32      0.38        19
           1       0.64      0.77      0.70        30

    accuracy                           0.59        49
   macro avg       0.55      0.54      0.54        49
weighted avg       0.57      0.59      0.57        49



## Use Random Forest for Regression

In [6]:
# separate the data into fearture and target or label 
X = df.drop('tip', axis=1)
y = df.tip

# split the data into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# call the model
model = RandomForestRegressor(random_state=42)

# fit the model
model.fit(X_train, y_train)

# predict the model
y_pred = model.predict(X_test)

# evaluate the regression model
print(mean_squared_error(y_test, y_pred))   
print(root_mean_squared_error(y_test, y_pred))
print(r2_score(y_test, y_pred))
print(mean_absolute_error(y_test, y_pred))

0.9625607446938791
0.9811018013916186
0.2299337514142753
0.7750510204081635
