# **Random Forest Classification**
<hr>

Random Forest Classification is a supervised learning algorithm that uses ensemble learning method for regression. Ensemble learning method is a technique that combines predictions from multiple machine learning algorithms to make a more accurate prediction than a single model.


- **Advantages:**
    1. Powerful
    2. Accurate
    3. Good performance on many problems including non-linear
    4. It's usually robust to outliers and can handle them automatically
    5. Random Forest works well with both categorical and continuous variables.
    6. Random Forest can automatically handle missing values.
    7. No feature scaling required
    8. Random Forest is comparatively less impacted by noise.


- **Disadvantages:**
    1. No interpretability (Complex)
    2. Overfitting can easily occur
    3. We need to choose the number of trees
    4. It requires much computational power as well as resources as it builds numerous trees to combine their outputs (Longer Training Period)
    
![](https://www.nrronline.org/articles/2018/13/6/images/NeuralRegenRes_2018_13_6_962_233433_f2.jpg)
 
 
#### **Overview about hows it works**

- **Step 1)** Pick at random K data points from the Training set.

- **Step 2)** Build the Decision Tree associated to these K data points.

- **Step 3)** Choose the number Ntree of threes you want to build and repeat STEPS 1 & 2.

- **Step 4)** For a new data point, make each one of your Ntree trees predict the category to which the data points belongs, and assing the new data point to the category that wins the majoritary vote.

## **Importing the libraries**
<hr>

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## **Importing the dataset**
<hr>

In [2]:
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## **Splitting the dataset into the Training set and Test set**
<hr>

In [3]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

## **Feature Scaling**
<hr>

In [4]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## **Training the Random Forest Classification model on the Training set**
<hr>

In [5]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 64, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

RandomForestClassifier(criterion='entropy', n_estimators=64, random_state=0)

## **Predicting a new result**
<hr>

In [6]:
print(classifier.predict(sc.transform([[30,87000]])))

[0]


## **Predicting the Test set results**
<hr>

In [7]:
y_pred = classifier.predict(X_test)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)[:20])

[[0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [1 1]
 [0 0]
 [1 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [0 0]
 [1 0]
 [1 0]
 [0 0]
 [1 1]
 [0 0]]


## **Making the Confusion Matrix**
<hr>

In [8]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)
print(cm)

[[63  5]
 [ 4 28]]


In [9]:
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc*100}%")

Accuracy: 91.0%


## **Visualising the Training set results**
<hr>

In [None]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], color = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

## **Visualising the Test set results**
<hr>

In [None]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], color = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()