This code is inspired from another notebook

This code is for beginners to understand **Random Forest**

## Importing Libraries

In [None]:
# Pandas is used for data manipulation
import pandas as pd

# Use numpy to convert to arrays
import numpy as np

# Import tools needed for visualization
from sklearn.tree import export_graphviz
import pydot
import matplotlib.pyplot as plt
%matplotlib inline

## Understanding the data

In [None]:
# Reading the data to a dataframe 
features = pd.read_csv('../input/tempscsv/temps.csv')

In [None]:
# displaying first 5 rows
features.head(5)

In [None]:
# the shape of our features
features.shape

In [None]:
# column names
features.columns

In [None]:
# checking for null values
features.isnull().sum()

There are no null values

## One-Hot Encoding

A one hot encoding allows the representation of categorical data to be more expressive. 

In [None]:
# One-hot encode categorical features
features = pd.get_dummies(features)
features.head(5)

In [None]:
print('Shape of features after one-hot encoding:', features.shape)

## Features and Labels

In [None]:
# Labels are the values we want to predict
labels = features['actual']

# Remove the labels from the features
features= features.drop('actual', axis = 1)

# Saving feature names for later use
feature_list = list(features.columns)

## Training and Testing Sets

In [None]:
# Using Skicit-learn to split data into training and testing sets
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(features,
                                                                            labels,
                                                                            test_size = 0.20,
                                                                            random_state = 42)

In [None]:
print('Training Features Shape:', train_features.shape)
print('Training Labels Shape:', train_labels.shape)
print('Testing Features Shape:', test_features.shape)
print('Testing Labels Shape:', test_labels.shape)

## Training the Forest

In [None]:
# Import the model we are using
from sklearn.ensemble import RandomForestRegressor

# Instantiate model 
rf = RandomForestRegressor(n_estimators= 1000, random_state=42)

# Train the model on training data
rf.fit(train_features, train_labels);

## Make Predictions on Test Data

In [None]:
# Use the forest's predict method on the test data
predictions = rf.predict(test_features)

# Calculate the absolute errors
errors = abs(predictions - test_labels)

# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')


In [None]:
# Calculate mean absolute percentage error (MAPE)
mape = 100 * (errors / test_labels)

# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')

## Visualizing a Single Decision Tree

In [None]:
# Pull out one tree from the forest
tree = rf.estimators_[5]

# Export the image to a dot file
export_graphviz(tree, out_file = 'tree.dot', feature_names = feature_list, rounded = True, precision = 1)

# Use dot file to create a graph
(graph, ) = pydot.graph_from_dot_file('tree.dot')

# Write graph to a png file
graph.write_png('tree.png'); 

![Decision Tree](tree.png)

In [None]:
print('The depth of this tree is:', tree.tree_.max_depth)

Smaller tree for visualization.

![Small Decision Tree](small_tree.PNG)

**If you find the notebook useful, please consider upvotting**<p>
**Thank you !**