# **Day 10**

**Random Forest**

Its a type of supervised machine learning alogorithm based on ensemble learning. The random forest combines multiple algorithm of the same type i.e, multiple decision trees, resulting in forest of trees, hence the name Random Forest

**Problem Statement**

To predict the gas consumption(in millions of gallons) in 48 of the US states based on petrol tax(in cents), per capita income(in dollars), paved highways(in miles) and the proportion of population with the driving license

In [None]:
#import libraries
import pandas as pd
import numpy as np

In [None]:
#import dataset, assign it to dataset and view the dataset
dataset = pd.read_csv('../input/data-science-machine-learning-and-ai-using-python/petrol_consumption.csv')
dataset.head()

In [None]:
#Prepare dataset for training, X = 0:4 columns, Y=4th column
X = dataset.iloc[:, 0:4]  #feature variable
Y = dataset.iloc[:, 4]    #target variable

In [None]:
#divide the data in to training and test using sklearn's library
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.2, random_state=0)

In [None]:
#Feature Scaling
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
#Lets train the algorithm
from sklearn.ensemble import RandomForestRegressor

regressor = RandomForestRegressor(n_estimators=200, random_state=0)
regressor.fit(X_train, Y_train)
y_pred = regressor.predict(X_test)

In [None]:
#Evaluating the algorithm
from sklearn import metrics

print('Mean Absolute Error: ', metrics.mean_absolute_error(Y_test, y_pred))
print('Mean Squared Error: ', metrics.mean_squared_error(Y_test, y_pred))
print('Root Mean Squared Error: ', np.sqrt(metrics.mean_squared_error(Y_test, y_pred)))

In [None]:
print(y_pred)

In [None]:
Y_test.shape

**Support Vector Machines(SVM)**

It is a type of supervised machine learning classification algorithm, which tries to find a boundary that divides the data in such a way that the misclassification error can be minimized.

**Problem statement**

To classify Muffins and Cupcakes using SVM

In [None]:
#import libaries, numpy, pandas and sklearn's svm
import pandas as pd
import numpy as np
from sklearn import svm

#libraries for visualization, matplotlib and seaborn as sns
import matplotlib.pyplot as plt
import seaborn as sns;
sns.set(font_scale=1.2)

%matplotlib inline

In [None]:
#load the dataset as recipes variable
recipes = pd.read_csv('../input/data-science-machine-learning-and-ai-using-python/recipes_muffins_cupcakes.csv')
recipes

In [None]:
sns.pairplot(recipes)

In [None]:
#plot the two ingrdients using sns.lmplot('Flour', 'Sugar',data=recipes,hue='Type', palette='Set1',fit_reg=False,scatter_kws={'s':70})
sns.lmplot('Flour', 'Sugar',data=recipes,hue='Type', palette='Set1',fit_reg=False,scatter_kws={'s':70});

In [None]:
#specifying inputs to the model as ingredients, convert it to matrix
ingredients = recipes[['Flour','Sugar']].to_numpy()
type_label = np.where(recipes['Type'] == 'Muffin',0,1)

In [None]:
#feature names
recipe_features = recipes.columns.values[1:].tolist()
recipe_features

In [None]:
#lets fit the model
model = svm.SVC(kernel='linear')
model.fit(ingredients, type_label)

In [None]:
#lets visualize the results
#Get the seperating hyperplane
w = model.coef_[0]
a = -w[0]/w[1]
xx = np.linspace(30, 60)
yy = a * xx - (model.intercept_[0])/w[1]

In [None]:
#plot the parallel to the seperating hyperplane that pass through the support vectors
b = model.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = model.support_vectors_[-1]
yy_up =  a * xx + (b[1] - a * b[0])

In [None]:
#lets plot the hyperplane
sns.lmplot('Flour', 'Sugar',data=recipes,hue='Type', 
           palette='Set1',fit_reg=False,
           scatter_kws={'s':70});
plt.plot(xx,yy,linewidth=2,color='black');

In [None]:
#lets plot the margins and support vectors
sns.lmplot('Flour', 'Sugar',data=recipes,hue='Type', 
           palette='Set1',fit_reg=False,
           scatter_kws={'s':70});
plt.plot(xx,yy,linewidth=2,color='black')
plt.plot(xx,yy_down,'k--')
plt.plot(xx,yy_up, 'k--')

plt.scatter(model.support_vectors_[:,0], model.support_vectors_[:,1], s= 80, facecolors= 'none')

In [None]:
#lets predict the new case
#lets define a function to guess when a recipe is muffin or a cupcake
def muffin_or_cupcake(flour, sugar):
  if(model.predict([[flour,sugar]])) == 0:
    print('You\'re looking at a muffin recipe')
  else:
    print('You\'re looking at a cupcake recipe')

In [None]:
#predict for 50 parts of flour and 20 parts of sugar
muffin_or_cupcake(50,20)

In [None]:
#lets plot the new point visually to see where the point lies
sns.lmplot('Flour', 'Sugar',data=recipes,hue='Type', 
           palette='Set1',fit_reg=False,
           scatter_kws={'s':70});
plt.plot(xx,yy,linewidth=2,color='black')

plt.plot(50,20,'yo',markersize='9')