## Naive Bayes 

Naive Bayes Classifiers are a set of classifiers are based on the Bayes' theorem and also assume that each of the features are independant of each other.

![image.png](attachment:image.png)

More details on http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB


<p>This data frame contains the following columns:
</p>

<dl>
<dt>minority</dt><dd>
<p>Percentage black or Hispanic.
</p>
</dd>
<dt>crime</dt><dd>
<p>Rate of serious crimes per 1000 population.
</p>
</dd>
<dt>poverty</dt><dd>
<p>Percentage poor.
</p>
</dd>
<dt>language</dt><dd>
<p>Percentage having difficulty speaking or writing English.
</p>
</dd>
<dt>highschool</dt><dd>
<p>Percentage age 25 or older who had not finished highschool.
</p>
</dd>
<dt>housing</dt><dd>
<p>Percentage of housing in small, multiunit buildings.
</p>
</dd>
<dt>city</dt><dd><p>A factor with levels: 
<code>city</code>, major city; 
<code>state</code>, state or state-remainder.
</p>
</dd>
<dt>conventional</dt><dd>
<p>Percentage of households counted by conventional personal enumeration.
</p>
</dd>
<dt>undercount</dt><dd>
<p>Preliminary estimate of percentage undercount.
</p>
</dd>
</dl>


In [3]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np

data = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/car/Ericksen.csv')

X = data[["poverty","language","minority", "highschool", "housing" ]]
Y = data["crime"]

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.3)

NBClassifier = GaussianNB()
NBClassifier.fit(x_train, y_train)

print (NBClassifier.score(x_test, y_test))

x = np.asarray([60, 60, 23, 60, 12]).reshape(1, -1)

y = NBClassifier.predict(x)

print (y)

0.0
[55]


## Random Forest
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

![image.png](attachment:image.png)

### Parameters:
* n_estimators : integer, optional (default=10)
* max_depth : integer or None, optional (default=None)
* bootstrap : boolean, optional (default=True)
* verbose : int, optional (default=0)

More details on http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier


In [4]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np

data = pd.read_csv('https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/datasets/Titanic.csv')

data = data.apply(LabelEncoder().fit_transform)

X = data[["Class","Age","Sex"]]
Y = data["Survived"]

RFclassifer = RandomForestClassifier(n_estimators=20, oob_score=True)
RFclassifer.fit(X, Y)

print (RFclassifer.feature_importances_)
print (RFclassifer.oob_score_)

x = np.asarray(['1', 40, '1']).reshape(1, -1)

y = RFclassifer.predict(x)

print (y)

[0.55687749 0.18590849 0.25721402]
0.0
[1]
