## Support Vector Machine 

Naive Bayes Classifiers are a set of classifiers are based on the Bayes' theorem and also assume that each of the features are independant of each other.

![image.png](attachment:image.png)

More details on http://scikit-learn.org/stable/modules/svm.html#regression


### Parameters
* C: penalty parameter, (default: 1)
* epsilon: margin of no-penalty, (default: 0.1)
* kernel: model function initializer linear, poly, rbf, sigmoid, precomputed (default: rbf)

In [1]:
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from collections import defaultdict
import pandas as pd
import numpy as np

data = pd.read_csv('https://query.data.world/s/rdlu6u4afsg3fszryln46xrw776vxb')

X = data[ ['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu', 
           'Fedu', 'Mjob', 'Fjob', 'reason', 'guardian', 'traveltime', 'studytime', 
           'failures', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery', 
           'higher', 'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc', 
           'Walc', 'health', 'absences', 'G1', 'G2']]
Y = data["G3"]

d = defaultdict(LabelEncoder)
X = X.apply(lambda x: d[x.name].fit_transform(x))

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.3)

SVM = SVR(kernel='linear')
SVM.fit(x_train, y_train)

print('Feature Importance: ', SVM.coef_)
print('Score: ', SVM.score(x_test, y_test))

Feature Importance:  [[ 0.10887157  0.10104368  0.05393444  0.05194819 -0.08258441  0.05632718
   0.02813767 -0.00201498 -0.05168766 -0.01377808 -0.01600327  0.04215425
   0.01967178  0.09496627 -0.26163174  0.22505828 -0.01586272  0.09919966
   0.02218945 -0.09976611 -0.2061171   0.03063396 -0.07324859  0.10980986
   0.02731936 -0.07624109  0.03072932  0.02025018 -0.01355843  0.01918552
  -0.01877061  1.06468683]]
Score:  0.8012824320648346


In [2]:
test = ['GP', 'M', 20, 'U', 'GT3', 'T', 1, 1, 'health', 'services', 'home', 'father',
        1, 2, 0, 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 2, 3, 4, 2, 1, 1, 2, 12, 15]
x = []
for t, c in zip(test, d):
    x.append(d[c].transform([t]))
x = np.asarray(x).reshape(1, -1)

y = SVM.predict(x)

print(y)

[15.47339225]


## Random Forest
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.

![image.png](attachment:image.png)

### Parameters:
* n_estimators : number of forests to construct, optional (default=10)
* max_depth : maximum depth of each forest, optional (default=None)
* bootstrap : using bootstrapping when training all trees, optional (default=True)
* verbose : debug information, optional (default=0)
* oob_score : compute average of correct classifications (default=False)

More details on http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier


In [3]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np

data = pd.read_csv('http://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv')

le = LabelEncoder()
data['Sex'] = le.fit_transform(data['Sex'])

X = data[["Pclass", "Age","Sex", "Siblings/Spouses Aboard", "Parents/Children Aboard", "Fare"]]
Y = data["Survived"]

RFclassifer = RandomForestClassifier(n_estimators=20, oob_score=True)
RFclassifer.fit(X, Y)

print ('Feature Importances', RFclassifer.feature_importances_)
print ('OOB score', RFclassifer.oob_score_)


Feature Importances [0.09067862 0.2755989  0.27612979 0.0484639  0.03288182 0.27624697]
OOB score 0.8038331454340474


In [4]:
test = np.asarray([1, 25, le.transform(['male']) , 1, 2, 24]).reshape(1, -1)

y = RFclassifer.predict(test)

print('No' if y == 0 else 'Yes')

No
