# 25. GRADUATE ADMISSIONS: MULTICLASS CLASSIFICATION
---

## 1. Introducing the Data

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
pd.set_option("display.max_columns", 99)
pd.set_option("display.max_rows", 999)
pd.set_option('precision', 3)

admission = pd.read_csv('data/Admission3_class')

train, test = train_test_split(admission, test_size=0.2, random_state=42)
X_train = train.drop('Chance of Admit', axis=1)
y_train = train['Chance of Admit']
X_test = test.drop('Chance of Admit', axis=1)
y_test = test['Chance of Admit']

print(X_train.shape, X_test.shape)
admission.head()

(400, 7) (100, 7)


Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,CGPA,Research,LOR,Chance of Admit
0,337,118,4,4.5,9.65,1,4.5,1
1,324,107,4,4.0,8.87,1,4.5,2
2,316,104,3,3.0,8.0,1,3.5,2
3,322,110,3,3.5,8.67,1,2.5,2
4,314,103,2,2.0,8.21,0,3.0,3


In [5]:
y_train.value_counts()

3    164
2    149
1     87
Name: Chance of Admit, dtype: int64

In [6]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_train[:5]

array([[0.62      , 0.67857143, 0.5       , 0.625     , 0.65064103,
        1.        , 0.71428571],
       [0.52      , 0.67857143, 0.75      , 0.75      , 0.55769231,
        0.        , 1.        ],
       [0.26      , 0.35714286, 0.5       , 0.625     , 0.54487179,
        0.        , 0.42857143],
       [0.48      , 0.53571429, 0.25      , 0.375     , 0.47115385,
        0.        , 0.71428571],
       [0.36      , 0.5       , 0.5       , 0.625     , 0.45192308,
        1.        , 0.28571429]])

## 2. Training a Few Classification Models
#### 2.i. SGD Classifier

In [10]:
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_score

sgd_clf = SGDClassifier(random_state=42)
scores = cross_val_score(sgd_clf, X_train, y_train, cv=4, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.75 0.75 0.67 0.71]
Avg_Accuracy_Score: 0.72


The accuracy score is worse than what we got when we did binary classification. SGD classifiers can directly classify instances into multiple classes. This is done by combining multiple binary classifiers in a “one versus all” (OVA),  also known as `one-vs-the-rest (OvR)`, scheme.
Let's try a one-vs-one strategy and see what happens

In [11]:
from sklearn.multiclass import OneVsOneClassifier

ovo_sgd = OneVsOneClassifier(sgd_clf)
scores = cross_val_score(ovo_sgd, X_train, y_train, cv=4, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.71 0.8  0.8  0.86]
Avg_Accuracy_Score: 0.7925


The score get's better if we use the one-vs-one strategy. 
#### 2.ii. KNN Classifier

In [13]:
from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()
scores = cross_val_score(knn_clf, X_train, y_train, cv=4, 
                         n_jobs=-1, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.79 0.77 0.79 0.81]
Avg_Accuracy_Score: 0.79


In [14]:
ovo_knn = OneVsOneClassifier(knn_clf)
scores = cross_val_score(ovo_knn, X_train, y_train, cv=4, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.79 0.77 0.79 0.82]
Avg_Accuracy_Score: 0.7925


#### 2.iii. SVM Classifier

In [15]:
from sklearn.svm import SVC

svm_clf = SVC()
scores = cross_val_score(svm_clf, X_train, y_train, cv=4, 
                         n_jobs=-1, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.79 0.77 0.82 0.9 ]
Avg_Accuracy_Score: 0.82


The score is actually pretty good, best so far. That's probably because under the hood, Scikit-Learn actually used the OvO strategy for SVM classifier. Let's try OVA and see if the score get's worse!

In [16]:
from sklearn.multiclass import OneVsRestClassifier

ova_svm = OneVsRestClassifier(SVC())
scores = cross_val_score(ova_svm, X_train, y_train, cv=4, 
                         n_jobs=-1, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.79 0.77 0.81 0.91]
Avg_Accuracy_Score: 0.8200000000000001


Interesting. So far this is great.
#### 2.iv. Random Forest Classifier

In [17]:
from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(random_state=42)
scores = cross_val_score(rf_clf, X_train, y_train, cv=4, 
                         n_jobs=-1, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.81 0.83 0.8  0.84]
Avg_Accuracy_Score: 0.8200000000000001


In [18]:
ovo_rf = OneVsOneClassifier(rf_clf)
scores = cross_val_score(ovo_rf, X_train, y_train, cv=4, 
                         n_jobs=-1, scoring="accuracy")
print('Accuracy_Scores:', scores)
print('Avg_Accuracy_Score:', scores.mean())

Accuracy_Scores: [0.8  0.83 0.79 0.85]
Avg_Accuracy_Score: 0.8175
