### UPDATED - 18/08  [Dataset Change]

Some Lesser Known MultiClass Classifier from SKLearn

In this notebook, we're going to have a look at some of the lesser known **Inherently MultiClass Classifier Algorithm** from SK-Learn library. 

* Extra Tree Classifier [sklearn.tree module]
* Extra Tree Classifier [sklearn.ensemble module]
* MLP Classifier
* Nearest Centroid
* Quadratic Discriminant Analysis
* Radius Neighbors Classifier
* Ridge Classifier

We'll be using the [Abalone Dataset](https://archive.ics.uci.edu/ml/datasets/abalone) Dataset for the multi-class classification.

As always, I'll keep the notebook organized & well commented for easy reading. Please do consider to UPVOTE if you find it helpful.


# Libraries

In [None]:
# Generic
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os, warnings, gc
warnings.filterwarnings("ignore")

# Sklearn Classifier Algorithm
from sklearn.tree import ExtraTreeClassifier
from sklearn.ensemble import ExtraTreesClassifier, BaggingClassifier, RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import NearestCentroid, RadiusNeighborsClassifier
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.linear_model import RidgeClassifier

# Sklearn (other)
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

from tabulate import tabulate


# Data

In [None]:
url = '../input/all-datasets-for-practicing-ml/Class/Class_Abalone.csv'
data = pd.read_csv(url, header='infer')

In [None]:
# Total Records
print("Total Records: ", data.shape[0])

In [None]:
# Check for empty/null/missing records
print("Is Dataset Empty: ", data.empty)

In [None]:
# Records per Classes
data.Sex.value_counts()

# Data Prep

We'll use Label Encoder to convert the 'Sex' column to numerical format for easy ingestion by the algorithms

In [None]:
# Instantiating Label Encoder
encoder = LabelEncoder()

# Columns List
columns = data.columns

# Encode the column 
data['Sex']= encoder.fit_transform(data['Sex']) 
    

In [None]:
# Inspect
data.head()

# Feature Engineering, Data Split & Feature Scaling

In [None]:
# Feature & Target Selection
target = ['Sex']   
features = columns [1:]

X = data[features]
y = data[target]


# Dataset Split
''' Training = 90% & Validation = 10%  '''
test_size = 0.1
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=test_size, random_state=0, shuffle=True) 


# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_val = sc.transform(X_val)


# Classification

## Extra Tree Classifier (Tree Module)

An extremely randomized tree classifier.

Extra-trees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.

**Note**: Extra-trees should only be used within ensemble methods.

In this notebook, will be using another unknown ensemble classifier i.e. BaggingClassifier

In [None]:
# Instantiate Extra Tree Classifier
et = ExtraTreeClassifier(random_state=1)

# Bagging Classifier
bgc = BaggingClassifier(et, random_state=1, max_features=8, verbose=0)

# Train 
bgc.fit(X_train, y_train)

# Prediction
y_pred = bgc.predict(X_val)

# Accuracy
print("Extra Tree Classifier(Tree Module) Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))

tab_data = []
tab_data.append(['Extra Tree(Tree)', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

## Extra Tree Classifier (Ensemble Module)

An extra-trees classifier.

This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

In [None]:
# Instantiate Classifier
etc = ExtraTreesClassifier(n_estimators=100, max_depth= 5,
                           verbose=0, random_state=1)

# Train
etc.fit(X_train, y_train)

# Prediction
y_pred = etc.predict(X_val)

# Accuracy
print("Extra Tree Classifier(Ensemble Module) Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))
tab_data.append(['Extra Tree(Ensemble)', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

## MLP Classifier

Multi-layer Perceptron classifier.

This model optimizes the log-loss function using LBFGS or stochastic gradient descent.

In [None]:
# Instantiate Classifier
mlp = MLPClassifier(random_state=1, max_iter=300,solver='sgd',
                    batch_size=200, learning_rate='adaptive', learning_rate_init=0.001,
                    shuffle=True, verbose=0)

# Train
mlp.fit(X_train, y_train)

# Prediction
y_pred = mlp.predict(X_val)

# Accuracy
print("MLP Classifier Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))
tab_data.append(['MLP', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

## Nearest Centroid

Nearest centroid classifier.

Each class is represented by its centroid, with test samples classified to the class with the nearest centroid.

In [None]:
# Instantiate Classifier
nc = NearestCentroid()

# Train
nc.fit(X_train, y_train)

# Prediction
y_pred = nc.predict(X_val)

# Accuracy
print("Nearest Centroid Classifier Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))
tab_data.append(['Nearest Centroid', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

## Quadratic Discriminant Analysis

Quadratic Discriminant Analysis

A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayesâ€™ rule.

The model fits a Gaussian density to each class.

In [None]:
# Instantiate Classifier
qda = QuadraticDiscriminantAnalysis()

# Train
qda.fit(X_train, y_train)

# Prediction
y_pred = qda.predict(X_val)

# Accuracy
print("Quadratic Discriminant Analysis Classifier Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))
tab_data.append(['Quadratic Discriminant Analysis', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

## Radius Neighbours Classifier

Classifier implementing a vote among neighbors within a given radius

In [None]:
# Instantiate Classifier
rnc = RadiusNeighborsClassifier(radius=2.0, )

# Train
rnc.fit(X_train, y_train)

# Prediction
y_pred = rnc.predict(X_val)

# Accuracy
print("Radius Neighbours Classifier Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))
tab_data.append(['Radius Neighbours', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

## Ridge Classifier

Classifier using Ridge regression.

This classifier first converts the target values into {-1, 1} and then treats the problem as a regression task (multi-output regression in the multiclass case).

In [None]:
# Instantiate Classifier
rc = RidgeClassifier(class_weight='balanced', random_state=1)

# Train
rc.fit(X_train, y_train)

# Prediction
y_pred = rc.predict(X_val)

# Accuracy
print("Ridge Classifier Accuracy: ", '{:.2%}'.format(accuracy_score(y_val, y_pred)))
tab_data.append(['Ridge Classifier', '{:.2%}'.format(accuracy_score(y_val, y_pred))])

In [None]:
print(tabulate(tab_data, headers=['Classifiers','Accuracy'], tablefmt='pretty'))

As we can observe, the accuracy of these classifiers are fairely close to each other. The next logical step would be to fine-tune the parameters to increase the accuracy. 