<center><h1>Classification by local modeling</h1></center>

### Summary:
1. [Introduction](#introduction)

2. [Local Ordinary Least Squares (L-OLS)](#lols)
    
    2.1. [Influence of the number of clusters on model accuracy](#lols-#-clusters)
    
3. [Local Least Squares Support Vector Machine (L-LSSVM)](#l_lssvm)

### 1. Introduction <a class="anchor" id="introduction"></a>

Classic classification by local modeling is a two-step approach for modeling:

1. An unsupervised clustering algorithm is run to find regions in the dataset;
2. For each region, a model is built with the respective data partition.

For inference the procedure is similar:

1. A similarity metric is used to determine the new data point region, e.g. euclidian distance from regions prototypes;
2. The model from that specific region is used to predict the class of the new data point.

There are a lot of clustering algorithms but, for the sake of simplicity, it will be used only K-means.

The class **LocalModel**, implemented below, create an easy way to implement and test local models for classification:

In [7]:
from sklearn.cluster import KMeans
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

from load_dataset import datasets

from devcode import run_simulation
from devcode.models.lssvm import LSSVM

### 2. Local Ordinary Least Squares (L-OLS) <a class="anchor" id="lols"></a>

#### Description
Example of using local learning method using Ordinary Least Square (OSL) as base classifier.

In [8]:
%%time

# 1. Select the number of clusters (i.e., number of local regions)
n_clusters = 5

linear_clf = LinearRegression()
kmeans     = KMeans(n_clusters=n_clusters, n_init=10, random_state=0)

run_simulation(dataset_name="vc2c", kmeans=kmeans, clf_model=linear_clf)

Start of clusterization: 2022-08-05 12:57:53.535627




Start of local models training: 2022-08-05 12:57:54.093732
Train accuracy: 0.842741935483871
Test accuracy:  0.8225806451612904

CPU times: total: 7.27 s
Wall time: 684 ms


In [9]:
%%time

# 1. Select the number of clusters (i.e., number of local regions)
n_clusters = 5

linear_clf = LSSVM(gamma=1, kernel='rbf', sigma=4)
kmeans     = KMeans(n_clusters=n_clusters, n_init=10, random_state=0)

run_simulation(dataset_name="vc2c", kmeans=kmeans, clf_model=linear_clf)

Start of clusterization: 2022-08-05 12:57:54.281287




Start of local models training: 2022-08-05 12:57:54.845466
Train accuracy: 0.8225806451612904
Test accuracy:  0.8225806451612904

CPU times: total: 7.75 s
Wall time: 690 ms


### 3.  Local Least Squares Support Vector Machine (L-LSSVM)  <a class="anchor" id="l_lssvm"></a>

#### Description
Example of using local learning method using Least Square Support Vector Machine (LSSVM) as base classifier.

In [10]:
%%time
# %autoreload
import numpy as np 

from sklearn.cluster import KMeans
from devcode.models.lssvm import LSSVM
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

dataset_name = "vc2c"

clf_dict = {
    'linear': LSSVM(gamma=1, kernel='linear'),
    'poly'  : LSSVM(gamma=1, kernel='poly', d=2),
    'rbf'   : LSSVM(gamma=1, kernel='rbf', sigma=1)
}

print(dataset_name)
n_train = datasets[dataset_name]['features'].values.shape[0]

k_values = np.linspace(2, np.ceil(np.sqrt(len(X_train))), num=5, dtype='int').tolist() # 2 to sqrt(N)
    
for n_clusters in k_values:
    kmeans = KMeans(n_clusters=n_clusters, n_init=10, random_state=0)

    for kernel_type, clf in clf_dict.items():
        print(f'Nº of clusters: {n_clusters} | Kernel: {kernel_type}')
        run_simulation(dataset_name=dataset_name, kmeans=kmeans, clf_model=clf)

vc2c


NameError: name 'np' is not defined