 # Prototype based feture maps

### Import related packages and modules

In [1]:
import numpy as np
from sklearn.datasets import make_blobs
from PrototypeBasedFeaturemaps import Featuremap

### Manual mapping   (prototypes are chosen by practioner)

Our mapping functions have been incorporated into the [PrototypeBasedFeaturemaps](PrototypeBasedFeaturemaps.py) module. The creation of a mapping function is straightforward and requires only two parameters: the name of the mapping function and the metric used to calculate distance values.

Supported mapping names: 'Phi_1', 'Phi_M', Phi_1', 'Phi_M', Phi_NM'

Supported metrics: $\ell_1$: 'l1' norm and $\ell_2$: 'l2' norm

In [2]:
# create mapping functions
metric='l1'
phi_1= Featuremap(mapping='Phi_1', metric=metric)
phi_M= Featuremap(mapping='Phi_M', metric=metric)
phi_N= Featuremap(mapping='Phi_N', metric=metric)
phi_MN= Featuremap(mapping='Phi_MN', metric=metric)

In [3]:
# generate a dummy dataset
random_state = 42
n_samples = 500
centers = np.array([(-5, -5), (0, -5),(5,-5),(-5,0)]).reshape((4,2))
X,y=make_blobs(n_samples=n_samples, centers=centers, shuffle=False,random_state=random_state)
# binarize labels
y= np.array([i%2 for i in y])
print(X.shape)

(500, 2)


In [4]:
# select protypes
protos_1= centers[[0,2]]
protos_2= centers[[1,3]]
print(protos_1.shape)
print(protos_2.shape)

(2, 2)
(2, 2)


Note that, $\phi_1$ and $\phi_N$ utilize single prototype set.

In [5]:
# map from n to n+1 dimension 
XD_1=phi_1.map(X,protos_1)
print(XD_1.shape)

# map from n to n+n dimension
XD_N=phi_N.map(X,protos_1)
print(XD_N.shape)

(500, 3)
(500, 4)


In [6]:
protos=[protos_1,protos_2]

# map from n to n+M dimension
# M=2
XD_2=phi_M.map(X,protos)
print(XD_2.shape)

# map from n to n + m x n dimension
XD_MN=phi_MN.map(X,protos)
print(XD_MN.shape)

(500, 4)
(500, 6)


## Automatic mapping (prototypes are chosen by k-means)

In [Numerical_Prototype_Selection](Numerical_Prototype_Selection.py), a transformation function has been implemented utilizing scikit-learn transformers. This transformer applies the k-means clustering algorithm and designates cluster centers as prototypes. When either $\phi_1$ or $\phi_N$ is provided, clustering is applied to the entire dataset. Conversely, when $\phi_M$ or $\phi_{MN}$ is provided, clustering is applied to each class separately.

As an example, consider the following binary classification scenario where the feature map is $\phi_M$. In this case, the two-dimensional data is mapped to a four-dimensional feature space.

In [7]:
from Numerical_Prototype_Selection import ProtoTransformer

my_transformer=ProtoTransformer(feature_map=phi_M, n_proto=2)
my_transformer.fit(X,y)
XD=my_transformer.transform(X)
print(XD.shape)

(500, 4)


The implemented transformer is fully compatible with scikit-learn modules, including pipeline and grid-search. This compatibility enables the identification of the optimal number of prototypes in conjunction with classifier-specific hyperparameters.

In [8]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression as LR
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_classification

clf = LR(random_state=42,solver='liblinear', dual=False)

pipe_line= Pipeline(steps=[('mapping',my_transformer),
                        ('clf', clf)])

c_params=[0.1,1,10]
n_proto=[2,3,4]
grid_search_params= {'clf__C':c_params, 'mapping__n_proto': n_proto}

grid_search = GridSearchCV(estimator=pipe_line, param_grid=grid_search_params, refit=True, cv=2 , scoring='accuracy', n_jobs=1)

X, y = make_classification(n_samples=700, random_state=42,n_informative=5)

grid_search.fit(X,y)

print(grid_search.best_score_)
print(grid_search.best_params_)

0.7757142857142857
{'clf__C': 1, 'mapping__n_proto': 2}


In [9]:
prototype_sets= grid_search.best_estimator_['mapping'].proto
for p in prototype_sets:
    print(len(p))

2
2
