
# <span style="color:rgb(213,80,0)">Comparing MATLAB and Python runtimes</span>

One of the experiments in the CLASSIX paper [1] compares several clustering algorithms on the URI machine learning repository [2]. The Python code that generated these results, together with all the used hyperparameters, is available in the CLASSIX GitHub repository: [https://github.com/nla-group/classix/blob/master/exp/run_real_world.py](https://github.com/nla-group/classix/blob/master/exp/run_real_world.py)


Here we use these datasets to compare the runtimes of the MATLAB and Python implementations of CLASSIX. For the MATLAB, we test two variants: (i) pure MATLAB implementation, and (ii) using MATLAB with the MEX file <samp>matxsubmat.c</samp> to speed up submatrix-times-vector products.


Note that the below requires a working MATLAB-Python link.


In [1]:
clear all
addpath ..
%ari = @(a,b) py.sklearn.metrics.adjusted_rand_score(a,b);
ari = @(a,b) rand_index(double(a),double(b),'adjusted');

## Banknote dataset

In [2]:
ret = py.classix.loadData('Banknote');
data = double(ret{1});
labels = double(ret{2});
% z-normalization used in https://github.com/nla-group/classix/blob/master/exp/run_real_world.py
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
no_mex = struct('use_mex',0);
tic
[label, explain, out] = classix(data,0.21,41,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.044 seconds - classes: 2 - ARI: 0.87

In [3]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.21,41);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.031 seconds - classes: 2 - ARI: 0.87

In [4]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.21, verbose=0, minPts=int32(41));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.078 seconds - classes: 2 - ARI: 0.87

## Dermatology dataset

In [5]:
ret = py.classix.loadData('Dermatology');
data = double(ret{1});
data = data(:,1:33); % final column has NaN's?
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.4,4,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.019 seconds - classes: 6 - ARI: 0.45

In [6]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.4,4);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.017 seconds - classes: 6 - ARI: 0.45

In [7]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.4, verbose=0, minPts=int32(4));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.047 seconds - classes: 6 - ARI: 0.45

## Ecoli dataset

In [8]:
ret = py.classix.loadData('Ecoli');
data = double(ret{1});
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.3,4,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.015 seconds - classes: 7 - ARI: 0.56

In [9]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.3,4);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.012 seconds - classes: 7 - ARI: 0.56

In [10]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.3, verbose=0, minPts=int32(4));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.028 seconds - classes: 7 - ARI: 0.56

## Glass dataset

In [11]:
ret = py.classix.loadData('Glass');
data = double(ret{1});
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.725,1,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.009 seconds - classes: 26 - ARI: 0.23

In [12]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.725,1);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.009 seconds - classes: 26 - ARI: 0.23

In [13]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.725, verbose=0, minPts=int32(1));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.013 seconds - classes: 26 - ARI: 0.23

## Iris dataset

In [14]:
ret = py.classix.loadData('Iris');
data = double(ret{1});
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.225,4,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.008 seconds - classes: 4 - ARI: 0.56

In [15]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.225,4);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.007 seconds - classes: 4 - ARI: 0.56

In [16]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.225, verbose=0, minPts=int32(4));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.010 seconds - classes: 4 - ARI: 0.56

## Seeds dataset

In [17]:
ret = py.classix.loadData('Seeds');
data = double(ret{1});
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.15,9,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.013 seconds - classes: 3 - ARI: 0.70

In [18]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.15,9);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.010 seconds - classes: 3 - ARI: 0.70

In [19]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.15, verbose=0, minPts=int32(9));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.029 seconds - classes: 3 - ARI: 0.70

## Wine dataset

In [20]:
ret = py.classix.loadData('Wine');
data = double(ret{1});
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.425,4,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 0.011 seconds - classes: 2 - ARI: 0.47

In [21]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.425,4);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 0.010 seconds - classes: 2 - ARI: 0.47

In [22]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.425, verbose=0, minPts=int32(4));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 0.020 seconds - classes: 2 - ARI: 0.47

## Phoneme dataset

In [23]:
ret = py.classix.loadData('Phoneme');
data = double(ret{1});
labels = double(ret{2});
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.445,8,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M       runtime: 20.861 seconds - classes: 4 - ARI: 0.76

In [24]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.445,8);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(label)),ari(labels,label))

CLASSIX.M (MEX) runtime: 6.195 seconds - classes: 4 - ARI: 0.76

In [25]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.445, verbose=0, minPts=int32(8));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d - ARI: %3.2f\n',...
    toc,length(unique(double(clx.labels_))),ari(labels,clx.labels_))

CLASSIX.PY      runtime: 5.369 seconds - classes: 4 - ARI: 0.76

## VDU Signals dataset

In [26]:
ret = py.classix.loadData('vdu_signals');
data = double(ret);
data = (data - mean(data))./std(data);

% MATLAB CLASSIX
tic
[label, explain, out] = classix(data,0.3,1,no_mex);
fprintf('CLASSIX.M       runtime: %5.3f seconds - classes: %d\n',...
    toc,length(unique(label)))

CLASSIX.M       runtime: 1.028 seconds - classes: 11

In [27]:
% MATLAB CLASSIX (MEX)
tic
[label, explain, out] = classix(data,0.3,1);
fprintf('CLASSIX.M (MEX) runtime: %5.3f seconds - classes: %d\n',...
    toc,length(unique(label)))

CLASSIX.M (MEX) runtime: 1.034 seconds - classes: 11

In [28]:
% Python CLASSIX
tic
clx = py.classix.CLASSIX(radius=0.3, verbose=0, minPts=int32(1));
clx = clx.fit(data);
fprintf('CLASSIX.PY      runtime: %5.3f seconds - classes: %d\n',...
    toc,length(unique(double(clx.labels_))))

CLASSIX.PY      runtime: 2.860 seconds - classes: 9

In [29]:
fprintf('Agreement between M and PY: ARI %3.2f\n',ari(label,clx.labels_))

Agreement between M and PY: ARI 1.00

## Learn more about CLASSIX?

CLASSIX is a fast and memory-efficient clustering algorithm which produces explainable results. If you'd like to learn more, here are a couple of online resources:

-  arXiv paper: [https://arxiv.org/abs/2202.01456](https://arxiv.org/abs/2202.01456)
-  Python code: [https://github.com/nla-group/classix](https://github.com/nla-group/classix)
-  MATLAB code: [https://github.com/nla-group/classix-matlab/](https://github.com/nla-group/classix-matlab/)
-  YouTube video: [https://www.youtube.com/watch?v=K94zgRjFEYo](https://www.youtube.com/watch?v=K94zgRjFEYo)
## References

[1] X. Chen and S. Güttel. "Fast and explainable clustering based on sorting." arXiv: [https://arxiv.org/abs/2202.01456](https://arxiv.org/abs/2202.01456), 2022.


[2] D. Dua and C. Graff. "UCI machine learning repository." URL: [http://archive.ics.uci.edu/ml](http://archive.ics.uci.edu/ml), 2017.

