kNN using SAX+MINDIST #28

ManuelMonteiro24 · 2018-04-18T16:13:20Z

When using this class what are the available "metrics" parameters that can be used? only "dtw"? any recommendation if i would want to use euclidean or for example the SAX distance, on using this classifier on a dataset with a SAX representation?

rtavenar · 2018-04-19T15:52:17Z

Hi @ManuelMonteiro24 ,

You are right that at the time of your post, kNN in tslearn would only accept DTW as a metric. I've just added L1 and L2 distances (cf docs and gallery of examples) on the github.

Should be on PyPI from version 0.1.11 :)

For the SAX part of your question, I'll try to make an example using sklearn's Pipeline asap (for you and for the docs)

ManuelMonteiro24 · 2018-04-19T17:04:58Z

Oh nice! thank you for the information!

rtavenar · 2018-04-20T07:24:20Z

Now, the knn example in the gallery should cover both your questions.
Let me know if this answers all your questions or not.

rtavenar · 2018-04-21T11:30:31Z

@ManuelMonteiro24 : could you tell me if the knn example in the gallery covers your needs. If so, I would close the issue :)

ManuelMonteiro24 · 2018-04-24T00:36:55Z

Thanks! i done a quick test and everything seems ok. Just to warn there is a texting error in the page http://tslearn.readthedocs.io/en/latest/gen_modules/neighbors/tslearn.neighbors.KNeighborsTimeSeries.html in the metric parameters option it is written "eucidean" instead of "euclidean", i guess?

ManuelMonteiro24 · 2018-05-03T18:31:35Z

Just a question on the "nearest neighbors" example on the doc page, in the part of the SAX example you print "Nearest neighbor classification using SAX+L2". What do you mean by SAX+L2?
When you apply the SAX transformation, in comparing two time series (in the SAX representation space) you have to use the MINDIST() function, referenced on the SAX paper http://www.cs.ucr.edu/~eamonn/SAX.pdf, to get the difference measure between the two time series. Is that one that you are using in that example? if so why the "+L2" in the print? if not which do you use?

I ran the example two and changed the metric parameter in the KNeighborsTimeSeriesClassifier() between euclidean and dtw and the classification values differ, so i think the SAX() classification example in not functioning properly

ManuelMonteiro24 · 2018-05-03T18:32:13Z

sorry for the late doubt

rtavenar · 2018-05-04T11:47:26Z

OK, I re-open the issue, I agree I should be more careful on that.

I'll check asap (but cannot do it today nor next week).

ManuelMonteiro24 · 2018-05-07T16:46:22Z

ok thank you i will try to develop a solution if from the code that the Sax author has in matlab if it works i push it

ManuelMonteiro24 · 2018-05-08T14:21:21Z

I already could implement a solution to it but the code is not very pretty. From the experiences i done, it seems functioning properly. I had question, do you already used any cross validation method on the data withdraw and classified with the tslearn?? i wanted to apply it and i was wondering if someone already done or tough of it...

GillesVandewiele · 2019-09-10T19:27:25Z

Hi @rtavenar, this seems like an ideal issue for me as well as it will require to interact with sklearn utilities (a pipeline and cross-validation object). Is a SAX transformation already available in tslearn (e.g. in preprocessing module), and the MINDIST function (in the metrics module)? If not, I can look into implementing these as well.

rtavenar · 2019-09-10T20:06:26Z

That would be great!

SAX is already available as a transformer there: https://github.com/rtavenar/tslearn/blob/master/tslearn/piecewise.py#L250

But I don't think MINDIST is defined anywhere.

GillesVandewiele · 2019-09-24T17:46:49Z

Hi @rtavenar

Took me some time, but I managed to take a look at this one. I wrote a script that does kNN (k=1) on SAX representations using the MINDIST function of the original paper. I reproduced their result on the SyntheticControl dataset (accuracy of 0.97 for knn+SAX+MINDIST vs 0.88 for knn with euclidean on raw timeseries). I pasted it below. I had to add BaseEstimator to the classes in preprocessing (TimeSeriesScalerMeanVariance & TimeSeriesScalerMinMax) to make it work.

Shall I create a PR with this as a new example?

# -*- coding: utf-8 -*-
"""
1-NN with SAX + MINDIST
=======================

This example presents a comparison performs kNN with k=1 on SAX 
transformations of the SyntheticControl dataset. MINDIST from the original
paper is used as a distance metric.
"""

# Author: Gilles Vandewiele
# License: BSD 3 clause

import numpy
import matplotlib.pyplot as plt

from tslearn.datasets import UCR_UEA_datasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation, \
	SymbolicAggregateApproximation, \
    OneD_SymbolicAggregateApproximation

from sklearn.pipeline import Pipeline
from sklearn.metrics import pairwise_distances, accuracy_score, \
	confusion_matrix
from sklearn.preprocessing import FunctionTransformer
from sklearn.neighbors import KNeighborsClassifier

from scipy.stats import norm

numpy.random.seed(0)
# Generate a random walk time series
data_loader = UCR_UEA_datasets()
dataset = 'SyntheticControl'
X_train, y_train, X_test, y_test = data_loader.load_dataset(dataset)

def generate_lookup_table(a):
	return [norm.ppf(i * (1./a)) for i in range(a)][1:]

def calc_distances(X, y=None):
	X = X.reshape((X.shape[0], X.shape[1]))
	cardinality = numpy.max(X) + 1
	n = X_train.shape[1]
	w = X.shape[1]
	table = generate_lookup_table(cardinality)

	def point_dist(i, j):
		i = int(i)
		j = int(j)
		if abs(i - j) <= 1:
			return 0
		else:
			return table[max(i, j) - 1] - table[min(i, j)]

	def sax_mindist(x, y):
		point_dists = [point_dist(x[i], y[i]) ** 2 for i in range(w)]
		return numpy.sqrt(w / n) * numpy.sqrt(numpy.sum(point_dists))


	return pairwise_distances(X, metric=sax_mindist)

# Currently, I cannot append KNN to the pipeline, as it will raise problems
# Due to the fact that it will transform X_test to a distance matrix
# of (X_test.shape[0], X_test.shape[0])
pipe = Pipeline([
	(
		'norm', 
		TimeSeriesScalerMeanVariance()
	),
	(
		'transform', 
		SymbolicAggregateApproximation(n_segments=16, alphabet_size_avg=10)
	),
	(
		'distance', 
		FunctionTransformer(calc_distances, validate=False, pass_y=False)
	)
])

all_X = numpy.vstack((X_train, X_test))
distances = pipe.transform(all_X)

# We only need the distances to the timeseries from the training set.
# Both for the test and the training set.
X_train_dist = distances[:len(X_train), :len(X_train)]
X_test_dist = distances[len(X_train):, :len(X_train)]

knn = KNeighborsClassifier(n_neighbors=1, metric='precomputed')
knn.fit(X_train_dist, y_train)
predictions = knn.predict(X_test_dist)
print('Accuracy score on test set with KNN on SAX using MINDIST')
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))

knn = KNeighborsClassifier(n_neighbors=1, metric='euclidean')
knn.fit(X_train.reshape((X_train.shape[0], X_train.shape[1])), y_train)
predictions = knn.predict(X_test.reshape((X_test.shape[0], X_test.shape[1])))
print('Accuracy score on test set with KNN on raw ts using euclidean dist')
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))

rtavenar · 2019-09-24T18:34:01Z

Hi @GillesVandewiele

This is great ! It would definitely deserve a PR and we could discuss there if mindist should, or not, be added as a new metric in tslearn, and if so, how we could do that.

GillesVandewiele · 2019-09-24T18:36:41Z

I have been thinking about it as well, but the problem is that my code assumes that a SAX transformation has been applied on the input data. We could add this perhaps as a private method and create an extra timeseries metric sax for the TimeSeriesKNN of tslearn?

rtavenar · 2019-09-24T18:50:21Z

Wait, I did not check that recently. Is your distance doing something different from this one? Sorry, all this is a bit old to me.

GillesVandewiele · 2019-09-24T18:53:17Z

Ow haha, I did not notice that one as well... Seems like that one indeed does the same thing. I'll check to integrate that one later instead.

GillesVandewiele · 2019-09-27T09:56:13Z

Hi Romain,

Just to make sure, do you agree that having sax+mindist as a new metric would be a nice to have in tslearn? It can be integrated in the TimeSeriesKNeighborsClassifier/Regressor. I will probably have time this weekend to implement that.

If you don't think it is an interesting addition, you can close this issue, as mindist and sax where in here all along ;)

rtavenar · 2019-09-27T10:03:28Z

Hi, I do think it would be an interesting addition (gallery example + additional metric in kNN estimators). Not sure how to deal with it though, but if you have ideas, I'll be happy to read them. Best, Romain

…

Le 27 sept. 2019 à 11:56, Gilles Vandewiele ***@***.***> a écrit : Hi Romain, Just to make sure, do you agree that having sax+mindist as a new metric would be a nice to have in tslearn? It can be integrated in the TimeSeriesKNeighborsClassifier/Regressor. I will probably have time this weekend to implement that. If you don't think it is an interesting addition, you can close this issue, as mindist and sax where in here all along ;) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AELARE52F4ITLWRW74KCL5DQLXKD7ANCNFSM4E3JODBQ>.

GillesVandewiele · 2019-09-27T10:04:12Z

I will give it a look this weekend and create a PR for it :) (if I find the time at least ;) )

GillesVandewiele · 2020-03-29T13:27:01Z

I guess this one can now be closed @rtavenar :)

rtavenar · 2020-03-29T13:51:44Z

Yep, this was implemented in #152 and is now merged to dev branch. Will be available from version 0.4

rtavenar added a commit that referenced this issue Apr 24, 2018

Fixed typo in docs as suggested in Issue #28

3177a64

rtavenar closed this as completed Apr 24, 2018

rtavenar reopened this May 4, 2018

rtavenar added the new feature label Aug 26, 2019

rtavenar changed the title ~~Doubt using KNeighborsTimeSeriesClassifier()~~ kNN using SAX+MINDIST Aug 26, 2019

rtavenar closed this as completed Mar 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kNN using SAX+MINDIST #28

kNN using SAX+MINDIST #28

ManuelMonteiro24 commented Apr 18, 2018

rtavenar commented Apr 19, 2018

ManuelMonteiro24 commented Apr 19, 2018

rtavenar commented Apr 20, 2018

rtavenar commented Apr 21, 2018

ManuelMonteiro24 commented Apr 24, 2018

ManuelMonteiro24 commented May 3, 2018 •

edited

ManuelMonteiro24 commented May 3, 2018

rtavenar commented May 4, 2018

ManuelMonteiro24 commented May 7, 2018

ManuelMonteiro24 commented May 8, 2018

GillesVandewiele commented Sep 10, 2019

rtavenar commented Sep 10, 2019

GillesVandewiele commented Sep 24, 2019

rtavenar commented Sep 24, 2019

GillesVandewiele commented Sep 24, 2019

rtavenar commented Sep 24, 2019

GillesVandewiele commented Sep 24, 2019

GillesVandewiele commented Sep 27, 2019

rtavenar commented Sep 27, 2019 via email

GillesVandewiele commented Sep 27, 2019

GillesVandewiele commented Mar 29, 2020

rtavenar commented Mar 29, 2020

kNN using SAX+MINDIST #28

kNN using SAX+MINDIST #28

Comments

ManuelMonteiro24 commented Apr 18, 2018

rtavenar commented Apr 19, 2018

ManuelMonteiro24 commented Apr 19, 2018

rtavenar commented Apr 20, 2018

rtavenar commented Apr 21, 2018

ManuelMonteiro24 commented Apr 24, 2018

ManuelMonteiro24 commented May 3, 2018 • edited

ManuelMonteiro24 commented May 3, 2018

rtavenar commented May 4, 2018

ManuelMonteiro24 commented May 7, 2018

ManuelMonteiro24 commented May 8, 2018

GillesVandewiele commented Sep 10, 2019

rtavenar commented Sep 10, 2019

GillesVandewiele commented Sep 24, 2019

rtavenar commented Sep 24, 2019

GillesVandewiele commented Sep 24, 2019

rtavenar commented Sep 24, 2019

GillesVandewiele commented Sep 24, 2019

GillesVandewiele commented Sep 27, 2019

rtavenar commented Sep 27, 2019 via email

GillesVandewiele commented Sep 27, 2019

GillesVandewiele commented Mar 29, 2020

rtavenar commented Mar 29, 2020

ManuelMonteiro24 commented May 3, 2018 •

edited