Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kNN using SAX+MINDIST #28

Closed
ManuelMonteiro24 opened this issue Apr 18, 2018 · 22 comments
Closed

kNN using SAX+MINDIST #28

ManuelMonteiro24 opened this issue Apr 18, 2018 · 22 comments

Comments

@ManuelMonteiro24
Copy link

When using this class what are the available "metrics" parameters that can be used? only "dtw"? any recommendation if i would want to use euclidean or for example the SAX distance, on using this classifier on a dataset with a SAX representation?

@rtavenar
Copy link
Member

Hi @ManuelMonteiro24 ,

You are right that at the time of your post, kNN in tslearn would only accept DTW as a metric. I've just added L1 and L2 distances (cf docs and gallery of examples) on the github.

Should be on PyPI from version 0.1.11 :)

For the SAX part of your question, I'll try to make an example using sklearn's Pipeline asap (for you and for the docs)

@ManuelMonteiro24
Copy link
Author

Oh nice! thank you for the information!

@rtavenar
Copy link
Member

Now, the knn example in the gallery should cover both your questions.
Let me know if this answers all your questions or not.

@rtavenar
Copy link
Member

@ManuelMonteiro24 : could you tell me if the knn example in the gallery covers your needs. If so, I would close the issue :)

@ManuelMonteiro24
Copy link
Author

Thanks! i done a quick test and everything seems ok. Just to warn there is a texting error in the page http://tslearn.readthedocs.io/en/latest/gen_modules/neighbors/tslearn.neighbors.KNeighborsTimeSeries.html in the metric parameters option it is written "eucidean" instead of "euclidean", i guess?

@ManuelMonteiro24
Copy link
Author

ManuelMonteiro24 commented May 3, 2018

Just a question on the "nearest neighbors" example on the doc page, in the part of the SAX example you print "Nearest neighbor classification using SAX+L2". What do you mean by SAX+L2?
When you apply the SAX transformation, in comparing two time series (in the SAX representation space) you have to use the MINDIST() function, referenced on the SAX paper http://www.cs.ucr.edu/~eamonn/SAX.pdf, to get the difference measure between the two time series. Is that one that you are using in that example? if so why the "+L2" in the print? if not which do you use?

I ran the example two and changed the metric parameter in the KNeighborsTimeSeriesClassifier() between euclidean and dtw and the classification values differ, so i think the SAX() classification example in not functioning properly

@ManuelMonteiro24
Copy link
Author

sorry for the late doubt

@rtavenar
Copy link
Member

rtavenar commented May 4, 2018

OK, I re-open the issue, I agree I should be more careful on that.

I'll check asap (but cannot do it today nor next week).

@rtavenar rtavenar reopened this May 4, 2018
@ManuelMonteiro24
Copy link
Author

ok thank you i will try to develop a solution if from the code that the Sax author has in matlab if it works i push it

@ManuelMonteiro24
Copy link
Author

I already could implement a solution to it but the code is not very pretty. From the experiences i done, it seems functioning properly. I had question, do you already used any cross validation method on the data withdraw and classified with the tslearn?? i wanted to apply it and i was wondering if someone already done or tough of it...

@rtavenar rtavenar changed the title Doubt using KNeighborsTimeSeriesClassifier() kNN using SAX+MINDIST Aug 26, 2019
@GillesVandewiele
Copy link
Contributor

Hi @rtavenar, this seems like an ideal issue for me as well as it will require to interact with sklearn utilities (a pipeline and cross-validation object). Is a SAX transformation already available in tslearn (e.g. in preprocessing module), and the MINDIST function (in the metrics module)? If not, I can look into implementing these as well.

@rtavenar
Copy link
Member

That would be great!

SAX is already available as a transformer there: https://github.com/rtavenar/tslearn/blob/master/tslearn/piecewise.py#L250

But I don't think MINDIST is defined anywhere.

@GillesVandewiele
Copy link
Contributor

Hi @rtavenar

Took me some time, but I managed to take a look at this one. I wrote a script that does kNN (k=1) on SAX representations using the MINDIST function of the original paper. I reproduced their result on the SyntheticControl dataset (accuracy of 0.97 for knn+SAX+MINDIST vs 0.88 for knn with euclidean on raw timeseries). I pasted it below. I had to add BaseEstimator to the classes in preprocessing (TimeSeriesScalerMeanVariance & TimeSeriesScalerMinMax) to make it work.

Shall I create a PR with this as a new example?

# -*- coding: utf-8 -*-
"""
1-NN with SAX + MINDIST
=======================

This example presents a comparison performs kNN with k=1 on SAX 
transformations of the SyntheticControl dataset. MINDIST from the original
paper is used as a distance metric.
"""

# Author: Gilles Vandewiele
# License: BSD 3 clause

import numpy
import matplotlib.pyplot as plt

from tslearn.datasets import UCR_UEA_datasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation, \
	SymbolicAggregateApproximation, \
    OneD_SymbolicAggregateApproximation

from sklearn.pipeline import Pipeline
from sklearn.metrics import pairwise_distances, accuracy_score, \
	confusion_matrix
from sklearn.preprocessing import FunctionTransformer
from sklearn.neighbors import KNeighborsClassifier

from scipy.stats import norm

numpy.random.seed(0)
# Generate a random walk time series
data_loader = UCR_UEA_datasets()
dataset = 'SyntheticControl'
X_train, y_train, X_test, y_test = data_loader.load_dataset(dataset)

def generate_lookup_table(a):
	return [norm.ppf(i * (1./a)) for i in range(a)][1:]

def calc_distances(X, y=None):
	X = X.reshape((X.shape[0], X.shape[1]))
	cardinality = numpy.max(X) + 1
	n = X_train.shape[1]
	w = X.shape[1]
	table = generate_lookup_table(cardinality)

	def point_dist(i, j):
		i = int(i)
		j = int(j)
		if abs(i - j) <= 1:
			return 0
		else:
			return table[max(i, j) - 1] - table[min(i, j)]

	def sax_mindist(x, y):
		point_dists = [point_dist(x[i], y[i]) ** 2 for i in range(w)]
		return numpy.sqrt(w / n) * numpy.sqrt(numpy.sum(point_dists))


	return pairwise_distances(X, metric=sax_mindist)

# Currently, I cannot append KNN to the pipeline, as it will raise problems
# Due to the fact that it will transform X_test to a distance matrix
# of (X_test.shape[0], X_test.shape[0])
pipe = Pipeline([
	(
		'norm', 
		TimeSeriesScalerMeanVariance()
	),
	(
		'transform', 
		SymbolicAggregateApproximation(n_segments=16, alphabet_size_avg=10)
	),
	(
		'distance', 
		FunctionTransformer(calc_distances, validate=False, pass_y=False)
	)
])

all_X = numpy.vstack((X_train, X_test))
distances = pipe.transform(all_X)

# We only need the distances to the timeseries from the training set.
# Both for the test and the training set.
X_train_dist = distances[:len(X_train), :len(X_train)]
X_test_dist = distances[len(X_train):, :len(X_train)]

knn = KNeighborsClassifier(n_neighbors=1, metric='precomputed')
knn.fit(X_train_dist, y_train)
predictions = knn.predict(X_test_dist)
print('Accuracy score on test set with KNN on SAX using MINDIST')
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))

knn = KNeighborsClassifier(n_neighbors=1, metric='euclidean')
knn.fit(X_train.reshape((X_train.shape[0], X_train.shape[1])), y_train)
predictions = knn.predict(X_test.reshape((X_test.shape[0], X_test.shape[1])))
print('Accuracy score on test set with KNN on raw ts using euclidean dist')
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))

@rtavenar
Copy link
Member

Hi @GillesVandewiele

This is great ! It would definitely deserve a PR and we could discuss there if mindist should, or not, be added as a new metric in tslearn, and if so, how we could do that.

@GillesVandewiele
Copy link
Contributor

I have been thinking about it as well, but the problem is that my code assumes that a SAX transformation has been applied on the input data. We could add this perhaps as a private method and create an extra timeseries metric sax for the TimeSeriesKNN of tslearn?

@rtavenar
Copy link
Member

Wait, I did not check that recently. Is your distance doing something different from this one? Sorry, all this is a bit old to me.

@GillesVandewiele
Copy link
Contributor

Ow haha, I did not notice that one as well... Seems like that one indeed does the same thing. I'll check to integrate that one later instead.

@GillesVandewiele
Copy link
Contributor

Hi Romain,

Just to make sure, do you agree that having sax+mindist as a new metric would be a nice to have in tslearn? It can be integrated in the TimeSeriesKNeighborsClassifier/Regressor. I will probably have time this weekend to implement that.

If you don't think it is an interesting addition, you can close this issue, as mindist and sax where in here all along ;)

@rtavenar
Copy link
Member

rtavenar commented Sep 27, 2019 via email

@GillesVandewiele
Copy link
Contributor

I will give it a look this weekend and create a PR for it :) (if I find the time at least ;) )

@GillesVandewiele
Copy link
Contributor

I guess this one can now be closed @rtavenar :)

@rtavenar
Copy link
Member

Yep, this was implemented in #152 and is now merged to dev branch. Will be available from version 0.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants