New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kNN using SAX+MINDIST #28
Comments
Hi @ManuelMonteiro24 , You are right that at the time of your post, kNN in tslearn would only accept DTW as a metric. I've just added L1 and L2 distances (cf docs and gallery of examples) on the github. Should be on PyPI from version 0.1.11 :) For the SAX part of your question, I'll try to make an example using |
Oh nice! thank you for the information! |
Now, the knn example in the gallery should cover both your questions. |
@ManuelMonteiro24 : could you tell me if the knn example in the gallery covers your needs. If so, I would close the issue :) |
Thanks! i done a quick test and everything seems ok. Just to warn there is a texting error in the page http://tslearn.readthedocs.io/en/latest/gen_modules/neighbors/tslearn.neighbors.KNeighborsTimeSeries.html in the metric parameters option it is written "eucidean" instead of "euclidean", i guess? |
Just a question on the "nearest neighbors" example on the doc page, in the part of the SAX example you print "Nearest neighbor classification using SAX+L2". What do you mean by SAX+L2? I ran the example two and changed the metric parameter in the KNeighborsTimeSeriesClassifier() between euclidean and dtw and the classification values differ, so i think the SAX() classification example in not functioning properly |
sorry for the late doubt |
OK, I re-open the issue, I agree I should be more careful on that. I'll check asap (but cannot do it today nor next week). |
ok thank you i will try to develop a solution if from the code that the Sax author has in matlab if it works i push it |
I already could implement a solution to it but the code is not very pretty. From the experiences i done, it seems functioning properly. I had question, do you already used any cross validation method on the data withdraw and classified with the tslearn?? i wanted to apply it and i was wondering if someone already done or tough of it... |
Hi @rtavenar, this seems like an ideal issue for me as well as it will require to interact with sklearn utilities (a pipeline and cross-validation object). Is a SAX transformation already available in tslearn (e.g. in preprocessing module), and the MINDIST function (in the metrics module)? If not, I can look into implementing these as well. |
That would be great! SAX is already available as a transformer there: https://github.com/rtavenar/tslearn/blob/master/tslearn/piecewise.py#L250 But I don't think MINDIST is defined anywhere. |
Hi @rtavenar Took me some time, but I managed to take a look at this one. I wrote a script that does kNN (k=1) on SAX representations using the MINDIST function of the original paper. I reproduced their result on the SyntheticControl dataset (accuracy of 0.97 for knn+SAX+MINDIST vs 0.88 for knn with euclidean on raw timeseries). I pasted it below. I had to add BaseEstimator to the classes in preprocessing (TimeSeriesScalerMeanVariance & TimeSeriesScalerMinMax) to make it work. Shall I create a PR with this as a new example? # -*- coding: utf-8 -*-
"""
1-NN with SAX + MINDIST
=======================
This example presents a comparison performs kNN with k=1 on SAX
transformations of the SyntheticControl dataset. MINDIST from the original
paper is used as a distance metric.
"""
# Author: Gilles Vandewiele
# License: BSD 3 clause
import numpy
import matplotlib.pyplot as plt
from tslearn.datasets import UCR_UEA_datasets
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
from tslearn.piecewise import PiecewiseAggregateApproximation, \
SymbolicAggregateApproximation, \
OneD_SymbolicAggregateApproximation
from sklearn.pipeline import Pipeline
from sklearn.metrics import pairwise_distances, accuracy_score, \
confusion_matrix
from sklearn.preprocessing import FunctionTransformer
from sklearn.neighbors import KNeighborsClassifier
from scipy.stats import norm
numpy.random.seed(0)
# Generate a random walk time series
data_loader = UCR_UEA_datasets()
dataset = 'SyntheticControl'
X_train, y_train, X_test, y_test = data_loader.load_dataset(dataset)
def generate_lookup_table(a):
return [norm.ppf(i * (1./a)) for i in range(a)][1:]
def calc_distances(X, y=None):
X = X.reshape((X.shape[0], X.shape[1]))
cardinality = numpy.max(X) + 1
n = X_train.shape[1]
w = X.shape[1]
table = generate_lookup_table(cardinality)
def point_dist(i, j):
i = int(i)
j = int(j)
if abs(i - j) <= 1:
return 0
else:
return table[max(i, j) - 1] - table[min(i, j)]
def sax_mindist(x, y):
point_dists = [point_dist(x[i], y[i]) ** 2 for i in range(w)]
return numpy.sqrt(w / n) * numpy.sqrt(numpy.sum(point_dists))
return pairwise_distances(X, metric=sax_mindist)
# Currently, I cannot append KNN to the pipeline, as it will raise problems
# Due to the fact that it will transform X_test to a distance matrix
# of (X_test.shape[0], X_test.shape[0])
pipe = Pipeline([
(
'norm',
TimeSeriesScalerMeanVariance()
),
(
'transform',
SymbolicAggregateApproximation(n_segments=16, alphabet_size_avg=10)
),
(
'distance',
FunctionTransformer(calc_distances, validate=False, pass_y=False)
)
])
all_X = numpy.vstack((X_train, X_test))
distances = pipe.transform(all_X)
# We only need the distances to the timeseries from the training set.
# Both for the test and the training set.
X_train_dist = distances[:len(X_train), :len(X_train)]
X_test_dist = distances[len(X_train):, :len(X_train)]
knn = KNeighborsClassifier(n_neighbors=1, metric='precomputed')
knn.fit(X_train_dist, y_train)
predictions = knn.predict(X_test_dist)
print('Accuracy score on test set with KNN on SAX using MINDIST')
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))
knn = KNeighborsClassifier(n_neighbors=1, metric='euclidean')
knn.fit(X_train.reshape((X_train.shape[0], X_train.shape[1])), y_train)
predictions = knn.predict(X_test.reshape((X_test.shape[0], X_test.shape[1])))
print('Accuracy score on test set with KNN on raw ts using euclidean dist')
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions)) |
This is great ! It would definitely deserve a PR and we could discuss there if mindist should, or not, be added as a new metric in tslearn, and if so, how we could do that. |
I have been thinking about it as well, but the problem is that my code assumes that a SAX transformation has been applied on the input data. We could add this perhaps as a private method and create an extra timeseries metric |
Wait, I did not check that recently. Is your distance doing something different from this one? Sorry, all this is a bit old to me. |
Ow haha, I did not notice that one as well... Seems like that one indeed does the same thing. I'll check to integrate that one later instead. |
Hi Romain, Just to make sure, do you agree that having If you don't think it is an interesting addition, you can close this issue, as mindist and sax where in here all along ;) |
Hi,
I do think it would be an interesting addition (gallery example + additional metric in kNN estimators).
Not sure how to deal with it though, but if you have ideas, I'll be happy to read them.
Best,
Romain
… Le 27 sept. 2019 à 11:56, Gilles Vandewiele ***@***.***> a écrit :
Hi Romain,
Just to make sure, do you agree that having sax+mindist as a new metric would be a nice to have in tslearn? It can be integrated in the TimeSeriesKNeighborsClassifier/Regressor. I will probably have time this weekend to implement that.
If you don't think it is an interesting addition, you can close this issue, as mindist and sax where in here all along ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#28>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AELARE52F4ITLWRW74KCL5DQLXKD7ANCNFSM4E3JODBQ>.
|
I will give it a look this weekend and create a PR for it :) (if I find the time at least ;) ) |
I guess this one can now be closed @rtavenar :) |
Yep, this was implemented in #152 and is now merged to dev branch. Will be available from version 0.4 |
When using this class what are the available "metrics" parameters that can be used? only "dtw"? any recommendation if i would want to use euclidean or for example the SAX distance, on using this classifier on a dataset with a SAX representation?
The text was updated successfully, but these errors were encountered: