New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shapelet Transform #135
Comments
Hi, I don't think that the shapelet transform algorithm is suited for your dataset. From what I understood, you have a single time series (with the values corresponding to variable
I would need more information about your data, because, from what I understood, the shapelet transform algorithm is not adapted to the data snippet that you gave. |
Thanks for your brief answer, So I have this huge one time series which I have sliced down to the number of parts(data snippets) , And every snippet is similar to this(above one) , now what I would like to do is to shapelet discovery first and then shapelet transform in order to detect anomalies in the time series data. That is why I was using the pyts library. as you said earlier Should I use every time snippet in the repository? thanks |
In this case, what you could do:
Here is a minimal working example: import numpy as np
from pyts.datasets import load_gunpoint
from pyts.transformation import ShapeletTransform
# Load a dataset
X, _, _, _ = load_gunpoint(return_X_y=True)
# Compute fake labels.
# This trick is needed because the current implementation requires labels
# to select the most discriminative shapelets, but it's irrelevant in your use case.
n_snippets = X.shape[0]
y = np.r_[np.zeros(n_snippets // 2), np.ones(n_snippets - n_snippets // 2)]
# Compute all the shapelets of length 9.
# Here we set 'n_shapelets' to a very high integer so that no selection is performed.
# This number must be higher than X.shape[0] * (X.shape[1] - window_size + 1) to avoid selection.
clf = ShapeletTransform(n_shapelets=int(1e9), window_sizes=[9])
X_new = clf.fit_transform(X, y)
# Compute statistics to identify anomalies.
# Here we find the shapelet with the highest mean distance to all the snippets.
idx = X_new.mean(axis=1).argmax()
print(idx) |
Hi , |
The shapelets are saved in the Here is a very minimal working example (following the previous one): import matplotlib.pyplot as plt
plt.plot(clf.shapelets_[idx], 'o-')
plt.show() |
Hi ,Thanks for your quick reply here is the glimpse of my code for the shapelets, I used pretty much your implementation where I created the matrix by breaking down by time snippet into the parts which the create the matrix with the split function and goes further for y variable , I have seen one example of shapelet transform and its plot with matplotlib which quite interesting. Thanks
|
Your code looks fine to me. |
I mean if you could help me with the visualization part of the shapelets as in the example of pyts as shown in the above link that would be great. |
You should not use the first n shapelets ( |
Thanks for your comment
|
I think that it is relevant, but as I said, there is a big issue with this code. Here, you pick the first 2 shapelets ( |
ok ,
|
You don't want to use clf = ShapeletTransform(n_shapelets=int(1e9), window_sizes=[9])
X_new = clf.fit_transform(X, y)
indices = np.argsort(X_new.mean(axis=1))[::-1]
for i, index in enumerate(indices[:4]):
idx, start, end = clf.indices_[index]
plt.plot(X[idx], color='C{}'.format(i), label='Sample {}'.format(idx))
plt.plot(np.arange(start, end), X[idx, start:end], lw=5, color='C{}'.format(i))
plt.xlabel('Time', fontsize=12)
plt.title(' shapelets', fontsize=14)
plt.show() |
Ok!, Can't we take the window size other than 9 ? Could you please tell me how can we decide that. |
Sorry for not answering this point! You can use any window size (as long as it is between 1 and the length of the snippet). You can try out several values, I would guess that the window size should depend on your use case (in terms of actual time, how many seconds / minutes / hours / days / weeks / months / years). |
Ok understood,
so is it the only way to get shapelets in the learning, what if I want shapelet with the maximum and the minimun mean distance from the clf |
To learn shapelets, you need labels (i.e., each data snippet is labeled). This is supervised learning. From what I understood, you don't have labels (because you only have one time series), and you want to do anomaly detection with unsupervised learning. So the learning shapelet approach is not relevant in your case (unless you have labels for your data snippets). |
Resolved |
Hello everyone,
I have a dataset , like this where Q0 is the feature value and TS is the time stamp , and I would like to apply shapelet transform on this csv file. and I have written code for this, but it is throwing an error saying
ValueError: could not convert string to float: '2018-03-02 00:58:19.202450'
Q0 TS
0.012364804744720459, 2018-03-02 00:44:51.303082
0.012344598770141602, 2018-03-02 00:44:51.375207
0.012604951858520508, 2018-03-02 00:44:51.475198
0.012307226657867432, 2018-03-02 00:44:51.575189
0.012397348880767822, 2018-03-02 00:44:51.675180
0.013141036033630371, 2018-03-02 00:44:51.775171
0.012811839580535889, 2018-03-02 00:44:51.875162
0.012950420379638672, 2018-03-02 00:44:51.975153
0.013257980346679688, 2018-03-02 00:44:52.075144
########################################
Code:
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from pyts.datasets import load_gunpoint
from pyts.transformation import ShapeletTransform
from datetime import time
Toy dataset
data=pd.read_csv('dataset11.csv')
pf=data.head(10)
y=data[['Q0']]
X=data[['TS']]
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=10)
print(X_train)
as columns.
dataframe = pd.DataFrame(
pf,columns=['TS', 'Q0'])
Changing the datatype of Date, from
Object to datetime64
#dataframe["Sample2"] = Sample2.time.strptime("%T")
Setting the Date as index
dataframe = dataframe.set_index("TS")
dataframe
setting figure size to 12, 10
plt.figure(figsize=(12, 6))
Labelling the axes and setting
a title
plt.xlabel("Time")
plt.ylabel("Values")
plt.title("Vibration")
plotting the "A" column alone
plt.plot(dataframe["Q0"])
plt.legend(loc='best', fontsize=8)
plt.show()
st = ShapeletTransform(window_sizes='auto', sort=True)
X_new = st.fit_transform(X_train, y_train)
print(X_new)
Visualize the four most discriminative shapelets
plt.figure(figsize=(6, 4))
for i, index in enumerate(st.indices_[:4]):
idx, start, end = index
plt.plot(X_train[idx], color='C{}'.format(i),
label='Sample {}'.format(idx))
plt.plot(np.arange(start, end), X_train[idx, start:end],
lw=5, color='C{}'.format(i))
plt.xlabel('Time', fontsize=12)
plt.title('The four most discriminative shapelets', fontsize=14)
plt.legend(loc='best', fontsize=8)
plt.show()
######################################
Can anyone help me with this to run this code and visualize the shapelet transform
shapelet.txt
The text was updated successfully, but these errors were encountered: