Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with different l_s values #27

Merged
merged 2 commits into from Nov 23, 2019
Merged

Conversation

diogodcarvalho
Copy link
Contributor

Hello @khundman,

First of all congrats for the very nice work you have done with telemanom. After trying it out I noticed there was an issue when I changed the l_s values. The tests to check if the anomalies were detected in the expected range started giving unexpected results (although I could see the anomaly was actually being detected) and also the colored plot boxes seemed to be shifted in the jupyter notebook. I think you had already mentioned these problems of non adaptability of the code to l_s in issue #5

After going through the code what I understood is that the indices you made available in the .csv file don't represent the position of the anomaly in the initial array saved in the .npy file, but the position in the array y_test that is created after you pre-process the data for training (which indices are shifted l_s to the left as the first points are used for the prediction). Correct me if I am wrong on this one, but from what I experimented it seems to be this (hit my head against the walls a few times to understand what was the problem :) )

To solve this issue what I propose and have implemented is:

  1. Shift the indices in the .csv file to +250 so that they represent the index in the original array, this way we don't have to hard-code the value 250 somewhere in the code + anyone who wants to use their own data can create a labeled_anomalies.csv file like the one you have can do it without having to take into consideration this +250 thing. A problem that this +250 generated is that some indices went over the maximum length of the array. This happens for A-8/9, D-1/4/7/9/12, E-3, F-2/8 and T-2, in these cases I just set the index to the array length - 1. I think leaving them bigger than the maximum length wouldn't hurt the code anywhere, but it might generate questions from other people and you had already raised this question in issue Labeled anomalous sequences extend past end of sequences #6 also.

  2. Changed just a few lines in the code to take into consideration the chosen (variable) l_s value. This changes only happen right before returning E_seq back to the main file and on the plotting functions.

I re-run the code and everything seems to be running smoothly. In this commit I also included the results with this new indexing as the old results had to be removed because they contained the 'wrong' indexing according to the new rules.

I thought this was the best alternative to change as less code as possible, avoid hard coding values plus make it as general and user friendly as possible. Hope it helps !

@khundman khundman merged commit 4057b4d into khundman:master Nov 23, 2019
@khundman
Copy link
Owner

Thanks @diogodcarvalho, this makes sense to me and I really appreciate the work! I went ahead and merged and I'm also going to push a new version shortly that incorporates your changes as well as some other significant updates.

@khundman
Copy link
Owner

In my latest commit you can see your changes reflected in labeled_anomalies.csv as well as here

# additional shift is applied to indices so that they represent the
# position in the original data array, obtained from the .npy files,
# and not the position on y_test (See PR #27).
self.E_seq = [(e_seq[0] + self.config.l_s,
e_seq[1] + self.config.l_s) for e_seq in self.E_seq]
and here
'x0': r[0] - self.config.l_s,
'y0': _min,
'x1': r[1] - self.config.l_s,

@diogodcarvalho
Copy link
Contributor Author

Glad to help. I will then try out the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants