Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you share the code for preprocessing and explain meaning of each index of data? #3

Closed
NoPainNoCode opened this issue Jul 13, 2020 · 1 comment

Comments

@NoPainNoCode
Copy link

NoPainNoCode commented Jul 13, 2020

  1. Can you share the code used to preprocess the npz files in the dataset folder?
  2. And can you explain in detail the meaning of each index of data [below]?

[below]
data = np.load('./machine_temp.npz', mmap_mode='r', allow_pickle=True)
for i, k in enumerate(data.files):
print("i:{}, k:{}".format(i, k))
==========result==========
i:0, k:t
i:1, k:t_unit
i:2, k:readings
i:3, k:idx_anomaly
i:4, k:idx_split
i:5, k:training
i:6, k:test
i:7, k:train_m
i:8, k:train_std
i:9, k:t_train
i:10, k:t_test
i:11, k:idx_anomaly_test

@lin-shuyu
Copy link
Owner

Hi NoPainNoCode,

Thanks for your question!

I've added a demo ipython notebook in datasets/ folder. Please have a look there for the detailed pre-processing procedure. In summary, we only standardised the time series by removing the mean and normalising by the standard deviation of the original time series.

As for the meaning of the specific features in the loaded data, I will list the explanation below:

  1. t - timestamp for each reading in the time series.
  2. t_unit - unit for the interval between two consecutive timestamps.
  3. readings - the original time series values; same as the time series loaded from the original .csv file.
  4. idx_anomaly - indices where the anomalies occurred; computed from the anomaly timestamps from the original .csv file.
  5. idx_split - indices between which the training set is created. We took a section of the original time series where no anomalies have occurred as the training set.
  6. training - normalised time series for the training set.
  7. test - normalised time series for the test set.
  8. t_train - indices for the training set readings.
  9. t_test - indices for the test set readings.
  10. idx_anomaly_test - indices for the anomalies in the test set.

Hope this explanation is helpful for you!

Best wishes,
Lin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants