Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window_size and batch_size in config.yaml #45

Closed
kennethweitzel opened this issue Sep 29, 2020 · 1 comment
Closed

window_size and batch_size in config.yaml #45

kennethweitzel opened this issue Sep 29, 2020 · 1 comment

Comments

@kennethweitzel
Copy link

Hello @khundman,

to my understanding the window_size and batch_size form the number of historical error values (window_size * batch_size = h).

telemanom/config.yaml

Lines 7 to 11 in 26831a0

# number of values to evaluate in each batch
batch_size: 70
# number of trailing batches to use in error calculation
window_size: 30

I also know that values will be aggregated in windows of one minute and processed in batches of 70 minutes, as stated in your paper:

Telemetry values are aggregated into one minute windows and evaluated in batches of 70 minutes mimicking the downlink schedule for SMAP and our current system implementation.

I assume that means that one minute contains 30 values (so 1 value per 2 seconds). Is that correct?
The parameter h is then used to calculate the dynamic threshold and evaluate each batch.

Could you explain the reason for h to be divided into 2 seperate parameters? Why can't there be an h paramter of 2100 instead of 30 * 70 (window_size * batch_size) to define each batch? Is there a way to efficiantly configure these two parameters for a use case not dealing with with SMAP?

Thank you in advance!

@khundman
Copy link
Owner

I assume that means that one minute contains 30 values (so 1 value per 2 seconds). Is that correct?

No, the processed SMAP telemetry used in this experiment contains 1 value per minute per channel. Once 70 minutes pass, this data comes down from the spacecraft in a batch (approx. 70 values per channel).

Could you explain the reason for h to be divided into 2 seperate parameters?

These two numbers represent different aspects of the problem. As described above, the batch_size (70) represents the chunks of data we receive in each batch. This is not a streaming scenario, so we are restricted to processing these batches as they arrive. For different problems this value would be set to whatever the batch size/cadence is. The window_size is a tuneable parameter that balances the tradeoff between 1) providing a lot of historical context to the algorithm for evaluating the severity of an anomaly (larger window) and 2) computational efficiency and memory efficiency (larger windows are higher cost).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants