Should the number smoothed_errors be equal to total observations? (i.e. an anomaly score for each observation) #49

gorold · 2020-11-13T15:48:53Z

Lines 61 to 64 in 26831a0

    
           # for values at beginning < sequence length, just use avg 
        
           if not channel.id == 'C-2':  # anomaly occurs early in window 
        
               self.e_s[:self.config.l_s] = \ 
        
                   [np.mean(self.e_s[:self.config.l_s * 2])] * self.config.l_s

I'm wondering if the length of self.e_s should be equal to the total number of observations in the sequence? All self.e_s are missing 260 observations.

The above code snippet does not add the 250 observations to the front of the array, but simply changes the existing 250 front elements. Something like this will add them: self.e_s = np.insert(self.e_s, 0, [np.mean(self.e_s[:self.config.l_s * 2])] * self.config.l_s)

However, there is still 10 missing observations from each self.e_s, perhaps due to the n_predictions option?

The text was updated successfully, but these errors were encountered:

khundman · 2020-11-21T17:27:22Z

Thanks for the question, @gorold. l_s is the length of the sequence passed to the LSTM, which will use that sequence to start generating predictions for timesteps > l_s. Some length l_s of input sequence is required to generate future predictions, and because of this the length of the resulting (smoothed) error sequence (self.e, self.e_s) will be equal to len(observations) - l_s.

The portion of errors.py that you referenced isn't intended to restore the sequence to the original observation length. The conversion of early errors to a mean value is to account for large error spikes resulting from the initial predictions that are not smoothed out by the exponentially-weight moving average. If you remove the lines you reference above and re-run you can see this effect.

The additional delta of 10 from the original length of the observations is due to n_predictions. We need 10 future test values to generate the loss that is backpropagated during training (see #24 (comment)) and evaluate normalized loss during test. This isn't a requirement for inference however, and I'd welcome a PR to address this.

khundman closed this as completed Feb 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the number smoothed_errors be equal to total observations? (i.e. an anomaly score for each observation) #49

Should the number smoothed_errors be equal to total observations? (i.e. an anomaly score for each observation) #49

gorold commented Nov 13, 2020

khundman commented Nov 21, 2020

Should the number smoothed_errors be equal to total observations? (i.e. an anomaly score for each observation) #49

Should the number smoothed_errors be equal to total observations? (i.e. an anomaly score for each observation) #49

Comments

gorold commented Nov 13, 2020

khundman commented Nov 21, 2020