Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ignore_small_changes on BigQuery #1546

Open
Emmanuel2RPA opened this issue Jun 5, 2024 · 1 comment
Open

ignore_small_changes on BigQuery #1546

Emmanuel2RPA opened this issue Jun 5, 2024 · 1 comment
Labels

Comments

@Emmanuel2RPA
Copy link

Hello, this is copy of my message on slack

Platform: BigQuery

I have problem with the ignore_small_changes params, when I try to set the spike_failure_percent_threshold to 2% I get a very strange result on the threshold range in my volume anomalie test, it's literally a 2-fold increase in the training_avg value.
So after a little research, it seems that this i caused by a problem of calculation priority, let me explain :
When max value for volume anomalie test is calculated via this function : get_anomaly_scores_query.sql (tell me if I'm saying something wrong)
The max value is computed with the following code :

    {%- set max_val -%}
      {{ test_configuration.anomaly_sensitivity }} * training_stddev + training_avg
    {%- endset -%}
    {% if test_configuration.ignore_small_changes.spike_failure_percent_threshold %}
      {%- set spike_avg_threshold -%}
        ((1 + {{ test_configuration.ignore_small_changes.spike_failure_percent_threshold }}/100.0) * training_avg)
      {%- endset -%}
      {%- set max_val -%}
        {{ elementary.arithmetic_max(spike_avg_threshold, max_val) }}
      {%- endset -%}
    {% endif %}

and the elementary.arithmetic_max function is used. This function is relatively simple and returns, as it's name suggets, the largest value between spike_avg_threshold and max_val (i.e. f(x,y)=0,5(x+y+∣x−y∣))

If we now look at the final calculation in a SQL query on BigQuery, we obtain the following field :
(0.5 * (((1 + 100/100.0) * training_avg) + 2 * training_stddev + training_avg + abs(((1 + 100/100.0) * training_avg) - 2 * training_stddev + training_avg))) end as max_metric_value,

Where spike_avg_threshold = ((1 + 100/100.0) * training_avg) = x and max_val = 2 * training_stddev + training_avg = y
But we can now easly see that we are going to have problem with the calcul because of the "2 * training_stddev + training_avg" part, in fact, the multiplication will apply only on the training_stddev value and wait the addition priority to sum the training_avg value "at the end" of this part of the calcul. This will make the calculation false and return a result other than the one expected, in this case, the highest value between spike_avg_threshold and max_val.
For exemple, if I set the spike_failure_percent_threshold: 2 I get an acceptance margin multiplied by 2 (100%) and not just 2%.

Logically, it would "be enough" to add parentheses to this problematic part to resolve the error, but I wanted to explain it to you to make sure I hadn't missed anything or made a mistake.

I'm available if my explanation isn't clear.
Good day to you.

@Emmanuel2RPA Emmanuel2RPA added Bug Something isn't working Triage 👀 labels Jun 5, 2024
@haritamar
Copy link
Collaborator

Hi @Emmanuel2RPA !
Thanks for opening this and for the detailed explanation, I believe this bug was actually fixed here, though we haven't released it in an official version just yet, but we'll do so soon.
There were indeed missing parentheses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants