Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Degrees of freedom argument errors while running the "gallery" dataset #20

Closed
dolaru opened this issue Mar 19, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@dolaru
Copy link

commented Mar 19, 2018

Spotted in 6.3.0

When the gallery dataset is analysed, there's several hundred error messages being outputted in Elasticsearch's logs with the following messages:

[2018-03-19T17:24:26,319][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [gallery1_20180319-1719_000_0] [autodetect/8041] [CStatisticalTests.cc@103] Failed to compute significance Error in function fisher_f_distribution<double>::fisher_f_distribution: Degrees of freedom argument is 0, but must be > 0 ! d1 = 0, d2 = 8, x = 0
[2018-03-19T17:24:26,327][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [gallery1_20180319-1719_000_0] [autodetect/8041] [CPeriodicityHypothesisTests.cc@251] Bad input: Error in function boost::math::chi_squared_distribution<double>::chi_squared_distribution: Degrees of freedom argument is 0, but must be > 0 !, df = 0, percentage = 90

Analysis config:

{
        "bucket_span" : "1h",
        "detectors" : [
          {
            "detector_description" : "rare by status over clientip",
            "function" : "rare",
            "by_field_name" : "status",
            "over_field_name" : "clientip"
          },
          {
            "detector_description" : "freq_rare by uri over clientip",
            "function" : "freq_rare",
            "by_field_name" : "uri",
            "over_field_name" : "clientip"
          },
          {
            "detector_description" : "high_count by status over clientip",
            "function" : "high_count",
            "by_field_name" : "status",
            "over_field_name" : "clientip"
          },
          {
            "detector_description" : "high_count by uri over clientip",
            "function" : "high_count",
            "by_field_name" : "uri",
            "over_field_name" : "clientip"
          },
          {
            "detector_description" : "sum(bytes) by method over clientip",
            "function" : "sum",
            "field_name" : "bytes",
            "by_field_name" : "method",
            "over_field_name" : "clientip"
          }
        ],
        "influencers" : [
          "clientip"
        ]
      }

Full log:
elasticsearch.log

@dolaru dolaru added >bug :ml labels Mar 19, 2018

@tveasey tveasey closed this in #23 Mar 26, 2018

tveasey added a commit that referenced this issue Mar 26, 2018

Guard all remaining unguarded calls to create a chi^2 distribution fo…
…r the case d.f. 0 (#23)

This guards all remaining calls to create a chi^2 distribution in the tests for periodicity to avoid 
creating with zero degrees freedom. Fixes #20.
@dolaru

This comment has been minimized.

Copy link
Author

commented Mar 27, 2018

Reopening as similar error messages are still being spotted after #23 was merged:

2018-03-26T22:54:31.272 : [562][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [gallery1_20180326-2250_700-alpha1_1756] [autodetect/14885] [CPeriodicityHypothesisTests.cc@269] Bad input: Error in function fisher_f_distribution::fisher_f_distribution: Degrees of freedom argument is 0, but must be > 0 !, n = 1, percentage = 10

2018-03-26T22:54:31.588 : [563][ERROR][o.e.x.m.j.p.l.CppLogMessageHandler] [gallery1_20180326-2250_700-alpha1_1756] [autodetect/14885] [CPeriodicityHypothesisTests.cc@251] Bad input: Error in function boost::math::chi_squared_distribution::chi_squared_distribution: Degrees of freedom argument is 0, but must be > 0 !, df = 0, percentage = 10

The error count is significantly lower though. (was ~800, now ~80)

@dolaru dolaru reopened this Mar 27, 2018

@dolaru dolaru added the v6.3.0 label Mar 27, 2018

@sophiec20 sophiec20 changed the title Degrees of freedom argument errors while running the "gallery" dataset [ML] Degrees of freedom argument errors while running the "gallery" dataset Mar 28, 2018

tveasey added a commit that referenced this issue Mar 28, 2018

[ML] Fix sparse data edge cases for periodicity testing (#28)
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.

@tveasey tveasey closed this Mar 28, 2018

tveasey added a commit that referenced this issue Mar 28, 2018

Guard all remaining unguarded calls to create a chi^2 distribution fo…
…r the case d.f. 0 (#23)

This guards all remaining calls to create a chi^2 distribution in the tests for periodicity to avoid 
creating with zero degrees freedom. Fixes #20.

tveasey added a commit that referenced this issue Mar 28, 2018

[ML] Fix sparse data edge cases for periodicity testing (#28)
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.

droberts195 pushed a commit that referenced this issue Apr 23, 2018

Guard all remaining unguarded calls to create a chi^2 distribution fo…
…r the case d.f. 0 (#23)

This guards all remaining calls to create a chi^2 distribution in the tests for periodicity to avoid 
creating with zero degrees freedom. Fixes #20.

droberts195 pushed a commit that referenced this issue Apr 23, 2018

[ML] Fix sparse data edge cases for periodicity testing (#28)
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.

droberts195 pushed a commit that referenced this issue Apr 23, 2018

Guard all remaining unguarded calls to create a chi^2 distribution fo…
…r the case d.f. 0 (#23)

This guards all remaining calls to create a chi^2 distribution in the tests for periodicity to avoid 
creating with zero degrees freedom. Fixes #20.

droberts195 pushed a commit that referenced this issue Apr 23, 2018

[ML] Fix sparse data edge cases for periodicity testing (#28)
This fixes issue #20. Digging into the root cause, they were all down to very sparse data over the 
window we maintain to test for periodicity. This showed up the need to lower bound the count of 
buckets with periodic repeats when testing for periodic partitions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.