Refactor chi-squared distribution #77

mp4096 · 2017-10-19T13:00:23Z

Ok, this is a larger one:

Mode

Fix formula for the mode. Math.NET implementation has an error for freedom < 2 and faulty unit tests as well.

Pdf and log density

When investigating this unit test:

// TODO: figure out why this test fails:
test::check_continuous_distribution(&try_create(1.0), 0.0, 10.0);

I realised the problem two-fold: First, the old pdf implementation returned Inf at 0 instead of 0. Second, for freedom == 1.0 the pdf is very steep around 0 and cannot be integrated with the hard-coded step size. So I implemented a branching in the pdf and log density (also see Wikipedia) and changed freedom to 1.5.

Actually, Math.NET has a little bit more branching in the chi-squared pdf, but the bulk of it is covered in statrs's gamma.rs implementation. I'm not sure if we want to cover the case of freedom = Inf. @boxtown What's your opinion on this? If yay, I'll add this branching to pdf() and ln_pdf() and uncomment the unit tests. If nay, I'll just delete the commented unit tests.

Misc

And of course added more unit tests, but this is kind of self-explanatory.

codecov · 2017-10-19T13:07:46Z

Codecov Report

Merging #77 into master will increase coverage by 0.42%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #77      +/-   ##
==========================================
+ Coverage   92.24%   92.67%   +0.42%     
==========================================
  Files          44       44              
  Lines        7187     7299     +112     
==========================================
+ Hits         6630     6764     +134     
+ Misses        557      535      -22

Impacted Files	Coverage Δ
src/distribution/chi_squared.rs	`93.78% <100%> (+35.75%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95fb051...2df72f2. Read the comment docs.

boxtown · 2017-10-19T17:53:18Z

src/distribution/chi_squared.rs

+        if x > 0.0 {
+            self.g.ln_pdf(x)
+        } else {
+            -f64::INFINITY


conventionally we've been using f64::NEG_INFINITY, I'm not actually 100% sure -f64::INFINITY == f64::NEG_INFINITY

Sorry, I completely missed this one.

BTW, I didn't know that these constants are defined like this 😁 :

pub const INFINITY: f64 = 1.0f64 / 0.0f64 pub const NEG_INFINITY: f64 = -1.0f64 / 0.0f64

boxtown · 2017-10-19T17:58:42Z

src/distribution/chi_squared.rs

@@ -310,7 +310,11 @@ impl Continuous<f64, f64> for ChiSquared {
    ///
    /// where `k` is the degrees of freedom and `Γ` is the gamma function
    fn pdf(&self, x: f64) -> f64 {
-        self.g.pdf(x)
+        if x > 0.0 {


I think we can be more specific, x == 0.0 is defined (at least according to Wikipedia) as long as freedom != 1.0, so we can probably do something like if x > 0.0 || (self.freedom != 1.0 && x == 0.0). May want to consider prec::almost_eq instead of == or != for floats but I'm just spitballing

Sorry, I screwed up here. Should've done better research.

The thing is a little bit more complicated. Wikipedia silently assumes only integer degrees of freedom. If we take freedom as a positive real, then we have 3 cases:

0 < freedom < 2: pdf(0) = +∞, hence 0 is excluded from the support

freedom == 2: pdf(0) = 0.5

2 < freedom: pdf(0) = 0

This is obvious from the pdf formula (see Wikipedia), specifically the x^(k/2 - 1) term. All other terms are non-zero for any freedom.

Anyway, I've tried out scipy's implementation of chi-squared. Its behaviour is exactly as above (i.e. +∞, 0.5 and 0).

So the question is, what to do for the case when 0 < freedom < 2: Should we let the pdf function return +∞ (as scipy does)? Or should we handle it as a separate case?

My personal preference right now is to return +∞, but document this behaviour in the documentation.

Edit: wording.

Yeah that's fine by me as long as it's clear in the documentation

Ok! Just to make sure: You prefer to return +∞ when 0 < freedom < 2 and pdf is evaluated at 0.

Then I'll rewrite the unit tests and update the docs.

boxtown · 2017-10-19T18:01:19Z

src/distribution/chi_squared.rs

+        test_almost(2.5, 0.045412171451573920401, 1e-15, |x| x.pdf(5.5));
+        test_almost(2.5, 1.8574923023527248767e-24, 1e-36, |x| x.pdf(110.1));
+        test_case(2.5, 0.0, |x| x.pdf(f64::INFINITY));
+        // test_case(f64::INFINITY, 0.0, |x| x.pdf(0.0));


What does pdf return for freedom == f64::INFINITY normally? I'd like to have this case defined but am not necessarily sure we need an explicit branch in the code

It returns a NaN. I guess as a result of 0 * Inf when computing the pdf.

TBH I don't like the idea of returning 0 instead of NaN because it doesn't make sense for me... Shouldn't a pdf integrate to 1 over support? How can it be upheld when pdf() returns 0 everywhere?

That's a good point. I think maybe being explicit about the NaN would be good, like adding a specific branch and note it in the docstring

boxtown · 2017-10-30T16:06:50Z

Yup, I think being consistent with scipy is probably a good thing

…

On Oct 30, 2017 11:52 AM, "Mikhail Pak" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/distribution/chi_squared.rs <#77 (comment)>: > @@ -310,7 +310,11 @@ impl Continuous<f64, f64> for ChiSquared { /// /// where `k` is the degrees of freedom and `Γ` is the gamma function fn pdf(&self, x: f64) -> f64 { - self.g.pdf(x) + if x > 0.0 { Ok! Just to make sure: You prefer to return +∞ when 0 < freedom < 2 and pdf is evaluated at 0. Then I'll rewrite the unit tests and update the docs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#77 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACwkvpSBpFsyFbCHnZeBEr1mygWosyVlks5sxfDagaJpZM4P_MEl> .

Mode cannot be negative

boxtown · 2018-01-09T01:33:08Z

Hey @mp4096 what's the status on this PR? Do you need any help bringing it across the finish line?

boxtown requested changes Oct 19, 2017

View reviewed changes

mp4096 added 6 commits December 5, 2017 12:27

fix: formula for the mode of the chi squared distribution

cd367dc

Mode cannot be negative

fix: chi squared distribution pdf and log pdf

82685a2

test: add mean, variance, stddev, skewness, min, max for chi-squared

34da91e

test: add test for the entropy of the chi-squared distribution

6ca79bc

test: add cdf unit test for the chi-squared

0f9859e

fix: use NEG_INFINITY

2df72f2

mp4096 force-pushed the fix-chi-squared branch from 85d6884 to 2df72f2 Compare December 5, 2017 11:27

vks mentioned this pull request May 15, 2021

Tracker: 0.14 release #140

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor chi-squared distribution #77

Refactor chi-squared distribution #77

mp4096 commented Oct 19, 2017

codecov bot commented Oct 19, 2017 •

edited

boxtown Oct 19, 2017

mp4096 Oct 20, 2017

boxtown Oct 19, 2017

mp4096 Oct 24, 2017 •

edited

boxtown Oct 30, 2017

mp4096 Oct 30, 2017

boxtown Oct 19, 2017

mp4096 Oct 20, 2017

boxtown Oct 20, 2017

boxtown commented Oct 30, 2017 via email

boxtown commented Jan 9, 2018

Refactor chi-squared distribution #77

Are you sure you want to change the base?

Refactor chi-squared distribution #77

Conversation

mp4096 commented Oct 19, 2017

Mode

Pdf and log density

Misc

codecov bot commented Oct 19, 2017 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mp4096 Oct 24, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boxtown commented Oct 30, 2017 via email

boxtown commented Jan 9, 2018

codecov bot commented Oct 19, 2017 •

edited

mp4096 Oct 24, 2017 •

edited