Distributions.StudentT.Density returns NaN for large degrees of freedom #44

hickford · 2012-08-21T10:21:49Z

Distributions.StudentT.Density returns NaN for large degrees of freedom. For examples

foreach(int dof in Enumerable.Range(1,500))
{
    var dist = new MathNet.Numerics.Distributions.StudentT(0,1,dof);
    Console.WriteLine("{0} {1}",dist,dist.Density(0));
}

StudentT(Location = 0, Scale = 1, DoF = 1) 0.318309886183791
StudentT(Location = 0, Scale = 1, DoF = 2) 0.353553390593275
StudentT(Location = 0, Scale = 1, DoF = 3) 0.367552596947861
.......
StudentT(Location = 0, Scale = 1, DoF = 337) 0.398646439335889
StudentT(Location = 0, Scale = 1, DoF = 338) 0.398647314279096
StudentT(Location = 0, Scale = 1, DoF = 339) Infinity
StudentT(Location = 0, Scale = 1, DoF = 340) NaN
StudentT(Location = 0, Scale = 1, DoF = 341) NaN

The StudentT distribution is well defined for all degrees of freedom, so this is a bug. However, for large degrees of freedom it converges to a normal distribution, so there's a workaround:

new MathNet.Numerics.Distributions.Normal(0,1).Density(0)

0.398942280401433

http://numerics.mathdotnet.com/api/MathNet.Numerics.Distributions/StudentT.htm

The text was updated successfully, but these errors were encountered:

cdrnet · 2012-08-21T11:05:32Z

Thanks for reporting.

Looks like we should go logarithmic on large DoF. In the meantime, a quick workaround would be to use Math.Exp(dist.DensityLn(0)) instead of dist.Density(0).

manyue · 2012-08-24T17:39:01Z

I changed the computation for the ratios of the Gamma functions. This should fix the problem. (Problem arises because Gamma(339)=infinity and so the output becomes infinity). Try

        int dof=1;
        for(;dof<10000;dof++)
        {
            var dist = new MathNet.Numerics.Distributions.StudentT(0, 1, dof);
            if (dist.Density(0)==Double.PositiveInfinity||dist.Density(0)==Double.NaN)
            break; 
        }
        var distr = new MathNet.Numerics.Distributions.StudentT(0, 1, dof);

        Console.WriteLine("{0} {1}",distr,distr.Density(0));

        Console.WriteLine(dof);

cdrnet · 2012-08-24T19:09:28Z

Thanks a lot for the quick fix. I wonder, how does this compare numeric stability and accuracy-wise with just computing the gamma ratio in logarithmic space, i.e. replace

SpecialFunctions.Gamma((_dof + 1.0) / 2.0) / SpecialFunctions.Gamma(_dof / 2.0)

with

Math.Exp(SpecialFunctions.GammaLn((_dof + 1.0) / 2.0) - SpecialFunctions.GammaLn(_dof / 2.0))

?

Also note that DoF > 0.0, so the special casing for zero-division and zero-gamma in the fix should not actually be needed.

manyue · 2012-08-25T13:38:30Z

Thanks for pointing out. I have looked at it again and the recurrence implementation is not as good as the logarithmic space one, so please ignore it and sorry about that.

I have looked at the values of the density for the above parameters (Location=1 and Scale=1). At Dof=10000, they differ at an order of 1e-13, and at Dof=100000, they differ at about 1e-11. I was thinking about the differences of large numbers when using

Math.Exp(SpecialFunctions.GammaLn((_dof + 1.0) / 2.0) - SpecialFunctions.GammaLn(_dof / 2.0))

Looking at the logarithm of Gamma functions, at 1e10, it is about 220258509288.811, so at this level, you still retain accuracy up to 1e-3 and the value of the density differs from the Gaussian density at the order of 1e-7. So this does not seem to be too much of an issue.

On the other hand, the recurrence implementation has accumulative error and is slower than the computation of gamma ration in logarithmic space. So please ignore the recurrence implementation.

The logarithmic space implementation do start to differ from the Gaussian limit at about Dof=1e11 and at Dof=1e15 gives a result that is rather wrong. While it is rather unlikely that such large Dof will be encountered, it may be a good idea to use a cutoff that is smaller than positive infinity (e.g. 1e10?) where the Gaussian density will be used when Dof is greater than the cut off?

cdrnet · 2012-08-27T19:08:23Z

Thanks for the thorough analysis! Yes, switching to Gaussian density for large DoF would be very reasonable.

ghost assigned cdrnet Aug 21, 2012

manyue added a commit to manyue/mathnet-numerics that referenced this issue Aug 24, 2012

Student T distribution mathnet#44

49159ac

cdrnet added a commit that referenced this issue Aug 29, 2012

Dist: StudentT density more robust for very large degrees of freedom #44

f8792b7

cdrnet closed this as completed Aug 29, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributions.StudentT.Density returns NaN for large degrees of freedom #44

Distributions.StudentT.Density returns NaN for large degrees of freedom #44

hickford commented Aug 21, 2012

cdrnet commented Aug 21, 2012

manyue commented Aug 24, 2012

cdrnet commented Aug 24, 2012

manyue commented Aug 25, 2012

cdrnet commented Aug 27, 2012

Distributions.StudentT.Density returns NaN for large degrees of freedom #44

Distributions.StudentT.Density returns NaN for large degrees of freedom #44

Comments

hickford commented Aug 21, 2012

cdrnet commented Aug 21, 2012

manyue commented Aug 24, 2012

cdrnet commented Aug 24, 2012

manyue commented Aug 25, 2012

cdrnet commented Aug 27, 2012