Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributions.StudentT.Density returns NaN for large degrees of freedom #44

Closed
hickford opened this issue Aug 21, 2012 · 5 comments
Closed
Assignees
Labels

Comments

@hickford
Copy link
Contributor

Distributions.StudentT.Density returns NaN for large degrees of freedom. For examples

foreach(int dof in Enumerable.Range(1,500))
{
    var dist = new MathNet.Numerics.Distributions.StudentT(0,1,dof);
    Console.WriteLine("{0} {1}",dist,dist.Density(0));
}

StudentT(Location = 0, Scale = 1, DoF = 1) 0.318309886183791
StudentT(Location = 0, Scale = 1, DoF = 2) 0.353553390593275
StudentT(Location = 0, Scale = 1, DoF = 3) 0.367552596947861
.......
StudentT(Location = 0, Scale = 1, DoF = 337) 0.398646439335889
StudentT(Location = 0, Scale = 1, DoF = 338) 0.398647314279096
StudentT(Location = 0, Scale = 1, DoF = 339) Infinity
StudentT(Location = 0, Scale = 1, DoF = 340) NaN
StudentT(Location = 0, Scale = 1, DoF = 341) NaN

The StudentT distribution is well defined for all degrees of freedom, so this is a bug. However, for large degrees of freedom it converges to a normal distribution, so there's a workaround:

new MathNet.Numerics.Distributions.Normal(0,1).Density(0)

0.398942280401433

http://numerics.mathdotnet.com/api/MathNet.Numerics.Distributions/StudentT.htm

@ghost ghost assigned cdrnet Aug 21, 2012
@cdrnet
Copy link
Member

cdrnet commented Aug 21, 2012

Thanks for reporting.

Looks like we should go logarithmic on large DoF. In the meantime, a quick workaround would be to use Math.Exp(dist.DensityLn(0)) instead of dist.Density(0).

manyue added a commit to manyue/mathnet-numerics that referenced this issue Aug 24, 2012
@manyue
Copy link
Contributor

manyue commented Aug 24, 2012

I changed the computation for the ratios of the Gamma functions. This should fix the problem. (Problem arises because Gamma(339)=infinity and so the output becomes infinity). Try

        int dof=1;
        for(;dof<10000;dof++)
        {
            var dist = new MathNet.Numerics.Distributions.StudentT(0, 1, dof);
            if (dist.Density(0)==Double.PositiveInfinity||dist.Density(0)==Double.NaN)
            break; 
        }
        var distr = new MathNet.Numerics.Distributions.StudentT(0, 1, dof);

        Console.WriteLine("{0} {1}",distr,distr.Density(0));

        Console.WriteLine(dof);

@cdrnet
Copy link
Member

cdrnet commented Aug 24, 2012

Thanks a lot for the quick fix. I wonder, how does this compare numeric stability and accuracy-wise with just computing the gamma ratio in logarithmic space, i.e. replace

SpecialFunctions.Gamma((_dof + 1.0) / 2.0) / SpecialFunctions.Gamma(_dof / 2.0)

with

Math.Exp(SpecialFunctions.GammaLn((_dof + 1.0) / 2.0) - SpecialFunctions.GammaLn(_dof / 2.0))

?

Also note that DoF > 0.0, so the special casing for zero-division and zero-gamma in the fix should not actually be needed.

@manyue
Copy link
Contributor

manyue commented Aug 25, 2012

Thanks for pointing out. I have looked at it again and the recurrence implementation is not as good as the logarithmic space one, so please ignore it and sorry about that.

I have looked at the values of the density for the above parameters (Location=1 and Scale=1). At Dof=10000, they differ at an order of 1e-13, and at Dof=100000, they differ at about 1e-11. I was thinking about the differences of large numbers when using

Math.Exp(SpecialFunctions.GammaLn((_dof + 1.0) / 2.0) - SpecialFunctions.GammaLn(_dof / 2.0))

Looking at the logarithm of Gamma functions, at 1e10, it is about 220258509288.811, so at this level, you still retain accuracy up to 1e-3 and the value of the density differs from the Gaussian density at the order of 1e-7. So this does not seem to be too much of an issue.

On the other hand, the recurrence implementation has accumulative error and is slower than the computation of gamma ration in logarithmic space. So please ignore the recurrence implementation.

The logarithmic space implementation do start to differ from the Gaussian limit at about Dof=1e11 and at Dof=1e15 gives a result that is rather wrong. While it is rather unlikely that such large Dof will be encountered, it may be a good idea to use a cutoff that is smaller than positive infinity (e.g. 1e10?) where the Gaussian density will be used when Dof is greater than the cut off?

@cdrnet
Copy link
Member

cdrnet commented Aug 27, 2012

Thanks for the thorough analysis! Yes, switching to Gaussian density for large DoF would be very reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants