Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Estimates of Second Moments #128

Open
suzannastep opened this issue Mar 1, 2022 · 5 comments · May be fixed by #129
Open

Negative Estimates of Second Moments #128

suzannastep opened this issue Mar 1, 2022 · 5 comments · May be fixed by #129

Comments

@suzannastep
Copy link

The function my_e2truncnorm in R/truncnorm.R will sometimes return negative estimates of the second moment of the truncated standard normal distribution. I have attached two files to reproduce this error, along with the current negative second moments.

Both use mean \approx 54000, sd \approx 2.97. The first file truncates to (0,Inf), and the second file truncates to (-Inf,0). Most of the negative second moments are small (~1e-7), but some are fairly large (~1e+1).

@suzannastep
Copy link
Author

suzannastep commented Mar 1, 2022

Code to load examples:

nsm1 <- data.frame(a = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
b = c(Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, Inf, 
Inf, Inf, Inf, Inf, Inf, Inf, Inf),
mean = c(-54788.9937492435, 
-54788.7495783716, -54795.061172919, -54786.56403001, -54787.0710507828, 
-54793.8411648124, -54791.1033322952, -54790.8850621429, -54792.4474633392, 
-54789.104838451, -54790.0716305659, -54791.8116771782, -54792.2405734505, 
-54793.2180670531, -54795.066790929, -54792.5099482951, -54790.3826065817, 
-54791.77833505, -54791.1393162631, -54791.3704836432, -54792.7546212792, 
-54790.3941786588, -54790.5148148196, -54783.961766733, -54787.61580626, 
-54787.9756696761, -54793.3780437052, -54793.8414409467, -54788.5799790508, 
-54796.3350585025, -54787.6400823768, -54790.1239781587, -54790.5671618815, 
-54790.5850620408, -54786.7898117188, -54788.4238087205, -54791.6176927468, 
-54788.288365944, -54794.9353162132, -54788.9025164062, -54794.1065287576, 
-54788.5087379015, -54791.4541413838, -54789.3062131891, -54789.2876504587, 
-54790.4616344339, -54790.0682324217, -54787.6989406089, -54793.0147070028), 
sd = c(2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
2.97496459681126, 2.97496459681126),
second_moments = c(-17.6492352485657, 
-4.76837158203125e-07, -9.5367431640625e-07, -11.7154006958008, 
-9.5367431640625e-07, -4.76837158203125e-07, -12.9443755149841, 
-4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -1.43051147460938e-06, -1.43051147460938e-06, 
-17.6404252052307, -9.5367431640625e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -4.76837158203125e-07, -16.2378377914429, 
-4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-4.76837158203125e-07, -14.6725268363953, -4.76837158203125e-07, 
-4.76837158203125e-07, -1.43051147460938e-06, -14.370005607605, 
-9.5367431640625e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
-9.5367431640625e-07, -13.1242661476135, -12.6084485054016, -17.5787625312805, 
-4.76837158203125e-07, -4.76837158203125e-07, -1.43051147460938e-06, 
-14.3491959571838, -4.76837158203125e-07, -4.76837158203125e-07, 
-17.066424369812, -14.5285978317261, -4.76837158203125e-07,
-9.5367431640625e-07, -4.76837158203125e-07))
rownames(nsm1) <- c("Row700", "Row702", "Row704", 
"Row705", "Row708", "Row712", "Row715", "Row716", "Row717", "Row719", 
"Row720", "Row721", "Row725", "Row727", "Row728", "Row729", "Row730", 
"Row732", "Row733", "Row736", "Row737", "Row739", "Row740", "Row744", 
"Row745", "Row748", "Row749", "Row751", "Row753", "Row754", "Row760", 
"Row763", "Row764", "Row765", "Row766", "Row767", "Row768", "Row769", 
"Row770", "Row774", "Row775", "Row776", "Row779", "Row781", "Row786", 
"Row789", "Row793", "Row795", "Row799")
nsm2 <- data.frame(a = c(-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, 
-Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf, -Inf), 
    b = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
mean = c(54793.6976016185, 
    54789.2789114755, 54797.7760543845, 54797.2690336117, 54796.5650050414, 
    54793.2367520993, 54791.8926210553, 54795.2352459435, 54794.2684538286, 
    54793.5968372019, 54792.099510944, 54792.2584346181, 54789.2732934655, 
    54791.8301360994, 54795.6222604304, 54791.377209057, 54792.013046847, 
    54793.9805908811, 54792.3089689607, 54796.7242781345, 54796.3644147184, 
    54790.9620406893, 54793.7143621958, 54794.9249914674, 54788.005025892, 
    54794.9131273272, 54794.0087762809, 54796.7000020177, 54789.1452011907, 
    54794.2161062358, 54792.7223916477, 54796.0517184506, 54789.4047681813, 
    54795.4375679883, 54790.2335556369, 54795.831346493, 54792.0057657992, 
    54785.7805432566, 54795.0338712054, 54791.1719658525, 54795.7868262848, 
    54790.8163913233, 54795.0524339358, 54793.8784499606, 54787.1880161754, 
    54792.0559254679, 54794.4767477245, 54796.6411437856, 54795.1046280564, 
    54791.3253773917),
sd = c(2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126, 
    2.97496459681126, 2.97496459681126, 2.97496459681126, 2.97496459681126), 
second_moments = c(-4.76837158203125e-07, -9.5367431640625e-07, 
    -4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
    -4.76837158203125e-07, -9.5367431640625e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -13.3077096939087, -4.76837158203125e-07, 
    -4.76837158203125e-07, -4.76837158203125e-07, -1.43051147460938e-06, 
    -4.76837158203125e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
    -13.4651703834534, -4.76837158203125e-07, -4.76837158203125e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -10.5989608764648, 
    -9.5367431640625e-07, -1.43051147460938e-06, -9.5367431640625e-07, 
    -4.76837158203125e-07, -4.76837158203125e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -9.5367431640625e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -9.5367431640625e-07, -4.76837158203125e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -4.76837158203125e-07, 
    -13.0829844474792, -15.7729797363281, -9.5367431640625e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -4.76837158203125e-07, -9.5367431640625e-07, 
    -9.5367431640625e-07, -1.43051147460938e-06, -11.2983322143555))
rownames(nsm2) <- c("Row701", "Row704", "Row705", "Row708", 
"Row709", "Row715", "Row717", "Row719", "Row720", "Row723", "Row725", 
"Row726", "Row728", "Row729", "Row731", "Row735", "Row738", "Row741", 
"Row743", "Row745", "Row748", "Row749", "Row750", "Row752", "Row754", 
"Row756", "Row757", "Row760", "Row762", "Row763", "Row768", "Row769", 
"Row770", "Row774", "Row775", "Row776", "Row777", "Row780", "Row781", 
"Row782", "Row783", "Row785", "Row786", "Row789", "Row790", "Row792", 
"Row794", "Row795", "Row796", "Row799")

Code to reproduce error:

library(ashr)
my_e2truncnorm(nsm1$a,nsm1$b,nsm1$mean,nsm1$sd)
my_e2truncnorm(nsm2$a,nsm2$b,nsm2$mean,nsm2$sd)

Using dput seems to have introduced some rounding that changes the output. However, this code will still produce negative second moments, so the gist of this example is unchanged. I also have the original values saved as a RDS file, but I cannot upload to github issues

@stephens999
Copy link
Owner

an even simpler version extracted from your example:
my_e2truncnorm(0,Inf, -54793.7, 3) gives
[1] -13.65669

Interesting it is very sensitive to the mean.
my_e2truncnorm(0,Inf, -55000, 3) gives
[1] 4.768372e-07

@stephens999
Copy link
Owner

this code from https://github.com/cossio/TruncatedNormal.jl/blob/3c16866c3afa3920e787513d492689e9e81192ca/src/tnmom2.jl#L66-L71 may be helpful:

Second moment of the truncated standard normal distribution.
"""
function tnmom2(a::Real, b::Real)
#return tnmom2c(0, a, b)

if !(a ≤ b)
    return oftype(middle(a, b), NaN)
elseif a == b
    return middle(a, b)^2
elseif abs(a) > abs(b)
    return tnmom2(-b, -a)
elseif isinf(a) && isinf(b)
    return one(middle(a, b))
elseif isinf(b)
    return 1 + √(2 / π) * a / erfcx(a / √2)
end

@assert a < b < Inf && abs(a) ≤ abs(b)
@assert a ≤ 0 ≤ b || 0 ≤ a ≤ b

if a ≤ 0 ≤ b
    ea = √(π/2) * erf(a / √2)
    eb = √(π/2) * erf(b / √2)
    fa = ea - a * exp(-a^2 / 2)
    fb = eb - b * exp(-b^2 / 2)
    m2 = (fb - fa) / (eb - ea)
    fb ≥ fa && eb ≥ ea || error("error: a=$a, b=$b")
    0 ≤ m2 ≤ 1 || error("error: a=$a, b=$b")
    return m2
else # 0 ≤ a ≤ b
    exΔ = exp((a - b)middle(a, b))
    ea = √(π/2) * erfcx(a / √2)
    eb = √(π/2) * erfcx(b / √2)
    fa = ea + a
    fb = eb + b
    m2 = (fa - fb * exΔ) / (ea - eb * exΔ)
    a^2 ≤ m2 ≤ b^2 || error("error: a=$a, b=$b")
    return m2
end

end

@pcarbo
Copy link
Collaborator

pcarbo commented Mar 2, 2022

@suzannastep @stephens999 Let me know if I can help with this.

@suzannastep
Copy link
Author

suzannastep commented Mar 3, 2022

There are numerical problems here happening in the (r equivalent of) the line

μ^2 + σ^2 * tnmom2(α, β) + 2μ * σ * tnmean(α, β)

Using the julia code,

tnmom2(0.0,Inf, -54793.7, 3.0)
μ^2 + σ^2 * tnmom2(α, β) = 6.004699137379999e9
2μ * σ * tnmean(α, β) = -6.004699137379999e9

So the final answer is exactly 0.0 . Like exactly, as in

tnmom2(0.0,Inf, -54793.7, 3.0) == 0
> true

Similarly,

tnmom2(0.0,Inf, -55000, 3.0)
μ^2 + σ^2 * tnmom2(α, β) = 6.050000018e9
2μ * σ * tnmean(α, β) = -6.050000018e9
tnmom2(0.0,Inf, -55000, 3.0) == 0
> true

This is a perfect storm for numerical problems-- differences of large numbers that are almost equal. Using the r code, we’re getting catastrophic cancellation. Any issues in computing μ^2 + σ^2 * tnmom2(α, β) or 2μ * σ * tnmean(α, β) will mean that the difference will have huge absolute error. And that's what's happening.

my_e2truncnorm(0,Inf, -54793.7, 3)
μ^2 + σ^2 * tnmom2(α, β) = 6004699123.72331
2μ * σ * tnmean(α, β) = -6004699137.38
[1] -13.65669

In actuality, the rcode reorders the sum slightly, so it's more like

my_e2truncnorm(0,Inf, -54793.7, 3)
μ^2 + 2μ * σ * tnmean(α, β)= -3002349577.69
σ^2 * tnmom2(α, β)  = 3002349564.03331
[1] -13.65669

On the other hand,

my_e2truncnorm(0,Inf, -55000, 3)
μ^2 + 2μ * σ * tnmean(α, β)= -3025000018
σ^2 * tnmom2(α, β)  = 3025000018
[1] 4.768372e-07

It looks like the issue in the rcode is coming from the relative error in σ^2 * tnmom2(α, β), which is ~1e-9. The relative error in the other term is ~1e-16

@suzannastep suzannastep linked a pull request Mar 4, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants