Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get "xxx" in the report for long-running computations #65

Open
amigalemming opened this issue Oct 6, 2014 · 18 comments
Open

get "xxx" in the report for long-running computations #65

amigalemming opened this issue Oct 6, 2014 · 18 comments

Comments

@amigalemming
Copy link

If a computation runs quite long, say, longer than 0.5s, then I get a result like:

OLS regression  544 ms  570 ms  0 s
R² goodness-of-fit 0.999   0.999   xxx

in the HTML report and then all following regression results are "xxx", too. This also implies that the overview diagram is not generated.

@bitc
Copy link

bitc commented Jun 19, 2016

I have done a little investigating. This bug happens when the var reports = {{json}}; that is rendered into the HTML ends up containing JavaScript "null" values (rather than proper numbers):

var reports = [{"reportAnalysis":{"anMean":{"estUpperBound":3.8590750, ... null, ...

As the browser loads the page, a JavaScript error happens when it tries to run toFixed(3) on (the first of) these null values.

Where do these nulls come from? They appear to be coming from the JSON rendering of Statistics.Resampling.Bootstrap.Estimate values, which have all fields as strict Doubles. From playing around with Aeson, I noticed that it will render Double NaN and Infinity values as JavaScript "null":

Prelude Data.Aeson> encode (1/0 :: Double)
"null"
Prelude Data.Aeson> encode (0/0 :: Double)
"null"
Prelude Data.Aeson> 

I am seeing "null" values end up in this field (among others):

{Report} -> {reportAnalysis :: SampleAnalysis} -> {anRegress :: [Regression]} !! 0 -> {regRSquare :: Bootstrap.Estimate} -> {estLowerBound :: !Double}

Somehow criterion is calculating this estLowerBound to be NaN or Infinity (At least according to the evidence laid out in this comment; I haven't actually witnessed the actual Haskell value)

This is as far as I have come with my investigation.

@bitc
Copy link

bitc commented Jun 19, 2016

More evidence. I've spotted the NaNs in the command line output (Use your browser to search for "NaN"):

benchmarking fib/1
time                 38.35 ns   (38.28 ns .. 38.46 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 38.38 ns   (38.31 ns .. 38.59 ns)
std dev              372.5 ps   (176.5 ps .. 702.6 ps)

benchmarking fib/5
time                 899.5 ns   (897.6 ns .. 902.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 900.1 ns   (898.5 ns .. 902.0 ns)
std dev              5.975 ns   (4.724 ns .. 7.946 ns)

benchmarking fib/9
time                 6.145 μs   (6.137 μs .. 6.157 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 6.160 μs   (6.138 μs .. 6.258 μs)
std dev              135.1 ns   (23.39 ns .. 306.6 ns)
variance introduced by outliers: 24% (moderately inflated)

benchmarking fib/11
time                 16.11 μs   (16.07 μs .. 16.18 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.13 μs   (16.09 μs .. 16.29 μs)
std dev              242.6 ns   (96.99 ns .. 482.7 ns)
variance introduced by outliers: 11% (moderately inflated)

benchmarking fib/13
time                 42.28 μs   (42.02 μs .. 42.66 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 42.41 μs   (42.17 μs .. 42.91 μs)
std dev              1.089 μs   (573.8 ns .. 2.028 μs)
variance introduced by outliers: 25% (moderately inflated)

benchmarking fib/15
time                 110.7 μs   (110.3 μs .. 111.1 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 110.5 μs   (110.3 μs .. 110.9 μs)
std dev              903.4 ns   (634.4 ns .. 1.270 μs)

benchmarking fib/20
time                 1.215 ms   (1.210 ms .. 1.223 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.217 ms   (1.214 ms .. 1.222 ms)
std dev              13.26 μs   (7.718 μs .. 20.12 μs)

benchmarking fib/25
time                 13.61 ms   (13.53 ms .. 13.69 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 13.55 ms   (13.52 ms .. 13.58 ms)
std dev              83.90 μs   (58.05 μs .. 108.5 μs)

benchmarking fib/30
time                 149.7 ms   (149.1 ms .. 150.6 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 149.9 ms   (149.7 ms .. 150.2 ms)
std dev              337.9 μs   (213.8 μs .. 503.5 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking fib/31
time                 242.2 ms   (241.7 ms .. 242.7 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 242.4 ms   (242.2 ms .. 242.6 ms)
std dev              274.6 μs   (118.4 μs .. 342.2 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking fib/32
time                 392.0 ms   (388.7 ms .. 394.7 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 392.3 ms   (392.1 ms .. 392.6 ms)
std dev              396.8 μs   (0.0 s .. 419.0 μs)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/33
time                 635.0 ms   (632.2 ms .. 639.1 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 634.7 ms   (633.6 ms .. 635.5 ms)
std dev              1.098 ms   (0.0 s .. 1.268 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/34
time                 1.028 s    (1.022 s .. NaN s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.028 s    (1.027 s .. 1.029 s)
std dev              1.023 ms   (0.0 s .. 1.083 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/35
time                 1.665 s    (1.651 s .. 1.675 s)
                     1.000 R²   (NaN R² .. 1.000 R²)
mean                 1.665 s    (1.664 s .. 1.667 s)
std dev              2.290 ms   (0.0 s .. 2.466 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/36
time                 2.697 s    (2.618 s .. 2.770 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.701 s    (2.692 s .. 2.710 s)
std dev              14.82 ms   (0.0 s .. 15.10 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/37
time                 4.426 s    (4.355 s .. 4.581 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.375 s    (4.354 s .. 4.395 s)
std dev              33.17 ms   (0.0 s .. 33.97 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/38
time                 7.314 s    (7.022 s .. 7.496 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 7.340 s    (7.297 s .. 7.370 s)
std dev              46.02 ms   (0.0 s .. 52.85 ms)
variance introduced by outliers: 19% (moderately inflated)

@bitc
Copy link

bitc commented Jun 19, 2016

I currently am unable to dig further into this bug.

Hopefully the info above can point someone who is more familiar with the criterion codebase in the right direction.

@rrnewton
Copy link
Member

rrnewton commented Aug 16, 2016

@RyanGlScott, we had a problem with this as well, but now I can't remember what it was. Do you?

@RyanGlScott
Copy link
Member

I think the issue might have been that the FromJSON Measured instance was doing something dodgy with null values behind the scenes, which we fixed with this commit.

Therefore, I'm tempted to claim this bug as fixed.

@ghost
Copy link

ghost commented Aug 31, 2016

I'm also getting this with criterion-1.1.1.0.

@rrnewton
Copy link
Member

rrnewton commented Sep 1, 2016

Sadly, 1.1.1.0 doesn't include the linked commit. It sounds like it is time to do a 1.1.1.1 release for bugfixes. @RyanGlScott and @bos - any opinions about where to cut that?

There's currently a "1.2" branch corresponding to the statistics 0.14 release. The master branch also already has a number of commits relative to 1.1.1.0, which you can see here.

Additions like --json actually went in before the bugfix Ryan linked. We could cut a 1.1.1.1 that backports the fix to this issue. Or we could plow ahead and release a 1.1.3.1, which is the version master is already tagged with.

I'm not completely sure if that should be 1.1.X or 1.2, however. We also changed the default report format to json, so this is a breaking change that should require a major version bump. If we do the major version bump and push out 1.2 now, then the statistics-0.14 version can slide down to 1.3.

@rrnewton
Copy link
Member

rrnewton commented Sep 1, 2016

Could you, @tswilkinson and @bitc, see if you can reproduce the error on master?

@RyanGlScott
Copy link
Member

Ack, I spoke too soon. I can still reproduce this error with criterion-1.1.4.0.

@RyanGlScott
Copy link
Member

RyanGlScott commented Nov 16, 2016

I did some digging into this recently, and I think I've narrowed the issue down to a function in statistics. Notice these lines in criterion:

(coeffs,r2) <- liftIO $
               bootstrapRegress gen resamples confInterval olsRegress ps r

I've managed to get coeffs and r2 values that contain NaN (non-determinsitically, though, since it depends on a PRNG gen). Here is how I reproduced it in GHCi (using actual values for ps and r that I recorded during a criterion session in which this bug happened):

$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression
λ> gen <- createSystemRandom
λ> :set -XOverloadedLists
λ> bootstrapRegress gen 1000 0.95 olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estLowerBound = 0.4809383656829592, estUpperBound = 0.4981065876781946, estConfidenceLevel = 0.95},Estimate {estPoint = -6.992014124988053e-3, estLowerBound = NaN, estUpperBound = 2.503471449018241e-3, estConfidenceLevel = 0.95}],Estimate {estPoint = 0.999930103564569, estLowerBound = 0.9998776351901867, estUpperBound = 1.0, estConfidenceLevel = 0.95})

Notice the NaN value in the first field of the pair that bootstrapRegress returns. (You may have to re-run that last line several times before the PRNG gives you a NaN value.)

@Shimuuar, do have any idea why bootstrapRegress might be giving NaN values? FWIW, this is with statistics-0.13.3.0.

@RyanGlScott
Copy link
Member

Here's what I believe to be a more deterministic way of reproducing the bug, using initialize instead of createSystemRandom (the latter of which is what criterion actually uses internally):

$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression Data.Word
λ> import qualified Data.Vector.Unboxed as U
λ> :set -XOverloadedLists
λ> gen <- initialize ([1..1000] :: U.Vector Word32)
λ> bootstrapRegress gen 1000 0.95 olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estLowerBound = 0.4809383656829594, estUpperBound = 0.4911313727498061, estConfidenceLevel = 0.95},Estimate {estPoint = -6.992014124988053e-3, estLowerBound = -4.717844538390807e-2, estUpperBound = 2.5034714490178326e-3, estConfidenceLevel = 0.95}],Estimate {estPoint = 0.999930103564569, estLowerBound = 0.9998776351901867, estUpperBound = NaN, estConfidenceLevel = 0.95})

This time, the NaN in the estUpperBound of r2 (the second field of the pair).

@Shimuuar
Copy link
Contributor

I'll look into it.

@Shimuuar
Copy link
Contributor

My guess is resample sometimes gives values where every point have same x and linear fit returns NaN. After that NaNs propagate

@Shimuuar
Copy link
Contributor

I was right about this one. I think problem appear in long running computations because they collect less samples and resample where all values are same is more likely.

@amigalemming
Copy link
Author

On Thu, 17 Nov 2016, Aleksey Khudyakov wrote:

I was right about this one. I think problem appear in long running
computations because they collect less samples and resample where all
values are same is more likely.

That matches my experience.

@RyanGlScott
Copy link
Member

Thanks for looking into this, @Shimuuar!

(For reference, the statistics issue is being tracked in haskell/statistics#111.)

@Shimuuar
Copy link
Contributor

I think proper solution will require changes to linear regression API and consequently bootstrapRegress. I think simply returning NaN is bad idea as this issue shown. Lack of unique solution should be reported in more obvious way like returning Nothing.

@amigalemming
Copy link
Author

On Thu, 17 Nov 2016, Aleksey Khudyakov wrote:

I think proper solution will require changes to linear regression API
and consequently bootstrapRegress. I think simply returning NaN is bad
idea as this issue shown. Lack of unique solution should be reported in
more obvious way like returning Nothing.

An alternative way would be to go the way of LAPACK or pseudo inverses,
that is, add another criterion: If the solution of linear regression is
not unique, the choose among all solutions the one with minimal squared
value. If you only have one data point linear regression would result in a
constant function with this additional condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants