get "xxx" in the report for long-running computations #65

amigalemming · 2014-10-06T14:45:57Z

If a computation runs quite long, say, longer than 0.5s, then I get a result like:

OLS regression  544 ms  570 ms  0 s
R² goodness-of-fit 0.999   0.999   xxx

in the HTML report and then all following regression results are "xxx", too. This also implies that the overview diagram is not generated.

The text was updated successfully, but these errors were encountered:

bitc · 2016-06-19T00:50:30Z

I have done a little investigating. This bug happens when the var reports = {{json}}; that is rendered into the HTML ends up containing JavaScript "null" values (rather than proper numbers):

var reports = [{"reportAnalysis":{"anMean":{"estUpperBound":3.8590750, ... null, ...

As the browser loads the page, a JavaScript error happens when it tries to run toFixed(3) on (the first of) these null values.

Where do these nulls come from? They appear to be coming from the JSON rendering of Statistics.Resampling.Bootstrap.Estimate values, which have all fields as strict Doubles. From playing around with Aeson, I noticed that it will render Double NaN and Infinity values as JavaScript "null":

Prelude Data.Aeson> encode (1/0 :: Double)
"null"
Prelude Data.Aeson> encode (0/0 :: Double)
"null"
Prelude Data.Aeson>

I am seeing "null" values end up in this field (among others):

{Report} -> {reportAnalysis :: SampleAnalysis} -> {anRegress :: [Regression]} !! 0 -> {regRSquare :: Bootstrap.Estimate} -> {estLowerBound :: !Double}

Somehow criterion is calculating this estLowerBound to be NaN or Infinity (At least according to the evidence laid out in this comment; I haven't actually witnessed the actual Haskell value)

This is as far as I have come with my investigation.

bitc · 2016-06-19T00:58:17Z

More evidence. I've spotted the NaNs in the command line output (Use your browser to search for "NaN"):

benchmarking fib/1
time                 38.35 ns   (38.28 ns .. 38.46 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 38.38 ns   (38.31 ns .. 38.59 ns)
std dev              372.5 ps   (176.5 ps .. 702.6 ps)

benchmarking fib/5
time                 899.5 ns   (897.6 ns .. 902.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 900.1 ns   (898.5 ns .. 902.0 ns)
std dev              5.975 ns   (4.724 ns .. 7.946 ns)

benchmarking fib/9
time                 6.145 μs   (6.137 μs .. 6.157 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 6.160 μs   (6.138 μs .. 6.258 μs)
std dev              135.1 ns   (23.39 ns .. 306.6 ns)
variance introduced by outliers: 24% (moderately inflated)

benchmarking fib/11
time                 16.11 μs   (16.07 μs .. 16.18 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.13 μs   (16.09 μs .. 16.29 μs)
std dev              242.6 ns   (96.99 ns .. 482.7 ns)
variance introduced by outliers: 11% (moderately inflated)

benchmarking fib/13
time                 42.28 μs   (42.02 μs .. 42.66 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 42.41 μs   (42.17 μs .. 42.91 μs)
std dev              1.089 μs   (573.8 ns .. 2.028 μs)
variance introduced by outliers: 25% (moderately inflated)

benchmarking fib/15
time                 110.7 μs   (110.3 μs .. 111.1 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 110.5 μs   (110.3 μs .. 110.9 μs)
std dev              903.4 ns   (634.4 ns .. 1.270 μs)

benchmarking fib/20
time                 1.215 ms   (1.210 ms .. 1.223 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.217 ms   (1.214 ms .. 1.222 ms)
std dev              13.26 μs   (7.718 μs .. 20.12 μs)

benchmarking fib/25
time                 13.61 ms   (13.53 ms .. 13.69 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 13.55 ms   (13.52 ms .. 13.58 ms)
std dev              83.90 μs   (58.05 μs .. 108.5 μs)

benchmarking fib/30
time                 149.7 ms   (149.1 ms .. 150.6 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 149.9 ms   (149.7 ms .. 150.2 ms)
std dev              337.9 μs   (213.8 μs .. 503.5 μs)
variance introduced by outliers: 12% (moderately inflated)

benchmarking fib/31
time                 242.2 ms   (241.7 ms .. 242.7 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 242.4 ms   (242.2 ms .. 242.6 ms)
std dev              274.6 μs   (118.4 μs .. 342.2 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking fib/32
time                 392.0 ms   (388.7 ms .. 394.7 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 392.3 ms   (392.1 ms .. 392.6 ms)
std dev              396.8 μs   (0.0 s .. 419.0 μs)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/33
time                 635.0 ms   (632.2 ms .. 639.1 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 634.7 ms   (633.6 ms .. 635.5 ms)
std dev              1.098 ms   (0.0 s .. 1.268 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/34
time                 1.028 s    (1.022 s .. NaN s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.028 s    (1.027 s .. 1.029 s)
std dev              1.023 ms   (0.0 s .. 1.083 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/35
time                 1.665 s    (1.651 s .. 1.675 s)
                     1.000 R²   (NaN R² .. 1.000 R²)
mean                 1.665 s    (1.664 s .. 1.667 s)
std dev              2.290 ms   (0.0 s .. 2.466 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/36
time                 2.697 s    (2.618 s .. 2.770 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.701 s    (2.692 s .. 2.710 s)
std dev              14.82 ms   (0.0 s .. 15.10 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/37
time                 4.426 s    (4.355 s .. 4.581 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 4.375 s    (4.354 s .. 4.395 s)
std dev              33.17 ms   (0.0 s .. 33.97 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fib/38
time                 7.314 s    (7.022 s .. 7.496 s)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 7.340 s    (7.297 s .. 7.370 s)
std dev              46.02 ms   (0.0 s .. 52.85 ms)
variance introduced by outliers: 19% (moderately inflated)

bitc · 2016-06-19T01:02:14Z

I currently am unable to dig further into this bug.

Hopefully the info above can point someone who is more familiar with the criterion codebase in the right direction.

rrnewton · 2016-08-16T14:25:29Z

@RyanGlScott, we had a problem with this as well, but now I can't remember what it was. Do you?

RyanGlScott · 2016-08-16T14:28:52Z

I think the issue might have been that the FromJSON Measured instance was doing something dodgy with null values behind the scenes, which we fixed with this commit.

Therefore, I'm tempted to claim this bug as fixed.

ghost · 2016-08-31T15:26:06Z

I'm also getting this with criterion-1.1.1.0.

rrnewton · 2016-09-01T19:47:43Z

Sadly, 1.1.1.0 doesn't include the linked commit. It sounds like it is time to do a 1.1.1.1 release for bugfixes. @RyanGlScott and @bos - any opinions about where to cut that?

There's currently a "1.2" branch corresponding to the statistics 0.14 release. The master branch also already has a number of commits relative to 1.1.1.0, which you can see here.

Additions like --json actually went in before the bugfix Ryan linked. We could cut a 1.1.1.1 that backports the fix to this issue. Or we could plow ahead and release a 1.1.3.1, which is the version master is already tagged with.

I'm not completely sure if that should be 1.1.X or 1.2, however. We also changed the default report format to json, so this is a breaking change that should require a major version bump. If we do the major version bump and push out 1.2 now, then the statistics-0.14 version can slide down to 1.3.

rrnewton · 2016-09-01T19:48:09Z

Could you, @tswilkinson and @bitc, see if you can reproduce the error on master?

RyanGlScott · 2016-11-15T14:19:45Z

Ack, I spoke too soon. I can still reproduce this error with criterion-1.1.4.0.

RyanGlScott · 2016-11-16T16:42:33Z

I did some digging into this recently, and I think I've narrowed the issue down to a function in statistics. Notice these lines in criterion:

(coeffs,r2) <- liftIO $
               bootstrapRegress gen resamples confInterval olsRegress ps r

I've managed to get coeffs and r2 values that contain NaN (non-determinsitically, though, since it depends on a PRNG gen). Here is how I reproduced it in GHCi (using actual values for ps and r that I recorded during a criterion session in which this bug happened):

$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression
λ> gen <- createSystemRandom
λ> :set -XOverloadedLists
λ> bootstrapRegress gen 1000 0.95 olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estLowerBound = 0.4809383656829592, estUpperBound = 0.4981065876781946, estConfidenceLevel = 0.95},Estimate {estPoint = -6.992014124988053e-3, estLowerBound = NaN, estUpperBound = 2.503471449018241e-3, estConfidenceLevel = 0.95}],Estimate {estPoint = 0.999930103564569, estLowerBound = 0.9998776351901867, estUpperBound = 1.0, estConfidenceLevel = 0.95})

Notice the NaN value in the first field of the pair that bootstrapRegress returns. (You may have to re-run that last line several times before the PRNG gives you a NaN value.)

@Shimuuar, do have any idea why bootstrapRegress might be giving NaN values? FWIW, this is with statistics-0.13.3.0.

RyanGlScott · 2016-11-16T16:49:14Z

Here's what I believe to be a more deterministic way of reproducing the bug, using initialize instead of createSystemRandom (the latter of which is what criterion actually uses internally):

$ ghci
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/ryanglscott/.ghci
λ> :m + System.Random.MWC Statistics.Regression Data.Word
λ> import qualified Data.Vector.Unboxed as U
λ> :set -XOverloadedLists
λ> gen <- initialize ([1..1000] :: U.Vector Word32)
λ> bootstrapRegress gen 1000 0.95 olsRegress [[1.0,2.0,3.0,4.0]] [0.4834418371319771,0.9643802028149366,1.4471413176506758,1.9452479053288698]
([Estimate {estPoint = 0.48681793194264117, estLowerBound = 0.4809383656829594, estUpperBound = 0.4911313727498061, estConfidenceLevel = 0.95},Estimate {estPoint = -6.992014124988053e-3, estLowerBound = -4.717844538390807e-2, estUpperBound = 2.5034714490178326e-3, estConfidenceLevel = 0.95}],Estimate {estPoint = 0.999930103564569, estLowerBound = 0.9998776351901867, estUpperBound = NaN, estConfidenceLevel = 0.95})

This time, the NaN in the estUpperBound of r2 (the second field of the pair).

Shimuuar · 2016-11-16T17:10:18Z

I'll look into it.

Shimuuar · 2016-11-16T17:21:25Z

My guess is resample sometimes gives values where every point have same x and linear fit returns NaN. After that NaNs propagate

Shimuuar · 2016-11-17T11:53:17Z

I was right about this one. I think problem appear in long running computations because they collect less samples and resample where all values are same is more likely.

amigalemming · 2016-11-17T12:23:33Z

On Thu, 17 Nov 2016, Aleksey Khudyakov wrote:

I was right about this one. I think problem appear in long running
computations because they collect less samples and resample where all
values are same is more likely.

That matches my experience.

RyanGlScott · 2016-11-17T13:31:39Z

Thanks for looking into this, @Shimuuar!

(For reference, the statistics issue is being tracked in haskell/statistics#111.)

Shimuuar · 2016-11-17T18:53:26Z

I think proper solution will require changes to linear regression API and consequently bootstrapRegress. I think simply returning NaN is bad idea as this issue shown. Lack of unique solution should be reported in more obvious way like returning Nothing.

amigalemming · 2016-11-17T19:03:40Z

On Thu, 17 Nov 2016, Aleksey Khudyakov wrote:

I think proper solution will require changes to linear regression API
and consequently bootstrapRegress. I think simply returning NaN is bad
idea as this issue shown. Lack of unique solution should be reported in
more obvious way like returning Nothing.

An alternative way would be to go the way of LAPACK or pseudo inverses,
that is, add another criterion: If the solution of linear regression is
not unique, the choose among all solutions the one with minimal squared
value. If you only have one data point linear regression would result in a
constant function with this additional condition.

RyanGlScott mentioned this issue Nov 15, 2016

HTML report missing jquery code #127

Closed

Shimuuar mentioned this issue Nov 17, 2016

Bootstrap for linear regression could return NaN haskell/statistics#111

Open

RyanGlScott added the Blocked upstream label Jun 23, 2017

RyanGlScott added the Bug label Jul 20, 2017

RyanGlScott mentioned this issue Oct 6, 2017

Transient "Sample is empty" runtime error. #161

Closed

dnadales mentioned this issue Dec 19, 2017

Move to LTS 10.0 TorXakis/TorXakis#552

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get "xxx" in the report for long-running computations #65

get "xxx" in the report for long-running computations #65

amigalemming commented Oct 6, 2014

bitc commented Jun 19, 2016

bitc commented Jun 19, 2016

bitc commented Jun 19, 2016

rrnewton commented Aug 16, 2016 •

edited

RyanGlScott commented Aug 16, 2016

ghost commented Aug 31, 2016

rrnewton commented Sep 1, 2016

rrnewton commented Sep 1, 2016

RyanGlScott commented Nov 15, 2016

RyanGlScott commented Nov 16, 2016 •

edited

RyanGlScott commented Nov 16, 2016

Shimuuar commented Nov 16, 2016

Shimuuar commented Nov 16, 2016

Shimuuar commented Nov 17, 2016

amigalemming commented Nov 17, 2016

RyanGlScott commented Nov 17, 2016

Shimuuar commented Nov 17, 2016

amigalemming commented Nov 17, 2016

get "xxx" in the report for long-running computations #65

get "xxx" in the report for long-running computations #65

Comments

amigalemming commented Oct 6, 2014

bitc commented Jun 19, 2016

bitc commented Jun 19, 2016

bitc commented Jun 19, 2016

rrnewton commented Aug 16, 2016 • edited

RyanGlScott commented Aug 16, 2016

ghost commented Aug 31, 2016

rrnewton commented Sep 1, 2016

rrnewton commented Sep 1, 2016

RyanGlScott commented Nov 15, 2016

RyanGlScott commented Nov 16, 2016 • edited

RyanGlScott commented Nov 16, 2016

Shimuuar commented Nov 16, 2016

Shimuuar commented Nov 16, 2016

Shimuuar commented Nov 17, 2016

amigalemming commented Nov 17, 2016

RyanGlScott commented Nov 17, 2016

Shimuuar commented Nov 17, 2016

amigalemming commented Nov 17, 2016

rrnewton commented Aug 16, 2016 •

edited

RyanGlScott commented Nov 16, 2016 •

edited