[DRAFT] Property test for numbers #8

JonathanLorimer · 2019-09-28T15:57:59Z

This is just a draft PR so you can see the diff of what I am doing

sshine

Hi, and thanks for your pull request.

hs-jq is a hobby project where the bar for contributions has not been defined, but is currently low. On the other hand, I also enjoy teaching (and learning in the process), so let me know how much of the feedback below you wish to apply and where you would like to strike the balance and say "LGTM".

By "full syntactic span" I mean that I'd like that this Gen Text produces all possible strings that can be interpreted as valid JSON numbers according to the documentation collected in #5. This includes, but is not limited to, the variations below.

The best way to attack this problem, I think, is by producing sub-generators for each component of a JSON number, and then concatenating them in the following way: Like Gen.element ["", "-"] covers the sets of equivalence classes of JSON numbers that either have or haven't got a sign, the next partitioning you could make is Gen.choice [ hasIntegerPart, hasntIntegerPart ]:

Since a JSON number can't take the form ., it has to be either <something>. or .<something>. For the case hasIntegerPart, the first sub-generator you may want to address is a Gen.choice [ "", fractionPart ], and for the case hasntIntegerPart, you have to assume that it has a fractionPart, and in both cases, continue to the optional exponent part.

You could see the entire set of equivalence classes of jsonNumberGen as a cartesian product of the equivalence classes of its sub-generators. That way the variations listed below can be exhausted without having to think "Did I forget the combination where there isn't an integer part, but there is a fraction part, and the fraction exceeds a double, and it's followed by a negative exponent?" because this is the losing battle of unit testing.

Variations:

Integers with an arbitrary magnitude
```
0
1
1000
734257894365982734695234
```
Integers with arbitrary signs
```
-0
-1
-734257894365982734695234
```

Fractionals with an arbitrary magnitude and an arbitrary fraction

1.0
2.5
5555555555555555555555555.0
0.1
0.14159
0.5555555555555555555555555
5555555555555555555555555.5555555555555555555555555

Fractionals with optional integer/fractional part

0.
.0
1.
.5
.5555555555555555555555555
5555555555555555555555555.
-0.
-.0
-1.
-.5
-.5555555555555555555555555
-5555555555555555555555555.

Numbers with an exponent

...e0
...E0
...e1
...E1
...e10
...E10
...e+0
...E+0
...e-0
...E-0
...e+50
...E+500
...e-5000

sshine · 2019-09-29T20:50:47Z

test/Generators.hs

@@ -16,16 +16,26 @@ import qualified Hedgehog.Range as Range
 -- DISCREPANCY: 'jq' allows both 000 in input and its own number literals,
 -- but JSON spec does not. We'll stay closest to 'jq' by doing this, too.

+integer :: Integer -> Gen Text
+integer limit = Text.pack . show <$>
+      (Gen.integral $ Range.constant 0 limit)


This integer generator uses an explicit Integer -> Gen ... approach to sizing.

This is very much how I'd do it in QuickCheck using sized:

integer :: Int -> Gen Text integer n = Text.pack . show <$> QC.choose (0, n) > sample (sized integer) "0" "0" "2" "6" ...

While Hedgehog also has the ability to explicitly parameterize a generator with a size, it uses a specific type called Size. The size, like in QuickCheck, is on an abstract scale that varies depending on the generator. Unlike QuickCheck, Hedgehog allows scaling an abstract size in a lot of ways so rescaling can be made convenient without making the Size parameter explicit all the time.

For example, for a generic Gen Integer we may want to grow it exponential (proportional to the internal size parameter) to see how it handles many digits. But for the "exponent" part of a Scientific, we probably want to grow it linearBounded (proportional to the internal size parameter) by the total range of Scientific's base10Exponent :: Int.

I'm still learning to make Hedgehog generators, so perhaps you can tell me if this looks about right:

integer :: Gen Integer integer = Gen.integral (Range.exponentialFrom 0 (-128) 128)

-128 and 128 were picked arbitrarily to suggest that the generated numbers should exceed a 32-bit and 64-bit word boundary, but other than that, I had no other justification for them, other than the fact that there needs to be some finite boundary.

I investigated how big numbers jq can handle and placed the results in #4:

It appears that 10^-324 and 10^309 are better bounds.

test/Generators.hs

JonathanLorimer · 2019-09-30T23:39:22Z

@sshine Thank you for all the feedback, your patience is seriously appreciated. I pushed some new changes to the generators so that they should now cover all of the cases you outlined.

Lo and behold the property testing caught a bug in the parser.

  NumLit:                              FAIL
      ✗ NumLit failed after 3 tests and 2 shrinks.

            ┏━━ test/ParserTest.hs ━━━
        147 ┃ hprop_NumLit :: Property
        148 ┃ hprop_NumLit = property $ do
        149 ┃   genNum <- forAll $ jsonNumberGen
            ┃   │ "-.1"
        150 ┃
        151 ┃   let nums = do
        152 ┃         jqNum <- eTm $ parseExpr genNum
        153 ┃         haskNum <- fmap NumLit (readMaybe $ Text.unpack genNum)
        154 ┃         pure (jqNum, haskNum)
        155 ┃
        156 ┃   case nums of
        157 ┃     Nothing -> failure
            ┃     ^^^^^^^^^^^^^^^^^^
        158 ┃     Just (j,h) -> j === h
        159 ┃   where
        160 ┃     eTm = either (const Nothing) Just

the generator does not handle fractional with optional digit part and a sign i.e. -.0

I am happy to go and add this functionality to the parser. Would you like me to do that in this PR or create a new branch/PR?

@JonathanLorimer

@JonathanLorimer shows in #8 that the number parser fails on ".1". This appears to be a limitation of Megaparsec's built-in 'scientific' parser. However, having changed prefix '-' parsing in 'makeExprParser' in d9763d makes it also fail for a different reason. This commit adds two unit tests that fail in each their way, since the prefix '-' change hides the 'number' parser failure.

Beyond the problem with prefix '-', the Hedgehog generator found another problem with '.1' being parsed prematurely as the 'Identity' jq filter. Problem 1: The parser needs to address that '.<anything>' could be 'Identity', or any of the other combinators listed in Parser's 'dotExpr' sub-parser *including* the currently overshadowed '.1' of the 'number' sub-parser. Until this is resolved, I'll leave the outcome of parsing '-.1' up in the air. In summary, this commit highlights the two problems: - '.1' cannot be determined as 'NumLit 0.1' with lookahead 1. - '-.1' should be either 'Neg (NumLit .1)' or 'Numlit (-0.1)'

sshine

Excellent work, and very positive to see that it found a bug I didn't know about.

It's these moments that enforce one's belief in property-based testing.

your patience is seriously appreciated

I'm happy my patience doesn't get interpreted as pedantry. ;-D

The test case can be shrunk further to ".1".

I've pushed 022086f onto master that isolates the "-.1" and ".1" bugs into unit tests because d9763d3, which I pushed 3 days ago, actually introduces another bug wrt. prefix '-'. (whoops!) It was necessary to change the way the parser is exported to expose number :: Parser Scientific directly.

If you could rebase, I hope I didn't cause too many merge conflicts. The commit 7d892ef does move the source files from Data.Aeson.Jq into Jq, sorry for that.

I am happy to go and add this functionality to the parser. Would you like me to do that in this PR or create a new branch/PR?

If you can make the generator cover the last equivalence classes (if I'm not mistaken that they're missing), and you can rebase successfully, I think this draft PR should be discarded in favor of one that has a timestamp in October. Then you are very welcome to resolve the bugs in a separate PR (or two, depending on your approach).

See further comments in the commit messages that reference this PR.

sshine · 2019-10-01T18:36:40Z

test/Generators.hs

+
+expGen :: Gen Text
+expGen = Text.pack . show <$>
+  (Gen.integral (Range.linearFrom 0 (-10^16) (10^16)))


Instead of (-10^16) and (10^16), why not ~~1e-16~~ (-1e16) and 1e16?

Since this is only used in scientificNumberGen, maybe move it there to add clarity that it specifically is the exponent to a Scientific. Also, how about using Range.linearBounded :: Range Int? Scientific's docs say:

-- Note that since we're using an 'Int' to represent the exponent these numbers -- aren't truly arbitrary precision. I intend to change the type of the exponent -- to 'Integer' in a future release.

but we'll just go with the Int assumption for now.

Also, thanks for figuring out how Range.* work! ;-)

This range generates an integer to go in the exponent place base ^ exponent, it doesn't actually generate the exponent itself. That is why 10^16, 10e16 isn't valid haskell.

Additionally, I am using linearFrom because you mentioned that you wanted the exponent value to scale linearly

But for the "exponent" part of a Scientific, we probably want to grow it linearBounded (proportional to the internal size parameter) by the total range of Scientific's base10Exponent :: Int.

test/Generators.hs

sshine · 2019-10-01T20:10:18Z

The test case can be shrunk further to ".1".

I wonder why the generator didn't do that. We can try and look into that later.

JonathanLorimer · 2019-10-01T20:34:54Z

The test case can be shrunk further to ".1".

I wonder why the generator didn't do that. We can try and look into that later.

My guess is that it is because there is a (ostensibly) 50/50 choice between "" and "-" using Gen.element, which is something that I don't think can be shrunk since it isn't sized.

sshine · 2019-10-02T08:08:51Z

According to Gen.element's docs (and incidentally my assumption):

This generator shrinks towards the first generator in the list.

JonathanLorimer · 2019-10-02T21:07:41Z

According to Gen.element's docs (and incidentally my assumption):

This generator shrinks towards the first generator in the list.

Good catch, I moved the empty strings to the beginning of the lists for the sign generators. The assumption is that unsigned is a more minimal case than signed

sshine · 2019-10-03T23:31:38Z

Excellent work.

Do you want to close this PR and reopen a non-draft one?

I'm asking mainly because of Hacktoberfest points.

sshine reviewed Sep 30, 2019

View reviewed changes

sshine reviewed Oct 1, 2019

View reviewed changes

JonathanLorimer added 7 commits October 2, 2019 16:40

failing roundtrip set up

c83c1cd

got hask jq equality working

376fe72

import cleanup

4c3570c

hprop_NumLit cleanup

2a91c31

more imports cleanup

8e8b2e6

added scientific

2adf763

revised generators

c03abb9

JonathanLorimer force-pushed the property-test-for-numbers branch from 40a0b24 to c03abb9 Compare October 2, 2019 20:41

JonathanLorimer added 2 commits October 2, 2019 16:59

updated scientific case

25dc89d

covered all scientific cases

bfc5125

JonathanLorimer closed this Oct 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Property test for numbers #8

[DRAFT] Property test for numbers #8

JonathanLorimer commented Sep 28, 2019

sshine left a comment •

edited

sshine Sep 29, 2019 •

edited

sshine Sep 30, 2019

JonathanLorimer commented Sep 30, 2019

sshine left a comment

sshine Oct 1, 2019 •

edited

JonathanLorimer Oct 2, 2019

sshine commented Oct 1, 2019

JonathanLorimer commented Oct 1, 2019

sshine commented Oct 2, 2019

JonathanLorimer commented Oct 2, 2019

sshine commented Oct 3, 2019

[DRAFT] Property test for numbers #8

[DRAFT] Property test for numbers #8

Conversation

JonathanLorimer commented Sep 28, 2019

sshine left a comment • edited

Choose a reason for hiding this comment

sshine Sep 29, 2019 • edited

Choose a reason for hiding this comment

sshine Sep 30, 2019

Choose a reason for hiding this comment

JonathanLorimer commented Sep 30, 2019

sshine left a comment

Choose a reason for hiding this comment

sshine Oct 1, 2019 • edited

Choose a reason for hiding this comment

JonathanLorimer Oct 2, 2019

Choose a reason for hiding this comment

sshine commented Oct 1, 2019

JonathanLorimer commented Oct 1, 2019

sshine commented Oct 2, 2019

JonathanLorimer commented Oct 2, 2019

sshine commented Oct 3, 2019

sshine left a comment •

edited

sshine Sep 29, 2019 •

edited

sshine Oct 1, 2019 •

edited