Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Property test for numbers #8

Closed

Conversation

JonathanLorimer
Copy link
Contributor

This is just a draft PR so you can see the diff of what I am doing

Copy link
Owner

@sshine sshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, and thanks for your pull request.

hs-jq is a hobby project where the bar for contributions has not been defined, but is currently low. On the other hand, I also enjoy teaching (and learning in the process), so let me know how much of the feedback below you wish to apply and where you would like to strike the balance and say "LGTM".


By "full syntactic span" I mean that I'd like that this Gen Text produces all possible strings that can be interpreted as valid JSON numbers according to the documentation collected in #5. This includes, but is not limited to, the variations below.

The best way to attack this problem, I think, is by producing sub-generators for each component of a JSON number, and then concatenating them in the following way: Like Gen.element ["", "-"] covers the sets of equivalence classes of JSON numbers that either have or haven't got a sign, the next partitioning you could make is Gen.choice [ hasIntegerPart, hasntIntegerPart ]:

Since a JSON number can't take the form ., it has to be either <something>. or .<something>. For the case hasIntegerPart, the first sub-generator you may want to address is a Gen.choice [ "", fractionPart ], and for the case hasntIntegerPart, you have to assume that it has a fractionPart, and in both cases, continue to the optional exponent part.

You could see the entire set of equivalence classes of jsonNumberGen as a cartesian product of the equivalence classes of its sub-generators. That way the variations listed below can be exhausted without having to think "Did I forget the combination where there isn't an integer part, but there is a fraction part, and the fraction exceeds a double, and it's followed by a negative exponent?" because this is the losing battle of unit testing.


Variations:

  • Integers with an arbitrary magnitude

    0
    1
    1000
    734257894365982734695234
    
  • Integers with arbitrary signs

    -0
    -1
    -734257894365982734695234
    
  • Fractionals with an arbitrary magnitude and an arbitrary fraction

    1.0
    2.5
    5555555555555555555555555.0
    0.1
    0.14159
    0.5555555555555555555555555
    5555555555555555555555555.5555555555555555555555555
    
  • Fractionals with optional integer/fractional part

    0.
    .0
    1.
    .5
    .5555555555555555555555555
    5555555555555555555555555.
    -0.
    -.0
    -1.
    -.5
    -.5555555555555555555555555
    -5555555555555555555555555.
    
  • Numbers with an exponent

    ...e0
    ...E0
    ...e1
    ...E1
    ...e10
    ...E10
    ...e+0
    ...E+0
    ...e-0
    ...E-0
    ...e+50
    ...E+500
    ...e-5000
    

@@ -16,16 +16,26 @@ import qualified Hedgehog.Range as Range
-- DISCREPANCY: 'jq' allows both 000 in input and its own number literals,
-- but JSON spec does not. We'll stay closest to 'jq' by doing this, too.

integer :: Integer -> Gen Text
integer limit = Text.pack . show <$>
(Gen.integral $ Range.constant 0 limit)
Copy link
Owner

@sshine sshine Sep 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integer generator uses an explicit Integer -> Gen ... approach to sizing.

This is very much how I'd do it in QuickCheck using sized:

integer :: Int -> Gen Text
integer n = Text.pack . show <$> QC.choose (0, n)

> sample (sized integer)
"0"
"0"
"2"
"6"
...

While Hedgehog also has the ability to explicitly parameterize a generator with a size, it uses a specific type called Size. The size, like in QuickCheck, is on an abstract scale that varies depending on the generator. Unlike QuickCheck, Hedgehog allows scaling an abstract size in a lot of ways so rescaling can be made convenient without making the Size parameter explicit all the time.

For example, for a generic Gen Integer we may want to grow it exponential (proportional to the internal size parameter) to see how it handles many digits. But for the "exponent" part of a Scientific, we probably want to grow it linearBounded (proportional to the internal size parameter) by the total range of Scientific's base10Exponent :: Int.

I'm still learning to make Hedgehog generators, so perhaps you can tell me if this looks about right:

integer :: Gen Integer
integer = Gen.integral (Range.exponentialFrom 0 (-128) 128)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-128 and 128 were picked arbitrarily to suggest that the generated numbers should exceed a 32-bit and 64-bit word boundary, but other than that, I had no other justification for them, other than the fact that there needs to be some finite boundary.

I investigated how big numbers jq can handle and placed the results in #4:

It appears that 10^-324 and 10^309 are better bounds.

test/Generators.hs Outdated Show resolved Hide resolved
@JonathanLorimer
Copy link
Contributor Author

@sshine Thank you for all the feedback, your patience is seriously appreciated. I pushed some new changes to the generators so that they should now cover all of the cases you outlined.

Lo and behold the property testing caught a bug in the parser.

  NumLit:                              FAIL
      ✗ NumLit failed after 3 tests and 2 shrinks.

            ┏━━ test/ParserTest.hs ━━━
        147 ┃ hprop_NumLit :: Property
        148 ┃ hprop_NumLit = property $ do
        149 ┃   genNum <- forAll $ jsonNumberGen
            ┃   │ "-.1"
        150 ┃
        151 ┃   let nums = do
        152 ┃         jqNum <- eTm $ parseExpr genNum
        153 ┃         haskNum <- fmap NumLit (readMaybe $ Text.unpack genNum)
        154 ┃         pure (jqNum, haskNum)
        155 ┃
        156 ┃   case nums of
        157 ┃     Nothing -> failure
            ┃     ^^^^^^^^^^^^^^^^^^
        158 ┃     Just (j,h) -> j === h
        159 ┃   where
        160 ┃     eTm = either (const Nothing) Just

the generator does not handle fractional with optional digit part and a sign i.e. -.0

I am happy to go and add this functionality to the parser. Would you like me to do that in this PR or create a new branch/PR?

sshine added a commit that referenced this pull request Oct 1, 2019
@JonathanLorimer shows in #8 that the number parser fails on ".1". This
appears to be a limitation of Megaparsec's built-in 'scientific' parser.

However, having changed prefix '-' parsing in 'makeExprParser' in d9763d
makes it also fail for a different reason. This commit adds two unit
tests that fail in each their way, since the prefix '-' change hides the
'number' parser failure.
sshine added a commit that referenced this pull request Oct 1, 2019
Beyond the problem with prefix '-', the Hedgehog generator found another
problem with '.1' being parsed prematurely as the 'Identity' jq filter.

Problem 1:

The parser needs to address that '.<anything>' could be 'Identity', or
any of the other combinators listed in Parser's 'dotExpr' sub-parser
*including* the currently overshadowed '.1' of the 'number' sub-parser.

Until this is resolved, I'll leave the outcome of parsing '-.1' up in
the air.

In summary, this commit highlights the two problems:
 - '.1' cannot be determined as 'NumLit 0.1' with lookahead 1.
 - '-.1' should be either 'Neg (NumLit .1)' or 'Numlit (-0.1)'
Copy link
Owner

@sshine sshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, and very positive to see that it found a bug I didn't know about.

It's these moments that enforce one's belief in property-based testing.

your patience is seriously appreciated

I'm happy my patience doesn't get interpreted as pedantry. ;-D

The test case can be shrunk further to ".1".

I've pushed 022086f onto master that isolates the "-.1" and ".1" bugs into unit tests because d9763d3, which I pushed 3 days ago, actually introduces another bug wrt. prefix '-'. (whoops!) It was necessary to change the way the parser is exported to expose number :: Parser Scientific directly.

If you could rebase, I hope I didn't cause too many merge conflicts. The commit 7d892ef does move the source files from Data.Aeson.Jq into Jq, sorry for that.

I am happy to go and add this functionality to the parser. Would you like me to do that in this PR or create a new branch/PR?

If you can make the generator cover the last equivalence classes (if I'm not mistaken that they're missing), and you can rebase successfully, I think this draft PR should be discarded in favor of one that has a timestamp in October. Then you are very welcome to resolve the bugs in a separate PR (or two, depending on your approach).

See further comments in the commit messages that reference this PR.


expGen :: Gen Text
expGen = Text.pack . show <$>
(Gen.integral (Range.linearFrom 0 (-10^16) (10^16)))
Copy link
Owner

@sshine sshine Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of (-10^16) and (10^16), why not 1e-16 (-1e16) and 1e16?

Since this is only used in scientificNumberGen, maybe move it there to add clarity that it specifically is the exponent to a Scientific. Also, how about using Range.linearBounded :: Range Int? Scientific's docs say:

-- Note that since we're using an 'Int' to represent the exponent these numbers
-- aren't truly arbitrary precision. I intend to change the type of the exponent
-- to 'Integer' in a future release.

but we'll just go with the Int assumption for now.

Also, thanks for figuring out how Range.* work! ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range generates an integer to go in the exponent place base ^ exponent, it doesn't actually generate the exponent itself. That is why 10^16, 10e16 isn't valid haskell.

Additionally, I am using linearFrom because you mentioned that you wanted the exponent value to scale linearly

But for the "exponent" part of a Scientific, we probably want to grow it linearBounded (proportional to the internal size parameter) by the total range of Scientific's base10Exponent :: Int.

test/Generators.hs Show resolved Hide resolved
test/Generators.hs Outdated Show resolved Hide resolved
test/Generators.hs Show resolved Hide resolved
@sshine
Copy link
Owner

sshine commented Oct 1, 2019

The test case can be shrunk further to ".1".

I wonder why the generator didn't do that. We can try and look into that later.

@JonathanLorimer
Copy link
Contributor Author

The test case can be shrunk further to ".1".

I wonder why the generator didn't do that. We can try and look into that later.

My guess is that it is because there is a (ostensibly) 50/50 choice between "" and "-" using Gen.element, which is something that I don't think can be shrunk since it isn't sized.

@sshine
Copy link
Owner

sshine commented Oct 2, 2019

According to Gen.element's docs (and incidentally my assumption):

This generator shrinks towards the first generator in the list.

@JonathanLorimer
Copy link
Contributor Author

According to Gen.element's docs (and incidentally my assumption):

This generator shrinks towards the first generator in the list.

Good catch, I moved the empty strings to the beginning of the lists for the sign generators. The assumption is that unsigned is a more minimal case than signed

@sshine
Copy link
Owner

sshine commented Oct 3, 2019

Excellent work.

Do you want to close this PR and reopen a non-draft one?

I'm asking mainly because of Hacktoberfest points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants