New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: fix parsing of amounts with a whitespace digits separator #780

Merged
merged 1 commit into from May 20, 2018

Conversation

Projects
None yet
4 participants
@awjchen
Collaborator

awjchen commented May 20, 2018

This PR addresses a problem which was found in issue #749. This PR does not address that issue.

The problem was that we fail to parse the following journal

commodity 1 000  USD

2018-01-01
  (a)   USD1 000

failing with the following error

hledger: test.journal:1:12:
unexpected space
expecting digit

We fail to parse this next journal as well

2018-01-01
  (a)   USD1 000 ; comment
hledger: test2.journal:2:9:
unexpected 'U'
expecting '=', '{', end of input, or newline

It seems that the issue is caused by the use of a space as the digits separator and trailing whitespace after the amount. The current parser consumes the trailing whitespace as a separator and then expects at least another digit, which it does not find. This patch groups together the separator and the following digit group. With this patch, the above two journals parse successfully.

@wafflebot wafflebot bot added the in progress label May 20, 2018

@awjchen awjchen changed the title from lib: fix parsing error of number with whitespace separator to lib: fix parsing of amounts with a whitespace digits separator May 20, 2018

@ehildenb

This comment has been minimized.

Show comment
Hide comment
@ehildenb

ehildenb May 20, 2018

Contributor

Can you add a test or two for this case? Perhaps the above examples.

Contributor

ehildenb commented May 20, 2018

Can you add a test or two for this case? Perhaps the above examples.

@awjchen

This comment has been minimized.

Show comment
Hide comment
@awjchen

awjchen May 20, 2018

Collaborator

Sure, I'd be happy to add tests for these two examples.

Collaborator

awjchen commented May 20, 2018

Sure, I'd be happy to add tests for these two examples.

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael May 20, 2018

Owner

I added some functional tests for #749 in tests/journal/amounts-and-commodities.test. You can just uncomment them and make functest or COLUMNS=80 shelltest --execdir --hide-successes -j8 tests/journal/amounts-and-commodities.test [-i...].

Owner

simonmichael commented May 20, 2018

I added some functional tests for #749 in tests/journal/amounts-and-commodities.test. You can just uncomment them and make functest or COLUMNS=80 shelltest --execdir --hide-successes -j8 tests/journal/amounts-and-commodities.test [-i...].

groups <- some digitChar `sepBy1` char firstSep
return (Just firstSep, leadingDigits : groups)
secondGroup <- some digitChar
otherGroups <- many $ try $ char firstSep *> some digitChar

This comment has been minimized.

@ony

ony May 20, 2018

Collaborator

Hmm... Interesting how this is different from previous code that suppose to produce list of one or more elements with non-zero list of digit characters separated by firstSep? Am I missing something?

@ony

ony May 20, 2018

Collaborator

Hmm... Interesting how this is different from previous code that suppose to produce list of one or more elements with non-zero list of digit characters separated by firstSep? Am I missing something?

This comment has been minimized.

@awjchen

awjchen May 20, 2018

Collaborator

Yes, the new code also parses a list of one or more numbers separated by firstSep, where each number is a non-zero list of digit characters. The difference is only that I have added try so that a separator and the following digits are parsed together as an atomic unit. The old code failed to parse the above examples because it interpreted trailing whitespace as a separator. With the new code, whitespace will only be interpreted as a separator if digits follow it.

@awjchen

awjchen May 20, 2018

Collaborator

Yes, the new code also parses a list of one or more numbers separated by firstSep, where each number is a non-zero list of digit characters. The difference is only that I have added try so that a separator and the following digits are parsed together as an atomic unit. The old code failed to parse the above examples because it interpreted trailing whitespace as a separator. With the new code, whitespace will only be interpreted as a separator if digits follow it.

This comment has been minimized.

@ony

ony May 20, 2018

Collaborator

Hmm... Yep. Confirmed with test-case

-- >>> parseTest rawnumberp "1 000 USD"
-- (Just ' ', ["1","000"], Nothing)

(consider adding this doctest)

Apparently sepBy in megaparsec expects that if separator were parsed successfully then next entry is guaranteed to parse correctly.

Though I still cannot understand error message. notFollowedBy $ (whitespaceChar >> digitChar) should say something about unexpected space followed by digit. But not:

### Failure in /home/nikolay/contrib/hledger/hledger-lib/Hledger/Read/Common.hs:757: expression `parseTest rawnumberp "1 000 USD"'
expected: RawNumber (Just ' ') ["1","000"] Nothing
 but got: 1:2:
          unexpected space
          expecting digit
@ony

ony May 20, 2018

Collaborator

Hmm... Yep. Confirmed with test-case

-- >>> parseTest rawnumberp "1 000 USD"
-- (Just ' ', ["1","000"], Nothing)

(consider adding this doctest)

Apparently sepBy in megaparsec expects that if separator were parsed successfully then next entry is guaranteed to parse correctly.

Though I still cannot understand error message. notFollowedBy $ (whitespaceChar >> digitChar) should say something about unexpected space followed by digit. But not:

### Failure in /home/nikolay/contrib/hledger/hledger-lib/Hledger/Read/Common.hs:757: expression `parseTest rawnumberp "1 000 USD"'
expected: RawNumber (Just ' ') ["1","000"] Nothing
 but got: 1:2:
          unexpected space
          expecting digit

This comment has been minimized.

@awjchen

awjchen May 20, 2018

Collaborator

Regarding the error message:

The problem seems to be that notFollowedBy ... is given " 000 USD" as remaining text to parse, which triggers its second alternative. This happens because the first section of rawnumberp doesn't consume the " 000", which is because of back-tracking of try in option (...) . try $ do upon the parse failure at the groups <- some digitChar ... line.

@awjchen

awjchen May 20, 2018

Collaborator

Regarding the error message:

The problem seems to be that notFollowedBy ... is given " 000 USD" as remaining text to parse, which triggers its second alternative. This happens because the first section of rawnumberp doesn't consume the " 000", which is because of back-tracking of try in option (...) . try $ do upon the parse failure at the groups <- some digitChar ... line.

@awjchen

This comment has been minimized.

Show comment
Hide comment
@awjchen

awjchen May 20, 2018

Collaborator

I've uncommented tests 11 and 12 in tests/journal/amounts-and-commodities.test and updated this PR.

Test 10 is also relevant, but as it is, it does not yet pass since we have not fully resolved issue 749.

Collaborator

awjchen commented May 20, 2018

I've uncommented tests 11 and 12 in tests/journal/amounts-and-commodities.test and updated this PR.

Test 10 is also relevant, but as it is, it does not yet pass since we have not fully resolved issue 749.

@ony

ony approved these changes May 20, 2018

P.S. doctests are also useful

@awjchen

This comment has been minimized.

Show comment
Hide comment
@awjchen

awjchen May 20, 2018

Collaborator

Thanks for the suggestions, I'll try to keep doctests in mind when making fixes. Testing is something that I'm looking to get more experience with.

Also, I'm not familliar with how our tests are organized. Are our doctests intended for developers or users, or both? If they are for users of the API, I feel like we should put this test in a functest rather than a doctest, since it tests against an improper implementation of rawnumberp, which is something that shouldn't concern the user.

Collaborator

awjchen commented May 20, 2018

Thanks for the suggestions, I'll try to keep doctests in mind when making fixes. Testing is something that I'm looking to get more experience with.

Also, I'm not familliar with how our tests are organized. Are our doctests intended for developers or users, or both? If they are for users of the API, I feel like we should put this test in a functest rather than a doctest, since it tests against an improper implementation of rawnumberp, which is something that shouldn't concern the user.

@awjchen

This comment has been minimized.

Show comment
Hide comment
@awjchen

awjchen May 20, 2018

Collaborator

Having the approval of @ony (and myself), I will merge this PR.

Collaborator

awjchen commented May 20, 2018

Having the approval of @ony (and myself), I will merge this PR.

@awjchen awjchen merged commit 46aae19 into simonmichael:master May 20, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@wafflebot wafflebot bot removed the in progress label May 20, 2018

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael May 20, 2018

Owner

Thanks @awjchen for the fix and @ony for the review. The results look good.

In general, use whichever kind of test works best for you, we don't have a firm policy. But it's a good point about doctests being part of the API documentation. hunit tests run fastest but are a chore to write, shell tests run pretty fast and are easy to write and understand, doctests are easy to write and provide useful documentation but alas run extremely slowly. Please feel free to add notes to https://github.com/simonmichael/hledger/wiki/Tests.

Owner

simonmichael commented May 20, 2018

Thanks @awjchen for the fix and @ony for the review. The results look good.

In general, use whichever kind of test works best for you, we don't have a firm policy. But it's a good point about doctests being part of the API documentation. hunit tests run fastest but are a chore to write, shell tests run pretty fast and are easy to write and understand, doctests are easy to write and provide useful documentation but alas run extremely slowly. Please feel free to add notes to https://github.com/simonmichael/hledger/wiki/Tests.

@simonmichael

This comment has been minimized.

Show comment
Hide comment
@simonmichael

simonmichael May 20, 2018

Owner

easytests are new and might be easier than hunittests.

Owner

simonmichael commented May 20, 2018

easytests are new and might be easier than hunittests.

@ony

This comment has been minimized.

Show comment
Hide comment
@ony

ony May 21, 2018

Collaborator

@awjchen , doctests are simple unit-tests for developers that helps to catch errors that propagates further. Especially useful for parsers. They also help to understand some corner cases of API.

In this particular case issue were isolated because of extra check notFollowedBy in the end. But if that line would be missing you'll get error in a next parser or will get wrong interpretation.

Collaborator

ony commented May 21, 2018

@awjchen , doctests are simple unit-tests for developers that helps to catch errors that propagates further. Especially useful for parsers. They also help to understand some corner cases of API.

In this particular case issue were isolated because of extra check notFollowedBy in the end. But if that line would be missing you'll get error in a next parser or will get wrong interpretation.

@awjchen

This comment has been minimized.

Show comment
Hide comment
@awjchen

awjchen May 21, 2018

Collaborator

Ah, I haven't before considered whether issues were propgated or isolated. Thanks for the tips!

Collaborator

awjchen commented May 21, 2018

Ah, I haven't before considered whether issues were propgated or isolated. Thanks for the tips!

@awjchen awjchen deleted the awjchen:issue749example2 branch May 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment