Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A possible bug in Text.Parsec.Token.float #35

Closed
mrkkrp opened this issue Apr 23, 2015 · 17 comments
Closed

A possible bug in Text.Parsec.Token.float #35

mrkkrp opened this issue Apr 23, 2015 · 17 comments

Comments

@mrkkrp
Copy link
Contributor

mrkkrp commented Apr 23, 2015

There may be a bug in Text.Parsec.Token.float. Please see this SO question for comprehensive description:

http://stackoverflow.com/questions/29820870/floating-point-numbers-precision-and-parsec

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jun 1, 2015

I'm sorry that I have to repeat myself, but this is an obvious bug. Will I it be ignored? If you're not willing to notice or fix it, will you accept pull request that fixes it?

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jun 9, 2015

I'm always curious when some open source/free software maintainer ignores an obvious bug. I always start to think why would any conscious person behave like this. At any rate, you could just drop a couple of lines, for example:

  • "I don't think this is a bug, because..."
  • "Yes, this is a bug, but I don't have time/desire to fix it, so pull requests are welcome!"
  • "OK, we'll see how to fix that."

or something.

But in this case, it's an obvious bug (so first option is not applicable) and it must be fixed. It can lead to situations when floating point data is read and parsed incorrectly by Parsec and who knows what consequences it can cause in production. So, I urge you, please consider fixing this bug, or at least giving us your opinion here.

If you do not have time to fix this bug, we are here to help you!

@albertnetymk
Copy link

I think it's a bug as well. I find the second answer, the one from Reid Barton, is easier to understand.

@aslatter
Copy link
Collaborator

aslatter commented Jul 5, 2015

Sorry for the radio silence on this ticket - I think it would be reasonable for Parsec to give similar behavior to read - anything else is likely to be surprising. I'm not a floating point expert, but it's not surprising to think that repeated multiplication would result in a loss of precision.

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 5, 2015

@aslatter, OK, I will try to find time and fix it. Also, I will add Bugs.Bug35 module to tests. You're using HUnit for testing. In this particular case QuickCheck would be the best solution, we could test against results of Haskell's own read function for example. However, since you're already using HUnit, I think it's better to add test cases with various tricky numbers. What do you think?

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 5, 2015

Interestingly, float fails on negative numbers too. Is it intended behavior? If it is, it should be mentioned in the documentation. If not, this also should be fixed.

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 5, 2015

I think it's also a bug. read can read negative numbers read "-1.0" :: Double -> 1.0, while float cannot: parse float "" "-1.0" ->

Left (line 1, column 1):
unexpected "-"
expecting float

I'm adding new test cases to test this too.

mrkkrp added a commit to mrkkrp/parsec that referenced this issue Jul 5, 2015
@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 5, 2015

Here is new test case, I don't have time right now to proceed to fix it, but this contribution should be useful as well.

@albertnetymk
Copy link

@aslatter I totally with you on that "parsec shall be consistent with read". However, using the following the snippet (more or less a reprint from the above SO question):

import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Language
import qualified Text.Parsec.Token as P

using_read :: Double
using_read = read "4.23808622486133"

float :: Parser Double
float = P.float (P.makeTokenParser emptyDef)

p = parse float "input"
main = do
  print $ parse float "input" "4.23808622486133"
  print $ using_read

We could see that the two are not consistent:

Right 4.2380862248613305
4.23808622486133

@albertnetymk
Copy link

@mrkkrp As the negative floating numbers, I looked up the def in haskell report 2010; the corresponding section doesn't say anything about negative literals, and it refers to section 3.4 for negative number literals.

After reading it, I think it makes sense to only parse number without + or -, both for integet and float. The same goes for all other number parsers. Then, let users to handle signs explicitly; maybe providing one function, sign, like this.

natural is not discussed at all in the report, I propose that we remove it.

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 5, 2015

@albertnetymk, Now I think that float works correctly in that it doesn't parse sign. I will remote that test case. As for integer, we should still parse sign there because it's how it worked for long time. We cannot just break logic of existing code that uses this behavior of integer. Also natural may be used in real applications, so let it exist, even though it's not in Haskell report. My point is that we should consider how Parsec already works in real applications and we should not change it in such radical ways only to achieve conceptual perfection.

@albertnetymk
Copy link

Backward compatibility is a big concern. I am just bringing it up so that this change could be added to match the haskell report in the future backward incompatible release, maybe parsec 4?

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 5, 2015

Well, I think it's up to the maintainer to decide.

mrkkrp added a commit to mrkkrp/parsec that referenced this issue Jul 6, 2015
@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 7, 2015

After a little research I conclude that it's impossible to re-create floating point number without losing precision by means of floating point arithmetic, since it has fixed precision and result will be always incorrect. See this for more information:

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

The only way to recreate floating point number is by careful manipulations on bit-level. This brings us to the dilemma:

  • should we manually do it;
  • or should we use existing read function?

read function is a good way to go. Even though it's partial, it's easy to make sure that incorrect input is never passed to it. We could parse different parts of floating point number much like how it's currently done, but then form string representation of the number and let it go through read function. If you don't like the fact that read function is partial, there is reads, which can be tuned to do what we need.

This problem is not trivial. Complex C libraries on OS level usually do such sort of thing, and high-level languages use such libraries.

See also this SO questions and answers for more information:

http://stackoverflow.com/questions/85223/how-to-manually-parse-a-floating-point-number-from-a-string

@albertnetymk
Copy link

The first approach could be brittle, for in order to keep parsec consistent with read, it's subject to the change in read. I vote for the second approach, for it's always consistent with read.

@mrkkrp
Copy link
Contributor Author

mrkkrp commented Jul 30, 2015

@albertnetymk, if you're still interested in Parsec, I've created a fork of the project called megaparsec. I've changed many things already, but all that work is preparatory. I've removed many compatibility things and improved documentation (typos, code examples, it's 100% covered now), also I've reworked about 70% of original code-base, mainly for cosmetic reasons. Now I'm simply going to fix all the issues that are hanging here for a start, after that I will build complete QuickCheck test suite. If you're interested please propose changes. I'll probably remove natural parser as you suggested, for example. Feel free to open issues or pull requests.

@albertnetymk
Copy link

@mrkkrp Excellent work. I would watch the new repo.

hdgarrood added a commit to hdgarrood/parsec that referenced this issue May 13, 2016
hdgarrood pushed a commit to hdgarrood/parsec that referenced this issue May 13, 2016
int-index pushed a commit to int-index/parsec that referenced this issue Sep 18, 2020
Suppress hsc2hs-related warning on GHC HEAD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants