A possible bug in Text.Parsec.Token.float #35

mrkkrp · 2015-04-23T13:07:22Z

There may be a bug in Text.Parsec.Token.float. Please see this SO question for comprehensive description:

http://stackoverflow.com/questions/29820870/floating-point-numbers-precision-and-parsec

The text was updated successfully, but these errors were encountered:

mrkkrp · 2015-06-01T05:32:12Z

I'm sorry that I have to repeat myself, but this is an obvious bug. Will I it be ignored? If you're not willing to notice or fix it, will you accept pull request that fixes it?

mrkkrp · 2015-06-09T18:14:20Z

I'm always curious when some open source/free software maintainer ignores an obvious bug. I always start to think why would any conscious person behave like this. At any rate, you could just drop a couple of lines, for example:

"I don't think this is a bug, because..."
"Yes, this is a bug, but I don't have time/desire to fix it, so pull requests are welcome!"
"OK, we'll see how to fix that."

or something.

But in this case, it's an obvious bug (so first option is not applicable) and it must be fixed. It can lead to situations when floating point data is read and parsed incorrectly by Parsec and who knows what consequences it can cause in production. So, I urge you, please consider fixing this bug, or at least giving us your opinion here.

If you do not have time to fix this bug, we are here to help you!

albertnetymk · 2015-07-04T10:52:47Z

I think it's a bug as well. I find the second answer, the one from Reid Barton, is easier to understand.

aslatter · 2015-07-05T02:28:18Z

Sorry for the radio silence on this ticket - I think it would be reasonable for Parsec to give similar behavior to read - anything else is likely to be surprising. I'm not a floating point expert, but it's not surprising to think that repeated multiplication would result in a loss of precision.

mrkkrp · 2015-07-05T08:44:10Z

@aslatter, OK, I will try to find time and fix it. Also, I will add Bugs.Bug35 module to tests. You're using HUnit for testing. In this particular case QuickCheck would be the best solution, we could test against results of Haskell's own read function for example. However, since you're already using HUnit, I think it's better to add test cases with various tricky numbers. What do you think?

mrkkrp · 2015-07-05T11:05:36Z

Interestingly, float fails on negative numbers too. Is it intended behavior? If it is, it should be mentioned in the documentation. If not, this also should be fixed.

mrkkrp · 2015-07-05T11:13:51Z

I think it's also a bug. read can read negative numbers read "-1.0" :: Double -> 1.0, while float cannot: parse float "" "-1.0" ->

Left (line 1, column 1):
unexpected "-"
expecting float

I'm adding new test cases to test this too.

mrkkrp · 2015-07-05T11:44:36Z

Here is new test case, I don't have time right now to proceed to fix it, but this contribution should be useful as well.

albertnetymk · 2015-07-05T17:23:10Z

@aslatter I totally with you on that "parsec shall be consistent with read". However, using the following the snippet (more or less a reprint from the above SO question):

import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Language
import qualified Text.Parsec.Token as P

using_read :: Double
using_read = read "4.23808622486133"

float :: Parser Double
float = P.float (P.makeTokenParser emptyDef)

p = parse float "input"
main = do
  print $ parse float "input" "4.23808622486133"
  print $ using_read

We could see that the two are not consistent:

Right 4.2380862248613305
4.23808622486133

albertnetymk · 2015-07-05T17:36:34Z

@mrkkrp As the negative floating numbers, I looked up the def in haskell report 2010; the corresponding section doesn't say anything about negative literals, and it refers to section 3.4 for negative number literals.

After reading it, I think it makes sense to only parse number without + or -, both for integet and float. The same goes for all other number parsers. Then, let users to handle signs explicitly; maybe providing one function, sign, like this.

natural is not discussed at all in the report, I propose that we remove it.

mrkkrp · 2015-07-05T17:53:56Z

@albertnetymk, Now I think that float works correctly in that it doesn't parse sign. I will remote that test case. As for integer, we should still parse sign there because it's how it worked for long time. We cannot just break logic of existing code that uses this behavior of integer. Also natural may be used in real applications, so let it exist, even though it's not in Haskell report. My point is that we should consider how Parsec already works in real applications and we should not change it in such radical ways only to achieve conceptual perfection.

albertnetymk · 2015-07-05T18:00:35Z

Backward compatibility is a big concern. I am just bringing it up so that this change could be added to match the haskell report in the future backward incompatible release, maybe parsec 4?

mrkkrp · 2015-07-05T18:04:17Z

Well, I think it's up to the maintainer to decide.

mrkkrp · 2015-07-07T08:17:55Z

After a little research I conclude that it's impossible to re-create floating point number without losing precision by means of floating point arithmetic, since it has fixed precision and result will be always incorrect. See this for more information:

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

The only way to recreate floating point number is by careful manipulations on bit-level. This brings us to the dilemma:

should we manually do it;
or should we use existing read function?

read function is a good way to go. Even though it's partial, it's easy to make sure that incorrect input is never passed to it. We could parse different parts of floating point number much like how it's currently done, but then form string representation of the number and let it go through read function. If you don't like the fact that read function is partial, there is reads, which can be tuned to do what we need.

This problem is not trivial. Complex C libraries on OS level usually do such sort of thing, and high-level languages use such libraries.

See also this SO questions and answers for more information:

http://stackoverflow.com/questions/85223/how-to-manually-parse-a-floating-point-number-from-a-string

albertnetymk · 2015-07-07T12:31:42Z

The first approach could be brittle, for in order to keep parsec consistent with read, it's subject to the change in read. I vote for the second approach, for it's always consistent with read.

mrkkrp · 2015-07-30T18:55:18Z

@albertnetymk, if you're still interested in Parsec, I've created a fork of the project called megaparsec. I've changed many things already, but all that work is preparatory. I've removed many compatibility things and improved documentation (typos, code examples, it's 100% covered now), also I've reworked about 70% of original code-base, mainly for cosmetic reasons. Now I'm simply going to fix all the issues that are hanging here for a start, after that I will build complete QuickCheck test suite. If you're interested please propose changes. I'll probably remove natural parser as you suggested, for example. Feel free to open issues or pull requests.

albertnetymk · 2015-07-30T19:22:35Z

@mrkkrp Excellent work. I would watch the new repo.

Suppress hsc2hs-related warning on GHC HEAD

mrkkrp added a commit to mrkkrp/parsec that referenced this issue Jul 5, 2015

added test case for bug haskell#35

8bd789f

mrkkrp mentioned this issue Jul 5, 2015

Test for #35 added + some cosmetic changes #40

Closed

mrkkrp mentioned this issue Jul 5, 2015

integer dosen't fail for leading white spaces #39

Open

mrkkrp added a commit to mrkkrp/parsec that referenced this issue Jul 6, 2015

added failing tests for haskell#35, disabled by default

079a9e5

mrkkrp mentioned this issue Jul 30, 2015

A possible bug in Text.Parsec.Token.float mrkkrp/megaparsec#2

Closed

mrkkrp mentioned this issue Sep 9, 2015

Discussion about new lexer mrkkrp/megaparsec#28

Closed

hdgarrood mentioned this issue May 12, 2016

Floating point literals are not parsed accurately purescript/purescript#2115

Closed

hdgarrood added a commit to hdgarrood/parsec that referenced this issue May 13, 2016

Fix haskell#35, fix parsing of floating point values

108fde6

hdgarrood pushed a commit to hdgarrood/parsec that referenced this issue May 13, 2016

added failing tests for haskell#35, disabled by default

c5efc8e

hdgarrood mentioned this issue May 13, 2016

Fix float parsing #54

Merged

aslatter closed this as completed in #54 May 13, 2016

int-index pushed a commit to int-index/parsec that referenced this issue Sep 18, 2020

Merge pull request haskell#35 from RyanGlScott/master

5e53651

Suppress hsc2hs-related warning on GHC HEAD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A possible bug in Text.Parsec.Token.float #35

A possible bug in Text.Parsec.Token.float #35

mrkkrp commented Apr 23, 2015

mrkkrp commented Jun 1, 2015

mrkkrp commented Jun 9, 2015

albertnetymk commented Jul 4, 2015

aslatter commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

albertnetymk commented Jul 5, 2015

albertnetymk commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

albertnetymk commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 7, 2015

albertnetymk commented Jul 7, 2015

mrkkrp commented Jul 30, 2015

albertnetymk commented Jul 30, 2015

A possible bug in Text.Parsec.Token.float #35

A possible bug in Text.Parsec.Token.float #35

Comments

mrkkrp commented Apr 23, 2015

mrkkrp commented Jun 1, 2015

mrkkrp commented Jun 9, 2015

albertnetymk commented Jul 4, 2015

aslatter commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

albertnetymk commented Jul 5, 2015

albertnetymk commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

albertnetymk commented Jul 5, 2015

mrkkrp commented Jul 5, 2015

mrkkrp commented Jul 7, 2015

albertnetymk commented Jul 7, 2015

mrkkrp commented Jul 30, 2015

albertnetymk commented Jul 30, 2015