-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readInt does not check for overflow. #144
Comments
I also ran into this, this is problematic. My opinion is that we should keep the current behaviour, document it, and add a new function that returns |
I'm using this as a temporary, probably slower workaround now: -- | Parses a `ByteString` into an Integral.
--
-- Uses "Data.ByteString.Char8" so it cannot handle multi-byte encodings.
--
-- As opposed to `BS8.readIntegral`, it has overflow detection.
-- See https://github.com/haskell/bytestring/issues/144
readIntegral :: forall a . (Integral a, Bounded a) => ByteString -> Either String a
readIntegral str = case BS8.readInteger str of
Nothing -> Left "unexpected non-integral"
Just (n, rest)
| not (BS8.null rest) -> Left "trailing text behind integral"
| n > fromIntegral (maxBound :: a) -> Left "integral out of range (too large)"
| n < fromIntegral (minBound :: a) -> Left "integral out of range (too small)"
| otherwise -> Right (fromIntegral n) I found this still faster than going via |
I do not think it is a bug, sounds more like an expected behaviour. For example, > read "123456789012345678901234567890" :: Int
-4362896299872285998 One could use @vdukhovni what do you think? Cf. haskell-streaming/streaming-bytestring#29 |
I would argue that |
Haskell Report requires |
As you correctly pointed out, While a safer
So doing this correctly, with consistent behaviour between the strict and lazy interfaces is non-trivial, especially if one also wants to retain most of the performance of the code path that does not check for overflow (in the reasonable expectation that in many cases the data is known to be free of problem inputs). |
Returning to your question after handling an edge case in that PR that this helped me realise (thanks!), I should perhaps mention that the original However, since it never did overflow, there's an opportunity here, to keep that behaviour and return |
I have opened haskell-streaming/streaming-bytestring#31 which implements overflow checks (for decimal |
Now that haskell-streaming/streaming-bytestring#31 is about to be merged, my take is that yes, |
I wonder whether replacing existing |
Is a My best guess is that the main users affected would be users running CI-tests on edge-cases, who artificially create unexpected inputs, and see how the code behaves. No idea whether either of these is an issue for anyone. [ Has 0.11 been released yet? If not, perhaps do this now, for 0.11? ] |
Following our discussion about branching in #241, I am +1 for porting haskell-streaming/streaming-bytestring#31 to |
Should I wait for more |
👍 from me too. |
I have a question about the desired behaviour of Should the same sort of cut-off be implemented for lazy ByteString? The idea is the same, don't spin forever reading ridiculous quantities from a stream (that even be unbounded), but it is not as obviously applicable to lazy bytestrings where I/O is either absent entirely (just a "rope" in memory), or implicit (interleaved lazy I/O). What should the port do? [ Note: "fails", means returns (Nothing, original input) ] |
I've opened #309. Pending feedback it keeps the "reasonable" bound on the number of leading zeros. This bound can be removed if it is deemed not useful for lazy ByteStrings. |
One more thing to think about is whether there should be a variant for reading unsigned |
I do not have a strong opinion here. On one side, this is an additional divergence from the current behaviour. On the other side, I struggle to imagine a legitimate use case for 32k-long sequence of zeros representing
I often work with |
And even for
That can be done, but my point is perhaps also that there may be existing users who are currently using One might make the case, that the new function needs a new name: |
Dunno, it could potentially be a fixed-width column of giant integers.
Writing
I'm very much against having functions with the same type signature, similar names, but only slightly different semantics. I think it is acceptable to make a function "safer" in a major release. |
I agree that making things safer is generally good, but I'm not a fan of API changes that lead to only runtime errors and not compile-time errors... :-( |
In a complex system how is one expected to know that all the call sites are dealt with, including call sites in some library one is using... |
One way forward is to rename the new function, and leave a deprecated alias in place. That way you get a compile-time warning when using the old name, but we don't end up with the two similar function problem: -- new
-- | ... Document overflow checks, and perhaps mention a new safeReadWord...
safeReadInt = ...
-- old
{-# DEPRECATED readInt, changed to check integer bounds, use safeReadInt instead #-}
readInt = safeReadInt Thoughts? Anyone else? |
What next steps would you like to see for this PR? |
Well, this is a balancing act.
That said, I find it acceptable to make
What would happen when
Sorry, got a tough week. I'll review it over the weekend. |
Could somebody in the knows summarise the outocme of the rather long PR, or perhaps link me to a changelog that says what came out of this? Thanks! |
the The only downside is that Word values larger than |
@vdukhovni That sounds great and is exactly what I'd expect from that function name, thank you! |
readInt
will happily consume as many digits as it can, without checking if the result is actually representable as an int. This seems like a straightforward correctness bug, and there are a few possible alternatives here:error
. The most informative option, but this does not really feel like an exceptional condition, and this choice might break existing code.We could also provide an alternative function which returns a structured result.
Also, I think readInt and readInteger are significantly more complicated than they need to be, and readInteger should probably work in int-sized chunks for performance.
The text was updated successfully, but these errors were encountered: