Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result for decodeUtf8 of "\194" ByteString with text 1.0 #61

Closed
snoyberg opened this issue Dec 29, 2013 · 6 comments
Closed

Incorrect result for decodeUtf8 of "\194" ByteString with text 1.0 #61

snoyberg opened this issue Dec 29, 2013 · 6 comments

Comments

@snoyberg
Copy link
Contributor

@snoyberg snoyberg commented Dec 29, 2013

Consider the following code:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text.IO as TIO
import Data.Text.Encoding (decodeUtf8)

main :: IO ()
main = TIO.putStrLn $ decodeUtf8 "\194"

The behavior with text pre-1.0, and the expected behavior, is that it throws an exception, in particular:

text-bug.hs: Cannot decode byte '\xc2': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream

Beginning with release 1.0, this prints out an empty string. This behavior was initially reported as a bug in conduit's text decoding at snoyberg/conduit#127.

@bos
Copy link
Collaborator

@bos bos commented Dec 30, 2013

This bisects back to aca0971 which was written by @bgamari. Sorry for the regression, I'm looking into it now.

@snoyberg
Copy link
Contributor Author

@snoyberg snoyberg commented Dec 30, 2013

Thanks Bryan, let me know if I can be of any assistance.

bos added a commit that referenced this issue Dec 30, 2013
This test currently fails due to gh-61.

--HG--
extra : amend_source : ca66a1e6503a0cb9cf6cf5f2b82f2199133a6512
bos added a commit that referenced this issue Dec 30, 2013
@bos
Copy link
Collaborator

@bos bos commented Dec 30, 2013

OK, please test the 1.0 branch. It works for me. If it looks good to you, I'll release it.

The fix is in 7c09f3c, and a much improved test case is in 9c11b44. The new test case reveals the regression that was introduced in aca0971, and is fixed by the alleged fix :-)

@bos bos closed this in 7c09f3c Dec 30, 2013
@snoyberg
Copy link
Contributor Author

@snoyberg snoyberg commented Dec 30, 2013

I can confirm that with the 1.0 branch, the conduit test suite passes, so I believe the issue is resolved. Thank you!

@bos
Copy link
Collaborator

@bos bos commented Dec 30, 2013

OK, it's up: http://hackage.haskell.org/package/text-1.0.0.1

A couple of last invalid UTF-8 generators added, too, for good measure, in 494d7d9.

@AnneTheAgile
Copy link

@AnneTheAgile AnneTheAgile commented Mar 10, 2014

Cross ref to blog post; http://www.serpentine.com/blog/2013/12/30/testing-a-utf-8-decoder-with-vigour/ (pet peeve, I wish github showed dates, not 'ago' timestamps... 2013-12-30 is not 2 months ago..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants