New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result for decodeUtf8 of "\194" ByteString with text 1.0 #61

Closed
snoyberg opened this Issue Dec 29, 2013 · 6 comments

Comments

Projects
None yet
3 participants
@snoyberg
Contributor

snoyberg commented Dec 29, 2013

Consider the following code:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text.IO as TIO
import Data.Text.Encoding (decodeUtf8)

main :: IO ()
main = TIO.putStrLn $ decodeUtf8 "\194"

The behavior with text pre-1.0, and the expected behavior, is that it throws an exception, in particular:

text-bug.hs: Cannot decode byte '\xc2': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream

Beginning with release 1.0, this prints out an empty string. This behavior was initially reported as a bug in conduit's text decoding at snoyberg/conduit#127.

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos Dec 30, 2013

Collaborator

This bisects back to aca0971 which was written by @bgamari. Sorry for the regression, I'm looking into it now.

Collaborator

bos commented Dec 30, 2013

This bisects back to aca0971 which was written by @bgamari. Sorry for the regression, I'm looking into it now.

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Dec 30, 2013

Contributor

Thanks Bryan, let me know if I can be of any assistance.

Contributor

snoyberg commented Dec 30, 2013

Thanks Bryan, let me know if I can be of any assistance.

bos added a commit that referenced this issue Dec 30, 2013

Ensure that t_utf8_err gets fed *only* invalid UTF-8 inputs
This test currently fails due to gh-61.

--HG--
extra : amend_source : ca66a1e6503a0cb9cf6cf5f2b82f2199133a6512

bos added a commit that referenced this issue Dec 30, 2013

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos Dec 30, 2013

Collaborator

OK, please test the 1.0 branch. It works for me. If it looks good to you, I'll release it.

The fix is in 7c09f3c, and a much improved test case is in 9c11b44. The new test case reveals the regression that was introduced in aca0971, and is fixed by the alleged fix :-)

Collaborator

bos commented Dec 30, 2013

OK, please test the 1.0 branch. It works for me. If it looks good to you, I'll release it.

The fix is in 7c09f3c, and a much improved test case is in 9c11b44. The new test case reveals the regression that was introduced in aca0971, and is fixed by the alleged fix :-)

@bos bos closed this in 7c09f3c Dec 30, 2013

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Dec 30, 2013

Contributor

I can confirm that with the 1.0 branch, the conduit test suite passes, so I believe the issue is resolved. Thank you!

Contributor

snoyberg commented Dec 30, 2013

I can confirm that with the 1.0 branch, the conduit test suite passes, so I believe the issue is resolved. Thank you!

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos Dec 30, 2013

Collaborator

OK, it's up: http://hackage.haskell.org/package/text-1.0.0.1

A couple of last invalid UTF-8 generators added, too, for good measure, in 494d7d9.

Collaborator

bos commented Dec 30, 2013

OK, it's up: http://hackage.haskell.org/package/text-1.0.0.1

A couple of last invalid UTF-8 generators added, too, for good measure, in 494d7d9.

@AnneTheAgile

This comment has been minimized.

Show comment
Hide comment
@AnneTheAgile

AnneTheAgile Mar 10, 2014

Cross ref to blog post; http://www.serpentine.com/blog/2013/12/30/testing-a-utf-8-decoder-with-vigour/ (pet peeve, I wish github showed dates, not 'ago' timestamps... 2013-12-30 is not 2 months ago..)

AnneTheAgile commented Mar 10, 2014

Cross ref to blog post; http://www.serpentine.com/blog/2013/12/30/testing-a-utf-8-decoder-with-vigour/ (pet peeve, I wish github showed dates, not 'ago' timestamps... 2013-12-30 is not 2 months ago..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment