-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add C-based isAscii :: Text -> Bool
#497
Conversation
I am almost certain my FFI code is buggy. I don't well understand pinned/unpinned memory. But I'm unable to test on my machine (build issues). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and you probably want to add tests.
Fixed my build issues, will do. Thanks a ton for the help with the fiddly bits. |
Fixed an error in the C code (end of text data is at |
Something to note: |
isAscii :: Text -> Bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good to me!
Another pair of eyes (@phadej?) to go over the FFI bits again would be nice.
There's the @since
TODOs, I think this can make it into 2.0.2.
Looks good to me, would be nice to have a test for the edge cases to make sure the offset is working as expected i.e. make sure this fails on text which fails to be ascii on the boundary with non-zero offset (looks correct, but it is useful to have such tests regardless). |
OK. I added a single test for that: isAscii_border :: IO ()
isAscii_border = do
-- ASCII prefix ends at position 3 (from 0)
let text = T.pack "123一二三"
text' = case text of T.Text arr off _len -> T.Text arr (off+1) 3
assertBool "UTF-8 string with ASCII prefix ending at last position incorrectly detected as ASCII" $ not $ T.isAscii text' We're really testing the behaviour of |
Thanks! |
Now that
Text
s are represented internally as a UTF-8 bytestring, we can provide a fastisAscii :: Text -> Bool
that inspects the bytestring directly, rather thanChar
byChar
. Such a function can come in handy in serialization libraries, and can't easily be written by an end user. Plus, the C snippet already exists, so there's not much to do.