-
Notifications
You must be signed in to change notification settings - Fork 159
Prevent potential segfault in decodeUtf8With #211 #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Another approach that would work here would be to have an |
Build failures appear to be the same as those on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but of the 3 general directions I considered for fixing this, you picked a 4th and unfortunately a least desirable one; basically the 3 general ways I considered (plus yours) are :
- Total correctness: Support the full range of replacement characters as advertised by the type-sig
- Partial correctness: Support only BMP for replacement characters; throw an exception if an invalid replacement char is returned; i.e. fail non-silently
- Total with silent truncation: Silently remap non-BMP replacement characters into the BMP range somehow; but don't break the internal invariant of
Text
- Total incorrectness: Support only BMP, but break internal invariant of
Text
if non-BMP replacements occur.
Ideally, I'd prefer 1. as it's the most principled one, but that one requires either more complexity or overallocation ; I don't like 3. because it silently fails and can hide incorrect API usage, but at least it doesn't break Text
's internal invariant;
TLDR: short term I intended to go with 2. and update the documentation.
I don't have a problem with (2), I considered that approach also. But to be
sure we're on the same page: how does the current approach break the
internal invariants? I'm not intimately familiar with the invariants of
Text, I'd like to know where I screwed up. I thought my implementation was
actually (3).
…On Fri, Dec 29, 2017 at 10:27 AM, Herbert Valerio Riedel < ***@***.***> wrote:
***@***.**** requested changes on this pull request.
Thanks, but of the 3 general directions I considered for fixing this, you
picked a 4th and unfortunately a least desirable one; basically the 3
general ways I considered (plus yours) are :
1. Total correctness: Support the full range of replacement characters
as advertised by the type-sig
2. Partial correctness: Support only BMP for replacement characters;
throw an exception if an invalid replacement char is returned; i.e. fail
non-silently
3. Total with silent truncation: Silently remap non-BMP replacement
characters into the BMP range somehow; but don't break the internal
invariant of Text
4. Total incorrectness: Support only BMP, but break internal invariant
of Text if non-BMP replacements occur.
Ideally, I'd prefer 1. as it's the most principled one, but that one
requires either more complexity or overallocation ; I don't like 3. because
it silently fails and can hide incorrect API usage, but at least it doesn't
break Text's internal invariant;
TLDR: short term I intended to go with 2. and update the documentation.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#212 (review)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AADBB_k4B5-QL2iw4ySSd0BPTcUaORPbks5tFKJbgaJpZM4RO2N1>
.
|
It allows invalid UTF-16 code units to be written into the |
I've pushed an update which implements (2) instead. |
What is the status of this? |
I considered documenting the corner case in the Haddocks for
decodeUtf8With
, but thought the corner case may be more confusing than not, and didn't want it to block this PR. If you'd prefer something be added, I'll be happy to do so.