Skip to content

Conversation

snoyberg
Copy link
Contributor

I considered documenting the corner case in the Haddocks for decodeUtf8With, but thought the corner case may be more confusing than not, and didn't want it to block this PR. If you'd prefer something be added, I'll be happy to do so.

@snoyberg
Copy link
Contributor Author

Another approach that would work here would be to have an unsafeWrite1 or similar function, and throw an exception if the value is above '\ffff'. Since that conditional is already being checked in unsafeWrite, that should be no runtime performance impact for the common case.

@snoyberg
Copy link
Contributor Author

Build failures appear to be the same as those on master.

Copy link
Member

@hvr hvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but of the 3 general directions I considered for fixing this, you picked a 4th and unfortunately a least desirable one; basically the 3 general ways I considered (plus yours) are :

  1. Total correctness: Support the full range of replacement characters as advertised by the type-sig
  2. Partial correctness: Support only BMP for replacement characters; throw an exception if an invalid replacement char is returned; i.e. fail non-silently
  3. Total with silent truncation: Silently remap non-BMP replacement characters into the BMP range somehow; but don't break the internal invariant of Text
  4. Total incorrectness: Support only BMP, but break internal invariant of Text if non-BMP replacements occur.

Ideally, I'd prefer 1. as it's the most principled one, but that one requires either more complexity or overallocation ; I don't like 3. because it silently fails and can hide incorrect API usage, but at least it doesn't break Text's internal invariant;

TLDR: short term I intended to go with 2. and update the documentation.

@snoyberg
Copy link
Contributor Author

snoyberg commented Dec 29, 2017 via email

@hvr
Copy link
Member

hvr commented Dec 29, 2017

how does the current approach break the internal invariants?

It allows invalid UTF-16 code units to be written into the Text buffer

@snoyberg
Copy link
Contributor Author

I've pushed an update which implements (2) instead.

@bgamari
Copy link
Contributor

bgamari commented Jan 21, 2018

What is the status of this?

hvr pushed a commit that referenced this pull request Aug 28, 2018
This also also makes the testsuite compatible w/ QC 2.10
and consequently closes #211 and #212
@hvr hvr closed this Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants