Prevent potential segfault in decodeUtf8With #211 #212

snoyberg · 2017-12-29T04:19:19Z

I considered documenting the corner case in the Haddocks for decodeUtf8With, but thought the corner case may be more confusing than not, and didn't want it to block this PR. If you'd prefer something be added, I'll be happy to do so.

snoyberg · 2017-12-29T06:15:25Z

Another approach that would work here would be to have an unsafeWrite1 or similar function, and throw an exception if the value is above '\ffff'. Since that conditional is already being checked in unsafeWrite, that should be no runtime performance impact for the common case.

snoyberg · 2017-12-29T06:26:43Z

Build failures appear to be the same as those on master.

hvr

Thanks, but of the 3 general directions I considered for fixing this, you picked a 4th and unfortunately a least desirable one; basically the 3 general ways I considered (plus yours) are :

Total correctness: Support the full range of replacement characters as advertised by the type-sig
Partial correctness: Support only BMP for replacement characters; throw an exception if an invalid replacement char is returned; i.e. fail non-silently
Total with silent truncation: Silently remap non-BMP replacement characters into the BMP range somehow; but don't break the internal invariant of Text
Total incorrectness: Support only BMP, but break internal invariant of Text if non-BMP replacements occur.

Ideally, I'd prefer 1. as it's the most principled one, but that one requires either more complexity or overallocation ; I don't like 3. because it silently fails and can hide incorrect API usage, but at least it doesn't break Text's internal invariant;

TLDR: short term I intended to go with 2. and update the documentation.

snoyberg · 2017-12-29T10:47:51Z

I don't have a problem with (2), I considered that approach also. But to be sure we're on the same page: how does the current approach break the internal invariants? I'm not intimately familiar with the invariants of Text, I'd like to know where I screwed up. I thought my implementation was actually (3).

…

On Fri, Dec 29, 2017 at 10:27 AM, Herbert Valerio Riedel < ***@***.***> wrote: ***@***.**** requested changes on this pull request. Thanks, but of the 3 general directions I considered for fixing this, you picked a 4th and unfortunately a least desirable one; basically the 3 general ways I considered (plus yours) are : 1. Total correctness: Support the full range of replacement characters as advertised by the type-sig 2. Partial correctness: Support only BMP for replacement characters; throw an exception if an invalid replacement char is returned; i.e. fail non-silently 3. Total with silent truncation: Silently remap non-BMP replacement characters into the BMP range somehow; but don't break the internal invariant of Text 4. Total incorrectness: Support only BMP, but break internal invariant of Text if non-BMP replacements occur. Ideally, I'd prefer 1. as it's the most principled one, but that one requires either more complexity or overallocation ; I don't like 3. because it silently fails and can hide incorrect API usage, but at least it doesn't break Text's internal invariant; TLDR: short term I intended to go with 2. and update the documentation. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#212 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADBB_k4B5-QL2iw4ySSd0BPTcUaORPbks5tFKJbgaJpZM4RO2N1> .

hvr · 2017-12-29T17:29:35Z

how does the current approach break the internal invariants?

It allows invalid UTF-16 code units to be written into the Text buffer

snoyberg · 2017-12-30T19:46:55Z

I've pushed an update which implements (2) instead.

bgamari · 2018-01-21T20:46:12Z

What is the status of this?

This also also makes the testsuite compatible w/ QC 2.10 and consequently closes #211 and #212

snoyberg mentioned this pull request Dec 29, 2017

Restrictive upper bound on QuickCheck 2.10.1 hides potential segfault #211

Closed

1 task

hvr suggested changes Dec 29, 2017

View reviewed changes

snoyberg force-pushed the master branch from 5d297f2 to 4f1bf5d Compare December 30, 2017 19:46

Prevent potential segfault in decodeUtf8With haskell#211

bed05e4

snoyberg force-pushed the master branch from 4f1bf5d to bed05e4 Compare December 30, 2017 19:49

chshersh mentioned this pull request Feb 6, 2018

Add tests to universum serokell/universum#36

Closed

hvr pushed a commit that referenced this pull request Aug 28, 2018

Extend tutf8_err testcases to cover ab90c65

44ec2ce

This also also makes the testsuite compatible w/ QC 2.10 and consequently closes #211 and #212

hvr closed this Aug 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent potential segfault in decodeUtf8With #211 #212

Prevent potential segfault in decodeUtf8With #211 #212

Uh oh!

snoyberg commented Dec 29, 2017

Uh oh!

snoyberg commented Dec 29, 2017

Uh oh!

snoyberg commented Dec 29, 2017

Uh oh!

hvr left a comment

Uh oh!

snoyberg commented Dec 29, 2017 via email

Uh oh!

hvr commented Dec 29, 2017

Uh oh!

snoyberg commented Dec 30, 2017

Uh oh!

bgamari commented Jan 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Prevent potential segfault in decodeUtf8With #211 #212

Prevent potential segfault in decodeUtf8With #211 #212

Uh oh!

Conversation

snoyberg commented Dec 29, 2017

Uh oh!

snoyberg commented Dec 29, 2017

Uh oh!

snoyberg commented Dec 29, 2017

Uh oh!

hvr left a comment

Choose a reason for hiding this comment

Uh oh!

snoyberg commented Dec 29, 2017 via email

Uh oh!

hvr commented Dec 29, 2017

Uh oh!

snoyberg commented Dec 30, 2017

Uh oh!

bgamari commented Jan 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants