Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode: Use assert to ensure unsafety constraints #583

Conversation

chrysn
Copy link
Contributor

@chrysn chrysn commented Jan 31, 2020

The unsafe from_utf8_unchecked call in decude_utf8_lossy relies on bytes to contain only valid bytes.

The from_utf8_lossy function used in there does not guarantee that it never returns a Borrowed() with foreigh data -- it currently does not, but it might (without violating its type constraints, and technically without even changing the documentation) start returning a Borrowed(&'static "�") if the complete string is deemed invalid.

With no guarantees that this does not happen, and an unsafe that relies on them, a strong assert is required, rather than the lax debug_assert that'd only trigger in debug builds.

If it is anticipated that from_utf_lossy would actually ever change in such a way, it should be possible to rewrite the code to check the condition inside the match branch (which would need to clone that slice then, as it couldn't know that there's actually a &'static in there) -- but I consider this unlikely.

The unsafe from_utf8_unchecked function relies on bytes to contain only
valid bytes.

The from_utf8_lossy function does not guarantee that it never returns a
Borrowed() with foreigh data -- it currently does not, but it might
(without violating its type constraints, and technically without even
changing the documentation) start returning a `Borrowed(&'static "�")`
if the complete string is deemed invalid.

With no guarantees that this does not happen, and an unsafe that relies
on them, a strong assert is required, rather than the lax debug_assert
that'd only trigger in debug builds.
@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably 3b587ea) made this pull request unmergeable. Please resolve the merge conflicts.

@djc
Copy link
Contributor

djc commented Aug 24, 2020

@JoshMcguigan, @Shnatsel, any opinions on this change as a follow-up to #560? Perhaps an alternative route would be to get the documentation for String::from_utf8_lossy() updated to make a slightly stronger guarantee.

@Shnatsel
Copy link

I don't see how the described change to the behavior could lead to a memory safety issue. AFAIU the only invariant asserted by the unsafe block is that the borrowed string is valid UTF-8, and String::from_utf8_lossy is already guaranteed to return valid UTF-8 regardless.

So the debug assertion is not future-proof, but the unsafe block is.

(That said, the control flow with an early return inside a match statement is a little hard to follow. Getting rid of the early return would improve clarity.)

@Shnatsel
Copy link

There is a copy of this function with better comments: https://github.com/servo/rust-url/pull/560/files#diff-8dc09b48ae911f5a9aec0b3be41f59af

@djc
Copy link
Contributor

djc commented Aug 24, 2020

Yeah, @Shnatsel's assessment makes sense to me: whether Borrowed or Owned, String::from_utf8_lossy() does guarantee that it returns valid UTF-8, and thus using String::from_utf8_unchecked() is safe here.

@djc djc closed this Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants