Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting/debug of OS strings #22766

Closed
SimonSapin opened this issue Feb 24, 2015 · 9 comments
Closed

Formatting/debug of OS strings #22766

SimonSapin opened this issue Feb 24, 2015 · 9 comments
Labels
C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@SimonSapin
Copy link
Contributor

OsStr implements fmt::Debug:

  • On Unix, using to_lossy_str, then fmt::Debug::fmt on the resulting String, which escapes all control or non-ASCII characters.
  • On Windows, it eventually calls an impl I originally wrote for rust-wft8, which is lossless but only escapes surrogate code points. In particular, it happily prints control characters.

So there are two issues: they should be consistent across platforms, and is it OK for a Debug impl to be lossy?

CC @aturon, @alexcrichton

@SimonSapin
Copy link
Contributor Author

Also, std::path::Display is pointless if impl fmt::Debug for Path does the same.

@alexcrichton
Copy link
Member

they should be consistent across platforms

Yes, I think this was just an oversight in the initial implementation.

is it OK for a Debug impl to be lossy?

Yes.

Also, std::path::Display is pointless if impl fmt::Debug for Path does the same.

They're subtly different in that Debug will place quotes around Path but Display does not. Everything basically uses to_string_lossy but the Debug vs Display distinction is whether the Debug or Display implementation is used on the resulting String

@SimonSapin
Copy link
Contributor Author

Ok. Then impl fmt::Debug for Wtf8 should be changed to use to_string_lossy.

@huonw huonw added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 8, 2016
@huonw
Copy link
Member

huonw commented Jan 8, 2016

Is all that is required to close this issue just changing

fn fmt(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
fn write_str_escaped(f: &mut fmt::Formatter, s: &str) -> fmt::Result {
use fmt::Write;
for c in s.chars().flat_map(|c| c.escape_default()) {
try!(f.write_char(c))
}
Ok(())
}
try!(formatter.write_str("\""));
let mut pos = 0;
loop {
match self.next_surrogate(pos) {
None => break,
Some((surrogate_pos, surrogate)) => {
try!(write_str_escaped(
formatter,
unsafe { str::from_utf8_unchecked(
&self.bytes[pos .. surrogate_pos]
)},
));
try!(write!(formatter, "\\u{{{:X}}}", surrogate));
pos = surrogate_pos + 3;
}
}
}
try!(write_str_escaped(
formatter,
unsafe { str::from_utf8_unchecked(&self.bytes[pos..]) },
));
formatter.write_str("\"")
}
to something like fmt::Debug::fmt(&self.to_string_lossy(), formatter)?

(I'm tagging as E-easy under that expectation, but it'd be nice to have confirmation.)

@huonw huonw added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Jan 8, 2016
@retep998
Copy link
Member

retep998 commented Jan 8, 2016

I'd personally prefer if the debug implementation on Windows was lossless and had escaped lone surrogates since if I do encounter an OsString that had lone surrogates I'd really like to know what they are and Debug is the tool I'd reach for first.

@brson
Copy link
Contributor

brson commented Jun 27, 2016

So perhaps we should take the existing code that escapes the surrogates and wrap that in another call to Debug to escape the rest?

@brson brson added I-needs-decision Issue: In need of a decision. and removed E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. labels Feb 24, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 22, 2017
@dtolnay dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. I-needs-decision Issue: In need of a decision. labels Nov 17, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 17, 2017

In my opinion:

  • The Debug representation should be consistent across platforms, especially in whether control characters are escaped.
  • In general I am perfectly okay with Debug representations being lossy, but in the case of OS strings I believe we can come up with a more helpful representation than to_string_lossy. I agree with @retep998 that pretty much any time you are debugging something with lone surrogates, you really want Debug to show you what they are.

I would be prepared to consider a PR with these changes, and I think we can go from there.

bors added a commit that referenced this issue Dec 18, 2017
Add lossless debug implementation for unix OsStrs

Fixes #22766

Invalid utf8 byte sequences are replaced with `\xFF` style escape codes, while valid utf8 goes through the normal `Debug` implementation.

This is necessarily different from the windows Debug implementation, which uses `\u{xxxx}` style escape sequences for unpaired surrogates, but both implementations are consistent in that they are both lossless, and display invalid sequences in the way most similar to existing language syntax.

r? @dtolnay
@retep998
Copy link
Member

Does the Windows implementation currently print lone surrogates? #46798 only changed how OsStr is printed on Unix platforms.

@Diggsey
Copy link
Contributor

Diggsey commented Dec 18, 2017

@retep998 the windows implementation always printed lone surrogates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants