Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upEscape fewer Unicode codepoints in `Debug` impl of `str` #34485
Conversation
rust-highfive
assigned
brson
Jun 26, 2016
This comment has been minimized.
This comment has been minimized.
|
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
|
Looks like run-pass/ifmt.rs is failing on travis. |
This comment has been minimized.
This comment has been minimized.
|
Is this changing |
This comment has been minimized.
This comment has been minimized.
|
It is changing that function. Why is it a breaking change? |
This comment has been minimized.
This comment has been minimized.
|
It's a stable function and this will break people's code which relies on the current behaviour. |
This comment has been minimized.
This comment has been minimized.
|
/cc @rust-lang/libs and @rust-lang/lang On Jun 26, 2016, 18:04 -0400, Oliver Middletonnotifications@github.com, wrote:
|
This comment has been minimized.
This comment has been minimized.
|
The documentation explicitly states that any character that is not in the printable ASCII range |
This comment has been minimized.
This comment has been minimized.
|
@ranma42 Agreed. I don't think we can change this. It's very clearly changing the contract of the function. |
This comment has been minimized.
This comment has been minimized.
|
@ranma42 OK, assume for now that we don't change that function, but only the |
This comment has been minimized.
This comment has been minimized.
|
Didn't these exact conversations happen before? Was a previous attempt abandoned? |
This comment has been minimized.
This comment has been minimized.
|
I don't know, I don't remember any. |
This comment has been minimized.
This comment has been minimized.
|
I can't find it. Might have been |
This comment has been minimized.
This comment has been minimized.
|
@tbu- , yes, I think that should work. We might also want to expose it as a function on |
brson
added
T-libs
I-nominated
labels
Jun 27, 2016
This comment has been minimized.
This comment has been minimized.
|
Not sure what I think of this. The medium that |
This comment has been minimized.
This comment has been minimized.
|
Actually, the output of I would guess that the reason for this is that the people implementing it didn't need non-ASCII characters, and I mean if you don't need them they're just a nuisance. But if you're implementing a non-English program, then it basically makes the If you write to a device that doesn't support UTF-8, you should just escape these characters later, when writing to said device -- like the |
This comment has been minimized.
This comment has been minimized.
A possible objection is that
which seems to imply that it should not be exposed to the users, but rather to tools or developers. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@tbu- not really, in that case he is the user of the |
This comment has been minimized.
This comment has been minimized.
|
@ranma42 It's a runtime error provided by the operating system, encountered while programming. EDIT: Also, you could probably look into the PEP, they also give a longer motivation in there. |
This comment has been minimized.
This comment has been minimized.
I'd rather they don't do these (mostly useless) format/escape for me (a programmer). They @tbu- Maybe we can't change |
This comment has been minimized.
This comment has been minimized.
|
@tbu- It is a runtime error provided by the operating system, encountered by |
This comment has been minimized.
This comment has been minimized.
If this was the reason, we should also implement |
This comment has been minimized.
This comment has been minimized.
That'd be a big breaking change. Not all |
This comment has been minimized.
This comment has been minimized.
|
@liigo Yes, that would be a major breaking change (it would change the constraints on the |
This comment has been minimized.
This comment has been minimized.
|
To me, the major advantage of the current implementation of Of course this does not mean that it should be used for everything. Specifically, I would only use #34318 shows an example where using Even though Rust does not (yet) have its own localised error messages, it would not be hard to imagine the same issue affecting other types of output, so it might be a good idea to think of a more general solution to ensure a way forward in this direction. |
This comment has been minimized.
This comment has been minimized.
|
@ranma42 If you want to see the exact code points, why only make an exception for English? That's very English-centric. :) EDIT: Imagine the We should probably provide a function that does the same as |
This comment has been minimized.
This comment has been minimized.
This is not advantage for non-ASCII text. It just makes unreadable noise ( |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@bors: r- |
tbu-
force-pushed the
tbu-:pr_unicode_debug_str
branch
from
f9bf85d
to
3d09b4a
Jul 28, 2016
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton It should be fixed now. |
This comment has been minimized.
This comment has been minimized.
alexcrichton
added
the
relnotes
label
Jul 28, 2016
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Jul 28, 2016
This comment has been minimized.
This comment has been minimized.
bors
merged commit 3d09b4a
into
rust-lang:master
Jul 28, 2016
This was referenced Jul 28, 2016
SimonSapin
added a commit
to SimonSapin/rust-std-candidates
that referenced
this pull request
Aug 2, 2016
bluss
referenced this pull request
Nov 15, 2016
Closed
`fmt::Debug` should not escape printable characters #24588
radix
pushed a commit
to radix/string-wrapper
that referenced
this pull request
Jan 13, 2017
This comment has been minimized.
This comment has been minimized.
|
I’m very late to say this, but this adds 2102 bytes of static data to libcore, whereas previously all large Unicode tables were in the |
aturon
referenced this pull request
Feb 3, 2017
Open
Escaping `char` in libcore adds 2k of static data for no_std cases #39492
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin I opened #39492 for this issue, and to propose a general policy. |
tbu-
referenced this pull request
Jun 23, 2017
Closed
Tracking issue for the functions for debug escaping `char_escape_debug` #35068
This comment has been minimized.
This comment has been minimized.
|
I’d like to know what the code do precisely (what are |
This comment has been minimized.
This comment has been minimized.
|
@ariasuni The commit message and PR message give a list of Unicode categories of characters that are escape, but yes if it’s not already this list should also be in some doc-comment in the code. I agree that I’d prefer to have these tables in |
This comment has been minimized.
This comment has been minimized.
|
@ariasuni The high-level view is that you have to store The low-level view of this particular implementation seems to have changed since I have implemented it, you can find some notes on the new one in 44bcd26. |
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin We can split |
This comment has been minimized.
This comment has been minimized.
|
@ariasuni even if we did that, that doesn’t solve the situation that a |
This comment has been minimized.
This comment has been minimized.
|
Sorry, I probably overthought it. The idea is to put the code for Unicode categories elsewhere so that we can use it in |
This comment has been minimized.
This comment has been minimized.
|
I don’t understand how this would reduce code size, I may be missing something. |
This comment has been minimized.
This comment has been minimized.
|
If I understand correctly, |
tbu- commentedJun 26, 2016
Use the same procedure as Python to determine whether a character is
printable, described in PEP 3138. In particular, this means that the
following character classes are escaped:
' '0x20This allows for user-friendly inspection of strings that are not
English (e.g. compare
"\u{e9}\u{e8}\u{ea}"to"éèê").Fixes #34318.
CC #34422.