Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467

Centril · 2018-10-29T10:27:17Z

Manishearth · 2018-10-29T10:31:12Z

last unresolved question isn't a real unresolved question, it was included in the RFC for completeness but does not block this issue.

Centril · 2018-10-29T10:31:45Z

@joshtriplett Please check that the list of checkboxes above are satisfactory. :)

@Manishearth alright; leave a note under it to that effect?

Manishearth · 2018-10-29T10:32:40Z

The note saying so is already in the unresolved q

8573 · 2018-10-29T18:53:23Z

Is there a better name for the less_used_codepoints lint?

Substituting "rare" or "unusual" for "less used" seems to me a simple, if not necessarily final, improvement, replacing the somewhat awkward "less used" with a single, shorter, more usual synonym.

(Edit: I note that I personally oppose allowing non-ASCII identifiers, but I recognize that the Rust Team favors it, and I have no problem bowing to their decision and chipping in my cents to help.)

Manishearth · 2018-10-29T19:36:40Z

I like "unusual" -Manish Goregaokar

…

On Mon, Oct 29, 2018 at 6:56 PM 8573 ***@***.***> wrote: Is there a better name for the less_used_codepoints lint? Substituting "rare" or "unusual" for "less used" seems to me a simple, if not necessarily final, improvement, replacing the somewhat awkward "less used" with a single, shorter, more usual synonym. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#55467 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABivSBq4pwBDJPCioj_7Jlu_fx5eoCRNks5up09wgaJpZM4X-3kG> .

Serentty · 2018-10-29T20:03:20Z

I would prefer “rare” as it sounds more objective to me than “unusual”, and perhaps less judgemental as well.

eaglgenes101 · 2018-11-01T02:26:14Z

My first thought was "uncommon", but that's not strong enough of an adjective to get the intended meaning across.

Centril · 2018-11-01T02:29:38Z

I'm partial towards "rare" as well; rare_codepoints is pretty short and sweet.

glaebhoerl · 2018-11-01T08:14:34Z

If we need something even stronger we might try "mythic". 😛

eddyb · 2018-11-03T20:33:33Z

I prefer uncommon_codepoints as being more serious/boring than rare/legendary/mythic/etc.

8573 · 2018-11-04T02:20:15Z

@eddyb—

I prefer uncommon_codepoints as being more serious/boring than rare/legendary/mythic/etc.

As a native and competent English-user who generally is seen as serious/boring [1], while I agree that "legendary" and "mythic" sound rather fantastical [2], I don't think "rare" does. The distinction I would draw between uncommon_codepoints and rare_codepoints is that "rare" may be seen as stronger than "uncommon", at least in US English (Merriam-Webster and Wiktionary say so, whereas Oxford and Cambridge don't say so explicitly).

[1] (I recognize this may not have been a trait I displayed when I was a member of your channel.)
[2] (similarly to the Unicode term "astral plane", which, possibly relevantly, if I understand correctly, was changed to "supplementary plane")

Kixunil · 2018-11-05T09:31:35Z

I'm much against non-ascii identifiers, but I'm not going to repeat what other folks said. I just noticed that no example of malicious code was provided so far, so I'm providing one:

fn list_items_in_category(category: &str) -> io::Result<String> {
    let cate𝚐ory = sanitize_untrusted_input(category);
    debug!("Listing category {}", cate𝚐ory);
    system(format!("grep '^{},' /my/simple/database | awk -F , "{{ print $2 }}", category))
}

Who can spot the problem without looking at character codes?

As a side note, I'd like to provide my experience with attempts to localize everything (feel free to skip the rest of my comment if you want to remain technical). I went to a school where they translate literally every technical term. In an attempt to make everything understandable to everyone, they translated even things that are very difficult to translate reasonably.

You'd expect that it was much easier to learn at that school compared to others schools that don't do that, right? Well, life is weird. It was hard to understand, I felt like Alice in wonderland and it took me a week to realize that "that weird term I didn't hear before" was the actual thing I wanted to study and the very reason I signed up for that specific school!

Of course, this is not directly an argument against non-ascii identifiers. I just wanted to express my concerns to all those wonderful loving people (seriously) who want for everyone to feel great in Rust community, so that they remain vigilant and avoid accidentally going against their own beliefs.

glaebhoerl · 2018-11-05T09:36:46Z

while I agree that "legendary" and "mythic" sound rather fantastical [2], I don't think "rare" does.

(Agreed. My comment was entirely in jest (as I hope should've been evident?), and I had no intention of tarnishing "rare" by association, a word which is itself ordinary and common.)

eddyb · 2018-11-05T09:55:19Z

Who can spot the problem without looking at character codes?

It's trivial on my font. Also, we want to start with lints against this sort of thing.

ketsuban · 2018-11-05T09:55:30Z

Who can spot the problem without looking at character codes?

Github's monospace font configuration on my machine ended up using a binocular glyph for U+0067 LATIN SMALL LETTER G but a monocular one for U+1D690 MATHEMATICAL MONOSPACE SMALL G, so I spotted it instantly.

That said, the confusable_idents lint will once implemented almost certainly flag this code, since MATHEMATICAL MONOSPACE SMALL G → LATIN SMALL LETTER G is listed in confusables.txt.

varkor · 2018-11-05T10:45:32Z

The Oxford English Dictionary has as one definition of "uncommon":

Of an unusual type or character; exceptional in kind or quality

which seems especially appropriate 😉(Emphasis mine.)

I feel it's a more suitable choice of words than "rare", which has significantly more meanings than "uncommon", some associated specifically with being good, e.g. (also OED):

Unusually good, fine, or worthy; of uncommon excellence or merit.

I also think more reserved wording is generally appropriate for compiler naming conventions.

Manishearth · 2018-11-05T15:18:33Z

@Kixunil

but I'm not going to repeat what other folks said

That's precisely what you're doing -- your "counterexample" is caught by both the less_used_codepoints lint and the confusable_idents lint.

At this point these counterexample discussions have been done to death (and we're leveraging a unicode spec designed to deal with this!) -- please actually check if your "counterexample" isn't something we or the Unicode Consortium have thought of already.

Serentty · 2018-11-09T01:23:02Z

Exactly. People have thought a lot about this, and it's certainly possible to implement a feature that many people will find useful while dealing with complications that might arise. At this point, adding this feature is a given. If you have ideas about how to improve the lints for finding confusable identifiers, by all means share them, but there's no need to simply point out an issue that everyone is already aware of.

Centril · 2018-11-09T22:17:33Z

Re. less_used_codepoints: I propose that we bring the bikeshed to a halt in favor of uncommon_codepoints because "meh".

I was personally in favor of rare_codepoints but uncommon_codepoints is basically the same and also works for me so... 🤷‍♀️

Everyone OK with this?

Manishearth · 2018-11-09T22:22:32Z

works for me. Slight preference for rare but very slight. Both work for me, and I don't think it's really worth bikeshedding this too much :)

Kixunil · 2018-11-15T19:33:43Z

@Manishearth Sorry, I didn't mean to argue that the example is unsolved, I just wanted to provide actual code for those who might have difficulties imagining how to turn this feature into something malicious.

I wonder though, why confusables lint is not mandatory (according to RFC). It sounds to me like making borrow checker non-mandatory. My understanding is that Rust should be safe by default, where you can opt-into unsafety. For me, this means denying confusables in every crate I make, if I want to ensure high quality of my crates.

Manishearth · 2018-11-15T23:18:34Z

This isn't a matter of safety in the way that Rust describes it.

Making the warnings on by default has been discussed. Please let's not use this tracking issue to relitigate things which have already been decided through a rather long RFC.

rfcbot · 2021-03-30T17:13:58Z

🔔 This is now entering its final comment period, as per the review above. 🔔

ehuss · 2021-04-06T13:52:07Z

A concern has been raised in #83923 that extern blocks are not handled as described in the RFC. I would appreciate considering addressing that before this is stabilized. I suspect a validation check would be easy to add if that indeed should be rejected.

joshtriplett · 2021-04-06T19:07:36Z

Thanks for calling attention to that, @ehuss!

@rfcbot concern extern-blocks

Manishearth · 2021-04-07T18:05:46Z

@joshtriplett #83936 has merged now

joshtriplett · 2021-04-07T21:43:51Z

@rfcbot resolved extern-blocks

rfcbot · 2021-04-07T21:43:54Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2021-04-17T21:46:59Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

crlf0710 · 2021-04-18T10:25:15Z

Stablization PR is at #83799 .

… r=Manishearth Stablize `non-ascii-idents` This is the stablization PR for RFC 2457. Currently this is waiting on fcp in [tracking issue](rust-lang#55467). r? `@Manishearth`

…=Manishearth Stablize `non-ascii-idents` This is the stablization PR for RFC 2457. Currently this is waiting on fcp in [tracking issue](rust-lang#55467). r? `@Manishearth`

crlf0710 · 2021-04-19T02:04:26Z

Stablization PR has landed, closing.

ghost · 2021-06-05T04:58:59Z

Why not support Chinese names

Manishearth · 2021-06-05T05:10:13Z

@Mr-Zzg They are! See https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=296bbb7bc2d69f8d2c4245b9df93992a

This feature hasn't hit stable yet, it will in the next release.

vyamkovyi · 2021-12-22T16:02:29Z

https://trojansource.codes/

wooster0 · 2021-12-22T16:12:55Z

@Hexawolf GHSA-rcv6-wg5m-24v6

Manishearth · 2021-12-22T23:10:16Z

We are not affected by the homoglyph attack, please see the mitigations that were implemented as a part of this RFC.

Centril added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Oct 29, 2018

Centril mentioned this issue Oct 29, 2018

Allow non-ASCII identifiers rust-lang/rfcs#2457

Merged

This was referenced Oct 29, 2018

XID_Start / XID_Continue might not be quite right #4928

Closed

Ident mangling and unicode. #7539

Closed

New lint: detect homoglyphs rust-lang/rust-clippy#2368

Closed

Tracking issue for non-ASCII identifiers (feature "non_ascii_idents") #28979

Closed

SimonSapin mentioned this issue Oct 29, 2018

Tracking issue for 1.0.0 tracking issues #39954

Closed

38 tasks

4b5ent1 mentioned this issue Nov 4, 2018

暂不支持unicode identifier的编程语言/环境汇总 program-in-chinese/overview#102

Open

crlf0710 mentioned this issue Apr 3, 2021

Stablize non-ascii-idents #83799

Merged

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Apr 6, 2021

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Apr 7, 2021

rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label Apr 17, 2021

rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Apr 17, 2021

crlf0710 closed this as completed Apr 19, 2021

apiraino removed the to-announce Announce this issue on triage meeting label Apr 22, 2021

PatchMixolydic mentioned this issue May 29, 2021

rustc_lexer's definition of ids are more general than lang ref's spec #85809

Closed

Mingun mentioned this issue Jun 12, 2021

CONTRIBUTING.md is outdated serde-rs/serde#2038

Closed

mrluc mentioned this issue Dec 2, 2021

draft of warning on confusable identifiers elixir-lang/elixir#11429

Closed

5 tasks

Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467

Tracking issue for RFC 2457, "Allow non-ASCII identifiers" #55467

Comments

Centril commented Oct 29, 2018 • edited by crlf0710 Loading

Manishearth commented Oct 29, 2018

Centril commented Oct 29, 2018

Manishearth commented Oct 29, 2018

8573 commented Oct 29, 2018 • edited Loading

Manishearth commented Oct 29, 2018 via email

Serentty commented Oct 29, 2018

eaglgenes101 commented Nov 1, 2018

Centril commented Nov 1, 2018

glaebhoerl commented Nov 1, 2018

eddyb commented Nov 3, 2018

8573 commented Nov 4, 2018

Kixunil commented Nov 5, 2018

glaebhoerl commented Nov 5, 2018

eddyb commented Nov 5, 2018

ketsuban commented Nov 5, 2018 • edited Loading

varkor commented Nov 5, 2018

Manishearth commented Nov 5, 2018

Serentty commented Nov 9, 2018

Centril commented Nov 9, 2018

Manishearth commented Nov 9, 2018

Kixunil commented Nov 15, 2018

Manishearth commented Nov 15, 2018

rfcbot commented Mar 30, 2021

ehuss commented Apr 6, 2021

joshtriplett commented Apr 6, 2021

Manishearth commented Apr 7, 2021

joshtriplett commented Apr 7, 2021

rfcbot commented Apr 7, 2021

rfcbot commented Apr 17, 2021

crlf0710 commented Apr 18, 2021

crlf0710 commented Apr 19, 2021

ghost commented Jun 5, 2021

Manishearth commented Jun 5, 2021

vyamkovyi commented Dec 22, 2021

wooster0 commented Dec 22, 2021

Manishearth commented Dec 22, 2021

Centril commented Oct 29, 2018 •

edited by crlf0710

Loading

8573 commented Oct 29, 2018 •

edited

Loading

ketsuban commented Nov 5, 2018 •

edited

Loading