Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider linting against 00B7 aka interpunct aka middle dot #120797

Open
pnkfelix opened this issue Feb 8, 2024 · 10 comments
Open

Consider linting against 00B7 aka interpunct aka middle dot #120797

pnkfelix opened this issue Feb 8, 2024 · 10 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints disposition-postpone This issue / PR is in PFCP or FCP with a disposition to postpone it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@pnkfelix
Copy link
Member

pnkfelix commented Feb 8, 2024

Code

#![allow(dead_code)]
#![deny(uncommon_codepoints)]
const COL·LECCIÓ: () = ();// This is Catalan

// The below is not allowed by the lexer today...
// const ·START: () = ();

// ... but this is allowed today ...
const MID·DLE: () = ();

// ... and this is also allowed today
const END·: () = ();


fn main() {
println!("{}", r#"
COL·LECCIÓ
·START
MID·DLE
END·
"#)
}

Current output

COL·LECCIÓ
·START
MID·DLE
END·

but note that visual of the first line is font-dependent, in terms of how the columns of a fixed-width font line up; the playpen collapses the L·L into a single glyph that occupies one character width.

Desired output

I'm not certain. I just want to make sure we follow-up on PR #120695

The options I see are either:

  1. Leave things as they are (00B7 is hard-rejected as an initial character, and silently accepted in all other contexts)
  2. Adopt something like what was proposed in PR uncommon_codepoints: lint against 00B7 MIDDLE DOT in final position #120695: continue hard-rejecting 00B7 as an initial character; lint against its occurrence as a final character, and silently accept it as a "medial" character
  3. Something more aggressive than PR uncommon_codepoints: lint against 00B7 MIDDLE DOT in final position #120695, like linting against 00B7 in all contexts (except perhaps when it occurs in between two L's, to accommodate Catalan, as suggested by Manish here)
  4. Other options? (We probably don't get any benefit from deviating far from Unicode committee recommendations, so we probably do not want to start accepting 00B7 as an initial character)

Rationale and extra context

No response

Other cases

No response

Rust Version

Stable channel

Build using the Stable version: 1.76.0

Anything else?

No response

@pnkfelix pnkfelix added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 8, 2024
@pnkfelix
Copy link
Member Author

pnkfelix commented Feb 9, 2024

I think this is a T-lang issue to decide, not a T-compiler one.

@rustbot label: +T-lang -T-compiler

@rustbot rustbot added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 9, 2024
@pnkfelix
Copy link
Member Author

pnkfelix commented Feb 9, 2024

Also I think its one that the Lang team might be able to resolve quickly, so I'm nominating for discussion at the T-lang triage meeting.

@rustbot label: I-lang-nominated

@rustbot rustbot added the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label Feb 9, 2024
@Jules-Bertholet
Copy link
Contributor

While we are at it, I think it could also make sense to lint against U+30FB KATAKANA MIDDLE DOT as mentioned here. MIDDLE DOT and KATAKANA MIDDLE DOT are the only two XID_Continue characters with Identifier_Type=Inclusion.. So maybe uncommon_codepoints should just check that?

@joshtriplett
Copy link
Member

I'd be extremely hesitant to set any precedent of adding language-specific linting rules (e.g. only allowing this character between two Ls), but the idea of "lint on it at the end but not the middle" seems reasonable.

@tmandry
Copy link
Member

tmandry commented Feb 16, 2024

I'd be extremely hesitant to set any precedent of adding language-specific linting rules

I think it's fine as long as we're reusing existing lint names. We don't make any guarantees about lints generally. It's seems more like a matter of how much code does the compiler team want to maintain.

@joshtriplett
Copy link
Member

Based on discussion in today's @rust-lang/lang meeting, proposing that we not do this in uncommon_codepoints, and that we consider this again after we have the split-out lint that @Manishearth suggests in point 1 of #120228 (a lint about confusables with operator/punctuation).

@rfcbot postpone

@rfcbot
Copy link

rfcbot commented Feb 21, 2024

Team member @joshtriplett has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-postpone This issue / PR is in PFCP or FCP with a disposition to postpone it. labels Feb 21, 2024
@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Mar 6, 2024
@rfcbot
Copy link

rfcbot commented Mar 6, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Mar 16, 2024
@rfcbot
Copy link

rfcbot commented Mar 16, 2024

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

This has completed FCP with a disposition to postpone, so let's unnominate.

@rustbot rustbot removed the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label Mar 19, 2024
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints disposition-postpone This issue / PR is in PFCP or FCP with a disposition to postpone it. finished-final-comment-period The final comment period is finished for this PR / Issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants