Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable named ∇x gives "unknown start of token" compiler error #120142

Open
Danvil opened this issue Jan 19, 2024 · 5 comments
Open

Variable named ∇x gives "unknown start of token" compiler error #120142

Danvil opened this issue Jan 19, 2024 · 5 comments
Labels
A-unicode Area: Unicode C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@Danvil
Copy link
Contributor

Danvil commented Jan 19, 2024

I tried this code:

let ∇x = 1;

I expected the code to compiles but instead I get the compiler error message "unknown start of token \u{2207}".

This is surprising as variable names starting with Greek letters are fine:

let Δλ = 1;

I believe the cause is that Rust identifiers need to start with a XID_Start unicode characters, however the "Nabla" ∇ (0x2207) does not seem to be on that list.

It would be great to have the "Nabla" operator as a valid start token for identifier as it very commonly used in physics and mathematics to denote the derivative of a multi-variable function.

A possible workaround is to use the "Canadian syllabics e" ᐁ (0x1401).

@Danvil Danvil added the C-bug Category: This is a bug. label Jan 19, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 19, 2024
@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Jan 20, 2024

@rustbot rustbot added T-lang Relevant to the language team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-bug Category: This is a bug. labels Jan 20, 2024
@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 21, 2024
@Manishearth
Copy link
Member

Manishearth commented Jan 22, 2024

This would need an RFC to extend the current identifier profile (the default one from UAX 31) to use the mathematical notation profile.

This would add these characters to the identifier profile (with the superscripts and subscripts not being allowed at the beginning of an identifier)

All of these would get linted on by the uncommon_codepoints lint since they have Identifier_Type=Not_NFKC.

(A change I want to make is for uncommon_codepoints to have slightly different lint text based on the category that is triggered: #120228)

@fmease fmease added the A-unicode Area: Unicode label Jan 22, 2024
@bend-n
Copy link
Contributor

bend-n commented Jan 22, 2024

Hey if were getting mathematical characters can i just say i would really love it if we had
¬
is there a unicode profile for these?

Also what about the emoji profile 😀

@Manishearth
Copy link
Member

There isn't for the math operators because those are considered operatorlike.

As for emoji it's unlikely. Rust would have to put together its own set.

Emoji identifiers are a complicated can of worms.

@CraftSpider
Copy link
Contributor

Personally I'd be interested in math operators at least tokenizing, so they could reach macros as Punct or such, but I figure that's somewhat unlikely to actually happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-unicode Area: Unicode C-enhancement Category: An issue proposing an enhancement or a PR with one. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants