-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: 32-bit Unicode name mangling #539
Conversation
| 0, it, r => r | ||
| i+1, it, r => | ||
let c := it.curr | ||
if c.isAlpha || c.isDigit then | ||
mangleAux i it.next (r.push c) | ||
else if c = '_' then | ||
mangleAux i it.next (r ++ "__") | ||
else if c.toNat < 255 then | ||
else if c.toNat < 0x100 then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB the bound increase, now _xff
can be encoded as well.
src/Lean/Compiler/NameMangling.lean
Outdated
else | ||
let n := c.toNat | ||
let r := r ++ "_u32" | ||
let r := Nat.toDigits 16 n |>.foldl (fun r c => r.push c) r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it important in the 16-bit case that the encoding be constant-width? If yes, I can add justification to 8 characters here as well. Or alternatively, this last case could cover all Unicode with _u
used always instead of _u
/_u32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has to be fixed width because otherwise _uabcd
is confusable with _uabc
+ d
. You can also solve this by using different prefixes for different lengths, though, as with _x
vs _u
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'm not sure if we ever unmangle atm, but still it's better to keep the mangling injective.
(We could also use UTF-8 instead of a UTF-16/UTF-32 switch. Not sure which is preferrable.)
let r := r ++ "_U" | ||
let ds := Nat.toDigits 16 n | ||
let r := Nat.repeat (·.push '0') (8 - ds.length) r | ||
let r := ds.foldl (fun r c => r.push c) r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not actually clear to me why Nat.toDigits
builds a char list instead of a string immediately, but that's tangential to this PR
The command `#conv $tac => $e` will run a conv tactic `tac` on `e` and display the resulting simplified version of `e'`. This is used to implement several other user commands: * `#whnf e` displays the weak head normal form of `e` * `#simp e` displays the simp normal form of `e` * `#norm_num e` evaluates `e` using `norm_num` * `#push_neg e` shows the result of `push_neg` on `e` The `conv` versions of `norm_num` and `push_neg` are also added here, so that the commands can be implemented as macros.
* [x] depends on leanprover#539 * [x] depends on leanprover#552 This implements the `ring` conv tactic, which rewrites a target that is an equality in rings to `True`. I'm not sure if anyone uses this, but it is what it is and it brings down our metrics.
Reported on Zulip.