Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Consider normalisation to NFKC in latex/Unicode completer #10673
Comments
|
If Python normalises them in identifiers, +1 to doing it pre-emptively in our completer to avoid the confusion. |
|
Discussing with @takluyver we might be able to do something with tokenize. |
|
In theory, I think the completer could tokenize the text up to the cursor, work out if it's in a string, and do the normalisation if not. Any non-ascii character outside a string must be part of an identifier, I think (or a syntax error). It would be fiddly to get right, though. |
Yes, and that would still not really help with different visual-identifiers in different cells. I believe jedi also have an utility "am I in a string". So we could try to use that as well. |
Carreau commentedJun 24, 2017
Thanks @grumpfou for the report, there is a weird interaction between our latex completer and the normalisation of identifiers
The even weirdest thing is that above you have 3 involved "epsilon-like" glyphs:
Then this implies you can't make the assumption that the following does not raise:
(works with "varepsilon", but not "epsilon")
I'm tempted to think that we should insert the NFKC form when completing so that – at least – the identifier you complete to and the identifier which is actually generated are the same, and prevent users to enter invalid code that is auto-normalized. Typically you would expect the following to return 14 :

Though as
\phiget normalized the\varphiyou did bind to the same twice, which is awfully confusing.In the meantime, I know that some people are using the latex completer to actually write in docstrings and I'm unsure if it's ok to break that.
If we normalize, this also mean that some notebook that have
\phiin the code will work with one glyph which is implicitly normalized and can't be typed anymore – which I'm not a fan of.There is no way in Python AFAICT to get a warning as the normalisation is done during parsing. the
str.isidentifierdoes not have a way to tell us that Non NFKC are used but there a bug for that. There not way either to compile without normalisation nor to get a warning if non-normalized identifier present – I'm unsure an enhancement request for that would be accepted.