Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In scan for TimeUnit fragment allow two unicode representations of mu #428

Closed
wants to merge 1 commit into from

Conversation

jlapeyre
Copy link
Contributor

Previously only one unicode character for mu was recognized as the literal for microsecond. Now two are recognized. The new one appears frequently in Python. It also is used in qss-qasm.

Add tests for the three ways to specify microsecond.

Closes #422

Summary

Recognize two different characters representing mu in the Lexer. Write them as unicode literals in order to remove ambiguity, since the glyphs are typically indistinguishable.

… of mu

Previously only one unicode character for mu was recognized the literal for
microsecond. Now two are recognized. The new one appears frequently in Python.
It also is used in qss-qasm.

Add tests for the three was to specify microsecond.

Closes openqasm#422
Copy link
Contributor

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can this come with an update to the text of the spec to make the explicit Unicode points clear as well?

@k4rtik
Copy link
Member

k4rtik commented Jan 6, 2023

According to Wikipedia, there seems to be a difference between \mu and \micro in Unicode. The lexer already supports us as an alternative for writing microseconds (if I am reading it correctly).

Another, perhaps worse, solution to this issue is to explicitly state that the character is \micro, i.e., U+00B5. Though this PR's solution does not seem problematic as the conflict between the two characters is going to be restricted to time unit parsing.

Further, the article for micro- states:

Micro (Greek letter μ (U+03BC) or the legacy symbol µ (U+00B5))

@jakelishman
Copy link
Contributor

From a pragmatic perspective, it'd be good to make sure that people who use TeX-to-unicode plugins for their text editors have an easy way to insert the correct symbol, since I'm guessing (I don't use one myself) that's the intended usecase for this over just using "us". I would imagine that \mu will produce a mu-like letter, and \micro isn't a standard TeX macro, even though the micro codepoint is probably semantically more appropriate.

@jlapeyre
Copy link
Contributor Author

This should not be merged yet. See #440.

(Is there a don't merge label? I am unable to add labels)

@blakejohnson
Copy link
Contributor

To be revisited after #477.

@blakejohnson
Copy link
Contributor

Superseded by #477 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ambiguity with character code for mu in microsecond
4 participants