Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up chapter 'Unit Expressions' #3013

Merged
merged 7 commits into from
Oct 26, 2021

Conversation

henrikt-ma
Copy link
Collaborator

Just some stuff I encountered while preparing #3012.


unit_prefix:
Y | Z | E | P | T | G | M | k | h | da | d | c | m | u | n | p | f | a | z | y
UNIT-PREFIX = "Y" | "Z" | "E" | "P" | "T" | "G" | "M" | "k" | "h" | "da"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why UNIT-PREFIX must be written in capital letters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And similarly for UNIT-SYMBOL.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that it is strictly necessary. It's just matching the style of how we define similar lexical units for the Modelica language.

For me, the unit syntax becomes easier to read if I can tell by the capitalization which production rules that correspond to the lexical units I have in mind.

Of course, none of this really makes complete sense anyway, since the separation into UNIT-PREFIX and UNIT-SYMBOL is just conceptual in the grammar – in reality we all know they will be parsed as one lexical unit which is later split in tool-specific ways based on currently available unit definitions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in the other grammar it is used to lexical units (in all caps) from grammar constructs (in all lower case).
That's important for a number of reasons, including white-space handling. But here there is no need for such a separation, so I don't see that treating UNIT-PREFIX (similar to type-prefix in some weird sense) special gives us anything, and the down-side is that it seems we are shouting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, to begin with I just changed UNIT-PREFIX to unit-prefix.

The reason I hesitate more regarding UNIT-SYMBOL is that this one should really have been expressed using the lexing language. Wouldn't it be easiest to just write out the rule for UNIT-SYMBOL, which would also make clear precisely what is the set of legal characters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good if we could avoid that tools introduce other prefixes than the predefined ones, but I realize it will be hard to verify in practice. I mean, ideally tools would only extend the set of known unit symbols, but how can you tell that "μOhm" is a sign a a tool-defined prefix and not just the tool-defined unit symbol "μOhm" that happens to equal "1uOhm"?

I am also a bit scared of the possibility that different tools allow units to be introduced in ways that cause conflicts when Modelica code is moved from one tool to another. For example, say tool A introduces the symbol "metre" and tool B introduces "etre". What is a valid "milli-etre" in tool B will then be mistaken for a "metre" in tool A. Unfortunately, I guess it's too late for this as it would close the door for units such as the "mile" or "pt". A more realistic alternative would probably be to require additional units to be defined somewhere in a standardized annotation, which would allow a tool to take proper action when having custom definitions of both "metre" and "etre" in scope for the same unit string.

What I want to say with all of this is really just that I don't think we should formulate the specification as if tools can introduce custom unit prefixes as well as custom unit symbols.

Copy link
Collaborator Author

@henrikt-ma henrikt-ma Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But a common request would be currency symbols like: €£$ (and the Yen-symbol that I don't have on my keyboard).

Right, these are good examples of potentially useful unit symbols.

I think there are good technical reasons to take the same care of unit symbols as we do of identifiers. That is, play conservative in order to stay away from Unicode canonicalization issues etc, and give an explicit list of allowed characters similar to Q-CHAR.

Considering that just showing the glyph of a unicode symbol isn't very helpful i all cases, it would probably be best to make a table with all the characters allowed in addition to NON-DIGIT:

unit-char : NON-DIGIT | UNIT-UNICODE-CHAR

Initial content of the UNIT-UNICODE-CHAR would include:

  • °
  • Currency symbols: €, £, $, ¢, ¥, ₽,₨, … (just to mention a few from a list of about 40 symbols I have here)
  • Anything else?

Some symbols I would like to not see in the UNIT-UNICODE-CHAR because I'd like them to be reserved for future use:

  • Single quote: '
  • Double quote: "
  • Basic arithmetic operators: +, -, *
  • Grouping constructs and structure: [ ] { } < > : ; ,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not go into such details on these symbols; possibly just saying that only NON-DIGIT is allowed. The reason is that even if we sort of accepted unit="$" I don't think we should encourage it by listing it in the standard. (But the unit-symbol : unit-char { unit-char } sort of make sense.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should even leave the restriction to NON-DIGIT out of this PR, so I opened #3020 for what isn't merely cleanup.

We now at least have a (lowercase) unit-symbol defined in terms of some undefined unit-char.

Will this do for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think so.

@henrikt-ma
Copy link
Collaborator Author

I've now also reformulated the part which was previously stated regarding some base version, which sounded like a trace of how it may have been formulated when unit strings were discussed for their original inclusion in the specification.

Copy link
Collaborator

@HansOlsson HansOlsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@HansOlsson HansOlsson merged commit bff744f into modelica:master Oct 26, 2021
@henrikt-ma henrikt-ma deleted the cleanup/unit-syntax branch October 26, 2021 19:19
@HansOlsson HansOlsson added the M36 For pull requests merged into Modelica 3.6 label Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M36 For pull requests merged into Modelica 3.6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants