Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with parsing single-letter prefixes such as for atto #172

Closed
SimonHeybrock opened this issue Sep 14, 2021 · 7 comments
Closed

Issues with parsing single-letter prefixes such as for atto #172

SimonHeybrock opened this issue Sep 14, 2021 · 7 comments

Comments

@SimonHeybrock
Copy link
Contributor

For certain lower-cases unit prefixes (I have noticed for atto and femto) the string parser also interprets the uppercase letters as this prefix. Examples:

  • Uppercase single-letter prefixes are accepted, Am and Fm work, but I believe they should not?
  • am does not work, gives 0.00029088820866572158rad

Lowercase prefix:

d 0.1m  # correctly parsed, but not formatted?
c cm
m mm
u µm
n nm
p pm
f fm
a 0.00029088820866572158rad  # ???
z 9.99999999999999908e-22m  # correctly parsed, but not formatted?
y 9.99999999999999924e-25m  # correctly parsed, but not formatted?

Uppercased prefixes (should usually not work, unless it matches something else):

D 0.1m  # should not be accepted?
C cm  # should not be accepted?
M Mm  # mega-meter, correct
U µm  # should not be accepted?
N J  # Newton*meter, correct
P 1000000000000000m  # peta-meter, correctly parsed, but not formatted?
F fm  # should not be accepted?
A 1e-18m  # should not be accepted?
Z 1e+21m  # correctly parsed, but not formatted?
Y 9.99999999999999983e+23m  # correctly parsed, but not formatted?

For future reference, here is how the list was produced:

import scipp as sc
prefixes = 'dcmunpfazy'
for prefix in prefixes:
    print(prefix, sc.Unit(f'{prefix}m'))
for prefix in prefixes.upper():
    print(prefix, sc.Unit(f'{prefix}m'))
@phlptp
Copy link
Collaborator

phlptp commented Sep 14, 2021

I went back and forth on "am" being attometer and arcminute. The astronomy field in this case won out.

as far as the capitalization, some strings only use capital case for units so it needs to support that, and if it fails on the first pass it just lowers the case and tries again. Where the prefix is unambiguous both the lower case and upper case were allowed.

A couple of those I will look at fixing though.
Thanks

@SimonHeybrock
Copy link
Contributor Author

I went back and forth on "am" being attometer and arcminute. The astronomy field in this case won out.

Ahh, the curse of compromises... apparently no software project can escape from that. ;)

From my point view

  • SI is to be preferred, symbol for minute is min.
  • If you indeed add string formatting for attometer as am (to be consistent with, say, fm) you would not be able to round-trip because string("attometer") -> unit -> string("am") -> unit would yield arcmin.

@phlptp
Copy link
Collaborator

phlptp commented Sep 16, 2021

What I am playing with a domain mapping, which could be set by default or controlled through the match_flags argument.

This would trigger different string conversions depending on the specific domain and enable standard SI to be the default.

current planned domains include
cooking
astronomy
nuclear
surveying
ucum
strict_si

these are ones where there are some potential unit strings that mean different things in the different domains.

My inclination is default to more relaxed approach so where the capitalization on the prefix is unambiguous it should be allowed but if the strict_si domain is enabled they would not be.

@SimonHeybrock
Copy link
Contributor Author

That sounds like a very promising approach! Is the plan that multiple domains could be enabled at the same time?

@phlptp
Copy link
Collaborator

phlptp commented Sep 16, 2021

Still playing with it as to how it would work. Right now I am leaning towards making strict_si a flag that would disable the flexible capitalization rules for prefixes and a few other things. So that would work with all the others without issue.

The other domains would be mutually exclusive so you couldn't use more than one domain. The mechanism I am using does allow a few exceptions to that so a few particular combination might be allowed, still working out the details.

@phlptp phlptp mentioned this issue Sep 20, 2021
@phlptp
Copy link
Collaborator

phlptp commented Sep 21, 2021

@SimonHeybrock #173 tries to address this and add the domain logic.

@SimonHeybrock
Copy link
Contributor Author

This appears to work now as expected. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants