`parse_expression` fails on units with spaces in the name #799

jpeacock29 · 2019-04-23T23:28:07Z

For example,

from pint import UnitRegistry
UnitRegistry().parse_expression('survey mile')
UnitRegistry().parse_expression('fluid ounce')

both raise UndefinedUnitError. Others instances fail to recognize a unit name with spaces and gives unusual interpretations:

UnitRegistry().parse_expression('fl. oz.')

returns 1 femtoliter ounce.

The text was updated successfully, but these errors were encountered:

hgrecco · 2019-04-24T00:10:58Z

units with spaces are not supported as there is no robust wat to distinguish the user intention and would make the parser much more complex

jpeacock29 · 2019-04-24T00:38:59Z

Can you provide an non-robust example? "Fluid ounce" and "survey mile" both appear unambiguous. I'm glad to check out the code, but how would a token with a space in it would make the parsing much more difficult? I'm glad to make a pull request.

hgrecco · 2019-04-24T01:11:52Z

A space is now interpreted as product. That is why you can write 2 kg instead of 2 * kg. If we allow units with space when the parser finds a b, it will need to decide if it means a * b or the unit a b. This could be done by checking if a, b or a b (and their expanded versions in case they are compatible with prefixes) are in the registry. But then if you encounter a b c, now you need to check all of them. The parsing is clearly more difficult as it requires to look ahead (or look behind).

And then there are examples like the one you gave, it might be an unusual interpretation but is legal. In this cases we tend to follow a simple rule: do not guess the user intentions.

jpeacock29 · 2019-04-24T16:12:45Z

I see, thank you for explaining. It looks like a space for multiplication is currently handled in string_preprocessor in util.py, along with many other substitutions. Could another step of pre-proccessing handle this case by replacing "fluid ounce" with the canonical form "fluid_ounce"? This would fix the issue pre-tokenization and not require any look ahead in the parser.

hgrecco · 2019-04-24T16:40:45Z

That is completely correct. There has been discussions in the past about making the _subs_re (see dictionary here) pluggable in the registry. (maybe not requiring to write a regex but just string replacements).

I think that would be a nice addition.

jpeacock29 · 2019-04-24T18:46:10Z

I can take a go at this probably. My thinking is to use UnitRegistry._units to create a dictionary of names with spaces keyed to their canonical forms (eg, {'fluid ounce': 'fluid_ounce'}) and then use this dictionary to do the replacement in string_preprocessor.

However, this wouldn't work for the given example fl. oz. because the periods would not be picked up. I could work around this with regex like fl\.? oz\.? although I'm not sure if this would be too much a hit on performance. Any thoughts?

hgrecco · 2019-04-25T10:12:09Z

I think is better to create a new attribute for this (maybe UnitRegistry._replacements) and then do as you suggested.
Regarding the performance hit, we should measure that. It might be ok, I have no feeling for that.

jules-ch · 2022-03-31T21:05:47Z

Gonna close this, preprocessors can be used as a work around.

hgrecco mentioned this issue Aug 26, 2019

Possibility of support for CF/UDUNITS-style powers of units #851

Closed

jthielen mentioned this issue Aug 29, 2019

Percent (%) sign #857

Closed

aulemahal mentioned this issue Sep 9, 2020

implement check for whitespace in unit parameters Ouranosinc/xclim#543

Closed

jules-ch closed this as completed Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`parse_expression` fails on units with spaces in the name #799

`parse_expression` fails on units with spaces in the name #799

jpeacock29 commented Apr 23, 2019

hgrecco commented Apr 24, 2019

jpeacock29 commented Apr 24, 2019

hgrecco commented Apr 24, 2019

jpeacock29 commented Apr 24, 2019

hgrecco commented Apr 24, 2019

jpeacock29 commented Apr 24, 2019 •

edited

Loading

hgrecco commented Apr 25, 2019

jules-ch commented Mar 31, 2022

parse_expression fails on units with spaces in the name #799

parse_expression fails on units with spaces in the name #799

Comments

jpeacock29 commented Apr 23, 2019

hgrecco commented Apr 24, 2019

jpeacock29 commented Apr 24, 2019

hgrecco commented Apr 24, 2019

jpeacock29 commented Apr 24, 2019

hgrecco commented Apr 24, 2019

jpeacock29 commented Apr 24, 2019 • edited Loading

hgrecco commented Apr 25, 2019

jules-ch commented Mar 31, 2022

`parse_expression` fails on units with spaces in the name #799

`parse_expression` fails on units with spaces in the name #799

jpeacock29 commented Apr 24, 2019 •

edited

Loading