Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To what extent it pint comaptible with UCUM? #1769

Open
PhilippVerpoort opened this issue May 4, 2023 · 14 comments
Open

To what extent it pint comaptible with UCUM? #1769

PhilippVerpoort opened this issue May 4, 2023 · 14 comments

Comments

@PhilippVerpoort
Copy link

According to its documentation, UCUM is a "a code system intended to include all units of measures being contemporarily used in international science, engineering, and business. The purpose is to facilitate unambiguous electronic communication of quantities together with their units".

I was wondering to what extent the Pint package is compliant with the standard that UCUM is trying to establish or whether there is an aim to do so in the future? Are there other competing standards Pint aims to be compliant with?

@cmungall
Copy link

cmungall commented Dec 28, 2023

It's not compliant out of the box:

from pint import UnitRegistry
ureg = UnitRegistry()
ureg.parse_expression('Cel')

gives

UndefinedUnitError: 'Cel' is not defined in the unit registry

Same for:

  • mo (surprising given e.g. qudt:MO
  • others...

Additionally,

  • UCUM allows for both pmol/umol (supported by pint) and pmol.umol-1, the latter is not supported.
  • UCUM annotations such as 'nmol/mmol{Cre}' are not supported
  • Notation such as g/[100]gis supported
  • [in_i] is not supported, but [in] is
  • g/m2 is not supported, you have to write g/m^2 (the latter is not valid UCUM)
  • at least UCUM and pint agree on a meaning "year" (other systems may use this for are)

This is by no means exhaustive

It's easy to extend the unit registry to add terms from UCUM:

ureg = UnitRegistry()
ureg.define('mo = month')

I am not aware of an off the shelf library that does that. It should be possible to use systems like QUDT to auto-fill some of these, or perhaps the UCUM XML file (but not clear if the UCUM license would prevent further distruibution)

However, this doesn't help with differences in grammar. This would require a dedicated library or shim layer Surprisingly, there doesn't seem to be anything out there, at least not obvious and widely used.

This is a bit of a quandary for those of us working in areas like clinical informatics. UCUM is pretty ubiquitous due to its adoption in FHIR and other systems. At the same time it seems to make a lot of different decisions from systems used in other scientific areas, enshrined in excellent libraries like pint.

  • pyucum - this seems to be just a wrapper for existing validation services

Some useful resources:

@dalito
Copy link
Contributor

dalito commented Dec 29, 2023

What kind of "compatibility" would ideally be expected from pint? For example

  • pint can parse all ucum unit codes.
  • pint can output ucum compatible codes.

pint can already parse non-standard notations by using preprocessors (not well documented). I am not sure if output formatting can be customized to the required degree to produce UCUM codes.

UCUM seems to break some of the rules in the BIPM SI brochures whereas pint tries to follow these (e.g. g/m2 or nmol/mmol{Cre}). Using a preprocessor to parse such codes would pint allow to stay SI-compatible.

@cmungall
Copy link

cmungall commented Jan 2, 2024 via email

@dalito
Copy link
Contributor

dalito commented Jan 2, 2024

@cmungall - I have played a bit with lark in the last days and used it to write a parser for the full ucum grammar. I try to create a ucum-preprocessor for pint next. The rules of the parser are build mainly from ucum-exssence.xml.

@cmungall
Copy link

cmungall commented Jan 2, 2024

@dalito that would be awesome! I'd be happy to contribute when you have the code up on github.

btw, if you are playing with lark, you may be interested in https://github.com/linkml/semantic-dsl given you already use linkml. I think it would be overkill for a ucum preprocessor but you may be interested in general.

@cmungall
Copy link

cmungall commented Jan 3, 2024

@hgrecco
Copy link
Owner

hgrecco commented Jan 3, 2024

Just to add to the discussion, since we created the Parser delegate the definitions strings (as they appear in a file) and definitions objects (used by the registry) are clearly defined. Instead of a preprocessor you could create a Parser that emits definition objects which you then add to the registry using the define method. Most definitions classes are in pint/facets/plain/definitions.py The rest are in different definitions.py files in each facet.

@dalito
Copy link
Contributor

dalito commented Jan 4, 2024

@cmungall I saw the lark grammar when you linked to the uom repo the first time. It is however not complete and deviates from the ucum spec which may be the cause of e.g units-of-measurement/units-of-measurement#40.

My current code parses 840 of 848 ucum example units; the failures are for units that combine factor and annotation, e.g. /100{cells}. I can share what I have tomorrow.

@dalito
Copy link
Contributor

dalito commented Jan 4, 2024

@cmungall - I put together what I have in https://github.com/dalito/ucumvert. The grammar is now OK; all example UCUM unit codes are parsed without error. The conversion to pint is still very basic. I have not yet explored more elegant @hgrecco ways.

@dalito
Copy link
Contributor

dalito commented Jan 4, 2024

@hgrecco - UCUM is also a topic in the OPC UNIFIED ARCHITECTURE of the OPC foundation. Supporting UCUM and the QUDT ontology in pint may be of wider interest.

@kaiiam
Copy link

kaiiam commented Jan 8, 2024

Hey all just a quick note about UOM, we did not intent for it to deviate from the UCUM spec, it is just currently unfinished. Hence why it is missing of several of the UCUM cases like annotations (the {}s), and other issues in units-of-measurement/units-of-measurement#40. I'm hoping myself and @jamesaoverton might get a chance to work on it later, but neither of us have scope in our regular day jobs unfortunately.

Contributions to UOM are welcome if anyone @dalito perhaps? wants to work on in the interim before we get a chance to finish it. We'd love to be able to make UOM pint compatible. We are also eventually hoping to publish the UOM work so anyone who contributes is welcome to participate in that effort as well.

@dalito
Copy link
Contributor

dalito commented Jan 9, 2024

umcumvert converts now!

To "map" UCUM to pint I created a separate unit definitions file to extend the default pint units with UCUM units. This file is still very incomplete so some conversions are failing (currently 190 of the 848 common UCUM units). Looking forward to your feedback!

@kaiiam I will keep an eye on UOM.

@dalito
Copy link
Contributor

dalito commented Jan 16, 2024

With a UCUM-aware pint UnitRegistry that has an extra method from_ucum the conversion becomes quite convenient (so far only one way: ucum-to-pint). I published the package on PyPI.

>>> from ucumvert import PintUcumRegistry
>>> ureg = PintUcumRegistry()
>>> ureg.from_ucum("m/s2.kg")
<Quantity(1.0, 'kilogram * meter / second ** 2')>
>>> ureg.from_ucum("m[H2O]{35Cel}")  # UCUM code with annotation
<Quantity(1, 'm_H2O')>
>>> _.to("mbar")
<Quantity(98.0665, 'millibar')>
>>> ureg("degC")   # a standard pint unit
<Quantity(1, 'degree_Celsius')>
>>>

@cmungall
Copy link

Thanks @dalito!

I suggest to the pint maintainers that this issue could be closed with a PR to the pint docs that includes a link to ucumvert repo and a few sentences of explanation, happy to contribute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants