Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unitText - What resource do we point users to? #33

Open
aurielfournier opened this issue May 22, 2018 · 8 comments
Open

unitText - What resource do we point users to? #33

aurielfournier opened this issue May 22, 2018 · 8 comments
Milestone

Comments

@aurielfournier
Copy link
Collaborator

When users are entering character strings about the units of their measured variables, what resource should we point them to, so that we ensure they are using accepted standards for how to describe those units (ha vs hectares).

@aurielfournier
Copy link
Collaborator Author

units package

https://cran.r-project.org/web/packages/units/index.html

@amoeba
Copy link
Collaborator

amoeba commented May 30, 2018

Had a good question in my dataspice demo at NCEAS today that prompted me to think more about this. Could we maybe even warn the user if their unit isn't in a controlled vocabulary? This pushes dataspice further into the territory of metadata quality which I don't think is a bad idea.

@annakrystalli
Copy link
Collaborator

I guess it depends on whether we want to conform with the schema.org unit definition property of variableMeasured.

The unitCode property requires adherance to the UN/CEFACT Common Code (3 characters) or a URL. Other codes than the UN/CEFACT Common Code may be used with a prefix followed by a colon.

But:
a) we are not using unitCode property
b) there is no obvious link to definitive information on UN/CEFACT unit definitions

unitText which we are using, requires text indicating the unit of measurement and is suggested as useful if you cannot provide a standard unit code for unitCode. Given this I'm not sure
which standard we should suggest adhering to in dataspice. UN/CEFACT seems to be geared towards commerce. Perhaps ISO (as per eml2 unit definition) would be more appropriate for our purposes?

Any opinions?

@annakrystalli
Copy link
Collaborator

Sorry, error, I meant mainly based on SI units in EML

@khondula
Copy link
Contributor

Some other specifications I've come across for my work with ODM2 units

@cboettig
Copy link
Member

The units package in R is based on the Unidata units, which include all SI units. I agree that Unidata units are probably a much better choice for us than schema's UN/CEFACT codes. The schema definition for unit code does suggest that you can use other unit codes if you provide a prefix, but this seems not ideal to me (1. not sure what the unidata prefix would be, and 2., most users would find unidata:m a more confusing unit than m.) I think we should stick with unitText, but we should warn if units::as_units() does not recognize the unit?

We could go a step further in using units::as_units to attempt to convert the provided text into a standard form, i.e. units package recognizes these are all the same units:

library(units)
as_units("Meter/sec") 
as_units("m/s") 
as_units("meter / Second") 
as_units("meter * seconds / Second ^ 2") 

EML units are based on the older STMML standard, which I believe isn't really used by projects other than EML now(?).

@amoeba
Copy link
Collaborator

amoeba commented May 30, 2018

Oh as_units is super nice. For match failures I bet we could do a string distance comparison to the controlled list and suggest a few candidates too which might be nice.

@annakrystalli
Copy link
Collaborator

as_units looks great indeed. Maybe we could integrate using units::as_units() to complete the unitText attributes.csv column in the edit_attributes shiny app? 😃

@amoeba amoeba modified the milestones: v1.0, v1.1 Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants