New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow Multiple Unit Systems #134

Closed
billdenney opened this Issue May 21, 2018 · 18 comments

Comments

Projects
None yet
3 participants
@billdenney
Contributor

billdenney commented May 21, 2018

(I accidentally wrote this in an already-merged pull request, #93, so I'm moving it here.)

Thanks for your work on this package!

I was looking at this package to see if it could handle what I'd proposed previously within the udunits2 package (pacificclimate/Rudunits2#9). I have a very common use case for this feature that I was wanting again today. I'll describe it in a way that will hopefully clarify the issue:

I work with multiple chemicals simultaneously (specifically, I'm working with hospital laboratory test data). I need to be able to work with molar concentration units (e.g. "mole/L" and related SI conversions from there), mass concentration units (e.g. "mg/dL"), and sometimes historical and nonstandard units (e.g. "uIU/mL" for insulin) for many different chemical simultaneously. For example, I need to be able to work with glucose (molecular weight = 180.156 g/mole) and insulin (molecular weight = 5733.55 g/mole; 1 µIU/mL = 0.143988 pmol/L) in the same code.

What I'd like to be able to do is standardize all my units simultaneously with code like the following (not tested, typed directly into GitHub).

my_data <- data.frame(
  test_name=c("glucose", "insulin"),
  original_value=c(1, 1),
  original_units=c("mg/dL", "uIU/mL"),
  standard_units=c("mmol/L", "pmol/L"),
  stringsAsFactors=FALSE)
install_conversion_constant("mole", "gram", 180.156, system="glucose")
install_conversion_constant("mole", "gram", 5733.55, system="insulin")
install_conversion_constant("mole", "IU", 143.988, system="insulin")
my_data$original_value <- set_units(my_data$original_value, my_data$original_units, system=my_data$test_name)
my_data$standard_value <- set_units(my_data$original_value, my_data$standard_units, system=my_data$test_name)
@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 May 21, 2018

Member

Hi Bill,

Currently, you can do the following:

install_conversion_constant("mole.glucose", "g", 180.156)
install_conversion_constant("mole.insulin", "g", 5733.55)
install_conversion_constant("mole.insulin", "IU", 143.988)
set_units(set_units(1, mole.glucose), g)
set_units(set_units(1, mole.insulin), g)
set_units(set_units(1, mole.insulin), IU)

and the reverse conversions. But surprisingly, things like the following won't work:

set_units(set_units(1, mole.glucose/L), g/dL)

The reason is that the current udunits2 API, in which units relies on, is very limited. See issues #71, #84, #85 for further discussion on this issue. The udunits branch tries to solve this by working directly on top of the UDUNITS C API, but it's still experimental.

Another issue is the support for multiple systems of units, which you introduced in pacificclimate/Rudunits2#9. My code above is strictly correct, principled, unit-speaking. Converting moles to grams is not, because something is missing. However, I agree that it's an interesting feature, and that this correctness may be sacrificed a bit for the sake of simplicity (tidyness?) and usefulness. The xptr and xptr_cache branches are refactorisations that have this idea of the multiple systems of units in mind, but they are not functional yet in this sense.

I think @edzer prefers the principled way, although maybe your example convinces him. :) Anyway, let's see where the experiments go. And many thanks for your feedback!

Member

Enchufa2 commented May 21, 2018

Hi Bill,

Currently, you can do the following:

install_conversion_constant("mole.glucose", "g", 180.156)
install_conversion_constant("mole.insulin", "g", 5733.55)
install_conversion_constant("mole.insulin", "IU", 143.988)
set_units(set_units(1, mole.glucose), g)
set_units(set_units(1, mole.insulin), g)
set_units(set_units(1, mole.insulin), IU)

and the reverse conversions. But surprisingly, things like the following won't work:

set_units(set_units(1, mole.glucose/L), g/dL)

The reason is that the current udunits2 API, in which units relies on, is very limited. See issues #71, #84, #85 for further discussion on this issue. The udunits branch tries to solve this by working directly on top of the UDUNITS C API, but it's still experimental.

Another issue is the support for multiple systems of units, which you introduced in pacificclimate/Rudunits2#9. My code above is strictly correct, principled, unit-speaking. Converting moles to grams is not, because something is missing. However, I agree that it's an interesting feature, and that this correctness may be sacrificed a bit for the sake of simplicity (tidyness?) and usefulness. The xptr and xptr_cache branches are refactorisations that have this idea of the multiple systems of units in mind, but they are not functional yet in this sense.

I think @edzer prefers the principled way, although maybe your example convinces him. :) Anyway, let's see where the experiments go. And many thanks for your feedback!

@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 9, 2018

Contributor

@Enchufa2, thanks for the detailed response!

I'm not sure that I understand the second sentence of this comment "My code above is strictly correct, principled, unit-speaking. Converting moles to grams is not, because something is missing."

Specifically, while the code you give is correct for glucose-glucose conversions and insulin-insulin conversions, this would now allow for glucose-insulin conversions which should not work in a general unit conversion system. I agree that a general mole-gram conversion is not correct because you need to know what molecule you're working with (ergo the suggestion of multiple systems of units).

I know that I'm biased, but I think the work to implement multiple unit systems is worthwhile to enable many more general use cases than just the units and conversions that can apply to everything. (The bias is probably obvious by my work on pacificclimate/Rudunits2#9.)

Contributor

billdenney commented Jun 9, 2018

@Enchufa2, thanks for the detailed response!

I'm not sure that I understand the second sentence of this comment "My code above is strictly correct, principled, unit-speaking. Converting moles to grams is not, because something is missing."

Specifically, while the code you give is correct for glucose-glucose conversions and insulin-insulin conversions, this would now allow for glucose-insulin conversions which should not work in a general unit conversion system. I agree that a general mole-gram conversion is not correct because you need to know what molecule you're working with (ergo the suggestion of multiple systems of units).

I know that I'm biased, but I think the work to implement multiple unit systems is worthwhile to enable many more general use cases than just the units and conversions that can apply to everything. (The bias is probably obvious by my work on pacificclimate/Rudunits2#9.)

@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 9, 2018

Member

I'm not sure that I understand the second sentence of this comment "My code above is strictly correct, principled, unit-speaking. Converting moles to grams is not, because something is missing."

What I meant is that a mole is just an amount of substance. It's not a mass, so strictly speaking, you cannot convert moles to grams. The thing that is missing is of course the molar mass, which is different for every substance. In my example above, I defined mole.glucose and mole.insulin conversions to g to mask the real operation, that would be:

library(units)

molar.mass.glucose <- set_units(180.156, g/mole)
molar.mass.insulin <- set_units(5733.55, g/mole)
mole <- set_units(1, mole)
mole * molar.mass.glucose
mole * molar.mass.insulin

Specifically, while the code you give is correct for glucose-glucose conversions and insulin-insulin conversions, this would now allow for glucose-insulin conversions which should not work in a general unit conversion system.

Why not? With the current master branch (udunits branch was merged and pushed to CRAN as v0.6-0):

library(units)

install_conversion_constant("mole_glucose", "g", 180.156)
install_conversion_constant("mole_insulin", "g", 5733.55)

set_units(set_units(1, mole_glucose/L), g/dL)
#> 18.0156 g/dL
set_units(set_units(1, mole_glucose/L), mole_insulin/dL)
#> 0.003142137 mole_insulin/dL

Note however that glucose-insulin conversions wouldn't work with multiple unit systems, because each system would be self-contained and there's no easy way to connect them.

Still, I think too that this feature request is worth it, but we need a strong use case.

Member

Enchufa2 commented Jun 9, 2018

I'm not sure that I understand the second sentence of this comment "My code above is strictly correct, principled, unit-speaking. Converting moles to grams is not, because something is missing."

What I meant is that a mole is just an amount of substance. It's not a mass, so strictly speaking, you cannot convert moles to grams. The thing that is missing is of course the molar mass, which is different for every substance. In my example above, I defined mole.glucose and mole.insulin conversions to g to mask the real operation, that would be:

library(units)

molar.mass.glucose <- set_units(180.156, g/mole)
molar.mass.insulin <- set_units(5733.55, g/mole)
mole <- set_units(1, mole)
mole * molar.mass.glucose
mole * molar.mass.insulin

Specifically, while the code you give is correct for glucose-glucose conversions and insulin-insulin conversions, this would now allow for glucose-insulin conversions which should not work in a general unit conversion system.

Why not? With the current master branch (udunits branch was merged and pushed to CRAN as v0.6-0):

library(units)

install_conversion_constant("mole_glucose", "g", 180.156)
install_conversion_constant("mole_insulin", "g", 5733.55)

set_units(set_units(1, mole_glucose/L), g/dL)
#> 18.0156 g/dL
set_units(set_units(1, mole_glucose/L), mole_insulin/dL)
#> 0.003142137 mole_insulin/dL

Note however that glucose-insulin conversions wouldn't work with multiple unit systems, because each system would be self-contained and there's no easy way to connect them.

Still, I think too that this feature request is worth it, but we need a strong use case.

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 9, 2018

Member

And since udunits now does the conversion, we can now also add prefixes to user defined units:

> set_units(set_units(1, mole_glucose/L), mmole_insulin/dL)
3.142137 mmole_insulin/dL
Member

edzer commented Jun 9, 2018

And since udunits now does the conversion, we can now also add prefixes to user defined units:

> set_units(set_units(1, mole_glucose/L), mmole_insulin/dL)
3.142137 mmole_insulin/dL
@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 9, 2018

Contributor

To start with: Thank you for implementing the ability to add units in a more native way with the current version! This change will make many of my use cases much simpler!

@Enchufa2:

Specifically, while the code you give is correct for glucose-glucose conversions and insulin-insulin conversions, this would now allow for glucose-insulin conversions which should not work in a general unit conversion system.

Why not? With the current master branch (udunits branch was merged and pushed to CRAN as v0.6-0)
...
Note however that glucose-insulin conversions wouldn't work with multiple unit systems, because each system would be self-contained and there's no easy way to connect them.

While it now will work, my inclination is that it should not work. (Emphasis added not to look like yelling, just to ensure clarity in what I'm describing.)

A better example would probably be the system of hydrogen, oxygen, and water (H2O). If all were included in single system then the following should apply (note that this doesn't work with 0.6.0-- it's just an example for clarity).

library(units)

install_conversion_constant("mole_hydrogen", "g", 1)
install_conversion_constant("mole_oxygen", "g", 16)
install_conversion_constant("mole_water", "g", 18)
install_conversion_constant("mole_hydrogen", "mole_water", 2)
install_conversion_constant("mole_oxygen", "mole_water", 1)

The above leads to a unit system contradiction because the path from 1 mole_hydrogen -> g -> mole_water gives one answer (=1/18 mole_water) while the direct conversion 1 mole_hydrogen -> mole_water gives a different answer (= 1/2 mole_water).

Contributor

billdenney commented Jun 9, 2018

To start with: Thank you for implementing the ability to add units in a more native way with the current version! This change will make many of my use cases much simpler!

@Enchufa2:

Specifically, while the code you give is correct for glucose-glucose conversions and insulin-insulin conversions, this would now allow for glucose-insulin conversions which should not work in a general unit conversion system.

Why not? With the current master branch (udunits branch was merged and pushed to CRAN as v0.6-0)
...
Note however that glucose-insulin conversions wouldn't work with multiple unit systems, because each system would be self-contained and there's no easy way to connect them.

While it now will work, my inclination is that it should not work. (Emphasis added not to look like yelling, just to ensure clarity in what I'm describing.)

A better example would probably be the system of hydrogen, oxygen, and water (H2O). If all were included in single system then the following should apply (note that this doesn't work with 0.6.0-- it's just an example for clarity).

library(units)

install_conversion_constant("mole_hydrogen", "g", 1)
install_conversion_constant("mole_oxygen", "g", 16)
install_conversion_constant("mole_water", "g", 18)
install_conversion_constant("mole_hydrogen", "mole_water", 2)
install_conversion_constant("mole_oxygen", "mole_water", 1)

The above leads to a unit system contradiction because the path from 1 mole_hydrogen -> g -> mole_water gives one answer (=1/18 mole_water) while the direct conversion 1 mole_hydrogen -> mole_water gives a different answer (= 1/2 mole_water).

@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 9, 2018

Member

But the last two calls fail with

Error in install_conversion_constant("mole_hydrogen", "mole_water", 2) :
  exactly one of (from, to) must be a known unit
Error in install_conversion_constant("mole_oxygen", "mole_water", 1) :
  exactly one of (from, to) must be a known unit

So there's no possible contradiction.

Member

Enchufa2 commented Jun 9, 2018

But the last two calls fail with

Error in install_conversion_constant("mole_hydrogen", "mole_water", 2) :
  exactly one of (from, to) must be a known unit
Error in install_conversion_constant("mole_oxygen", "mole_water", 1) :
  exactly one of (from, to) must be a known unit

So there's no possible contradiction.

@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 9, 2018

Contributor

That was what I meant with "note that this doesn't work with 0.6.0-- it's just an example for clarity".

All of the conversions I wrote are accurate within their own systems-- and as the link between systems. For the link between systems (the mole-to-mole conversions), those should be outside of units.

Let me ask a different question: Were I to generate a pull request or collaborate on the xptr branch to enable different systems including test cases, would it be accepted? If yes, from the UI perspective, I would need a new argument to every function in the package; does sys="default" look good to you for that new argument?

Contributor

billdenney commented Jun 9, 2018

That was what I meant with "note that this doesn't work with 0.6.0-- it's just an example for clarity".

All of the conversions I wrote are accurate within their own systems-- and as the link between systems. For the link between systems (the mole-to-mole conversions), those should be outside of units.

Let me ask a different question: Were I to generate a pull request or collaborate on the xptr branch to enable different systems including test cases, would it be accepted? If yes, from the UI perspective, I would need a new argument to every function in the package; does sys="default" look good to you for that new argument?

@edzer

This comment has been minimized.

Show comment
Hide comment
@edzer

edzer Jun 9, 2018

Member

I don't see why you'd need a separate unit system for every molecule, and as long as I don't see the need I'm hesitant. I know you made a large PR to the original Rudunits2 code base. Could you give a simple example of using that PR, illustrating what you cannot do now in units using @Enchufa2 's suggestions above, in order to convince us?

Member

edzer commented Jun 9, 2018

I don't see why you'd need a separate unit system for every molecule, and as long as I don't see the need I'm hesitant. I know you made a large PR to the original Rudunits2 code base. Could you give a simple example of using that PR, illustrating what you cannot do now in units using @Enchufa2 's suggestions above, in order to convince us?

@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 9, 2018

Member

That was what I meant with "note that this doesn't work with 0.6.0-- it's just an example for clarity".

Yeap, but I understood that you were saying that glucose-insulin conversions should not work because this could lead to contradictions. That's why I was just highlighting that in there can't be any contradiction.


My two cents: A use case for multiple unit systems that would be stronger (IMHO) than the one presented in this issue would be to provide an API to generate a new empty system of units (different from SI). Then, you could populate it with your own XML (or programatically unit by unit) to provide new base units, etc. In this way, you could work for instance with the CGS system instead.

But this is quite an enterprise (because it would require at least a new XML with a whole new system to justify such a change and make it usable) for a small audience, and there are more important short-term issues open.

Member

Enchufa2 commented Jun 9, 2018

That was what I meant with "note that this doesn't work with 0.6.0-- it's just an example for clarity".

Yeap, but I understood that you were saying that glucose-insulin conversions should not work because this could lead to contradictions. That's why I was just highlighting that in there can't be any contradiction.


My two cents: A use case for multiple unit systems that would be stronger (IMHO) than the one presented in this issue would be to provide an API to generate a new empty system of units (different from SI). Then, you could populate it with your own XML (or programatically unit by unit) to provide new base units, etc. In this way, you could work for instance with the CGS system instead.

But this is quite an enterprise (because it would require at least a new XML with a whole new system to justify such a change and make it usable) for a small audience, and there are more important short-term issues open.

@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 9, 2018

Contributor

My underlying use case is the normalization of data from clinical trials of new medicines. I receive data from many clinical sites (similar to a doctor's office or hospital) with many different units that are specific to their local laboratory. The units are not just a mixture of concentration units (like "mmol/L" and "mg/dL" both for one type of cholesterol) but also a mixture of derived units. For example, I may receive glucose measurements as a concentration with varying units as both concentrations ("mg/dL" and "mmol/L"), normalized urinary concentrations (including awful units like "mg/mmol Creatinine"), non-normalized urinary concentrations ("mmol/L" and "mg/mL"), integrals of daily concentrations ("hr*mg/mL"), and probably others that I'm not thinking of.

That is just for glucose, and for other measurements, I could get other, equally-convoluted measurements, and it's common to have about 50-60 different lab measurements per clinical study.

The above is a real data scenario that I've experienced with a clinical study for diabetes.

With one unit system, I think that I have two options:

  1. Each time I am working on a new data set from a clinical study, carefully modify the units so that I now have units named something like "mmol_glucose/L". This is not a good option to me because it would require full parsing of the unit structure, removing SI prefixes (like m=milli, n=nano, etc.), replacing identified units, and then doing the conversion.
  2. Each time I am working on a new data set from a clinical study,
    1. load the units library,
    2. assign the any unit conversions needed for the first unit system (e.g. glucose),
    3. subset the data, perform the conversion,
    4. unassign the gram to mole conversion for the first unit system and assign it for the second unit system ensuring that I removed all extraneous units (I've worked with some lab values that have at least 5 unit sets like "mg/dL", "mmol/L", "pmol/uL", a historical international unit value "IU/mL", and a substrate conversion rate "kcat/mL")
    5. assign the unit conversions needed for the second unit system (e.g. cholesterol), and repeat steps 1 to 4.

With the current library, I would do the following

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)

install_conversion_constant("IU", "mg", 0.0347) # for insulin
remove_symbolic_unit("mole")
install_conversion_constant("mole", "g", 5733.55) # for insulin

# Convert all the insulin measurements in my_data here

# hope that I've remembered to fully reset the unit system removing everything
# that is insulin-specific
remove_symbolic_unit("mole")
# I forgot to remove_symbolic_unit("IU"), but that's OK because glucose doesn't use it
install_conversion_constant("mole", "g", 180.156) # for glucose

# Convert all the glucose measurements in my_data here

# hope that I've remembered to fully reset the unit system removing everything
# that is glucose-specific
remove_symbolic_unit("mole")
install_conversion_constant("mole", "g", 3485) # for glucagon
# I forgot to remove IU from the table for insulin above, and I didn't notice
# that my data has glucagon with IU in it.

# Convert all the glucagon measurements in my_data here.  Now I have an
# inaccurate value for the glucagon that started as IU/L because it used the
# insulin IU value instead of the glucagon IU value.

With multiple unit systems, I could build a library of systems for use with clinical trial data (enabling reusability) and do something much simpler:

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)
# I'd suggest an optional 'xml' argument pointing to an xml file.  If missing,
# it uses the default; if NULL, it starts with an empty unit system with nothing
# loaded; if a scalar character string, it tries to load that file; otherwise,
# error.
new_unit_system("glucose")
new_unit_system("insulin")
new_unit_system("glucagon")
remove_symbolic_unit("mole", system="glucose")
install_conversion_constant("mole", "g", 180.156, system="glucose")
remove_symbolic_unit("mole", system="insulin")
install_conversion_constant("IU", "mg", 0.0347, system="insulin")
install_conversion_constant("mole", "g", 5733.55, system="insulin")
remove_symbolic_unit("mole", system="glucagon")
install_conversion_constant("mole", "g", 3485, system="glucagon")
install_conversion_constant("IU", "mg", 1, system="glucagon")

# Convert everything at once:
my_data$Original_Value_with_Units <-
  set_units(my_data$Original_Value, my_data$Original_Units, system=my_data$Analyte)
my_data$New_Value_with_Units <-
  set_units(my_data$Original_Value_with_Units, my_data$New_Units, system=my_data$Analyte)

And, more generally, everything between loading the data and assigning the units would be abstracted away by a library of medical unit conversions managing all of the systems for all the different lab tests.

Contributor

billdenney commented Jun 9, 2018

My underlying use case is the normalization of data from clinical trials of new medicines. I receive data from many clinical sites (similar to a doctor's office or hospital) with many different units that are specific to their local laboratory. The units are not just a mixture of concentration units (like "mmol/L" and "mg/dL" both for one type of cholesterol) but also a mixture of derived units. For example, I may receive glucose measurements as a concentration with varying units as both concentrations ("mg/dL" and "mmol/L"), normalized urinary concentrations (including awful units like "mg/mmol Creatinine"), non-normalized urinary concentrations ("mmol/L" and "mg/mL"), integrals of daily concentrations ("hr*mg/mL"), and probably others that I'm not thinking of.

That is just for glucose, and for other measurements, I could get other, equally-convoluted measurements, and it's common to have about 50-60 different lab measurements per clinical study.

The above is a real data scenario that I've experienced with a clinical study for diabetes.

With one unit system, I think that I have two options:

  1. Each time I am working on a new data set from a clinical study, carefully modify the units so that I now have units named something like "mmol_glucose/L". This is not a good option to me because it would require full parsing of the unit structure, removing SI prefixes (like m=milli, n=nano, etc.), replacing identified units, and then doing the conversion.
  2. Each time I am working on a new data set from a clinical study,
    1. load the units library,
    2. assign the any unit conversions needed for the first unit system (e.g. glucose),
    3. subset the data, perform the conversion,
    4. unassign the gram to mole conversion for the first unit system and assign it for the second unit system ensuring that I removed all extraneous units (I've worked with some lab values that have at least 5 unit sets like "mg/dL", "mmol/L", "pmol/uL", a historical international unit value "IU/mL", and a substrate conversion rate "kcat/mL")
    5. assign the unit conversions needed for the second unit system (e.g. cholesterol), and repeat steps 1 to 4.

With the current library, I would do the following

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)

install_conversion_constant("IU", "mg", 0.0347) # for insulin
remove_symbolic_unit("mole")
install_conversion_constant("mole", "g", 5733.55) # for insulin

# Convert all the insulin measurements in my_data here

# hope that I've remembered to fully reset the unit system removing everything
# that is insulin-specific
remove_symbolic_unit("mole")
# I forgot to remove_symbolic_unit("IU"), but that's OK because glucose doesn't use it
install_conversion_constant("mole", "g", 180.156) # for glucose

# Convert all the glucose measurements in my_data here

# hope that I've remembered to fully reset the unit system removing everything
# that is glucose-specific
remove_symbolic_unit("mole")
install_conversion_constant("mole", "g", 3485) # for glucagon
# I forgot to remove IU from the table for insulin above, and I didn't notice
# that my data has glucagon with IU in it.

# Convert all the glucagon measurements in my_data here.  Now I have an
# inaccurate value for the glucagon that started as IU/L because it used the
# insulin IU value instead of the glucagon IU value.

With multiple unit systems, I could build a library of systems for use with clinical trial data (enabling reusability) and do something much simpler:

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)
# I'd suggest an optional 'xml' argument pointing to an xml file.  If missing,
# it uses the default; if NULL, it starts with an empty unit system with nothing
# loaded; if a scalar character string, it tries to load that file; otherwise,
# error.
new_unit_system("glucose")
new_unit_system("insulin")
new_unit_system("glucagon")
remove_symbolic_unit("mole", system="glucose")
install_conversion_constant("mole", "g", 180.156, system="glucose")
remove_symbolic_unit("mole", system="insulin")
install_conversion_constant("IU", "mg", 0.0347, system="insulin")
install_conversion_constant("mole", "g", 5733.55, system="insulin")
remove_symbolic_unit("mole", system="glucagon")
install_conversion_constant("mole", "g", 3485, system="glucagon")
install_conversion_constant("IU", "mg", 1, system="glucagon")

# Convert everything at once:
my_data$Original_Value_with_Units <-
  set_units(my_data$Original_Value, my_data$Original_Units, system=my_data$Analyte)
my_data$New_Value_with_Units <-
  set_units(my_data$Original_Value_with_Units, my_data$New_Units, system=my_data$Analyte)

And, more generally, everything between loading the data and assigning the units would be abstracted away by a library of medical unit conversions managing all of the systems for all the different lab tests.

@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 9, 2018

Member

What about the following:

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)

install_conversion_constant("mol_glucose", "g", 180.156)
install_conversion_constant("mol_glucagon", "g", 3482.80)
install_conversion_constant("mol_insulin", "g", 5733.55)
install_conversion_constant("mol_insulin", "IU", 143.988)

within(my_data, New_Value <- sapply(seq_along(Analyte), function(i) {
  original_unit <- gsub("(mol)", paste0("\\1_", Analyte[i]), Original_Units[i])
  original_value <- set_units(Original_Value[i], original_unit, mode="standard")
  new_unit <- gsub("(mol)", paste0("\\1_", Analyte[i]), New_Units[i])
  drop_units(set_units(original_value, new_unit, mode="standard"))
}))
#>    Analyte Original_Units Original_Value New_Units    New_Value
#> 1  glucose          mg/dL              1    mmol/L 5.550745e-02
#> 2  glucose         mmol/L              2    mmol/L 2.000000e+00
#> 3  insulin           IU/L              3     mg/dL 1.194589e+04
#> 4  insulin          mg/dL              4     mg/dL 4.000000e+00
#> 5 glucagon         mmol/L              5    mmol/L 5.000000e+00
#> 6 glucagon           IU/L              6    mmol/L 6.859935e+01
Member

Enchufa2 commented Jun 9, 2018

What about the following:

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)

install_conversion_constant("mol_glucose", "g", 180.156)
install_conversion_constant("mol_glucagon", "g", 3482.80)
install_conversion_constant("mol_insulin", "g", 5733.55)
install_conversion_constant("mol_insulin", "IU", 143.988)

within(my_data, New_Value <- sapply(seq_along(Analyte), function(i) {
  original_unit <- gsub("(mol)", paste0("\\1_", Analyte[i]), Original_Units[i])
  original_value <- set_units(Original_Value[i], original_unit, mode="standard")
  new_unit <- gsub("(mol)", paste0("\\1_", Analyte[i]), New_Units[i])
  drop_units(set_units(original_value, new_unit, mode="standard"))
}))
#>    Analyte Original_Units Original_Value New_Units    New_Value
#> 1  glucose          mg/dL              1    mmol/L 5.550745e-02
#> 2  glucose         mmol/L              2    mmol/L 2.000000e+00
#> 3  insulin           IU/L              3     mg/dL 1.194589e+04
#> 4  insulin          mg/dL              4     mg/dL 4.000000e+00
#> 5 glucagon         mmol/L              5    mmol/L 5.000000e+00
#> 6 glucagon           IU/L              6    mmol/L 6.859935e+01
@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 10, 2018

Contributor

I don’t know if you were just trying to replicate my code (where I noted there was an intentional bug with IU) or not, but the glucagon conversion is inaccurate because the IU value isn’t interpreted as a glucagon IU but an insulin IU.

In general, the error with IU conversion with glucagon is the type of unintentional error that I’m trying to avoid by separating the systems. The error was silent and the lack of warning or error suggests that the conversion happened accurately. Were they different systems, an error would have sprung up showing that in the glucagon system there is no unit defined as “IU”, and the user would know to correct the code.

Contributor

billdenney commented Jun 10, 2018

I don’t know if you were just trying to replicate my code (where I noted there was an intentional bug with IU) or not, but the glucagon conversion is inaccurate because the IU value isn’t interpreted as a glucagon IU but an insulin IU.

In general, the error with IU conversion with glucagon is the type of unintentional error that I’m trying to avoid by separating the systems. The error was silent and the lack of warning or error suggests that the conversion happened accurately. Were they different systems, an error would have sprung up showing that in the glucagon system there is no unit defined as “IU”, and the user would know to correct the code.

@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 10, 2018

Member

Yes, sorry, I just replicated your data, so if there's a bug, it's just due to my lack of knowledge about these units.

If IU units should be applied only to insulin, then the solution is to subscript it too, as with the mole:

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)

install_conversion_constant("mol_glucose", "g", 180.156)
install_conversion_constant("mol_glucagon", "g", 3482.80)
install_conversion_constant("mol_insulin", "g", 5733.55)
install_conversion_constant("mol_insulin", "IU_insulin", 143.988)

pat <- "(mol|IU)"

within(my_data, New_Value <- sapply(seq_along(Analyte), function(i) {
  original_unit <- gsub(pat, paste0("\\1_", Analyte[i]), Original_Units[i])
  original_value <- set_units(Original_Value[i], original_unit, mode="standard")
  new_unit <- gsub(pat, paste0("\\1_", Analyte[i]), New_Units[i])
  drop_units(set_units(original_value, new_unit, mode="standard"))
}))
#> Error: In ‘IU_glucagon/L’, ‘IU_glucagon’ is not recognized by udunits.
#> See a table of valid unit symbols and names with valid_udunits().
#> Add custom user-defined units with install_symbolic_unit().

And conversion fails if there's an error, as you can see. You can make the pattern pat as complex as needed to accommodate more special units. Of course, if there could be IU of glucagon, you just need to define the conversion, as suggested by the error message, and the code above succeeds:

install_conversion_constant("mol_glucagon", "IU_glucagon", 3) # whatever

within(my_data, New_Value <- sapply(seq_along(Analyte), function(i) {
  original_unit <- gsub(pat, paste0("\\1_", Analyte[i]), Original_Units[i])
  original_value <- set_units(Original_Value[i], original_unit, mode="standard")
  new_unit <- gsub(pat, paste0("\\1_", Analyte[i]), New_Units[i])
  drop_units(set_units(original_value, new_unit, mode="standard"))
}))
#>    Analyte Original_Units Original_Value New_Units    New_Value
#> 1  glucose          mg/dL              1    mmol/L 5.550745e-02
#> 2  glucose         mmol/L              2    mmol/L 2.000000e+00
#> 3  insulin           IU/L              3     mg/dL 1.194589e+04
#> 4  insulin          mg/dL              4     mg/dL 4.000000e+00
#> 5 glucagon         mmol/L              5    mmol/L 5.000000e+00
#> 6 glucagon           IU/L              6    mmol/L 2.000000e+03
Member

Enchufa2 commented Jun 10, 2018

Yes, sorry, I just replicated your data, so if there's a bug, it's just due to my lack of knowledge about these units.

If IU units should be applied only to insulin, then the solution is to subscript it too, as with the mole:

library(units)

my_data <-
  data.frame(Analyte=c("glucose", "glucose", "insulin", "insulin", "glucagon", "glucagon"),
             Original_Units=c("mg/dL", "mmol/L", "IU/L", "mg/dL", "mmol/L", "IU/L"),
             Original_Value=c(1, 2, 3, 4, 5, 6),
             New_Units=c("mmol/L", "mmol/L", "mg/dL", "mg/dL", "mmol/L", "mmol/L"),
             stringsAsFactors=FALSE)

install_conversion_constant("mol_glucose", "g", 180.156)
install_conversion_constant("mol_glucagon", "g", 3482.80)
install_conversion_constant("mol_insulin", "g", 5733.55)
install_conversion_constant("mol_insulin", "IU_insulin", 143.988)

pat <- "(mol|IU)"

within(my_data, New_Value <- sapply(seq_along(Analyte), function(i) {
  original_unit <- gsub(pat, paste0("\\1_", Analyte[i]), Original_Units[i])
  original_value <- set_units(Original_Value[i], original_unit, mode="standard")
  new_unit <- gsub(pat, paste0("\\1_", Analyte[i]), New_Units[i])
  drop_units(set_units(original_value, new_unit, mode="standard"))
}))
#> Error: In ‘IU_glucagon/L’, ‘IU_glucagon’ is not recognized by udunits.
#> See a table of valid unit symbols and names with valid_udunits().
#> Add custom user-defined units with install_symbolic_unit().

And conversion fails if there's an error, as you can see. You can make the pattern pat as complex as needed to accommodate more special units. Of course, if there could be IU of glucagon, you just need to define the conversion, as suggested by the error message, and the code above succeeds:

install_conversion_constant("mol_glucagon", "IU_glucagon", 3) # whatever

within(my_data, New_Value <- sapply(seq_along(Analyte), function(i) {
  original_unit <- gsub(pat, paste0("\\1_", Analyte[i]), Original_Units[i])
  original_value <- set_units(Original_Value[i], original_unit, mode="standard")
  new_unit <- gsub(pat, paste0("\\1_", Analyte[i]), New_Units[i])
  drop_units(set_units(original_value, new_unit, mode="standard"))
}))
#>    Analyte Original_Units Original_Value New_Units    New_Value
#> 1  glucose          mg/dL              1    mmol/L 5.550745e-02
#> 2  glucose         mmol/L              2    mmol/L 2.000000e+00
#> 3  insulin           IU/L              3     mg/dL 1.194589e+04
#> 4  insulin          mg/dL              4     mg/dL 4.000000e+00
#> 5 glucagon         mmol/L              5    mmol/L 5.000000e+00
#> 6 glucagon           IU/L              6    mmol/L 2.000000e+03
@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 10, 2018

Member

Also please note that a conversion such as the one you wrote, i.e.

my_data$New_Value_with_Units <-
  set_units(my_data$Original_Value_with_Units, my_data$New_Units, system=my_data$Analyte)

cannot work (and I'd say with confidence that will never work) that way. Because units assigns one unit to an entire object (vector, array, matrix), by design. Therefore, you would need to convert them one by one, just as I did in my example (which by the way is shorter, and simpler IMHO, even considering the line above as valid).

Member

Enchufa2 commented Jun 10, 2018

Also please note that a conversion such as the one you wrote, i.e.

my_data$New_Value_with_Units <-
  set_units(my_data$Original_Value_with_Units, my_data$New_Units, system=my_data$Analyte)

cannot work (and I'd say with confidence that will never work) that way. Because units assigns one unit to an entire object (vector, array, matrix), by design. Therefore, you would need to convert them one by one, just as I did in my example (which by the way is shorter, and simpler IMHO, even considering the line above as valid).

@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 10, 2018

Contributor

I’m coming around to the single-system approach. In addition, I’m realizing with these examples that I could make implicit multiple systems by just making disjoint conversions by adding specific mass and molar units for each system with something like the following (note that this is typed on my phone without testing, there are probably bugs, but hopefully the intent is clear):

install_symbolic_unit(“g_insulin”)
install_conversion_constant(“mol_insulin”, “g_insulin”, 5733.55)

My last major question here is: Do I recall correctly that udunits has a different concept for basic units (like grams) and unitless units (like percent)? If so, which does install_symbolic_unit use? And, could the user be given control over that if there are multiple types?

I’m going to bring up vectorized unit storage and operations in another thread.

Contributor

billdenney commented Jun 10, 2018

I’m coming around to the single-system approach. In addition, I’m realizing with these examples that I could make implicit multiple systems by just making disjoint conversions by adding specific mass and molar units for each system with something like the following (note that this is typed on my phone without testing, there are probably bugs, but hopefully the intent is clear):

install_symbolic_unit(“g_insulin”)
install_conversion_constant(“mol_insulin”, “g_insulin”, 5733.55)

My last major question here is: Do I recall correctly that udunits has a different concept for basic units (like grams) and unitless units (like percent)? If so, which does install_symbolic_unit use? And, could the user be given control over that if there are multiple types?

I’m going to bring up vectorized unit storage and operations in another thread.

@Enchufa2

This comment has been minimized.

Show comment
Hide comment
@Enchufa2

Enchufa2 Jun 10, 2018

Member

Yes, the intent is clear, and yes, you can do that, although IMHO is not necessary.

Regarding your question, yes, the concept is different. In fact, the udunits API has two functions to define a new unit: ut_new_base_unit and ut_new_dimensionless_unit .

Member

Enchufa2 commented Jun 10, 2018

Yes, the intent is clear, and yes, you can do that, although IMHO is not necessary.

Regarding your question, yes, the concept is different. In fact, the udunits API has two functions to define a new unit: ut_new_base_unit and ut_new_dimensionless_unit .

@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 10, 2018

Contributor

Thanks!

Could install_symbolic_unit have a flag added to choose whether a base or unitless unit were added?

Contributor

billdenney commented Jun 10, 2018

Thanks!

Could install_symbolic_unit have a flag added to choose whether a base or unitless unit were added?

@billdenney

This comment has been minimized.

Show comment
Hide comment
@billdenney

billdenney Jun 10, 2018

Contributor

Thanks for the detailed discussion here! I think that we’ve fully explored the topic and decided that the feature isn’t necessary.

If you think more discussion would help clarify anything, please comment or reopen.

Contributor

billdenney commented Jun 10, 2018

Thanks for the detailed discussion here! I think that we’ve fully explored the topic and decided that the feature isn’t necessary.

If you think more discussion would help clarify anything, please comment or reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment