New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow different units in a vector or matrix #145

Closed
billdenney opened this Issue Jun 10, 2018 · 31 comments

Comments

Projects
None yet
3 participants
@billdenney
Contributor

billdenney commented Jun 10, 2018

In some scenarios, it is helpful to allow different units to be stored in a single vector. Currently, units stores a scalar unit for any object with units (a scalar, vector, matrix, or array).

It would help for some of my applications (some of which are described in #134) to store different units for a vector (or matrix or array).

I know that when calling to the underlying library, there must be a 1:1 operation on units, so they must be subset to each set of unique units. So, additional logic would be required to handle each of the Ops subsetting to the unique pairs of units on each side of the operand.

Would this feature be of interest?

@edzer

This comment has been minimized.

Member

edzer commented Jun 10, 2018

How would this align with the concept of tidy data?

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 10, 2018

Without it, I would have to split and combine data provided with multiple units many times to achieve the desired effect.

As with the example in #134, data for clinical trials are provided with one column for the analyte (glucose, insulin, and glucagon in the example), one column for units, and one column for the value. In my experience working with medical data, this is a common structure.

With tidy data and units, all three of these columns are actually one concept, so they should be stored together. With the current structure of units, they must be stored separately.

With the current single-unit storage, the tidy medical data cannot be stored with units, it must be stored as separate vectors (more accurately separate columns in a data.frame), and the concept that a column has associated units is impossible to store in the column itself.

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 10, 2018

Question (disclaimer: I don't know anything about clinical trials): is each analyte an experiment? Or each experiment, for one subject, measures several analytes? I mean, something like the following:

subject day insulin glucose glucagon
A 1 43.449701 38.3461320 24.685657
A 2 13.405432 21.3245162 3.935556
B 1 1.768258 41.0023672 34.530623
B 2 24.510127 0.5037938 40.026276
@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 10, 2018

Each experiment for one subject measures several analytes, but each analyte may be on its own schedule, and some may be taken as unscheduled measurements. The data keys would be subject, date/time, and analyte, and the values would be measurement value and measurement units. This is based on a much more complex data standard required by regulatory angencies like the FDA in the United States, the EMA in Europe, and the PMDA in Japan.

To alter your table, there may have been no glucagon measured on day 2, and subject 2 may have had an unscheduled extra glucose measurement on day 2.2. And, due to a potential safety concern, subject 1 may have had an unrelated lab (for example, “creatinine”) measured at day 1.1 that was never specified in the study protocol.

(I’m on my phone still, so making an example table is challenging. If this is unclear, please let me know, and I’ll generate an example table when I get back to my computer.)

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 10, 2018

Understood. I think your description fits into the tidy data concept, but following this structure, a tidy output variable would be a concentration, with the same standard unit (e.g., g/ml) for all the analytes.

In contrast, your work seems to require different output units for different cases, but conceptually that's not a data frame column (which is a vector). Think of a vector as a potential axis in a plot: all quantities must share the same unit. In the same way as a vector can't store heterogeneous types (e.g., character, integer and double), I think that a vector shouldn't store heterogeneous units.

A list is a different matter though. By definition, it's made to store heterogeneous things. So you could define an S3 method such as the following:

set_units.list <- function(x, value, ..., mode=units_options("set_units_mode")) {
  stopifnot(length(x) == length(value))
  lapply(seq_along(x), function(i) set_units(x[[i]], value[[i]], ..., mode))
}

And then you can coerce the Original_Value as.list and convert everything at once (and the output would be a list, which cannot be set as a column of a data frame). But anyway, you need to first process all the reported units to append the analyte when needed, so I see no advantage on splitting the actual conversion in a second step.

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 11, 2018

I understand the consideration that storing magnitudes and different units together looks more like a list than a vector. My mindset is different in that I consider the underlying object "a vector of magnitudes and units" (both being plural) rather than what I'm interpreting your definition as "a vector of magnitudes with unit X".

A common question required for my use case is "what is everything that was measured for subject X on day Y?" The people monitoring safety and efficacy in the clinical study (including me) need to see all records across all measurements (including different analytes) which will have different units.

The inability to store the vector in a data.frame is a show-stopper for me. (Not a show-stopper in that I wouldn't use units, but I would not be able to store a units vector with the units included-- I would always have to store them as separate columns or I would need to write a separate package to handle vectors of units.) Were I to make the list you suggest, I would have what in my mind would look like un-tidy data because I would have to store most of the data (like subject, day, and analyte) in one structure and the measurement with units in a separate structure and I would then have to work to ensure that they were kept in sync.

@edzer

This comment has been minimized.

Member

edzer commented Jun 11, 2018

List-columns are perfectly tidy, and the way to go with current units setup. You could give them a class (e.g. heterogeneous_units) with e.g. a method that combines them into a vector with single unit.

@Enchufa2 is visiting us the second half of this week to discuss the r-quantities project, maybe we could set up a short call some time Wed afternoon, Thu or Fri morning? Which time zone are you in?

A case I ran into (mentally) earlier is this: if we'd want units to propagate through linear regression calculations, we'd need a matrix X'X (and its inverse) to have units associated with each element (up to its symmetry). (A potentially simpler alternative would be a wrapper function to lm, which back-fits units to coefficients and residuals.)

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 11, 2018

(@billdenney) I understand the consideration that storing magnitudes and different units together looks more like a list than a vector. My mindset is different in that I consider the underlying object "a vector of magnitudes and units" (both being plural) rather than what I'm interpreting your definition as "a vector of magnitudes with unit X".

The thing is that, if we take this worldview, vectors, arrays and matrices cannot be treated as such anymore. Internally, everything must be operated value by value, which is a huge penalty for all possible use cases. With the current worldview, most use cases benefit from the speed of operating with vectors, arrays and matrices internally, and edge use cases, like the one you describe, may found alternate approaches by handling vectors value by value.

(@edzer) You could give them a class (e.g. heterogeneous_units) with e.g. a method that combines them into a vector with single unit.

This is a more interesting approach I think if we can implement such a wrapper around the current worldview. But the latter should still be the base one by default IMHO.

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 11, 2018

I'm happy to chat about this! (I just sent an email to start coordinating offline; let me know if you didn't receive it.)

I love the idea of a heterogenous_units class. Initially, it would need relatively few operations-- mainly assignment and subsetting. Then, if assigned or subset to a single unit, it could (automatically?) be promoted to the more usable, current, default class. Longer-term, I could imagine it doing the subsetting for the user to enable the required operations.

Separately, I will very often run into the issue with matrix operations as I commonly run code fitting linear and nonlinear models to data with units. Some of my models also fit heterogeneous units simultaneously (for example, related lab tests). I'd be curious what performance penalty would be incurred by fitting with a units method (would it interfere with the underlying matrix operations that often occur outside of R). I've assumed that I will have to manually manage the units in these situations because it will quickly get complex.

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 11, 2018

I didn't receive any email, BTW.

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 11, 2018

One more detailed thought on heterogeneous_units: Perhaps the current class name units becomes the parent class to homogeneous_units and heterogeneous_units; the current class becomes homogeneous_units. Functions, when possible, are still defined on the parent class of units to simplify the auto-promotion of heterogeneous_units to homogenous_units.

The initial complexity that comes to mind is that when subset-assigning to homogenous_units they may need to be changed to heterogeneous_units if the new units differ from the old. (I'm sure that there will be more.)

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 13, 2018

The chat was great, thank you!

Here are my thoughts as code and with a simple example file:

library(units)
library(dplyr)

invented_data <-
  expand.grid(
    Analyte=c("glucose", "insulin"),
    Units=c("mmol/L", "mg/dL"),
    stringsAsFactors=FALSE) %>%
  mutate(Value=seq_len(n()))

##### Feature 1
# This would make a heterogeneous_units object since value is not a scalar
# (It doesn't work currently with version 0.6.0)
invented_data$Heterogeneous_Units <-
  set_units(x=invented_data$Value, value=invented_data$Units)

#Similarly
invented_data$Heterogeneous_Units <-
  invented_data$Value
units(invented_data$Heterogeneous_Units) <- invented_data$Units

# Similarly, make_units would work similarly

# This is just so that we have an object that will look more like the final
# object.  It is not a feature request.
invented_data$Heterogeneous_Units <-
  lapply(seq_len(nrow(invented_data)),
         FUN=function(idx) {
           current_value <- invented_data$Value[idx]
           units(current_value) <- invented_data$Units[idx]
           current_value
         })

##### Feature 2
# This would make a (homogeneous) units object since value is all the same
# (It doesn't work currently with version 0.6.0)
set_units(x=invented_data$Value[1:2], value=invented_data$Units[1:2])

##### Feature 3
# This would simplify heterogeneous_units to (homogeneous) units.  The
# call to as_units would require that the heterogeneous_units all have the same
# units.  (as_units doesn't work currently with version 0.6.0)
as_units(invented_data$Heterogeneous_Units[1:2])

##### Feature 4
# Optional, but I think beneficial:  Subsetting heterogeneous_units when they
# all have the same units should convert to (homogeneous) units.  That way, a
# common use case for me would work: subset heterogeneous_units to a set of
# (homogeneous) units and then perform operations on the (homogeneous) units.
invented_data$Heterogeneous_Units[1:2]

##### Feature 5
# This would generate a heterogeneous_units object (with a message) if the units
# differ or a (homogeneous) units object if the units are the same.  The latter
# is the current behavior.
foo1 <- 1
units(foo1) <- "g"
foo2 <- 2
units(foo2) <- "mole"
c(foo1, foo2)

##### Feature 6
# The `units` function would return the vector of all units for
# heterogeneous_units.  (It could look like the result of the lapply.)
units(invented_data$Heterogeneous_Units)
lapply(invented_data$Heterogeneous_Units, FUN=units)

##### Feature 7
# drop_units would just get the numeric part out
drop_units(invented_data$Heterogeneous_Units)
sapply(invented_data$Heterogeneous_Units, FUN=drop_units)

##### Feature 8
# Math.units and Opts.units would work as though done with lapply

##### Feature 9
# The following functions would not work with heterogeneous_units (no need to
# make the function)
#
# hist.units
# make_unit_label
# seq.units
# 

##### Feature 10
# I don't know what to do (if anything) with the following functions
# tibble.units
# type_sum.units
# pillar_shaft.units

invented_data.txt

edzer added a commit that referenced this issue Jun 14, 2018

@edzer

This comment has been minimized.

Member

edzer commented Jun 14, 2018

So far, this gives me the following output; comments welcome.

library(units)
# udunits system database from /usr/share/xml/udunits
library(dplyr)

# Attaching package: ‘dplyr’

# The following objects are masked from ‘package:stats’:

#     filter, lag

# The following objects are masked from ‘package:base’:

#     intersect, setdiff, setequal, union


invented_data <-
  expand.grid(
    Analyte=c("glucose", "insulin"),
    Units=c("mmol/L", "mg/dL"),
    stringsAsFactors=FALSE) %>%
  mutate(Value=seq_len(n()))


##### Feature 1
# This would make a heterogeneous_units object since value is not a scalar
# (It doesn't work currently with version 0.6.0)
invented_data$Heterogeneous_Units <-
  mixed_units(invented_data$Value, invented_data$Units)

#Similarly
invented_data$Heterogeneous_Units <- invented_data$Value

mixed_units(invented_data$Heterogeneous_Units) <- invented_data$Units

invented_data$mixed_units = mixed_units(invented_data$Value, invented_data$Units)
invented_data # prints "nicely", using format.mixed_units
#   Analyte  Units Value Heterogeneous_Units mixed_units
# 1 glucose mmol/L     1            1 mmol/L    1 mmol/L
# 2 insulin mmol/L     2            2 mmol/L    2 mmol/L
# 3 glucose  mg/dL     3             3 mg/dL     3 mg/dL
# 4 insulin  mg/dL     4             4 mg/dL     4 mg/dL

##### Feature 2
# This would make a (homogeneous) units object since value is all the same
# (It doesn't work currently with version 0.6.0)
#set_units(x=invented_data$Value[1:2], value=invented_data$Units[1:2])

# EJP: no, use set_units with a single unit to convert to (if needed):
# this was already possible:
set_units(x=invented_data$Value[1:2], value=invented_data$Units[1], mode = "standard")
# Units: mmol/L
# [1] 1 2

# this is now also possible:
try(set_units(invented_data$mixed_units, "mol/L")) # breaks
# Error : cannot convert mg/dL into mol/L
set_units(invented_data$mixed_units[1:2], "mol/L")
# Units: mol/L
# [1] 0.001 0.002

##### Feature 3
# This would simplify heterogeneous_units to (homogeneous) units.  The
# call to as_units would require that the heterogeneous_units all have the same
# units.  (as_units doesn't work currently with version 0.6.0)
as_units(invented_data$Heterogeneous_Units[1:2])
# Units: mmol/L
# [1] 1 2
# EP: OK

##### Feature 4
# Optional, but I think beneficial:  Subsetting heterogeneous_units when they
# all have the same units should convert to (homogeneous) units.  That way, a
# common use case for me would work: subset heterogeneous_units to a set of
# (homogeneous) units and then perform operations on the (homogeneous) units.
invented_data$Heterogeneous_Units[1:2]
# [[1]]
# 1 mmol/L

# [[2]]
# 2 mmol/L

# attr(,"class")
# [1] "mixed_units"
# EP: disagree here, subsetting should not modify type

##### Feature 5
# This would generate a heterogeneous_units object (with a message) if the units
# differ or a (homogeneous) units object if the units are the same.  The latter
# is the current behavior.
foo1 <- 1
units(foo1) <- "g"
foo2 <- 2
units(foo2) <- "mole"
try(c(foo1, foo2))
# Error in c.units(foo1, foo2) : 
#   units are not convertible, and cannot be mixed; try setting units_options(allow_mixed = TRUE)?
c(foo1, foo2, allow_mixed = TRUE)
# [[1]]
# 1 g

# [[2]]
# 2 mol

# attr(,"class")
# [1] "mixed_units"
# or:
units_options(allow_mixed = TRUE)
c(foo1, foo2)
# [[1]]
# 1 g

# [[2]]
# 2 mol

# attr(,"class")
# [1] "mixed_units"

##### Feature 6
# The `units` function would return the vector of all units for
# heterogeneous_units.  (It could look like the result of the lapply.)
# EP: it has to return a classed list:
units(invented_data$mixed_units)
# [[1]]
# $numerator
# [1] "mmol"

# $denominator
# [1] "L"

# attr(,"class")
# [1] "symbolic_units"

# [[2]]
# $numerator
# [1] "mmol"

# $denominator
# [1] "L"

# attr(,"class")
# [1] "symbolic_units"

# [[3]]
# $numerator
# [1] "mg"

# $denominator
# [1] "dL"

# attr(,"class")
# [1] "symbolic_units"

# [[4]]
# $numerator
# [1] "mg"

# $denominator
# [1] "dL"

# attr(,"class")
# [1] "symbolic_units"

# attr(,"class")
# [1] "mixed_symbolic_units"

# so that this works:
as.character(units(invented_data$mixed_units))
# [1] "mmol/L" "mmol/L" "mg/dL"  "mg/dL" 

##### Feature 7
# drop_units would just get the numeric part out
drop_units(invented_data$mixed_units)
# [1] 1 2 3 4

##### Feature 8
# Math.units and Opts.units would work as though done with lapply

# two examples:
try(invented_data$mixed_units == set_units(1, mol/L))
# [1]  TRUE FALSE FALSE FALSE
# Warning message:
# In doTryCatch(return(expr), name, parentenv, handler) :
#   Incompatible methods ("Ops.mixed_units", "Ops.units") for "=="
try(invented_data$mixed_units * set_units(10, mol/L))
# Error in invented_data$mixed_units * set_units(10, mol/L) : 
#   non-numeric argument to binary operator
# In addition: Warning message:
# In doTryCatch(return(expr), name, parentenv, handler) :
#   Incompatible methods ("Ops.mixed_units", "Ops.units") for "*"
invented_data$mixed_units == invented_data$mixed_units
# [1] TRUE TRUE TRUE TRUE
invented_data$mixed_units * invented_data$mixed_units
# [[1]]
# 1 mmol^2/L^2

# [[2]]
# 4 mmol^2/L^2

# [[3]]
# 9 mg^2/dL^2

# [[4]]
# 16 mg^2/dL^2

# attr(,"class")
# [1] "mixed_units"

##### Feature 10
# I don't know what to do (if anything) with the following functions
# tibble.units
# type_sum.units
# pillar_shaft.units

# EP: should print units inside each record, but no unit in column header
@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 14, 2018

This is wonderful! Thank you!

It's reasonable that subsetting doesn't change type; I'm OK with that. I'm curious about subsetting to a scalar though. Does it seem reasonable to make invented_data$mixed_units[1] return a units value rather than a mixed_units value or should the user always be required to convert it to units explicitly? (I'm fine either way on the answer.)

Could set_units have an argument allow_mixed added so that it could return either type?

If you don't think that set_units should have allow_mixed, it would be helpful if I could do the unit conversion from one mixed units to another like mixed_units(invented_data$mixed_units, rep(c("mol/L", "g/L"), each=2)) analogous to how set_units can do the unit conversion. And, similarly, it would help if I could do it the other way analogous to units: mixed_units(invented_data$mixed_units) <- rep(c("mol/L", "g/L"), each=2)

I like the way you chose to implement c(..., allow_mixed); that is a much better solution than automatically creating mixed units like I'd suggested.

The only thing that I'd hope to change is that invented_data$mixed_units * set_units(10, mol/L) would work (along with division). In my use case, I often need to normalize two types of values with units to the dose of a drug. So, I would have a case very similar to the example here that does not succeed.

edzer added a commit that referenced this issue Jun 15, 2018

@edzer

This comment has been minimized.

Member

edzer commented Jun 15, 2018

Test set:

library(units)
# udunits system database from /usr/share/xml/udunits
units_options(allow_mixed = TRUE)

library(magrittr)
(m = c(set_units(1:3, km), set_units(4:6, g)))
# Mixed units: g (3), km (3) 
# 1 km, 2 km, 3 km, 4 g, 5 g, 6 g 

# select a subset
m[3:4]
# Mixed units: g (1), km (1) 
# 3 km, 4 g 
m[3]
# Mixed units: km (1) 
# 3 km 

# select a single units object:
m[[3]]
# 3 km

(m %>% set_units(c(rep(c("m", "kg"), each = 3))))
# Mixed units: kg (3), m (3) 
# 1000 m, 2000 m, 3000 m, 0.004 kg, 0.005 kg, 0.006 kg 
# alternatively:
units(m) = rep(c("mm", "mg"), each = 3)
m
# Mixed units: mg (3), mm (3) 
# 1e+06 mm, 2e+06 mm, 3e+06 mm, 4000 mg, 5000 mg, 6000 mg 
# does the value get recycled?
m[1:3] %>% set_units("m")
# Mixed units: m (3) 
# 1000 m, 2000 m, 3000 m 

# convert to units:
m[1:3] %>% set_units("m") %>% as_units()
# Units: m
# [1] 1000 2000 3000

# round-trip via units:
m[1:3] %>% set_units("m") %>% as_units() %>% mixed_units()
# Mixed units: m (3) 
# 1000 m, 2000 m, 3000 m 

# Ops using by single unit: needs to be explicitly coerced to mixed_units:
m[1:3] * mixed_units(set_units(1, mm))
# Mixed units: mm^2 (3) 
# 1e+06 mm^2, 2e+06 mm^2, 3e+06 mm^2 
m[1:3] / mixed_units(set_units(1, mm))
# Mixed units: 1 (3) 
# 1e+06 1, 2e+06 1, 3e+06 1 
m[1:3] + mixed_units(set_units(1, mm))
# Mixed units: mm (3) 
# 1000001 mm, 2000001 mm, 3000001 mm 
m[1:3] - mixed_units(set_units(1, mm))
# Mixed units: mm (3) 
# 999999 mm, 1999999 mm, 2999999 mm 
try(m[1:3] ^ mixed_units(set_units(1, mm)))
# Error in Ops.mixed_units(m[1:3], mixed_units(set_units(1, mm))) : 
#   operation ^ not supported

# this breaks -- seems to be an S3 limitation:
try(m[1:3] * set_units(1, mm))
# Error in m[1:3] * set_units(1, mm) : 
#   non-numeric argument to binary operator
# In addition: Warning message:
# In doTryCatch(return(expr), name, parentenv, handler) :
#   Incompatible methods ("Ops.mixed_units", "Ops.units") for "*"
@edzer

This comment has been minimized.

Member

edzer commented Jun 15, 2018

> library(tibble)
> (tb = tibble(a = 1:6, mx_unit = m, units=set_units(6:1, m*mm*kg*km*g/h^2)))
# A tibble: 6 x 3
      a       mx_unit            units
  <int> <mixed_units>   g*kg*km*m*mm/h
1     1      1e+06 mm                6
2     2      2e+06 mm                5
3     3      3e+06 mm                4
4     4       4000 mg                3
5     5       5000 mg                2
6     6       6000 mg                1
> tb %>% data.frame
  a  mx_unit              units
1 1 1e+06 mm 6 g*kg*km*m*mm/h^2
2 2 2e+06 mm 5 g*kg*km*m*mm/h^2
3 3 3e+06 mm 4 g*kg*km*m*mm/h^2
4 4  4000 mg 3 g*kg*km*m*mm/h^2
5 5  5000 mg 2 g*kg*km*m*mm/h^2
6 6  6000 mg 1 g*kg*km*m*mm/h^2
@edzer

This comment has been minimized.

Member

edzer commented Jun 15, 2018

(in 909ae0e)

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 15, 2018

These look great! I'll try to test (and more importantly, think in detail about it) in the next couple of days.

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 15, 2018

This is really great. BTW, the type_sum slug for the units column is missing an h^-1, right?

@edzer

This comment has been minimized.

Member

edzer commented Jun 15, 2018

Yes, reported here: r-lib/pillar#73

@edzer

This comment has been minimized.

Member

edzer commented Jun 22, 2018

@billdenney did you have a chance to take a look at the mixed units and the new base units?

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 22, 2018

Some of the things that you appeared to test in your code above don't appear to work for me. Also, I was surprised that baz was automatically converted to regular units and not mixed. It's not a problem, but a message to the user may help clarify what is happening behind the scenes when this occurs.

library(units)                                                   
#> udunits system database from C:/Users/Bill Denney/Documents/R/win-library/3.4/units/share/udunits
                                                                 
units_options(allow_mixed=TRUE)                                  
foo <- 1:2                                                       
units(foo) <- "ng"                                               
bar <- 2:5                                                       
units(bar) <- "mg"                                               
baz <- c(foo, bar)                                               
class(baz)                                                       
#> [1] "units"
mixed_units(baz)                                                 
#> Mixed units: ng (6) 
#> 1 ng, 2 ng, 2e+06 ng, 3e+06 ng, 4e+06 ng, 5e+06 ng
                                                                 
units(baz) <- rep(c("mg", "lb"), each=3)                         
#> Error: length(x) == 1 is not TRUE
set_units(baz, value=rep(c("mg", "lb"), each=3), mode="standard")
#> Error: length(x) == 1 is not TRUE

Can you help me understand what I'm doing wrong?

edzer added a commit that referenced this issue Jun 23, 2018

@edzer

This comment has been minimized.

Member

edzer commented Jun 23, 2018

So far, units<-.units would not convert to mixed_units; now it does.

@edzer

This comment has been minimized.

Member

edzer commented Jun 28, 2018

OK to merge this into master?

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 28, 2018

Sounds good to me. Remember to merge master into mixed first, because it seems that there are some conflicts now.

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 28, 2018

Sorry for my delayed responses. This looks great!

As a minor note, is it feasible to give a more informative error message when trying to operate on mixed_units:

library(units)                                                                                      
#> udunits system database from C:/Users/Bill Denney/Documents/R/win-library/3.4/units/share/udunits
#> udunits system database from C:/Users/Bill Denney/Documents/R/win-library/3.4/units/share/udunits
                                                                                                    
units_options(allow_mixed=TRUE)                                                                     
foo <- 1:2                                                                                          
units(foo) <- "ng"                                                                                  
bar <- 2:5                                                                                          
units(bar) <- "mg"                                                                                  
baz <- c(foo, bar)                                                                                  
class(baz)                                                                                          
#> [1] "units"
#> [1] "units"                                                                                      
mixed_units(baz)                                                                                    
#> Mixed units: ng (6) 
#> 1 ng, 2 ng, 2e+06 ng, 3e+06 ng, 4e+06 ng, 5e+06 ng
#> Mixed units: ng (6)                                                                              
#> 1 ng, 2 ng, 2e+06 ng, 3e+06 ng, 4e+06 ng, 5e+06 ng                                               
                                                                                                    
units(baz) <- rep(c("mg", "lb"), each=3)                                                            
#> Error: length(x) == 1 is not TRUE                                                                
set_units(baz, value=rep(c("mg", "lb"), each=3), mode="standard")                                   
#> Mixed units: lb (3), mg (3) 
#> 1e-06 mg, 2e-06 mg, 2 mg, 6.613868e-06 lb, 8.81849e-06 lb, 1.102311e-05 lb
#> Error: length(x) == 1 is not TRUE                                                                
                                                                                                    
baz + foo[1]                                                                                        
#> Warning: Incompatible methods ("Ops.mixed_units", "Ops.units") for "+"
#> Error in baz + foo[1]: non-numeric argument to binary operator
@edzer

This comment has been minimized.

Member

edzer commented Jun 28, 2018

Do note e94aa57; I'm seeing this:

library(units)
# udunits system database from /usr/share/xml/udunits
units_options(allow_mixed=TRUE)
foo <- 1:2
units(foo) <- "ng"
bar <- 2:5
units(bar) <- "mg"
baz <- c(foo, bar)
class(baz)
# [1] "units"
mixed_units(baz)
# Mixed units: ng (6) 
# 1 ng, 2 ng, 2e+06 ng, 3e+06 ng, 4e+06 ng, 5e+06 ng 
units(baz) <- rep(c("mg", "lb"), each=3)
set_units(baz, value=rep(c("mg", "lb"), each=3), mode="standard")
# Mixed units: lb (3), mg (3) 
# 1e-06 mg, 2e-06 mg, 2 mg, 6.613868e-06 lb, 8.81849e-06 lb, 1.102311e-05 lb 
try(baz + foo[1])
# Error in baz + foo[1] : non-numeric argument to binary operator
# In addition: Warning message:
# In doTryCatch(return(expr), name, parentenv, handler) :
#   Incompatible methods ("Ops.mixed_units", "Ops.units") for "+"
baz + mixed_units(foo[1])
# Mixed units: lb (3), mg (3) 
# 2e-06 mg, 3e-06 mg, 2.000001 mg, 6.61387e-06 lb, 8.818493e-06 lb, 1.102312e-05 lb 

I don't think we can cope with this error without changing to S4.

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 28, 2018

There is a way to cope with this within S3: adding a parent class to everything, so that "units" are c("something", "units") and "mixed_units" are c("something", "mixed_units"). Then, you just define Ops.something.

@Enchufa2

This comment has been minimized.

Member

Enchufa2 commented Jun 28, 2018

I'm thinking that what probably would make more sense is to have c("units", "same_units") and c("units", "mixed_units"), instead of "units" and "mixed_units". But I can't tell now whether this could break things downstream.

@billdenney

This comment has been minimized.

Contributor

billdenney commented Jun 29, 2018

It looks good to me. Thank you for all your work on it!

@edzer

This comment has been minimized.

Member

edzer commented Jun 29, 2018

Thanks! We'll keep @Enchufa2 's solution in mind.

@edzer edzer closed this Jun 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment