Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine two or more specs? #65

Closed
krlmlr opened this issue Jun 11, 2022 · 6 comments · Fixed by #105
Closed

Combine two or more specs? #65

krlmlr opened this issue Jun 11, 2022 · 6 comments · Fixed by #105
Labels

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Jun 11, 2022

Do we need a spec_common() that takes an arbitrary number of spec objects and computes a common spec?

Invariant proposal: spec_guess(x) should be the same as spec_common(!!!map(x, spec_guess)) .

Would that help get a better understanding of #63?

library(tidyverse)
tibblify::guess_spec(list(books = repurrrsive::got_chars[[1]]$books))
#> spec_object(
#>   books = tib_chr_vec("books")
#> )
tibblify::guess_spec(list(books = repurrrsive::got_chars[[10]]$books))
#> spec_object(
#>   books = tib_chr("books")
#> )
tibblify::guess_spec(list(books = repurrrsive::got_chars[[11]]$books))
#> spec_object(
#>   books = tib_unspecified("books")
#> )

# Surprising:
tibblify::guess_spec(map(repurrrsive::got_chars, ~ list(books = .x$books)))
#> spec_df(
#>   books = tib_list("books")
#> )

Created on 2022-06-11 by the reprex package (v2.0.1)

When combining specs, the promotion hierarchy would be clearly unspecified -> chr -> chr_vec.

More broadly: would that enable us to think differently about guessing the spec?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jun 11, 2022

Implementation sketch:

  • if NULL, return tib_unspecified()
  • if vector, return atomic spec based on type and length (1 or not 1)
  • if named list (including a data frame), iterate over all elements
    • if all results are a vector -- column-major, new tib_col() (tib_col() #66), for now fail
    • if all results are objects -- row-major, tib_row()
    • mixed -- would need to construct an example, probably fail?
  • if unnamed list, iterate over a sample (*) of elements
    • if all results are a vector -- bind, new tib_bind() (tib_bind(): Can't guess_spec() for gh_repos #56), for now fail
    • if all results are objects -- combine with spec_common(), create tib_df()
    • mixed -- would need to construct an example, probably fail?

(*) sample: this also allows speeding up the guessing by limiting the maximum number of items to look at. When later a tibblify error occurs, actionable advice would be to increase that threshold.

Does this make any sense at all?

@mgirlich
Copy link
Owner

This reminds me a little bit of vec_ptype() resp. vec_ptype_common() with the difference that here we differentiate between scalars and vectors.

@mgirlich mgirlich added the spec label Jun 13, 2022
@mgirlich
Copy link
Owner

Invariant proposal: spec_guess(x) should be the same as spec_common(!!!map(x, spec_guess)).

  • To make it work for dataframe and list of objects you should replace map() with (a not yet existing) vec_map().
  • If x is an object this probably does not make sense.
  • I am not sure this will work with a matrix row
library(tibble)

y <- list(
  list(x = 1:2),
  list(x = 3:4)
) %>% str()
#> List of 2
#>  $ :List of 1
#>   ..$ x: int [1:2] 1 2
#>  $ :List of 1
#>   ..$ x: int [1:2] 3 4

tibble(x = matrix(1:4, ncol = 2))
#> # A tibble: 2 × 1
#>   x[,1]  [,2]
#>   <int> <int>
#> 1     1     3
#> 2     2     4

Created on 2022-06-15 by the reprex package (v2.0.1)

To make this work we would have to store the size of the vector in the spec (I was already thinking about this). But when guessing the spec for an object one probably does not want to fix the size for a vector in general. So, this should be an option.

Further, I think a matrix row should only be guessed if there are "enough" objects in the object list (say at least 5). So, it might work but this requires some playing around. What do you think?

@mgirlich
Copy link
Owner

I'll have a go at implementing the tib_*() functions on top of vctrs::new_rcrd() and then let's see where vctrs::vec_ptype2() gets us

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jun 16, 2022

This sounds like a good idea, is it worth starting with a small prototype before changing everything?

@mgirlich
Copy link
Owner

After trying to use vctrs::new_rcrd() for the tib_*() objects I don't think this really makes sense. What should the output of vec_ptype2(tib_scalar("a"), tib_scalar("b")) be? To use it in spec_combine() this should probably be an error which contradicts the point of vec_ptype2().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants