-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
look_for slow with big files #77
Comments
Look_for has been redesigned in labelled 2.6.0 What are you trying to do which is not working with the new version? |
Could you provide a reproductive example? Are all your packages up to date? |
Apologies for the uninformative issue! Let me try again. I am using Windows 10 with: Under both labelled 2.5.0 & 2.7.0 the following minimal example works: library(tidyverse)
library(labelled)
# create data ----
ex_data <- tibble(
id = 1:10,
ctry = c(rep(1, 5), rep(2, 5)),
cdy = rep(c(1, 3), 5),
text = LETTERS[1:10]
) %>%
mutate(
ctry = factor(ctry, labels = c("US", "UK")),
cdy = haven::labelled(
cdy,
labels = c("MMs" = 1, "Skittles" = 3),
label = "Preferred Candy")
)
# create dictionary ----
dictionary <- labelled::look_for(ex_data, details = TRUE)
# view dictionary ----
dictionary
#> variable label class type levels
#> 1 id <NA> integer integer
#> 2 ctry <NA> factor integer US; UK
#> 3 cdy Preferred Candy haven_labelled, vctrs_vctr, double double
#> 4 text <NA> character character
#> value_labels unique_values n_na na_values na_range
#> 1 10 0
#> 2 2 0
#> 3 [1] MMs; [3] Skittles 2 0
#> 4 10 0 Created on 2020-12-19 by the reprex package (v0.3.0) When I try to create the dictionary by the same method for a larger data set, the dictionary works under 2.5.0, but under 2.7.0 the command never finishes (no error message or warning, R is just running forever). The data that I am using is sadc_2017_national.sav. As the command never finishes, I was not able to reprex this one, but here is the code I was using.
Please let me know if there is anything else I can try on my end to help trouble shoot. Thank you! |
Sorry for not having response earlier. I explore quickly when the problem happens.
Your dataset is very big (lot of variables and of observations). The feature creating a problem is computing "range" of the different variables. Which is very time consuming. I need to explore further, maybe with an option to desactivate that part of the computation. |
See #79 for a proposition of evolution of look_for() Now, by default ( If you want full details (as before), indicate |
@shannonpileggi You can test it with Do not hesitate to provide me feedback |
* by default, computes only basic details with look_for() fix #77 * update examples * simpler code
@larmarange thank you for taking a look, identifying the problem, and proposing solutions! I did install the dev version and I have a bit of feedback.
I propose that the default version also include the In addition, I also attempted this by specifying the
Am I using the argument as intended? Thank you for all of your work on this package! |
Dear @shannonpileggi it seems that you do not have the dev version installed. Have tried > library(labelled)
> library(questionr)
> data(fertility)
> look_for(children)
pos variable label col_type values
<chr> <chr> <chr> <chr> <chr>
1 id_child Child Id dbl
2 id_woman Mother Id dbl
3 date_of_birth Date of birth date
4 sex Sex dbl+lbl [1] male
[2] female
5 alive Still alive? dbl+lbl [0] no, dead
[1] yes, alive
6 age_at_death Age at death (in months) dbl
> look_for(children, details = "basic")
pos variable label col_type values
<chr> <chr> <chr> <chr> <chr>
1 id_child Child Id dbl
2 id_woman Mother Id dbl
3 date_of_birth Date of birth date
4 sex Sex dbl+lbl [1] male
[2] female
5 alive Still alive? dbl+lbl [0] no, dead
[1] yes, alive
6 age_at_death Age at death (in months) dbl
> look_for(children, details = "full")
pos variable label col_type values
<chr> <chr> <chr> <chr> <chr>
1 id_child Child Id dbl range: 1 - 1584
2 id_woman Mother Id dbl range: 1 - 2000
3 date_of_birth Date of birth date range: 2007-01-03 - 2012-04-15
4 sex Sex dbl+lbl [1] male
[2] female
5 alive Still alive? dbl+lbl [0] no, dead
[1] yes, alive
6 age_at_death Age at death (in months) dbl range: 0 - 48 As you can see, value labels (and factor levels) are returned by default |
library(labelled)
library(questionr)
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data(fertility)
look_for(children) %>% glimpse()
#> Rows: 6
#> Columns: 6
#> $ pos <int> 1, 2, 3, 4, 5, 6
#> $ variable <chr> "id_child", "id_woman", "date_of_birth", "sex", "alive...
#> $ label <chr> "Child Id", "Mother Id", "Date of birth", "Sex", "Stil...
#> $ col_type <chr> "dbl", "dbl", "date", "dbl+lbl", "dbl+lbl", "dbl"
#> $ levels <named list> [NULL, NULL, NULL, NULL, NULL, NULL]
#> $ value_labels <named list> [NULL, NULL, NULL, <1, 2>, <0, 1>, NULL]
look_for(children, details = "none") %>% glimpse()
#> Rows: 6
#> Columns: 3
#> $ pos <int> 1, 2, 3, 4, 5, 6
#> $ variable <chr> "id_child", "id_woman", "date_of_birth", "sex", "alive", "...
#> $ label <chr> "Child Id", "Mother Id", "Date of birth", "Sex", "Still al...
look_for(children, details = "full") %>% glimpse()
#> Rows: 6
#> Columns: 13
#> $ pos <int> 1, 2, 3, 4, 5, 6
#> $ variable <chr> "id_child", "id_woman", "date_of_birth", "sex", "aliv...
#> $ label <chr> "Child Id", "Mother Id", "Date of birth", "Sex", "Sti...
#> $ col_type <chr> "dbl", "dbl", "date", "dbl+lbl", "dbl+lbl", "dbl"
#> $ levels <named list> [NULL, NULL, NULL, NULL, NULL, NULL]
#> $ value_labels <named list> [NULL, NULL, NULL, <1, 2>, <0, 1>, NULL]
#> $ class <named list> ["numeric", "numeric", "Date", <"haven_labelle...
#> $ type <chr> "double", "double", "double", "double", "double", "do...
#> $ na_values <named list> [NULL, NULL, NULL, NULL, NULL, NULL]
#> $ na_range <named list> [NULL, NULL, NULL, NULL, NULL, NULL]
#> $ unique_values <int> 1584, 1090, 1038, 2, 2, 22
#> $ n_na <int> 0, 0, 0, 0, 0, 1442
#> $ range <named list> [<1, 1584>, <1, 2000>, <2007-01-03, 2012-04-15... Created on 2021-01-16 by the reprex package (v0.3.0) |
Ah, thank you and sorry for the confusion! My output does match yours, now. :) library(labelled)
library(questionr)
data(fertility)
look_for(children, details = "none")
#> pos variable label
#> <int> <chr> <chr>
#> 1 id_child Child Id
#> 2 id_woman Mother Id
#> 3 date_of_birth Date of birth
#> 4 sex Sex
#> 5 alive Still alive?
#> 6 age_at_death Age at death (in months)
look_for(children, details = "full")
#> pos variable label col_type values
#> <chr> <chr> <chr> <chr> <chr>
#> 1 id_child Child Id dbl range: 1 - 1584
#> 2 id_woman Mother Id dbl range: 1 - 2000
#> 3 date_of_birth Date of birth date range: 2007-01-03 - 2012-04~
#> 4 sex Sex dbl+lbl [1] male
#> <U+200B> <U+200B> <U+200B> <U+200B> [2] female
#> 5 alive Still alive? dbl+lbl [0] no, dead
#> <U+200B> <U+200B> <U+200B> <U+200B> [1] yes, alive
#> 6 age_at_death Age at death (in mont~ dbl range: 0 - 48
look_for(children, details = "basic")
#> pos variable label col_type values
#> <chr> <chr> <chr> <chr> <chr>
#> 1 id_child Child Id dbl <U+200B>
#> 2 id_woman Mother Id dbl <U+200B>
#> 3 date_of_birth Date of birth date <U+200B>
#> 4 sex Sex dbl+lbl [1] male
#> <U+200B> <U+200B> <U+200B> <U+200B> [2] female
#> 5 alive Still alive? dbl+lbl [0] no, dead
#> <U+200B> <U+200B> <U+200B> <U+200B> [1] yes, alive
#> 6 age_at_death Age at death (in months) dbl <U+200B> Created on 2021-01-17 by the reprex package (v0.3.0) Some follow up questions I have are:
I did really like your previous wide output with Thank you for your work on this! |
There is a confusion here between the result returned by However, the tibble returned by library(labelled)
library(questionr)
library(dplyr)
data(fertility)
look_for(children) %>% as_tibble()
#> # A tibble: 6 x 6
#> pos variable label col_type levels value_labels
#> <int> <chr> <chr> <chr> <named lis> <named list>
#> 1 1 id_child Child Id dbl <NULL> <NULL>
#> 2 2 id_woman Mother Id dbl <NULL> <NULL>
#> 3 3 date_of_birth Date of birth date <NULL> <NULL>
#> 4 4 sex Sex dbl+lbl <NULL> <dbl [2]>
#> 5 5 alive Still alive? dbl+lbl <NULL> <dbl [2]>
#> 6 6 age_at_death Age at death (in months) dbl <NULL> <NULL>
look_for(children, details = "none") %>% as_tibble()
#> # A tibble: 6 x 3
#> pos variable label
#> <int> <chr> <chr>
#> 1 1 id_child Child Id
#> 2 2 id_woman Mother Id
#> 3 3 date_of_birth Date of birth
#> 4 4 sex Sex
#> 5 5 alive Still alive?
#> 6 6 age_at_death Age at death (in months)
look_for(children, details = "full") %>% as_tibble()
#> # A tibble: 6 x 13
#> pos variable label col_type levels value_labels class type na_values
#> <int> <chr> <chr> <chr> <name> <named list> <nam> <chr> <named l>
#> 1 1 id_child Chil~ dbl <NULL> <NULL> <chr~ doub~ <NULL>
#> 2 2 id_woman Moth~ dbl <NULL> <NULL> <chr~ doub~ <NULL>
#> 3 3 date_of~ Date~ date <NULL> <NULL> <chr~ doub~ <NULL>
#> 4 4 sex Sex dbl+lbl <NULL> <dbl [2]> <chr~ doub~ <NULL>
#> 5 5 alive Stil~ dbl+lbl <NULL> <dbl [2]> <chr~ doub~ <NULL>
#> 6 6 age_at_~ Age ~ dbl <NULL> <NULL> <chr~ doub~ <NULL>
#> # ... with 4 more variables: na_range <named list>, unique_values <int>,
#> # n_na <int>, range <named list> Created on 2021-01-18 by the reprex package (v0.3.0) You can use two helpers function on the table returned by More information is available in the dedicated vignette: https://larmarange.github.io/labelled/articles/look_for.html#advanced-usages-of-look-for- |
Ok. Thank you again for your thorough responses. I apologize, I think I am still used to the usage in version 2.5, and you have changed a lot! Apologies for not more thoroughly reading your new vignette. However, after reading through the vignette, it is still not clear to me if there is an easy way to replicate the functionality in 2.5, where you can see the metadata in wide rather than long format. I would ideally like to see a quick solution to generate the table shown here, with Again, thank you for your prompt responses and discussion on this matter! And my apologies in advance if this is in your documenation and I have yet again managed to miss it. |
You could use df %>% look_for() %>% convert_list_columns_to_character()
df %>% look_for() %>% convert_list_columns_to_character() %>% View() NB: reinstall the last dev version. I just fixed a small bug. |
I recently updated to labelled 2.7.0 and look_for(data, details = TRUE) hung and never resolved. I reverted back to 2.5.0 to get previous usage. Can you confirm that it is working as intended in 2.7.0?
The text was updated successfully, but these errors were encountered: