Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop shadow_shift method for factors - perhaps add another level (smaller than smallest)) #3

Closed
njtierney opened this issue Dec 14, 2015 · 4 comments

Comments

@njtierney
Copy link
Owner

No description provided.

@njtierney
Copy link
Owner Author

library(narnia)
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats

brfss %>%
  add_shadow_shift("STOPSMK2") %>%
  select(STOPSMK2, STOPSMK2_shift)
#> # A tibble: 245 x 2
#>    STOPSMK2 STOPSMK2_shift
#>      <fctr>         <fctr>
#>  1       NA        missing
#>  2       NA        missing
#>  3       NA        missing
#>  4       NA        missing
#>  5      Yes            Yes
#>  6       NA        missing
#>  7       NA        missing
#>  8       NA        missing
#>  9      Yes            Yes
#> 10       NA        missing
#> # ... with 235 more rows

Need to think about:

  • visualisations for this
  • standardised way to code this value in a factor - it should be one smaller than the smallest value

@njtierney
Copy link
Owner Author

I think that I am OK with this producing values that are larger than "present"

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
library(narnia)

brfss %>%
  add_shadow_shift("STOPSMK2") %>%
  select(STOPSMK2, STOPSMK2_shift) %>%
  mutate(smk_lvl = as.numeric(STOPSMK2),
         smk_lvl_2 = as.numeric(STOPSMK2_shift))
#> # A tibble: 245 x 4
#>    STOPSMK2 STOPSMK2_shift smk_lvl smk_lvl_2
#>      <fctr>         <fctr>   <dbl>     <dbl>
#>  1       NA        missing      NA         3
#>  2       NA        missing      NA         3
#>  3       NA        missing      NA         3
#>  4       NA        missing      NA         3
#>  5      Yes            Yes       1         1
#>  6       NA        missing      NA         3
#>  7       NA        missing      NA         3
#>  8       NA        missing      NA         3
#>  9      Yes            Yes       1         1
#> 10       NA        missing      NA         3
#> # ... with 235 more rows

@njtierney njtierney added this to To Do in CRAN Version 0.1.0 Jun 21, 2017
@njtierney njtierney moved this from To Do to Priority in CRAN Version 0.1.0 Jun 21, 2017
@njtierney
Copy link
Owner Author

Force NA category to be higher value for factors

@njtierney njtierney added V0.2.0 and removed V0.1.0 labels Aug 3, 2017
@njtierney njtierney removed this from Priority in CRAN Version 0.1.0 Aug 3, 2017
@njtierney njtierney added this to To Do in CRAN Version 0.2.0 Aug 7, 2017
@njtierney njtierney moved this from To Do to Priority in CRAN Version 0.2.0 Jan 8, 2018
@njtierney njtierney moved this from Priority to To Do in CRAN Version 0.2.0 Jan 9, 2018
@njtierney njtierney moved this from To Do to Priority in CRAN Version 0.2.0 Jan 19, 2018
@njtierney njtierney moved this from Priority to In Progress in CRAN Version 0.2.0 Jan 26, 2018
@njtierney
Copy link
Owner Author

Factors with NA get plotted, I currently don't see a problem with the current behaviour of shadow_shift.factor -

library(naniar)
library(tidyverse)
#> ── Attaching packages ────────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.4.1     ✔ dplyr   0.7.4
#> ✔ tidyr   0.7.2     ✔ stringr 1.2.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> ── Conflicts ───────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

riskfactors %>% 
  cast_shadow(smoke_days) %>%
  add_shadow_shift(smoke_days)
#> # A tibble: 245 x 3
#>    smoke_days smoke_days_NA smoke_days_shift
#>    <fct>      <fct>         <fct>           
#>  1 <NA>       NA            missing         
#>  2 <NA>       NA            missing         
#>  3 <NA>       NA            missing         
#>  4 <NA>       NA            missing         
#>  5 Everyday   !NA           Everyday        
#>  6 <NA>       NA            missing         
#>  7 <NA>       NA            missing         
#>  8 Not@All    !NA           Not@All         
#>  9 Everyday   !NA           Everyday        
#> 10 Not@All    !NA           Not@All         
#> # ... with 235 more rows

riskfactors %>%
  ggplot(aes(x = smoke_days,
             y = bmi)) + 
  geom_point()
#> Warning: Removed 11 rows containing missing values (geom_point).

riskfactors %>%
  ggplot(aes(x = smoke_days,
             y = bmi)) + 
  geom_miss_point()

riskfactors %>%
  ggplot(aes(x = smoke_days,
             y = activity_limited)) + 
  geom_point()

riskfactors %>%
  ggplot(aes(x = smoke_days,
             y = activity_limited)) + 
  geom_miss_point()

I'm closing this for the moment, but I will reopen it if need be (any thoughts @dicook?)

@njtierney njtierney moved this from In Progress to Done in CRAN Version 0.2.0 Jan 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

1 participant