Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete predictor fields cause NAs in prediction #19

Closed
sebastian-fox opened this issue Jul 9, 2020 · 1 comment
Closed

Incomplete predictor fields cause NAs in prediction #19

sebastian-fox opened this issue Jul 9, 2020 · 1 comment

Comments

@sebastian-fox
Copy link

Hi there,

I'm not sure if this is intended behaviour, but when I'm trying to predict a field I only get predictions when all the other fields for that record are complete (when using fill_NA or fill_NA_N). Here's my reproducible example. I'm expecting air_miss$Solar.R_imp[5] not to be NA. This gets filled when using naive_fill_NA() but your documentation suggests not to use that function:

library(miceFast)
library(data.table)
library(dplyr)

data(air_miss)

air_miss <- air_miss %>% 
  select(Ozone:Temp) %>% 
  head(10)


air_miss[, Solar.R_imp := fill_NA(.SD,
                                  model = "lm_bayes",
                                  posit_y = "Solar.R",
                                  posit_x = c("Ozone", "Wind", "Temp"))]

print(air_miss)

>     Ozone Solar.R Wind Temp Solar.R_imp
>  1:    41     190  7.4   67      190.00
>  2:    36     118  8.0   72      118.00
>  3:    12     149 12.6   74      149.00
>  4:    18     313 11.5   62      313.00
>  5:    NA      NA 14.3   56          NA
>  6:    28      NA 14.9   66    -1187.08
>  7:    23     299  8.6   65      299.00
>  8:    19      99 13.8   59       99.00
>  9:     8      19 20.1   61       19.00
> 10:    NA     194  8.6   69      194.00

naive_fill_NA(air_miss)

>        Ozone  Solar.R Wind Temp Solar.R_imp
>  1: 41.00000 190.0000  7.4   67    190.0000
>  2: 36.00000 118.0000  8.0   72    118.0000
>  3: 12.00000 149.0000 12.6   74    149.0000
>  4: 18.00000 313.0000 11.5   62    313.0000
>  5: 15.28918 144.9681 14.3   56    312.1653
>  6: 28.00000 501.6784 14.9   66  -1187.0801
>  7: 23.00000 299.0000  8.6   65    299.0000
>  8: 19.00000  99.0000 13.8   59     99.0000
>  9:  8.00000  19.0000 20.1   61     19.0000
> 10: 21.29695 194.0000  8.6   69    194.0000

Here's my session info:

- Session info -------------------------------------------------------------------------------
 setting  value                       
 version  R version 4.0.2 (2020-06-22)
 os       Windows 10 x64              
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United Kingdom.1252 
 ctype    English_United Kingdom.1252 
 tz       Europe/London               
 date     2020-07-09                  

- Packages -----------------------------------------------------------------------------------
 package     * version date       lib source        
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
 cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
 codetools     0.2-16  2018-12-24 [1] CRAN (R 4.0.2)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
 data.table  * 1.12.8  2019-12-09 [1] CRAN (R 4.0.0)
 dplyr       * 1.0.0   2020-05-29 [1] CRAN (R 4.0.0)
 ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
 fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
 generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)
 glue          1.4.1   2020-05-13 [1] CRAN (R 4.0.0)
 lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
 magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)
 miceFast    * 0.6.1   2020-07-06 [1] CRAN (R 4.0.2)
 pillar        1.4.4   2020-05-05 [1] CRAN (R 4.0.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
 purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
 R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
 Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.2)
 rlang         0.4.6   2020-05-02 [1] CRAN (R 4.0.0)
 rstudioapi    0.11    2020-02-07 [1] CRAN (R 4.0.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.2)
 tibble        3.0.1   2020-04-20 [1] CRAN (R 4.0.0)
 tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
 vctrs         0.3.1   2020-06-05 [1] CRAN (R 4.0.0)
 withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.0)

Any help would be great.
Thank you

@sebastian-fox
Copy link
Author

Not sure why, but this duplicated when I submitted it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant