-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create other flavours of missing values #50
Comments
Another name for the function is This could have the arguments: replace_na_type(.vars/.cols, # similar to `mutate
.predicate,
.funs) It might also need to follow the format of Alternatively, since this isn't really doing any modification in place, but is instead adding things to the shadow dataframe, it might be more sensible to have different verbs for that process:
|
If you haven't already, you should probably have a look at how haven deals
with flavoursof missing in order to stay compatible with the tidyverse.
Ross
…On 7 Jun 2017 12:39 p.m., "Nicholas Tierney" ***@***.***> wrote:
Another name for the function is replace_na_type
This could have the arguments:
replace_na_type(.vars/.cols, # similar to `mutate
.predicate,
.funs)
It might also need to follow the format of purrr::pmap, where you provide
a named list, which would contain the variables/columns, and the rules for
each of those.
*Alternatively*, since this isn't really doing any modification in place,
but is instead adding things to the shadow dataframe, it might be more
sensible to have different verbs for that process:
- add_shadow or
- mutate_na / mutate_na_type or
- replace_shadow / replace_shadow_type / replace_shadow_with
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#50 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFKJG0mFlty3MWNUquqyahh6AjbGvgC8ks5sBg1_gaJpZM4NvYaq>
.
|
Great suggestion, thanks @rgayler ! :) |
OK, so haven's library(haven)
x <- c(1:5, tagged_na("a"), tagged_na("z"), NA)
# Tagged NA's work identically to regular NAs
x
#> [1] 1 2 3 4 5 NA NA NA
is.na(x)
#> [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
# To see that they're special, you need to use na_tag(),
# is_tagged_na(), or print_tagged_na():
is_tagged_na(x)
#> [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
na_tag(x)
#> [1] NA NA NA NA NA "a" "z" NA
print_tagged_na(x)
#> [1] 1 2 3 4 5 NA(a) NA(z) NA
# You can test for specific tagged NAs with the second argument
is_tagged_na(x, "a")
#> [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
# Because the support for tagged's NAs is somewhat tagged on to R,
# the left-most NA will tend to be preserved in arithmetic operations.
na_tag(tagged_na("a") + tagged_na("z"))
#> [1] "a" I need to more carefully think about the implementation of this system, and how it fits into shadow values, so I'm going to change this to be on the next release for Narnia. One of the main goals with Narnia is to clearly expose these sorts of values to the user, I am unsure if hiding attributes of an NA value is ideal here, although I do see the benefits with other features of R. |
Another related thought that I've been wondering about is handling sparse |
Just a note to look into |
Just adding some of the relevant components of #76 into here: Extending the dieas from #76, and extending it to shadow values, this allows us to directly specify the different flavours of missings, providing the verbs:
This code would then only alter the shadow matrix, and leave the data intact as is, allowing us to leverage other features of the shadow matrix, and also possibly maybe add an additional factor level to it that describes the missingness mechanism (!NA, NA, NA_, NA_) There needs to be a way to store the "codebook / data dictionary" of missingness mechanisms, so that the user has a way to look up / describe what a value like Although there are currently ways in haven to store the "different" values of missingness using One approach I like so far could be something like this: data %>%
replace_shadow_where(.funs = ~.x == -99,
.why = "weather station too cold",
.suffix = "TC") An additional idea then is to make the |
Current progress: library(naniar)
library(tidyverse)
df <- tribble(
~wind, ~temperature,
-99, 45,
68, NA,
72, 25
)
dfs <- bind_shadow(df)
map(levels)
#> Error in as_mapper(.f, ...): argument ".f" is missing, with no default
dfs_special <- recode_shadow(dfs,
temperature = .where(wind == -99 ~ "bananas"))
dfs_special
#> # A tibble: 3 x 4
#> wind temperature wind_NA temperature_NA
#> <dbl> <dbl> <fct> <fct>
#> 1 -99. 45. !NA NA_bananas
#> 2 68. NA !NA NA
#> 3 72. 25. !NA !NA
map(dfs_special, levels)
#> $wind
#> NULL
#>
#> $temperature
#> NULL
#>
#> $wind_NA
#> [1] "!NA" "NA" "NA_bananas"
#>
#> $temperature_NA
#> [1] "!NA" "NA" "NA_bananas" Created on 2018-03-16 by the reprex package (v0.2.0). |
Next steps:
|
@njtierney just a thought for ya, but |
Thanks @mpadge ! That is something I have been thinking about, but at the moment I am sticking with the idea of expanding out the dataframe into the data and shadows. In the future I am interested in looking at collapsing things back down in the dataframe. On the note of labelled features, here are some packages that work with them (for future reference to myself) |
Hola @caitlinhudon - tagging you in here from discussion on twitter from your awesome talk |
Here is the section on labelled missing data: This looks like a nicely scoped out idea, which is great! But I think I want to take my idea for |
OK just dusted off the "special-missing" branch, I get the same output as before: library(naniar)
library(tidyverse)
df <- tribble(
~wind, ~temperature,
-99, 45,
68, NA,
72, 25
)
dfs <- bind_shadow(df)
map(dfs, levels)
#> $wind
#> NULL
#>
#> $temperature
#> NULL
#>
#> $wind_NA
#> [1] "!NA" "NA"
#>
#> $temperature_NA
#> [1] "!NA" "NA"
map(dfs, class)
#> $wind
#> [1] "numeric"
#>
#> $temperature
#> [1] "numeric"
#>
#> $wind_NA
#> [1] "shadow" "factor"
#>
#> $temperature_NA
#> [1] "shadow" "factor"
is_shadow(dfs)
#> [1] TRUE
are_shadow(dfs)
#> wind temperature wind_NA temperature_NA
#> FALSE FALSE TRUE TRUE
any_shadow(dfs)
#> Error in any_shadow(dfs): could not find function "any_shadow"
map(dfs, class)
#> $wind
#> [1] "numeric"
#>
#> $temperature
#> [1] "numeric"
#>
#> $wind_NA
#> [1] "shadow" "factor"
#>
#> $temperature_NA
#> [1] "shadow" "factor"
class(dfs)
#> [1] "shadow" "tbl_df" "tbl" "data.frame"
dfs_special <- dfs %>%
recode_shadow(temperature = .where(wind == -99 ~ "bananas"))
dfs_special
#> # A tibble: 3 x 4
#> wind temperature wind_NA temperature_NA
#> <dbl> <dbl> <fct> <fct>
#> 1 -99 45 !NA NA_bananas
#> 2 68 NA !NA NA
#> 3 72 25 !NA !NA
map(dfs_special, levels)
#> $wind
#> NULL
#>
#> $temperature
#> NULL
#>
#> $wind_NA
#> [1] "!NA" "NA" "NA_bananas"
#>
#> $temperature_NA
#> [1] "!NA" "NA" "NA_bananas" Created on 2018-06-14 by the reprex package (v0.2.0). Current tasks:
|
See this SO question for an idea on one practical implementation / statement of need |
Building on issues #25 and #31, and discussions with @rgayler, there needs to be a way to create different flavours of missing values to indicate different mechanisms.
An example of this could be where a weather station records
-99
as a missing value, but missing specifically because the weather was so cold the instruments stop working.Currently in R there is only one kind of
NA
value (ignoring NA_integer_ ... and friends).So there needs to be a way to specify your own missing value
NA_this
(or something).This might be a function like
tidyr::replace_na
, perhaps instead calledreplace_na_why
or something.This might look like
This would then create a value
NA_TC
, which then has a mechanism recorded.Since R does not treat these as missing, we would incorporate this into the shadow matrix values
!NA
,NA
, andNA_.why
The text was updated successfully, but these errors were encountered: