Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upadd .common_na_strings #168
Comments
|
It would be ideal if this had some fancy regex I think. |
|
Current list of "NA' values
|
|
Recently I've encountered "?". |
|
|
"" is one i've encountered a few times when going from excel to R |
|
I'd add:
|
|
Also "99" Beware of the "66", "77" and "88" which are usualy things like "Didn't want to respond" or 'Question skipped' |
|
I've seen 'NR' and 'nr' for not read.
…On Tue, 29 May 2018 at 18:07, Jorge Cimentada ***@***.***> wrote:
Also "99"
Beware of the "66", "77" and "88" which are usualy things like "Didn't
want to respond" or 'Question skipped'
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#168 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB4BzTnkAeNNgB0Nu10ewlQuQDKO-3n9ks5t3QGngaJpZM4UQEAZ>
.
|
|
Thank you everyone for your comments, really appreciate them! I'm also linking in the twitter question which had some nice examples as well: |
|
Thank you everyone for your comments! Here are library(naniar)
common_na_strings
#> [1] "NA" "N A" "N/A" "NA " " NA" "N /A" "N / A"
#> [8] " N / A" "N / A " "na" "n a" "n/a" "na " " na"
#> [15] "n /a" "n / a" " a / a" "n / a " "?" "." "NULL"
#> [22] "null" "" "."
common_na_numbers
#> [1] -9 -99 -999 -9999 9999 66 77 88
dat_ms <- tibble::tribble(~x, ~y, ~z,
1, "A", -100,
3, "N/A", -99,
NA, NA, -98,
-99, "E", -101,
-98, "F", -1)
miss_scan_count(dat_ms, -99)
#> # A tibble: 3 x 2
#> Variable n
#> <chr> <int>
#> 1 x 1
#> 2 y 0
#> 3 z 1
miss_scan_count(dat_ms, c(-99,-98))
#> # A tibble: 3 x 2
#> Variable n
#> <chr> <int>
#> 1 x 2
#> 2 y 0
#> 3 z 2
miss_scan_count(dat_ms, c("-99","-98","N/A"))
#> # A tibble: 3 x 2
#> Variable n
#> <chr> <int>
#> 1 x 2
#> 2 y 1
#> 3 z 2
miss_scan_count(dat_ms, common_na_numbers)
#> # A tibble: 3 x 2
#> Variable n
#> <chr> <int>
#> 1 x 2
#> 2 y 0
#> 3 z 2
miss_scan_count(dat_ms, common_na_strings)
#> # A tibble: 3 x 2
#> Variable n
#> <chr> <int>
#> 1 x 4
#> 2 y 4
#> 3 z 5Created on 2018-06-05 by the reprex package (v0.2.0). |
There are many other ways to represent missing values - I would like to include a bit of a mega string of values that naniar can search for, and also replace. This would cover things like:
etc.