Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add .common_na_strings #168

Closed
njtierney opened this Issue May 28, 2018 · 10 comments

Comments

Projects
None yet
7 participants
@njtierney
Copy link
Owner

njtierney commented May 28, 2018

There are many other ways to represent missing values - I would like to include a bit of a mega string of values that naniar can search for, and also replace. This would cover things like:

  • "N A "
  • "NA "
  • " NA"
  • "N/A"
  • "N / A"

etc.

@njtierney

This comment has been minimized.

Copy link
Owner Author

njtierney commented May 28, 2018

It would be ideal if this had some fancy regex I think.

njtierney added a commit that referenced this issue May 28, 2018

@njtierney

This comment has been minimized.

Copy link
Owner Author

njtierney commented May 29, 2018

Current list of "NA' values

  • "NA"
  • "N A"
  • "N/A"
  • "NA "
  • " NA"
  • "N /A"
  • "N / A"
  • " N / A"
  • "N / A "
  • "na"
  • "n a"
  • "n/a"
  • "na "
  • " na"
  • "n /a"
  • "n / a"
  • " a / a"
  • "n / a "
@b-rodrigues

This comment has been minimized.

Copy link

b-rodrigues commented May 29, 2018

Recently I've encountered "?".

@EdwinTh

This comment has been minimized.

Copy link

EdwinTh commented May 29, 2018

  • "NULL"
  • "null"
  • "-999"
  • "-"
@apear9

This comment has been minimized.

Copy link

apear9 commented May 29, 2018

"" is one i've encountered a few times when going from excel to R

@sinarueeger

This comment has been minimized.

Copy link

sinarueeger commented May 29, 2018

I'd add:

  • "." (origin: data extracted from STATA)
  • "-9" (when a variable is supposed to a range of values, e.g. [0,1,2])
@cimentadaj

This comment has been minimized.

Copy link

cimentadaj commented May 29, 2018

Also "99"

Beware of the "66", "77" and "88" which are usualy things like "Didn't want to respond" or 'Question skipped'

@TonyLadson

This comment has been minimized.

Copy link

TonyLadson commented May 30, 2018

@njtierney

This comment has been minimized.

Copy link
Owner Author

njtierney commented Jun 4, 2018

Thank you everyone for your comments, really appreciate them!

I'm also linking in the twitter question which had some nice examples as well:

https://twitter.com/nj_tierney/status/1001340686409482240

@njtierney njtierney modified the milestone: V0.3.0 Jun 5, 2018

@njtierney njtierney added the V0.3.0 label Jun 5, 2018

@njtierney njtierney added this to the V0.3.0 milestone Jun 5, 2018

@njtierney njtierney removed the V0.3.0 label Jun 5, 2018

@njtierney njtierney closed this in 3ce472e Jun 5, 2018

@njtierney

This comment has been minimized.

Copy link
Owner Author

njtierney commented Jun 5, 2018

Thank you everyone for your comments! Here are common_na_strings and common_na_numbers in action:

library(naniar)

common_na_strings
#>  [1] "NA"     "N A"    "N/A"    "NA "    " NA"    "N /A"   "N / A" 
#>  [8] " N / A" "N / A " "na"     "n a"    "n/a"    "na "    " na"   
#> [15] "n /a"   "n / a"  " a / a" "n / a " "?"      "."      "NULL"  
#> [22] "null"   ""       "."
common_na_numbers
#> [1]    -9   -99  -999 -9999  9999    66    77    88

dat_ms <- tibble::tribble(~x,  ~y,    ~z,
                         1,   "A",   -100,
                         3,   "N/A", -99,
                         NA,  NA,    -98,
                         -99, "E",   -101,
                         -98, "F",   -1)

miss_scan_count(dat_ms, -99)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            1
#> 2 y            0
#> 3 z            1
miss_scan_count(dat_ms, c(-99,-98))
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            2
#> 2 y            0
#> 3 z            2
miss_scan_count(dat_ms, c("-99","-98","N/A"))
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            2
#> 2 y            1
#> 3 z            2
miss_scan_count(dat_ms, common_na_numbers)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            2
#> 2 y            0
#> 3 z            2
miss_scan_count(dat_ms, common_na_strings)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            4
#> 2 y            4
#> 3 z            5

Created on 2018-06-05 by the reprex package (v0.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.