Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add .common_na_strings #168

Closed
njtierney opened this issue May 28, 2018 · 10 comments
Closed

add .common_na_strings #168

njtierney opened this issue May 28, 2018 · 10 comments
Milestone

Comments

@njtierney
Copy link
Owner

@njtierney njtierney commented May 28, 2018

There are many other ways to represent missing values - I would like to include a bit of a mega string of values that naniar can search for, and also replace. This would cover things like:

  • "N A "
  • "NA "
  • " NA"
  • "N/A"
  • "N / A"

etc.

@njtierney
Copy link
Owner Author

@njtierney njtierney commented May 28, 2018

It would be ideal if this had some fancy regex I think.

njtierney added a commit that referenced this issue May 28, 2018
@njtierney
Copy link
Owner Author

@njtierney njtierney commented May 29, 2018

Current list of "NA' values

  • "NA"
  • "N A"
  • "N/A"
  • "NA "
  • " NA"
  • "N /A"
  • "N / A"
  • " N / A"
  • "N / A "
  • "na"
  • "n a"
  • "n/a"
  • "na "
  • " na"
  • "n /a"
  • "n / a"
  • " a / a"
  • "n / a "
@b-rodrigues
Copy link

@b-rodrigues b-rodrigues commented May 29, 2018

Recently I've encountered "?".

@EdwinTh
Copy link

@EdwinTh EdwinTh commented May 29, 2018

  • "NULL"
  • "null"
  • "-999"
  • "-"
@apear9
Copy link

@apear9 apear9 commented May 29, 2018

"" is one i've encountered a few times when going from excel to R

@sinarueeger
Copy link

@sinarueeger sinarueeger commented May 29, 2018

I'd add:

  • "." (origin: data extracted from STATA)
  • "-9" (when a variable is supposed to a range of values, e.g. [0,1,2])
@cimentadaj
Copy link

@cimentadaj cimentadaj commented May 29, 2018

Also "99"

Beware of the "66", "77" and "88" which are usualy things like "Didn't want to respond" or 'Question skipped'

@TonyLadson
Copy link

@TonyLadson TonyLadson commented May 30, 2018

@njtierney
Copy link
Owner Author

@njtierney njtierney commented Jun 4, 2018

Thank you everyone for your comments, really appreciate them!

I'm also linking in the twitter question which had some nice examples as well:

https://twitter.com/nj_tierney/status/1001340686409482240

@njtierney njtierney modified the milestone: V0.3.0 Jun 5, 2018
@njtierney njtierney added the V0.3.0 label Jun 5, 2018
@njtierney njtierney added this to the V0.3.0 milestone Jun 5, 2018
@njtierney njtierney removed the V0.3.0 label Jun 5, 2018
@njtierney njtierney closed this in 3ce472e Jun 5, 2018
@njtierney
Copy link
Owner Author

@njtierney njtierney commented Jun 5, 2018

Thank you everyone for your comments! Here are common_na_strings and common_na_numbers in action:

library(naniar)

common_na_strings
#>  [1] "NA"     "N A"    "N/A"    "NA "    " NA"    "N /A"   "N / A" 
#>  [8] " N / A" "N / A " "na"     "n a"    "n/a"    "na "    " na"   
#> [15] "n /a"   "n / a"  " a / a" "n / a " "?"      "."      "NULL"  
#> [22] "null"   ""       "."
common_na_numbers
#> [1]    -9   -99  -999 -9999  9999    66    77    88

dat_ms <- tibble::tribble(~x,  ~y,    ~z,
                         1,   "A",   -100,
                         3,   "N/A", -99,
                         NA,  NA,    -98,
                         -99, "E",   -101,
                         -98, "F",   -1)

miss_scan_count(dat_ms, -99)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            1
#> 2 y            0
#> 3 z            1
miss_scan_count(dat_ms, c(-99,-98))
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            2
#> 2 y            0
#> 3 z            2
miss_scan_count(dat_ms, c("-99","-98","N/A"))
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            2
#> 2 y            1
#> 3 z            2
miss_scan_count(dat_ms, common_na_numbers)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            2
#> 2 y            0
#> 3 z            2
miss_scan_count(dat_ms, common_na_strings)
#> # A tibble: 3 x 2
#>   Variable     n
#>   <chr>    <int>
#> 1 x            4
#> 2 y            4
#> 3 z            5

Created on 2018-06-05 by the reprex package (v0.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.