Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default warning #48

Merged
merged 9 commits into from Mar 1, 2019
Merged

Fix default warning #48

merged 9 commits into from Mar 1, 2019

Conversation

zkamvar
Copy link
Member

@zkamvar zkamvar commented Feb 27, 2019

This will fix #47

  • no longer warns if there's no default to replace
  • now has the option to warn if things are caught by the default

@zkamvar zkamvar marked this pull request as ready for review February 27, 2019 17:03
@zkamvar
Copy link
Member Author

zkamvar commented Feb 27, 2019

Currently clean_variable_spelling() has a warn = FALSE argument that will aggregate warnings and indicate which columns they belong to. This has not yet been passed to clean_variables() or clean_data():

set.seed(2019-02-19)
library("linelist")
toy_data <- messy_data(10)

messy_locations <- c("hopsital", "h\u00f4pital", "hospital", 
                     "m\u00e9dical", "clinic", 
                     "feild", "field", NA, "howdy")

toy_data$location <- factor(sample(messy_locations, 10, replace = TRUE))
toy_data <- tibble::as_tibble(toy_data)

wordlist <- data.frame(
  from  = c("hopsital", "hopital",  "medical", "feild", NA, ".default"),
  to = c("hospital", "hospital", "clinic",  "field", "unknown", "unknown"),
  variable = rep("location", 6),
  stringsAsFactors = FALSE
)

global_words <- data.frame(
  from = "not_a_case",
  to = "not a case",
  variable = ".global",
  stringsAsFactors = FALSE
)

cleaned_data <- clean_data(toy_data)

# clean_variable_spelling has a `warn` argument
clean_variable_spelling(cleaned_data, cbind(wordlist, global_words), warn = TRUE)
#> Warning in clean_variable_spelling(cleaned_data, cbind(wordlist, global_words), : The following warnings were found...
#>   location__:
#>   .... NA was present in the first column of d; replacing with the character 'NA' If you want to indicate missing data, use the '.missing' keyword.
#>   .... 'howdy' was changed to the default value ('unknown')
#> # A tibble: 10 x 9
#>    id    date_of_onset discharge  gender epi_case_defini… messy_dates   lat
#>    <fct> <date>        <date>     <fct>  <fct>            <date>      <dbl>
#>  1 mxtw… 2018-01-02    2018-01-12 female suspected        NA          14.1 
#>  2 fdco… 2018-01-04    2018-01-14 female confirmed        NA          14.4 
#>  3 iwbo… 2018-01-02    2018-01-12 female suspected        NA          16.1 
#>  4 uboe… 2018-01-02    2018-01-12 female suspected        2018-10-18  14.7 
#>  5 dbbc… 2018-01-11    2018-01-21 male   not_a_case       1989-12-24  12.3 
#>  6 zzgy… 2018-01-06    2018-01-16 male   not_a_case       2001-12-01  13.0 
#>  7 nilp… 2018-01-02    2018-01-12 female not_a_case       NA          13.6 
#>  8 rmwa… 2018-01-10    2018-01-20 female suspected        NA           9.27
#>  9 bfzo… 2018-01-11    2018-01-21 male   suspected        1989-12-24  13.0 
#> 10 atwa… 2018-01-06    2018-01-16 female not_a_case       1989-12-24  13.5 
#> # … with 2 more variables: lon <dbl>, location <fct>

# clean_data doesn't have this option yet
clean_data(cleaned_data, wordlists = cbind(wordlist, global_words))
#> # A tibble: 10 x 9
#>    id    date_of_onset discharge  gender epi_case_defini… messy_dates   lat
#>    <fct> <date>        <date>     <fct>  <fct>            <date>      <dbl>
#>  1 mxtw… 2018-01-02    2018-01-12 female suspected        NA          14.1 
#>  2 fdco… 2018-01-04    2018-01-14 female confirmed        NA          14.4 
#>  3 iwbo… 2018-01-02    2018-01-12 female suspected        NA          16.1 
#>  4 uboe… 2018-01-02    2018-01-12 female suspected        2018-10-18  14.7 
#>  5 dbbc… 2018-01-11    2018-01-21 male   not_a_case       1989-12-24  12.3 
#>  6 zzgy… 2018-01-06    2018-01-16 male   not_a_case       2001-12-01  13.0 
#>  7 nilp… 2018-01-02    2018-01-12 female not_a_case       NA          13.6 
#>  8 rmwa… 2018-01-10    2018-01-20 female suspected        NA           9.27
#>  9 bfzo… 2018-01-11    2018-01-21 male   suspected        1989-12-24  13.0 
#> 10 atwa… 2018-01-06    2018-01-16 female not_a_case       1989-12-24  13.5 
#> # … with 2 more variables: lon <dbl>, location <fct>

wordlist$from[is.na(wordlist$from)] <- ".missing"
clean_variable_spelling(cleaned_data, cbind(wordlist, global_words), warn = TRUE)
#> Warning in clean_variable_spelling(cleaned_data, cbind(wordlist, global_words), : The following warnings were found...
#>   location__:
#>   .... 'howdy' was changed to the default value ('unknown')
#> # A tibble: 10 x 9
#>    id    date_of_onset discharge  gender epi_case_defini… messy_dates   lat
#>    <fct> <date>        <date>     <fct>  <fct>            <date>      <dbl>
#>  1 mxtw… 2018-01-02    2018-01-12 female suspected        NA          14.1 
#>  2 fdco… 2018-01-04    2018-01-14 female confirmed        NA          14.4 
#>  3 iwbo… 2018-01-02    2018-01-12 female suspected        NA          16.1 
#>  4 uboe… 2018-01-02    2018-01-12 female suspected        2018-10-18  14.7 
#>  5 dbbc… 2018-01-11    2018-01-21 male   not_a_case       1989-12-24  12.3 
#>  6 zzgy… 2018-01-06    2018-01-16 male   not_a_case       2001-12-01  13.0 
#>  7 nilp… 2018-01-02    2018-01-12 female not_a_case       NA          13.6 
#>  8 rmwa… 2018-01-10    2018-01-20 female suspected        NA           9.27
#>  9 bfzo… 2018-01-11    2018-01-21 male   suspected        1989-12-24  13.0 
#> 10 atwa… 2018-01-06    2018-01-16 female not_a_case       1989-12-24  13.5 
#> # … with 2 more variables: lon <dbl>, location <fct>

Created on 2019-02-27 by the reprex package (v0.2.1)

@zkamvar
Copy link
Member Author

zkamvar commented Feb 28, 2019

I've added the option warn_spelling = FALSE to clean_variables() and clean_data() so now you get get these warnings no matter what!

@zkamvar zkamvar merged commit 6d10845 into master Mar 1, 2019
@zkamvar zkamvar deleted the fix-default-warning branch March 1, 2019 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clean_spelling: inacurate warning
1 participant