Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add class #9

Merged
merged 18 commits into from Jan 18, 2019
Merged

Add class #9

merged 18 commits into from Jan 18, 2019

Conversation

zkamvar
Copy link
Member

@zkamvar zkamvar commented Jan 18, 2019

This PR adds the basic data dictionary where the user would set desired variables with the epivars() function and then label their column names with the as_linelist() function:

# User modifies this -------------------------
epivars("date_release", "age_months", set = TRUE)
ll <- as_linelist(dat,
  date_release = "release",
  age_months   = "months"
)

# This will run with any data set ------------
get_var(ll, "date_release")
get_var(ll, "age_months")
example workflow with linelist
library('outbreaks')
library('incidence')
library('linelist')
#> linelist is loaded with the following global variables in `epivars()`:
#> id, date_onset, date_report, gender, age, age_group, geo
# define the important variables
epivars(
        "id",
        "date_onset",
        "gender",
        "age",
        "outcome", 
        "date_outcome", 
        "date_hospitalisation", 
        set = TRUE)
#>  [1] "id"                   "date_onset"           "date_report"         
#>  [4] "gender"               "age"                  "age_group"           
#>  [7] "geo"                  "outcome"              "date_outcome"        
#> [10] "date_hospitalisation"

# convert the data set to a linelist and define the variables
ll <- as_linelist(outbreaks::fluH7N9_china_2013,
  "outcome"              = "outcome", 
  "date_outcome"         = "date_of_outcome", 
  "date_hospitalisation" = "date_of_hospitalisation",
  "id"                   = "case_id",
  "date_onset"           = "date_of_onset",
  "gender"               = "gender",
  "age"                  = "age"
  )
head(ll)
#>   case_id date_of_onset date_of_hospitalisation date_of_outcome outcome
#> 1       1    2013-02-19                    <NA>      2013-03-04   Death
#> 2       2    2013-02-27              2013-03-03      2013-03-10   Death
#> 3       3    2013-03-09              2013-03-19      2013-04-09   Death
#> 4       4    2013-03-19              2013-03-27            <NA>    <NA>
#> 5       5    2013-03-19              2013-03-30      2013-05-15 Recover
#> 6       6    2013-03-21              2013-03-28      2013-04-26   Death
#>   gender age province
#> 1      m  87 Shanghai
#> 2      m  27 Shanghai
#> 3      f  35    Anhui
#> 4      f  45  Jiangsu
#> 5      f  48  Jiangsu
#> 6      f  32  Jiangsu
head(outbreaks::fluH7N9_china_2013)
#>   case_id date_of_onset date_of_hospitalisation date_of_outcome outcome
#> 1       1    2013-02-19                    <NA>      2013-03-04   Death
#> 2       2    2013-02-27              2013-03-03      2013-03-10   Death
#> 3       3    2013-03-09              2013-03-19      2013-04-09   Death
#> 4       4    2013-03-19              2013-03-27            <NA>    <NA>
#> 5       5    2013-03-19              2013-03-30      2013-05-15 Recover
#> 6       6    2013-03-21              2013-03-28      2013-04-26   Death
#>   gender age province
#> 1      m  87 Shanghai
#> 2      m  27 Shanghai
#> 3      f  35    Anhui
#> 4      f  45  Jiangsu
#> 5      f  48  Jiangsu
#> 6      f  32  Jiangsu

get_meta(ll)
#>                    column               epivar  class hxl
#> 1                 case_id                   id factor    
#> 2           date_of_onset           date_onset   Date    
#> 3 date_of_hospitalisation date_hospitalisation   Date    
#> 4         date_of_outcome         date_outcome   Date    
#> 5                 outcome              outcome factor    
#> 6                  gender               gender factor    
#> 7                     age                  age factor    
#> 8                province                 <NA> factor

head(gender(ll))
#> [1] m m f f f f
#> Levels: f m
head(get_var(ll, "outcome"))
#> [1] Death   Death   Death   <NA>    Recover Death  
#> Levels: Death Recover
ic <- incidence(date_onset(ll), interval = "week", groups = gender(ll))
#> 10 missing observations were removed.
plot(ic)

Created on 2019-01-18 by the reprex package (v0.2.1)

The roadmap I'm thinking about is this:

  • basic data dictionary functionality
  • A data dictionary that allows you to map standard variable names to columns
  • Integration with #hxl standard for dynamic search of variables
  • Validation of categorical values
  • Automatically generate linelist from an imported dictionary using the openxlsx package, setting defaults for the columns.
  • Import dictionary and validation from file
  • explore possibility for adding a global dynamic function environment that would update based on the global variables used. For example, if the user specifies epivars("something", set = TRUE), then they can access that var from either get_var(x, "something") or get$something(x).

@zkamvar zkamvar merged commit 68389e3 into master Jan 18, 2019
@zkamvar zkamvar deleted the add-class branch February 8, 2019 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant