Add class #9

zkamvar · 2019-01-18T06:02:01Z

This PR adds the basic data dictionary where the user would set desired variables with the epivars() function and then label their column names with the as_linelist() function:

# User modifies this -------------------------
epivars("date_release", "age_months", set = TRUE)
ll <- as_linelist(dat,
  date_release = "release",
  age_months   = "months"
)

# This will run with any data set ------------
get_var(ll, "date_release")
get_var(ll, "age_months")

example workflow with linelist

library('outbreaks')
library('incidence')
library('linelist')
#> linelist is loaded with the following global variables in `epivars()`:
#> id, date_onset, date_report, gender, age, age_group, geo
# define the important variables
epivars(
        "id",
        "date_onset",
        "gender",
        "age",
        "outcome", 
        "date_outcome", 
        "date_hospitalisation", 
        set = TRUE)
#>  [1] "id"                   "date_onset"           "date_report"         
#>  [4] "gender"               "age"                  "age_group"           
#>  [7] "geo"                  "outcome"              "date_outcome"        
#> [10] "date_hospitalisation"

# convert the data set to a linelist and define the variables
ll <- as_linelist(outbreaks::fluH7N9_china_2013,
  "outcome"              = "outcome", 
  "date_outcome"         = "date_of_outcome", 
  "date_hospitalisation" = "date_of_hospitalisation",
  "id"                   = "case_id",
  "date_onset"           = "date_of_onset",
  "gender"               = "gender",
  "age"                  = "age"
  )
head(ll)
#>   case_id date_of_onset date_of_hospitalisation date_of_outcome outcome
#> 1       1    2013-02-19                    <NA>      2013-03-04   Death
#> 2       2    2013-02-27              2013-03-03      2013-03-10   Death
#> 3       3    2013-03-09              2013-03-19      2013-04-09   Death
#> 4       4    2013-03-19              2013-03-27            <NA>    <NA>
#> 5       5    2013-03-19              2013-03-30      2013-05-15 Recover
#> 6       6    2013-03-21              2013-03-28      2013-04-26   Death
#>   gender age province
#> 1      m  87 Shanghai
#> 2      m  27 Shanghai
#> 3      f  35    Anhui
#> 4      f  45  Jiangsu
#> 5      f  48  Jiangsu
#> 6      f  32  Jiangsu
head(outbreaks::fluH7N9_china_2013)
#>   case_id date_of_onset date_of_hospitalisation date_of_outcome outcome
#> 1       1    2013-02-19                    <NA>      2013-03-04   Death
#> 2       2    2013-02-27              2013-03-03      2013-03-10   Death
#> 3       3    2013-03-09              2013-03-19      2013-04-09   Death
#> 4       4    2013-03-19              2013-03-27            <NA>    <NA>
#> 5       5    2013-03-19              2013-03-30      2013-05-15 Recover
#> 6       6    2013-03-21              2013-03-28      2013-04-26   Death
#>   gender age province
#> 1      m  87 Shanghai
#> 2      m  27 Shanghai
#> 3      f  35    Anhui
#> 4      f  45  Jiangsu
#> 5      f  48  Jiangsu
#> 6      f  32  Jiangsu

get_meta(ll)
#>                    column               epivar  class hxl
#> 1                 case_id                   id factor    
#> 2           date_of_onset           date_onset   Date    
#> 3 date_of_hospitalisation date_hospitalisation   Date    
#> 4         date_of_outcome         date_outcome   Date    
#> 5                 outcome              outcome factor    
#> 6                  gender               gender factor    
#> 7                     age                  age factor    
#> 8                province                 <NA> factor

head(gender(ll))
#> [1] m m f f f f
#> Levels: f m
head(get_var(ll, "outcome"))
#> [1] Death   Death   Death   <NA>    Recover Death  
#> Levels: Death Recover
ic <- incidence(date_onset(ll), interval = "week", groups = gender(ll))
#> 10 missing observations were removed.
plot(ic)

^{Created on 2019-01-18 by the reprex package (v0.2.1)}

The roadmap I'm thinking about is this:

basic data dictionary functionality
A data dictionary that allows you to map standard variable names to columns
Integration with #hxl standard for dynamic search of variables
Validation of categorical values
Automatically generate linelist from an imported dictionary using the openxlsx package, setting defaults for the columns.
Import dictionary and validation from file
explore possibility for adding a global dynamic function environment that would update based on the global variables used. For example, if the user specifies epivars("something", set = TRUE), then they can access that var from either get_var(x, "something") or get$something(x).

This is a quick generator for messy data validation

The vars element will contain the list of the currently defined epivars and where they fall in the data frame. The meta element will contain a separate long data frame that defines all of the meta data associated with the vars.

zkamvar added 18 commits November 6, 2018 13:19

use Rproject

5ccb8d0

add ideas for incidence class

c53cf12

begin infastructure for incidence class

fe03ef5

add vim swap files to gitignore

af5c30c

ignore friggen docs folder

dd82276

add messy data function

a9b4ba4

This is a quick generator for messy data validation

update accessors and class

a74c81b

split epivars into vars and meta

f488b5d

The vars element will contain the list of the currently defined epivars and where they fall in the data frame. The meta element will contain a separate long data frame that defines all of the meta data associated with the vars.

update linelist class

5b7243d

fix processing error in messy_data

5c4d225

add tests

92e7505

add a bit of meat to the vignette

9a2cb85

move knitr to suggests; add Zhian to DESCRIPTION

0bbce1a

add get_meta function and linelist subsettor

5d97d58

add an example to the vignette

d81ea41

Merge branch 'master' into add-class

a904643

Merge branch 'master' into add-class

fe7ccae

Merge branch 'master' into add-class

ddb3672

zkamvar merged commit 68389e3 into master Jan 18, 2019

zkamvar deleted the add-class branch February 8, 2019 07:43

zkamvar mentioned this pull request Feb 8, 2019

Bump version and update NEWS #33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add class #9

Add class #9

zkamvar commented Jan 18, 2019

Add class #9

Add class #9

Conversation

zkamvar commented Jan 18, 2019