Skip to content

Adding data

Joe Palmer edited this page May 21, 2021 · 3 revisions

The first step for adding a new data source is working out if you have a 'national' or 'subnational'/'regional' data source. A national data source is one which brings together data from multiple different countries (such as WHO and JHU). A subnational data source is one which contains subregional data for a specific country.

Once you have decided what kind of source you are going to add, set up your class with the function:

make_new_data_source(source = "NameOfSource, type = "subnational")

This will create a new file in the R directory with the name of your source (e.g. France.R) which contains boilerplate code for your class, indcluding required fields, methods and some basic documentation (which will need expanding!).

All you need to do now is fill out the fields and replace the clean methods with your custom cleaning function. Please see the following guides based on data source type which cover things like naming conventions and what each field / method means and does:

Example using make_new_data_source()

> make_new_data_source(source = "ImaginationLand", type = "subnational")                                                                                                                                                 
subnational Class created for ImaginationLand at R/ImaginationLand.R
workflow created for ImaginationLand at .github/workflows/ImaginationLand.yaml

> system("cat R/ImaginationLand.R")

#' ImaginationLand Class for downloading, cleaning and processing
#' notification data
#'
#' @description Information for downloading, cleaning
#'  and processing covid-19 region data for ImaginationLand.
#'
#' @concept dataset
#' @family subnational
#' @examples
#' \dontrun{
#' region <- ImaginationLand$new(verbose = TRUE, steps = TRUE, get = TRUE)
#' region$return()
#' }
ImaginationLand <- R6::R6Class("ImaginationLand",
  inherit = DataClass,
  public = list(

    # Core Attributes (amend each paramater for country specific infomation)
    #' @field origin name of country to fetch data for
    origin = "ImaginationLand",
    #' @field supported_levels List of supported levels.
    supported_levels = list("1"),
    #' @field supported_region_names List of region names in order of level.
    supported_region_names = list("1" = NA),
    #' @field supported_region_codes List of region codes in order of level.
    supported_region_codes = list("1" = NA),
    #' @field common_data_urls List of named links to raw data.
    common_data_urls = list(main = "url"),
    #' @field source_data_cols existing columns within the raw data
    source_data_cols = c("col_1", "col_2", "col_3", "etc."),

    #' @description Data cleaning common across levels
    #'
    clean_commmon = function() {
      self$data$clean <- self$data$raw[["main"]]
    },

    #' @description Data cleaning specific to level 1
    #'
    clean_level_1 = function() {
      self$data$clean <- self$data$clean
    }
  )
)
Clone this wiki locally