Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authenticated internal use #5

Open
timcdlucas opened this issue Mar 1, 2018 · 3 comments
Open

Authenticated internal use #5

timcdlucas opened this issue Mar 1, 2018 · 3 comments
Labels
enhancement New feature or request getPR
Milestone

Comments

@timcdlucas
Copy link
Contributor

If we want this package to replace many lines of code in our internal work, we need it to get the DHS data (and other private data) via authenticated access.

Ideally, we want to keep this as hidden from external users as possible. So for example, we don't want

d <- getPR('ZAF', username = 'tim', password = 'mystrongpassword')

because the docs on CRAN will just have to say "this is not for you". Which is not good software documentation.

@timcdlucas timcdlucas added this to the v0.2 milestone Mar 1, 2018
@timcdlucas timcdlucas added enhancement New feature or request getPR labels Mar 5, 2018
@timcdlucas
Copy link
Contributor Author

And or use this package to add DHS data.
https://github.com/OJWatson/rdhs

@OJWatson
Copy link
Contributor

Hey,

I was sending someone who wanted to get all the data used for the malaria maps to this package and noticed the DHS coordinates were missing and then saw this issue :)

The following gets you very close to what you may want. I've started it in a fork, but there were a couple of dhs_ids i could not match correctly within the DHS surveys which are commented in the code below.

Most the function documentation is the same as that for rdhs::set_rdhs_config that does the auth bits for you.

Let me know what you think/any ideas on the odd dhs_ids

Ta, OJ

#' Add DHS locations to malaria data
#'
#'
#' @inheritParams rdhs::as_factor
#' @param data Data to add DHS coordinates to
#' @examples 
#' 
#' pf <- malariaAtlas::getPR("all",species = "pf")
#' pf <- fillDHSCoordinates(pf, 
#' email = "rdhs.tester@gmail.com",
#' project = "Testing Malaria Investigations")

fillDHSCoordinates <- function(data,
                                email = NULL, project = NULL, 
                                cache_path = NULL, config_path = NULL, 
                                global = TRUE, verbose_download = FALSE, 
                                verbose_setup = TRUE, data_frame = NULL, 
                                timeout = 30, password_prompt = FALSE, 
                                prompt = TRUE) {
  
  # set up a config for rdhs
 set_rdhs_config(email = email, project = project, cache_path = cache_path, config_path = config_path, 
    global = global, verbose_download = verbose_download, verbose_setup = verbose_setup, 
    data_frame = data_frame, timeout = timeout, password_prompt = password_prompt, 
    prompt = prompt)

  # get stems and remove blanks
  dhs_id_stems <- unique(substr(data$dhs_id, 1, 6))
  dhs_id_stems <- dhs_id_stems[nchar(dhs_id_stems)==6]
  
  # then there are some odd dhs ids I noticed
  dhs_id_stems[dhs_id_stems=="MDG201"] <- "MD2011"
  
  # I couldn't find the following ids in the datasets
  # dhs_id_stems[dhs_id_stems=="BI2012"] <- "BU2012"
  # dhs_id_stems[dhs_id_stems=="MZ2014"] <- "MZ2014"
  
  # find the necessary geographic data files from the DHS API
  dats <- rdhs::dhs_datasets(countryIds = unique(substr(dhs_id_stems, 1, 2)),
                             surveyYear = unique(substr(dhs_id_stems, 3, 6)),
                             fileType = "GE")
  dats <- dats[which(substr(dats$SurveyId, 1, 6) %in% dhs_id_stems),]
  
  # download the datasets
  geo <- get_datasets(dats)
  no_permission <- "Dataset is not available with your DHS login credentials"
  geo <- geo[-which(unlist(geo) == no_permission)]
  
  # missing info (can add more depending on factors, e.g. encoding of urban/rural)
  mis_info <- c("dhs_id","site_id", "latitude", "longitude")
  dhs_info <- c("DHSID","DHSCLUST", "LATNUM", "LONGNUM")
  
  # fill in blanks
  for(stem in dhs_id_stems) {
    
    # what file does the stem relate to
    file_name_match <- dats$FileName[which(substr(dats$SurveyId, 1, 6) == stem)]
    file_name <- gsub("(*).zip", "", file_name_match, ignore.case = TRUE)
    
    # did we find that file
    if (length(file_name)==1) {
      
      # read in the data and then fill in blanks
      shp <- readRDS(geo[[file_name]])@data
      matches <- match(shp$DHSID,data$dhs_id)
      
      data[na.omit(matches), mis_info] <- shp[which(!is.na(matches)), dhs_info]
      
    } 
  }
  
  return(data)
  
}

@timcdlucas
Copy link
Contributor Author

Thank you very much for getting in touch!

I think having this as a separate function like you've got here is the best design.

Given that it's Christmas it might take me a little while to look at this fully. But I'm sure we'll merge it in.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request getPR
Projects
None yet
Development

No branches or pull requests

2 participants