You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What does this package do? (explain in 50 words or less):
It interacts with the Demographic and Health Survey (DHS) Program API (https://api.dhsprogram.com), and provides tools to use the API to ease identifying, downloading, loading and analysing the raw survey data collected by the DHS.
Paste the full DESCRIPTION file inside a code block below:
Package: rdhs
Type: Package
Title: API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Version: 0.5.0
Authors@R: c(
person("OJ", "Watson", role=c("aut", "cre"),
email="o.watson15@imperial.ac.uk"),
person("Jeff", "Eaton", role="aut"))
Maintainer: OJ Watson <o.watson15@imperial.ac.uk>
URL: https://ojwatson.github.io/rdhs/
BugReports: https://github.com/OJWatson/rdhs/issues
Description: Provides a client for (1) querying the DHS API for survey indicators
and metadata (https://api.dhsprogram.com/#/index.html), (2) identifying surveys
and datasets for analysis, (3) downloading survey datasets from the DHS website,
(4) loading datasets and associate metadata into R, and (5) extracting variables
and combining datasets for pooled analysis.
LazyData: TRUE
Depends: R (>= 3.3.0)
Imports:
R6,
httr,
jsonlite,
foreign,
magrittr,
rappdirs,
digest,
storr,
xml2,
qdapRegex,
rgdal,
getPass,
haven,
iotools
Suggests:
testthat,
knitr,
rmarkdown,
ggplot2,
survey,
data.table,
microbenchmark
License: MIT + file LICENSE
RoxygenNote: 6.0.1
VignetteBuilder: knitr
Language: en-GB
URL for the package (the development repository, not a stylized html page):
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
data access, because it interacts with the DHS API
data retrieval, because it helps download and load the raw datasets from the DHS website, after using the API to first identify those survey datasets relevant for your analysis
Who is the target audience and what are scientific applications of this package?
Global Health Researchers and Policy makers. The DHS data has been used in well over 20,000 academic studies (based on google scholar search for "DHS" AND "demographic and health survey") that have helped shape progress towards targets such as the Sustainable Development Goals and inform health policy such as detailing trends in child mortality and characterising the distribution and use of insecticide-treated bed nets in Africa. The package will help assist researchers who use R for these purposes rather than/don't have access to stata/sas (these datasets are the published datasets by the DHS program), as well as serve to simplify commonly required analytical pipelines. The end result aims to increase the end user accessibility to the raw data and create a tool that supports reproducible global health research.
There are a number of other R pacakges that work with DHS data in various ways. A quick search of github for "DHS" and R shows 39 repos, however the majority are small custom scripts.
1 repo looks just at interacting with the DHS API, but it hasn't been added to for almost a year, and the API endpoint functions do not cover all the endpoints available nor allow you to query each endpoint by all the possible query terms. It also requires the user to know query terms rather than having them as arguments.
1 repo also looks at downloading the survey datasets from the website (and it was used initially when designing these fucntions with rdhs). However, it skips over large dataset files, has some bugs depending on the character length of your login credentials, and does not allow you to read in all the datasets available from the website. [ FYI: we don't read in .sas7bdat (we are writing a parser for the oddly formed catalog files provided by the DHS website for these) or hierarchal dataset files as we have a parser for the flat equivalent of hierarchal dataset. In theory each file format should be the same data, so having one parser that works is sufficient, but we have found that the flat and spss data formats have the most complete meta data for the data variable labels).
There are then a few repos that do bespoke pieces of analysis (2 of which are on CRAN) looking at spatial analysis and calculating survey statistics. We are hoping to bring these onboard, either by wrapping them to use the output of our downloaded harmonised datasets, or by writing additional tools for downstream analysis (see TODO.md).
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Requirements
Confirm each of the following by checking the box. This package:
does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
The package is novel and will be of interest to the broad readership of the journal.
The manuscript describing the package is no longer than 3000 words.
You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)
Detail
Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Summary
It interacts with the Demographic and Health Survey (DHS) Program API (https://api.dhsprogram.com), and provides tools to use the API to ease identifying, downloading, loading and analysing the raw survey data collected by the DHS.
https://github.com/OJWatson/rdhs
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
Who is the target audience and what are scientific applications of this package?
Global Health Researchers and Policy makers. The DHS data has been used in well over 20,000 academic studies (based on google scholar search for "DHS" AND "demographic and health survey") that have helped shape progress towards targets such as the Sustainable Development Goals and inform health policy such as detailing trends in child mortality and characterising the distribution and use of insecticide-treated bed nets in Africa. The package will help assist researchers who use R for these purposes rather than/don't have access to stata/sas (these datasets are the published datasets by the DHS program), as well as serve to simplify commonly required analytical pipelines. The end result aims to increase the end user accessibility to the raw data and create a tool that supports reproducible global health research.
yours differ or meet our criteria for best-in-category?
There are a number of other R pacakges that work with DHS data in various ways. A quick search of github for "DHS" and R shows 39 repos, however the majority are small custom scripts.
1 repo looks just at interacting with the DHS API, but it hasn't been added to for almost a year, and the API endpoint functions do not cover all the endpoints available nor allow you to query each endpoint by all the possible query terms. It also requires the user to know query terms rather than having them as arguments.
1 repo also looks at downloading the survey datasets from the website (and it was used initially when designing these fucntions with rdhs). However, it skips over large dataset files, has some bugs depending on the character length of your login credentials, and does not allow you to read in all the datasets available from the website. [ FYI: we don't read in .sas7bdat (we are writing a parser for the oddly formed catalog files provided by the DHS website for these) or hierarchal dataset files as we have a parser for the flat equivalent of hierarchal dataset. In theory each file format should be the same data, so having one parser that works is sufficient, but we have found that the flat and spss data formats have the most complete meta data for the data variable labels).
There are then a few repos that do bespoke pieces of analysis (2 of which are on CRAN) looking at spatial analysis and calculating survey statistics. We are hoping to bring these onboard, either by wrapping them to use the output of our downloaded harmonised datasets, or by writing additional tools for downstream analysis (see TODO.md).
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.mdmatching JOSS's requirements with a high-level description in the package root or ininst/.Detail
R CMD check(ordevtools::check()) succeed? Paste and describe any errors or warnings:Yes:
Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names: