Cleans and Munges Tuberculosis and Demographic Data for England
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.Rproj.user
R
data-raw
data
docs
man
tests
vignettes
.Rbuildignore
.gitconfig
.gitignore
.travis.yml
DESCRIPTION
Dockerfile
LICENSE
NAMESPACE
README.Rmd
README.md
_pkgdown.yaml
tbinenglanddataclean.Rproj

README.md

tbinenglanddataclean

CRAN_Status_Badge Build Status codecov

tbinenglanddataclean is an R package that contains functions and documentation to reproduce clean and munge available TB data in England.

Installation

You can install tbinenglanddataclean from github with:

# install.packages("devtools")
devtools::install_github("seabbs/tbinenglanddataclean")

Raw data

This package relies on raw data from several sources, these are;

  1. An extract of from the Enhanced Tuberculosis Surveillance System. Access to this data requires an application to Public Health England.
  2. Data on historic TB notifications from Public Health England.
  3. Demographic data from 2000, and from 2001 to 2015 from the Office of National Statistics (ONS) this data can be downloaded freely.
  4. Data on births in the UK both observed and projected from the ONS, available here and here.
  5. Data on age specific mortality rates from the ONS, available here.
  6. Survey information from the Labour Force Survey, as yearly extracts from 2000-2016 for the April to June survey. Only registered users can download this data. Registration is possible for those at UK institutions. Other access arrangements can be made at request.

Cleaning and building the datasets

The included vignette contains the code necessary to build all datasets associated with this package. Each function needs to be pointed at the correct raw data. If the default file names/locations are changed then this will also require updating. Contact me if you have any problems.

Other vignettes explore approaches for estimating demographic parameters from the clean and munged datasets.

Docker

This packge was developed in a docker container based on the tidyverse docker image. To run the docker image run:

docker run -d -p 8787:8787 --name tbinenglanddataclean --mount type=bind,source=$(pwd)/data/tb_data,target=/home/rstudio/tbinenglanddataclean/data/tb_data -e USER=tbinenglanddataclean -e PASSWORD=tbinenglanddataclean seabbs/tbinenglanddataclean

The rstudio client can be found on port :8787 at your local machines ip. The default username:password is tbinenglanddataclean:tbinenglanddataclean, set the user with -e USER=username, and the password with - e PASSWORD=newpasswordhere. The default is to save the analysis files into the user directory. If running without the accomanying data then remove --mount type=bind,source=$(pwd)/data/tb_data,target=/home/rstudio/tbinenglanddataclean/data/tb_data.

To run a plain R terminal use:

docker run --rm -it --user seabbs tbinenglanddataclean /usr/bin/R

To run a plain bash session:

docker run --rm -it --user seabbs tbinenglanddataclean /bin/bash

To connect as root:

docker exec -ti -u root tbinenglanddataclean bash