Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UK data from gov.uk api #60

Merged
merged 12 commits into from
Aug 18, 2020
Merged

Add UK data from gov.uk api #60

merged 12 commits into from
Aug 18, 2020

Conversation

kathsherratt
Copy link
Contributor

@kathsherratt kathsherratt commented Aug 12, 2020

  • uk_api.R sources all UK data from the coronavirus.data.gov.uk service.
    • The heavy lifting code is taken from their developers guide
      • we should flag to them and/or add as author
    • Data are left in raw state to highlight different data reporting (varying dates by test/publish/death) and the range of new variables available (testing, hospital data)
  • get_regional_data.R has special UK switch to avoid the raw UK data going through the standardised formatting pipeline
    • very open to ideas on better balancing returning all data vs standardised clean data
  • README, DESCRIPTION updated

This PR probably replaces #51 (which gets hospital data only) - but many thanks to @qleclerc for kicking it all off!

- `uk_api.R` sources all UK data from the coronavirus.gov.uk service
- Data are left in raw state to highlight different data reporting (varying dates by test/publish/death) and the range of new variables available (testing, hospital data)
- `get_regional_data.R` has special UK switch to avoid the raw UK data going through the standardised formatting pipeline
- updated README, DESCRIPTION
@kathsherratt kathsherratt mentioned this pull request Aug 13, 2020
4 tasks
@qleclerc
Copy link

Thanks Kath! I'll rely on that for future data updates on my end instead of my first script in #51 , thanks for setting up the interaction with the API :)

UK API now works properly and is integrated to return data as with other countries in the package.

- Own internal function to interact with API, so no  dependence on the PHE R package `ukcovid19`
- Data are returned clean (with standardised variable names, as with other countries) but alongside all the raw variables
- The old UK code is still here but is commented out. Suggest we either
  - delete,
  - move to a back-up folder in case the new code breaks (!)
  - pull out any useful bits and store separately - as it has some useful tables of lower level region authority codes

Outstanding issues (no action required yet):
- Deaths data at a regional level is returned as NA. This is as it comes from the gov.uk dashboard, but this is expected to change and data added soon. See [issue here](UKHSA-Internal/coronavirus-dashboard-api-python-sdk#12)
- Similarly there is a suggestion that England cases may "soon" be available by PublishDate, which would be better as that is consistent with other UK nations
@kathsherratt
Copy link
Contributor Author

kathsherratt commented Aug 17, 2020

This is now good to go pending review @seabbs.

Latest commit details:

UK API now works properly and is integrated to return data as with other countries in the package.

  • Own internal function to interact with API, so no dependence on the PHE R package ukcovid19
  • Data are returned clean (with standardised variable names, as with other countries) but alongside all the raw variables
  • The old UK code is still here but is commented out. Suggest we either
    • delete,
    • move to a back-up folder in case the new code breaks (!)
    • pull out any useful bits and store separately - as it has some useful tables of lower level region authority codes

See also #61 with tasks that this commit completes.

Outstanding issues (no action required yet):

  • England regional deaths data are returned as NA. This is as it comes from the gov.uk dashboard, but this is expected to change when data are added soon. This will automatically get returned whenever it is added. See issue here
  • England regional cases data may "soon" be available by PublishDate. This will not break code, but when this comes through we should make a small tweak to use this consistently as the cases_new metric.

Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I am not really clear what is being lost with the old data estimates? Can we still map the new data easily? If not is there anything we can do about it? If nothing is added suggest drop for now (code will still be in the commits if we need to go back and get it).

R/uk_api.R Outdated Show resolved Hide resolved
R/uk_api.R Outdated Show resolved Hide resolved
R/get_regional_data.R Outdated Show resolved Hide resolved
@seabbs
Copy link
Contributor

seabbs commented Aug 18, 2020

It looks like all the UK tests are currently failing. These probably just need to be gone through and have their expectations updated.

@kathsherratt
Copy link
Contributor Author

kathsherratt commented Aug 18, 2020

Nothing particularly lost by removing the old code, if it's in the commits then that's fine as a back-up.

I am re-writing the tests.

simplified tests that work with UK API
- api query is valid (ie returns status code 200)
- data are up to date (<7 days)
- correct number of regions returned at level 1 and 2
- correct column types
@kathsherratt kathsherratt mentioned this pull request Aug 18, 2020
@seabbs
Copy link
Contributor

seabbs commented Aug 18, 2020

Made a few edits (changing file name + fixing merge issues)

@seabbs seabbs merged commit f777221 into master Aug 18, 2020
@seabbs seabbs deleted the uk-api branch August 18, 2020 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants