Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

Reference Administrative Names #81

Open
hamishgibbs opened this issue Apr 10, 2020 · 6 comments
Open

Reference Administrative Names #81

hamishgibbs opened this issue Apr 10, 2020 · 6 comments

Comments

@hamishgibbs
Copy link
Contributor

Currently, many regional case count datasets are being returned from the package without clear reference to an existing geographic dataset. This means that users need to do some name matching before mapping case counts or joining them to other available datasets.

We are considering adding an iso_3166_2 field to all regional case counts to allow quick joins. This would improve the quality of the data being provided to users but involves some more work to manually match administrative names and fix administrative name matching as datasets change.

The current proposal is to create a directory in the raw-data folder with lookup tables with two fields: name_as_recieved and iso_3166_2. A function can then be incorporated into existing functions that reads from this directory (hosted on github) and joins iso codes to administrative names. We can then write tests to check that names continue to match the lookup tables exactly.

I believe this would improve the usability of the data but would increase the amount of work to create a new function a bit and will also lead to more tests breaking when datasets change.

Would be good to hear how people feel about this addition, especially as we add more case counts for LMIC.

@seabbs @kathsherratt @ffinger

@hamishgibbs
Copy link
Contributor Author

I have taken a crack at this and it works well for some geographies, (Canada, Belgium). get_germany_regional_cases already returns iso 3166_2 codes. Afghanistan is challenging because there is not total agreement between the admin boundaries we are recieving and those available with rnaturalearth::ne_states() this may be because the virus has not reached each province or because boundaries have changed.

@kathsherratt
Copy link
Collaborator

Making everything mappable seems like a good goal, and the look-up table sounds like a neat way to do this Hamish! Although I guess it would also mean updating two files rather than one for each new country we add. We might also need to decide what to do when a country has multiple sub-national admin levels - eg default to adm1, or always return at the finest spatial scale possible.

@ffinger
Copy link
Collaborator

ffinger commented Apr 12, 2020

Seems like a very good goal.
Some caveats we have to think about:

  • some countries use a sanitary/health subdivision different from the administrative one (common in Africa)
  • as @kathsherratt mentioned, in some countries it may be preferable to use adm2 or 3 instead of adm1, because the data is available at that level
  • for some areas and especially for higher up subdivisions (>= adm2) in LMIC the matching becomes difficult because a clean reference database doesn't exist or there is several concurring ones. Areas also frequently change name or get merged/split over time.
  • increased work to make everything mappable

Maybe we could have two stages of development for each country, that we track:

  • basic data extraction, with the names given in the data source
  • data geo-matching done

Also consider using https://github.com/epicentre-msf/hmatch for geomatching in complex situations.

@PaulC91
Copy link
Collaborator

PaulC91 commented Apr 12, 2020

hmatch is still private @ffinger . @patrickbarks maybe now is a good time to share it with the world!

For adm1 matching we have found that rnaturalearth::ne_states() is no the most up-to-date and often returns admin 2 level not admin 1 (France being an example).

GADM host a very comprehensive worldwide administrative areas dataset for level 1 and 2 (for most countries) that can be accessed directly from R with the raster package:

drc_adm1 <- sf::st_as_sf(raster::getData('GADM', country = "COD", level = 1, path = ".cache"))

But the coding scheme is GID, not ISO, in the form of COD.1_1.

Another option is this admin level 1 dataset from ArcGis that has ISO 3166-2 codes. Could be used as a default admin1 level dictionary to match against.

@seabbs
Copy link
Contributor

seabbs commented Apr 12, 2020

Agree with @ffinger that a two stage dev process makes sense.

@hamishgibbs
Copy link
Contributor Author

I also agree that a two stage process makes sense, getting the function up and running, then trying to make the data mappable. Also sounds good to reference data to GADM, not to rnaturalearth boundaries. And we would definitely like to provide admin codes at the correct spatial level, admin 2 or 3 if available. For example, FIPS codes for US counties, not the iso code of the state that county is in.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants