Reference Administrative Names #81
Comments
I have taken a crack at this and it works well for some geographies, (Canada, Belgium). |
Making everything mappable seems like a good goal, and the look-up table sounds like a neat way to do this Hamish! Although I guess it would also mean updating two files rather than one for each new country we add. We might also need to decide what to do when a country has multiple sub-national admin levels - eg default to adm1, or always return at the finest spatial scale possible. |
Seems like a very good goal.
Maybe we could have two stages of development for each country, that we track:
Also consider using https://github.com/epicentre-msf/hmatch for geomatching in complex situations. |
hmatch is still private @ffinger . @patrickbarks maybe now is a good time to share it with the world! For adm1 matching we have found that GADM host a very comprehensive worldwide administrative areas dataset for level 1 and 2 (for most countries) that can be accessed directly from R with the raster package:
But the coding scheme is Another option is this admin level 1 dataset from ArcGis that has ISO 3166-2 codes. Could be used as a default admin1 level dictionary to match against. |
Agree with @ffinger that a two stage dev process makes sense. |
I also agree that a two stage process makes sense, getting the function up and running, then trying to make the data mappable. Also sounds good to reference data to GADM, not to |
Currently, many regional case count datasets are being returned from the package without clear reference to an existing geographic dataset. This means that users need to do some name matching before mapping case counts or joining them to other available datasets.
We are considering adding an iso_3166_2 field to all regional case counts to allow quick joins. This would improve the quality of the data being provided to users but involves some more work to manually match administrative names and fix administrative name matching as datasets change.
The current proposal is to create a directory in the
raw-data
folder with lookup tables with two fields:name_as_recieved
andiso_3166_2
. A function can then be incorporated into existing functions that reads from this directory (hosted on github) and joins iso codes to administrative names. We can then write tests to check that names continue to match the lookup tables exactly.I believe this would improve the usability of the data but would increase the amount of work to create a new function a bit and will also lead to more tests breaking when datasets change.
Would be good to hear how people feel about this addition, especially as we add more case counts for LMIC.
@seabbs @kathsherratt @ffinger
The text was updated successfully, but these errors were encountered: