Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFiltering out Herbaria/Museum locations #20
Comments
|
Right, not implemented yet. Correct, not easy to solve (reason it's not been done yet) Thanks for the example. I think that's a good strategy - to look for tightly clustered records that all have the same lat/long data. I started to collect herbaria lat/long's here https://github.com/ropensci/scrubr/tree/master/inst/extdata - Needs to be much more exhaustive though. Do you know of lat/long data for other herbaria/meseums? Seems if metadata is available too we can see if the herbarium/meseum is mentioned, and match on that as well. |
|
see also this discussion on the gbif api users list http://lists.gbif.org/pipermail/api-users/2014-October/000076.html |
|
Thanks for the pointers. Really useful. I do actually know of a really comprehensive source of lat/longs for botanical gardens (although not musuems). If you source view (Ctrl-U) the HTML map given by BGCI, all the lat longs appear to be publicly visible: https://www.bgci.org/map.php |
|
Erika Edwards 'cleanGbifCoords.1.0.py' script might be worth looking into for this too: cleanGbifCoords.1.0.py : This is a general-use version of a portion of the script that we developed to clean the Zanne et al. climate data. Specifially, this script will remove political centroids and locations of major herbaria from a collection of latitudes/longitudes. If used, please cite our BCA: Edwards EJ, J de Vos, MJ Donoghue. 2015. Brief Communications Arising: Doubtful pathways to cold tolerance in plants. Nature doi:10.1038/nature14393. |
|
Sorry for slow reply on the BGCI site, I did actually play with scraping that but it was a pain so dropped down the to do list. Will have a look at that python script. |
Filtering out Herbaria/Museum locations :
I see it's on your roadmap/todo but it's not implemented yet, right?
This is a highly visible, important problem for me. I know it's perhaps not easy to solve(?) but just to say (and I'm sure you know) this is a major problem.
e.g. from an analysis of 4,828,341 GBIF records from 89,180 plant species one can see a clearly visible peak in the 'median-latitude-per-species' histogram at around -33.875. Why? Turns out there are 19,773 records across 2,757 species that all have a latitude of exactly "-33.875" and 19,706 (>99.6%) of these records come from the PRECIS database provided by SANBI.
TL;DR records with a latitude of "-38.875" from the PRECIS database should be viewed with extreme cynicism. I am working on finding/identifying more such institution/database cases, happy to supply more data on (plant-related) cases I find if that'd be helpful. I'm starting from scratch / zero-prior knowledge here though. There might already be a good list of such known cases?