-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a mapping from iDigBio to GrBio #187
Comments
Process
Mappings have been done using multiple approaches. In order (and only next step if prior didn't find any match)
ResultiDigBio entries in total: 1591 The result is simply an annotated version of the iDigBio API response. |
First we matched using IRNIdigbio entries with an IRN in total: 563
559 can be matched to an institution by IRN
|
Matching with other identifiersiDigBio use identifiers like
38 can be matched to an institution by an identifier
|
Matching with a combination of title and codeAll entities that can be linked by both code and title (and the country is US) are considered correct. If there is more one grbio institution that match both code and title, then the first will be selected (unless there is also a match to a collection by title+code where the parent institution is also an option) If the title match is fuzzy, then it has to match exactly one grbio institution. 291 iDigBio entries can be matched to a grbio institution by institution code and title
|
Matching with a combination of city and code.If there is exactly one institution in GrBio with the same code in the same city, then it is considered a match.
203 can be matched to an institution by both code and city
|
Matching by title alone when there are no iDigBio institution codeIf the iDigBio entry without code has a name match with a GrBio institution, then they are linked. 93 do not have a code but have a matching title
|
Matching by title, despite conflicting codesLooking at them they all seem reasonable. 34 have matching titles but not codes.
|
Manual matchingcandidates were selected by scripts, but manually selected as they seemed reasonable. 138 have been manually matched.
|
That leaves 235 missing a match (could be new)For some of those we can detect some candidates that needs manual evaluation Code candidatesNot matched, but has a code match 22 of the remaining items have a code match - they look like unlikely matches
City candidatesNot matched, but has the same city and 2 words in common in their titles 57 has a 2 word overlap in title and are from the same city
|
ASU BIOCOLL and the Chicago Academy of Sciences entries (CHAS, CACS, CA and CASM) have all been curated by our editors (which are from these institutions). I suspect our version is more up to date than iDigBio's and I wouldn't want to overwrite them. I suspect that it could be the case for some other mis-matching codes, I need to check these. |
Hi all, I'm almost finished going thru the 235 unmatched entries, but I do have a question (that might be very simple :) ). In the spreadsheet of the matches that @ManonGros sent me, it seems that each URL I've tried in the GrSciColl_INST_MATCH_URL column gives me a 404 error... is this expected behavior? This happens whether or not I am logged into the Registry. I just wanted to see what info was displayed at the URLs, but no dice with any of the URLs I've tried. All 404. :S |
Hi @CatChapman This is my fault, it is because I generated the URLs with an incorrect prefix! |
Aha! Thank you. :) |
As with IH we will map all entities to both an institution and a collection. The first step is to create a mapping for the institutions. Once that is done we can map/create the collections belonging to those institutions.
The text was updated successfully, but these errors were encountered: