Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRSciColl - A request for API that links occurrences to a collection based on parent institution (institution code) and catalog number #534

Open
spalp opened this issue Oct 26, 2023 · 14 comments
Assignees
Labels
enhancement GRSciColl Issues related to institutions, collections and staff

Comments

@spalp
Copy link

spalp commented Oct 26, 2023

Can I ask if the following API: https://github.com/ManonGros/Small-scripts-using-GBIF-API/blob/master/map_occ_to_grscicoll.ipyn can be modified to link occurrences from a selected dataset to a selected collection given an Institution Code and not on CollectionCode?

An example: I have a dataset with Coleoptera species, in which all occurrences with Institution code = 'NMNHS' (https://www.gbif.org/occurrence/search?dataset_key=ee639eb0-bf6f-410b-a903-21665b3bdb85&institution_code=nmnhs) correspond to the collection Coleoptera | Code: BG-NMNHS-ENT (https://registry.gbif.org/collection/84f3bb13-d11c-4926-bced-040e8c38bddc). I cannot link using the above-mentioned API because the dataset has no Collection Code but the Institution Code is sufficient to link uniquely to the collection.

@ManonGros
Copy link
Contributor

@marcos-lg we have some cases where the data comes from PLAZI (digitised paper made automatically into datasets).
In those datasets, the collection codes aren't mentioned because it is obvious to the authors from the context. For example, if they publish a paper about a species of insect, they obviously mean that they looked at specimens from entomological collections.
This means that only institution codes are provided.

Slaza is trying to map those occurrences to collections. She would like to be able to specify that the occurrences from this "entomology" dataset with the institutionCode XXX map to the entomological collection under XXX institution.

Do you think something like that could be implemented?
If there isn't any straightforward way to do this, I think it is ok to leave the issue on the side for now.

@ManonGros ManonGros added GRSciColl Issues related to institutions, collections and staff enhancement labels Nov 1, 2023
@marcos-lg
Copy link
Contributor

It's not possible right now but I think it could be incorporated to the occurrence mappings that we have now. Because with these mappings we can do things like mapping all occurrences from a dataset XX with (optionally) a code Y or identifier Z to an specific collection but we can't specify the institution code for the collections. But I think it shouldn't be complicated to adapt it to cover this case too.

@marcos-lg marcos-lg self-assigned this Nov 1, 2023
@spalp
Copy link
Author

spalp commented Nov 16, 2023

Thank you. That is great news.
...And would be too much to ask to be able to link occurrences from a dataset to a collection based on the prefix of the catalogue number? For example, within the datasets INSDC Sequences and International Barcode of Life project (iBOL), catalogue numbers in the following format SOM ###### [e.g. SOM 154573] are from vouchers stored at the herbarium SOM (now a collection under an institution): https://scientific-collections.gbif.org/collection/cd4845b3-6772-4b77-8b97-fc04eede5f90. However, the datasets provide neither an institution nor a collection code.

@marcos-lg
Copy link
Contributor

I deployed to production a change in occurrence mappings to specify the institution code for collections. The new field in the occurrence mapping is called parentCode. This way we can map all the occurrences from the dataset X and parentCode Y(this is the institution code) to the collection C. To do so we should add this mapping to the collection C:

"datasetKey": X,
"parentCode": "Y"

@spalp
Copy link
Author

spalp commented Nov 23, 2023

Thanks, can't wait to see the result. I already added a mapping with the new field here: http://api.gbif.org/v1/grscicoll/collection/84f3bb13-d11c-4926-bced-040e8c38bddc/occurrenceMappings/152.

@ManonGros
Copy link
Contributor

ManonGros commented Nov 23, 2023

@marcos-lg about @spalp other comment. I don't think we should use any prefix for mapping.

However, I think we could imagine using the catalogueNumber. This isn't ideal but at least it would allow to solve those cases where one or two important record published by a third party need to be linked to GRSciColl.

Ideally:

  • a Mapping for a dataset and catalogue number would be prioritised over a collection or institution code link.
  • if possible and in order to avoid having 20 mappings for 20 records, it would be nice to be able to give a list like "catalogueNumber": ["agagaga20", "SOM2334"] implying an in check.

What do you think? let me know if you think that could be workable. We could also put the idea on hold until we get more users interested.

@spalp spalp changed the title GRSciColl - A request for API that links occurrences to a collection based on the institution code GRSciColl - A request for API that links occurrences to a collection based on parent institution (institution code) and catalog number Nov 23, 2023
@marcos-lg
Copy link
Contributor

Thanks, can't wait to see the result. I already added a mapping with the new field here: http://api.gbif.org/v1/grscicoll/collection/84f3bb13-d11c-4926-bced-040e8c38bddc/occurrenceMappings/152.

@spalp keep in mind that this occurrenceMapping will map all the occurrences from that dataset to that collection. If you want it to be only for the ones that have a specific institution code you have to put that institution code in the parentCode field.

@marcos-lg
Copy link
Contributor

What do you think? let me know if you think that could be workable. We could also put the idea on hold until we get more users interested.

I think it could be done with the occurrence mappings containing a list of catalogue numbers yeah. But I don't think the lookup has to prioritize these mappings, it would just find any mapping that matches. The result would be the same, the matchType would still be EXPLICIT_MAPPING.

I think the change could be easily done but it requires to do changes in pipelines too in order to use the catalog number in the lookup.

@spalp
Copy link
Author

spalp commented Dec 6, 2023

http://api.gbif.org/v1/grscicoll/collection/84f3bb13-d11c-4926-bced-040e8c38bddc/occurrenceMappings/152.

Hello, @marcos-lg, I used the parentCode field already in occurrence mapping 152. However, that was before you wrote that the parentCode field was in deployment. Therefore, just in case, I deleted occurrence mapping 152 and created a few minutes ago new mapping (153) using the parentCode field. However, I still cannot see the field parentCode. The result of the new mapping is here: https://api.gbif.org/v1/grscicoll/collection/84f3bb13-d11c-4926-bced-040e8c38bddc. Does it mean that mapping is still not using the field parentCode?

@marcos-lg
Copy link
Contributor

@spalp It should be working. I don't see any errors in the logs, could you share the request that you do to create the mapping?

@spalp
Copy link
Author

spalp commented Dec 6, 2023

@spalp It should be working. I don't see any errors in the logs, could you share the request that you do to create the mapping?

image

What makes me wonder whether this mapping works is the fact that I do not see the parentCode here:
image

marcos-lg added a commit that referenced this issue Dec 6, 2023
@marcos-lg
Copy link
Contributor

Sorry @spalp there is a bug and that's why it doesn't work. I'll try to deploy it tomorrow and I'll let you know so you can create the mapping again. Thanks for reporting it!

@marcos-lg
Copy link
Contributor

@spalp it is now fixed.

@spalp
Copy link
Author

spalp commented Dec 12, 2023

Thanks, @marcos-lg. I am now able to see the parentCode in the last 4 occurrence mappings here: https://api.gbif.org/v1/grscicoll/collection/84f3bb13-d11c-4926-bced-040e8c38bddc/occurrenceMapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement GRSciColl Issues related to institutions, collections and staff
Projects
None yet
Development

No branches or pull requests

3 participants