Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRSciColl vocabularies to put in vocabulary servers - GRSciColl data schema review #549

Closed
ManonGros opened this issue Feb 5, 2024 · 10 comments
Assignees
Labels
GRSciColl Issues related to institutions, collections and staff

Comments

@ManonGros
Copy link
Contributor

ManonGros commented Feb 5, 2024

These are the following fields where the data is using controlled vocabularies. Those vocabularies need to be imported in the vocabulary server.

Entity type field/vocabulary note
institution type https://registry.gbif.org/vocabulary/InstitutionType
institution institutionalGovernance https://registry.gbif.org/vocabulary/InstitutionalGovernance
institution discipline https://registry.gbif.org/vocabulary/Discipline
collection contentType https://registry.gbif.org/vocabulary/CollectionContentType
collection preservationType https://registry.gbif.org/vocabulary/PreservationType
collection accessionStatus https://registry.gbif.org/vocabulary/AccessionStatus
identifiers identifierTypes -> we won't be using the vocabulary for identifier types
@MortenHofft
Copy link
Member

MortenHofft commented Feb 19, 2024

This sounds like a breaking change that needs a coordinated deployment with the registry console, portal16, graphql and hosted portals. When the feature branch is it is in dev then I can start that work. Please ping me by then

@marcos-lg
Copy link
Contributor

marcos-lg commented Feb 27, 2024

An important thing to remember about this is that if we make changes to an existing vocabulary that we're already using in the registry and these changes modify existing concepts(renaming or deleting a concept for example) we need to plan the release of the vocabulary and make the necessary changes in the registry at the same time because we might need to migrate concepts to the new version of the vocabulary.

Because of this, the registry should always use the latest release of the vocabs(latest release endpoints of the vocab API) so changes to an unreleased vocab doesn't affect the registry (by default the vocab api uses the latest data, not the latest released data).

Also, an improvement to this would be to keep a copy of all the vocab versions so we can query any version we want and it might be a bit easier to switch between versions. But I'm still not sure this is worth doing at this point since we don't know how often vocab changes will happen.

EDIT: since we'll use only the concept names in the registry and they can't be renamed, the only possible change is to deprecate a concept but it won't break the registry, although we might need to migrate them to use the replacement.

@marcos-lg marcos-lg self-assigned this Mar 7, 2024
marcos-lg added a commit that referenced this issue Mar 8, 2024
@marcos-lg
Copy link
Contributor

I've adapted the registry to use the vocabularies when they are ready. It's in a separate branch. One thing that we'll need is a mapping from the current values to the new vocabs so we can migrate the values in the DB.

@ManonGros
Copy link
Contributor Author

The accessionStatus vocabulary for collections is available here: https://registry.gbif.org/vocabulary/AccessionStatus the mapping is straightforward since it contains the same values.

@ManonGros
Copy link
Contributor Author

The contentType vocabulary for collections is available here: https://registry.gbif.org/vocabulary/CollectionContentType

For the mapping of old values, the "other" categories will linked to the new parent cases. For example ARCHAEOLOGICAL_OTHER should be mapped to Archaeological.
And HUMAN_DERIVED_MOLECULAR_DERIVATIVES will be mapped to MolecularDerivativesHuman.

Generally @marcos-lg would it make your life easier if I put all the current enums as alternative labels for their matching concepts? Like this: https://registry.gbif.org/vocabulary/CollectionContentType/concept/MolecularDerivativesHuman/hiddenLabels
I only thought of it now, let me know if this makes sense, I will do it for all the vocabs.

@marcos-lg
Copy link
Contributor

It makes sense to me so that way I can automate the migration

@ManonGros
Copy link
Contributor Author

Almost all the vocabularies are now in (although I still have to add names and definitions, but all the concepts are in). Five of these have the enums corresponding to their respective concepts as hidden labels.

The InstitutionalGovernance vocabulary (https://registry.gbif.org/vocabulary/InstitutionalGovernance), the field becomes multi-value so the mapping cannot be added as hidden label. Here is what I think should be mapped:

Enum Corresponding set of concepts
ACADEMIC_FEDERAL Academic Governmental
ACADEMIC_FOR_PROFIT Academic ForProfit
ACADEMIC_LOCAL Academic Local
ACADEMIC_NON_PROFIT Academic NonProfit
ACADEMIC_STATE Academic State
FEDERAL Governmental
FOR_PROFIT ForProfit
LOCAL Local
NON_PROFIT NonProfit
STATE State
OTHER leave empty

@ManonGros
Copy link
Contributor Author

@marcos-lg I am testing the search based on contentType https://api.gbif-dev.org/v1/grscicoll/collection?contentTypes=Archaeological but I get everything at once. Is that a problem with dev?

@marcos-lg
Copy link
Contributor

The param is in singular contentType. I updated the docs since it was contradictory (not deployed yet) but the param was always in singular.

@marcos-lg
Copy link
Contributor

Deployed to production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GRSciColl Issues related to institutions, collections and staff
Projects
None yet
Development

No branches or pull requests

3 participants