Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DwC term otherCatalogNumbers searchable #664

Closed
FedorSteeman opened this issue Feb 10, 2022 · 4 comments
Closed

Make DwC term otherCatalogNumbers searchable #664

FedorSteeman opened this issue Feb 10, 2022 · 4 comments
Assignees

Comments

@FedorSteeman
Copy link

FedorSteeman commented Feb 10, 2022

It's crucial that the DarwinCore term otherCatalogNumbers (http://rs.tdwg.org/dwc/terms/otherCatalogNumbers) is made searchable.

Many institutions around the world are merging as are their cataloging systems. For many such mergers the adoption of a novel institution-wide catalog numbering system is not uncommon.

Legacy catalogue numbers ("alternate catalog numbers") are typically mapped to this otherCatalogNumbers field and many researchers/curators and collection managers can attest to the "old" numbers being referrred to in a large body of older literature.

Attempts at doing a free text search on an alternate catalog number has turned out to be problematic due to the stemming algorithm used by your systems. For instance search on old catalog number "ZMUC-R771281" on one of our datasets is fruitless, as you can see here:
https://www.gbif.org/occurrence/search?q=ZMUC-R771281&dataset_key=8c834f97-c5df-4280-9623-86594979f91a

In one concrete case, we are currently trying to persuade our own Botany department to migrate their data to Specify so this data can be shared via GBIF. They're currently limited to sharing data via JSTOR. It will be easier to convince them if we can show them that their legacy catalog numbers will remain useful to search on in this new data integration via GBIF.

@muttcg muttcg transferred this issue from gbif/portal-feedback Feb 10, 2022
@muttcg muttcg assigned muttcg and marcos-lg and unassigned muttcg Feb 10, 2022
@MortenHofft
Copy link
Member

Multiple values
The documentation says values should be separated by a pipe |. So I suppose it would make sense if we interpreted it as such (an array of values) for search?

Response format
But what about the response format then? Should it continue to be a single string or should it be an array of strings? Same question as in #662 (comment)

@FedorSteeman
Copy link
Author

Multiple values
Although we will try to only have a single alternate catalog number served via this field, I will make sure that any multiple values will be separated by pipes.

Response format
For as far as we will share multiple values, which will be limited if occurring at all, having the response format being an array of values would make most sense.

@MortenHofft
Copy link
Member

Regarding response format please see this comment #662 (comment)

in short: to avoid a breaking change we should keep the response format a string. Even if it is searchable as individual values

marcos-lg added a commit that referenced this issue Feb 11, 2022
muttcg added a commit that referenced this issue Feb 14, 2022
…herCatalogNumbers_preparations

#662 #664 #665 #667 multivalue fields
@marcos-lg
Copy link
Contributor

Deployed to PROD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants