-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi-value search for existing fields #665
Comments
It believe it would be an appreciated feature if ordering could be retained (when serialized back into a string). Perhaps related: gbif/portal-feedback#3292 (comment) for the desire to keep ordering in the multimedia array. |
Deployed to PROD |
Hi, I'm really wondering why datasetID and datasetName are multi-value? How can an occurrence be part of 2 datasets? Thanks |
The request came from the GBIF node community, where (if I can remember correctly) they are using datasetID and datasetName to encode the various projects that a record is associated with, and also when dealing with aggregating data into a single dataset for GBIF with multiple origins. |
Thanks for the explanation! We need a multi-value projectID field ! (another similar topic about projectID in the metadata :-) |
Yes. It is indeed projectIDs they're encoding though, as you can see on this discussion Projects aren't covered in DwC terms which is why I think they used the dataset (I guess assuming this is the dataset created by a project) but we do have projects and programmes in the GBIF API. They are used for the projects and programmes the GBIF organisation itself runs though (BID, BIFA etc) so it might become overloaded for us to introduce that. |
I think the issue is more deeper rooted in the data model structure in that the simple Darwin Core Occurrence model denormalizes the real-world objects such as a collection specimen or a monitored organism into the Occurrence view where they are in practice only identified by the occurrenceID - which ultimately is representing a simple DwC data records and not the real-world entities of actual interest. The "data records" can thus take part in multiple "datasets". There is more than one way to organize the data records into sets of data records. Including sets of data records (which represent denormalized real-world entities) for different real-world projects. The more technical dataset model for the purpose of publishing these data records into GBIF is not the main concern here, but rather how to group records belonging to different "projects". One important reason or rationale is to group records produced or updated from different project funding. Similar to how the GBIF BID, BIFA, and CESP projects list datasets produced by this project funding. However, often we see project funding for georeferencing, or taxonomic validation and desire to "tag" the data records (or actually ultimately rather desire to "tag" the actual real-world collection specimens) that were georeferenced from a specific project funding --> to credit the funder and track fulfillment of the promise to the funder of e.g. georeferencing 10 000 collection specimens... |
The fields listed below are in ES and treated as single value strings.
Without breaking any pubic APIs, we can provide better search by treating them as multi-value fields.
For example, consider a record arriving with
recordedBy: Morten Hoefft | Tim Robertson
.It is not possible today, to search for
Tim Robertson
and discover this record, along with others having this value. See also #178Terms that currently support auto-suggest are noted, which may bring additional considerations.
This issue is intended to focus only on existing fields in ES, and those already being added in work in progress, and not to propose additional fields.
The text was updated successfully, but these errors were encountered: