Support multi-value search for existing fields #665

timrobertson100 · 2022-02-10T11:12:50Z

The fields listed below are in ES and treated as single value strings.
Without breaking any pubic APIs, we can provide better search by treating them as multi-value fields.

For example, consider a record arriving with recordedBy: Morten Hoefft | Tim Robertson.
It is not possible today, to search for Tim Robertson and discover this record, along with others having this value. See also #178

Terms that currently support auto-suggest are noted, which may bring additional considerations.

Term	Auto-suggest
datasetID
datasetName	yes (to be added)
otherCatalogNumbers	yes (to be added)
typeStatus
recordedBy	yes
identifiedBy	yes
preparations
samplingProtocol	yes

This issue is intended to focus only on existing fields in ES, and those already being added in work in progress, and not to propose additional fields.

The text was updated successfully, but these errors were encountered:

MortenHofft · 2022-02-14T09:11:50Z

It believe it would be an appreciated feature if ordering could be retained (when serialized back into a string).

Perhaps related: gbif/portal-feedback#3292 (comment) for the desire to keep ordering in the multimedia array.

…herCatalogNumbers_preparations #662 #664 #665 #667 multivalue fields

* adapted ALA pipelines to new fields added to basic record * run ALA ITs * Turn on ITs * Revert "Turn on ITs" This reverts commit acc264c. * turn off tests ALA Co-authored-by: Nikolay Volik <nvolik@gbif.org>

marcos-lg · 2022-04-28T13:06:24Z

Deployed to PROD

sylvain-morin · 2022-06-21T13:38:15Z

Hi,

I'm really wondering why datasetID and datasetName are multi-value?
(I just noticed this change in ALA since I'm migrating to the new ALA / Pipeline)

How can an occurrence be part of 2 datasets?

Thanks

timrobertson100 · 2022-06-21T13:46:37Z

I'm really wondering why datasetID and datasetName are multi-value?

The request came from the GBIF node community, where (if I can remember correctly) they are using datasetID and datasetName to encode the various projects that a record is associated with, and also when dealing with aggregating data into a single dataset for GBIF with multiple origins.

sylvain-morin · 2022-06-21T13:49:36Z

Thanks for the explanation!

We need a multi-value projectID field ! (another similar topic about projectID in the metadata :-)

timrobertson100 · 2022-06-21T13:56:35Z

Yes.

It is indeed projectIDs they're encoding though, as you can see on this discussion

Projects aren't covered in DwC terms which is why I think they used the dataset (I guess assuming this is the dataset created by a project) but we do have projects and programmes in the GBIF API. They are used for the projects and programmes the GBIF organisation itself runs though (BID, BIFA etc) so it might become overloaded for us to introduce that.

dagendresen · 2022-09-29T02:45:48Z

I think the issue is more deeper rooted in the data model structure in that the simple Darwin Core Occurrence model denormalizes the real-world objects such as a collection specimen or a monitored organism into the Occurrence view where they are in practice only identified by the occurrenceID - which ultimately is representing a simple DwC data records and not the real-world entities of actual interest.

The "data records" can thus take part in multiple "datasets". There is more than one way to organize the data records into sets of data records. Including sets of data records (which represent denormalized real-world entities) for different real-world projects. The more technical dataset model for the purpose of publishing these data records into GBIF is not the main concern here, but rather how to group records belonging to different "projects".

One important reason or rationale is to group records produced or updated from different project funding. Similar to how the GBIF BID, BIFA, and CESP projects list datasets produced by this project funding. However, often we see project funding for georeferencing, or taxonomic validation and desire to "tag" the data records (or actually ultimately rather desire to "tag" the actual real-world collection specimens) that were georeferenced from a specific project funding --> to credit the funder and track fulfillment of the promise to the funder of e.g. georeferencing 10 000 collection specimens...

timrobertson100 assigned MortenHofft and marcos-lg Feb 10, 2022

gbif deleted a comment from MortenHofft Feb 10, 2022

marcos-lg added a commit that referenced this issue Feb 11, 2022

#662 #664 #665 #667 multivalue fields

ddcf435

marcos-lg mentioned this issue Feb 11, 2022

#662 #664 #665 #667 multivalue fields #668

Merged

muttcg pushed a commit that referenced this issue Feb 14, 2022

Merge pull request #668 from gbif/#662_664_667_dataset_id_and_name_ot…

8b815fd

…herCatalogNumbers_preparations #662 #664 #665 #667 multivalue fields

This was referenced Mar 14, 2022

Support for fields that have changed to multivalue fields in pipelines AtlasOfLivingAustralia/biocache-service#739

Closed

Livingatlas SOLR pipeline index changes required for multivalue fields #688

Open

marcos-lg closed this as completed Apr 28, 2022

dagendresen mentioned this issue Sep 29, 2022

Structured search on datasetname term so we can group and download records gbif-norway/helpdesk#88

Closed

ymgan mentioned this issue Mar 21, 2023

dwc:datasetName use to group datasets, yes, no, options please? tdwg/dwc-qa#199

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multi-value search for existing fields #665

Support multi-value search for existing fields #665

timrobertson100 commented Feb 10, 2022 •

edited by marcos-lg

Loading

MortenHofft commented Feb 14, 2022

marcos-lg commented Apr 28, 2022

sylvain-morin commented Jun 21, 2022

timrobertson100 commented Jun 21, 2022

sylvain-morin commented Jun 21, 2022

timrobertson100 commented Jun 21, 2022

dagendresen commented Sep 29, 2022

Support multi-value search for existing fields #665

Support multi-value search for existing fields #665

Comments

timrobertson100 commented Feb 10, 2022 • edited by marcos-lg Loading

MortenHofft commented Feb 14, 2022

marcos-lg commented Apr 28, 2022

sylvain-morin commented Jun 21, 2022

timrobertson100 commented Jun 21, 2022

sylvain-morin commented Jun 21, 2022

timrobertson100 commented Jun 21, 2022

dagendresen commented Sep 29, 2022

timrobertson100 commented Feb 10, 2022 •

edited by marcos-lg

Loading