Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search support for datasetName and datasetID #662

Closed
MortenHofft opened this issue Feb 8, 2022 · 14 comments
Closed

Add search support for datasetName and datasetID #662

MortenHofft opened this issue Feb 8, 2022 · 14 comments
Assignees

Comments

@MortenHofft
Copy link
Member

MortenHofft commented Feb 8, 2022

Prompted by gbif/portal-feedback#3006 and gbif/portal-feedback#3026

The desire in above issues it to have a way to search for occurrences within or across datasets using datasetName https://dwc.tdwg.org/terms/#dwc:datasetName and datasetID https://dwc.tdwg.org/terms/#dwc:datasetID.

And to allow multiple values when the publisher delimits them by |.

So if a publisher provides an occurrence withdatasetName: projectA|projectB then it should be possible in the API to find that occurrence by adding a filter like datasetName=projectA (a keyword filter in ES terminology I guess?)

I do not believe we need a suggest endpoint for this.

@rondlg
Copy link

rondlg commented Feb 9, 2022

Marvelous. Thankyou.

@MortenHofft
Copy link
Member Author

I just realise that I haven't address these questions:

Current data that use pipes in a different way
The documentation currently do not say anything about using a pipe separator to split multiple values for datasetName.
So we have data that use |, but where it probably isn't intended as 2 names as such but as a hierarchy. E.g. Natural History Museum Denmark | Herpetology Collection
https://api.gbif.org/v1/occurrence/search?occurrenceID=5d867879-6e7d-4463-8773-e005a6a9731d

Would it make sense (and feasible) to evaluate how much data would be misinterpreted this way and try to get it changed at source?

Current response is a string
This issue suggest that we interpret this an array of keywords for search. Should the response format stay the same as now: a string. Or should it be changed to an array of strings? And where do we draw the line for API stability?

@rondlg
Copy link

rondlg commented Feb 10, 2022

If it's not a heavy lift to check for you to check for the delimiter then I'd say go for it. That said, certainly for us using a pipe to delimit each value would be the preferred way to handle it. We use that for other dwc fields and so isn't a terrible ask for us to fix at source (if we haven't already)!

Oh, and yes an array of strings please if possible.

@MortenHofft
Copy link
Member Author

MortenHofft commented Feb 10, 2022

It would be considered a breaking change if we changed the response format from datasetName: "projectA | projectB" to datasetName: ["projectA", "projectB"]. As in projects using the code would be likely to fail.

So if we were to return an array in the response, then we would need a new field. Instead we will go for keeping the response format as a string. But it will be indexed as an array and hence be searchable as such. Meaning that you will be able to search for datasetName: projectA and get the occurrence back. Consumers that need the atomised values for display will need to split on | themselves.

@rukayaj
Copy link

rukayaj commented Apr 11, 2022

Anything I can do to help with this?

@MortenHofft
Copy link
Member Author

It is in UAT https://api.gbif-uat.org/v1/occurrence/search?datasetName=ebcc%20atlas%20of%20european%20breeding%20birds, but part of a larger feature release that influence all UIs and I've been slow at adapting to all the small changes.

@marcos-lg
Copy link
Contributor

Deployed to PROD.

@rukayaj
Copy link

rukayaj commented Jun 15, 2022

Thanks for this! I was just wondering if it will get added to the front end gui search options https://www.gbif.org/occurrence/search any time soon?

(Edit: We have someone who wants a DOI for this kind of search)

@MortenHofft
Copy link
Member Author

MortenHofft commented Jun 16, 2022

That wasn't the plan to add those in the UI (gbif.org). They seem to be very internal and mainly intended for publisher API users. At least that is how we have discussed it so far. They seem to be more confusing than useful for any users but the publisher.
What is the use case please?

@rukayaj
Copy link

rukayaj commented Jun 16, 2022

We have a user who wants to be able to download all occurrences with a particular datasetName, and then have a DOI to cite in a paper. The records are a small subset of records within this huge dataset https://www.gbif.org/dataset/b124e1e0-4755-430f-9eab-894f25a9b59c, datasetName = NØF-vannfugltellinger

Works fine on the API: https://api.gbif.org/v1/occurrence/search?datasetname=NØF-vannfugltellinger

@rondlg
Copy link

rondlg commented Jun 16, 2022

Hi, we also have users who want to be able to search on that term. In fact their grouo was one of the primary reasons why we sent in the request. That said the api access is very welcome.

@MortenHofft
Copy link
Member Author

It should work fine with the download API as well and give you a DOI.

@rukayaj
Copy link

rukayaj commented Mar 6, 2024

I've had another request for the GBIF UI to allow searching by datasetName. Use case:

Colleagues have received funding from a particular funding agency and have published samples associated with these projects in our Collections Management System which are grouped using datasetName. It would then be useful to be able to search out the relevant data in GBIF.

The same data field in the CMS is also used to identify other datasets, such as e.g. all accessions used for the Arctic collections paper we published last year (which I guess was when we started talking about datasetName).

While it’s great that it’s available via the API, this is not a solution that is easily accessible to many users, and hence it would be very nice to be able to search for it in the standard GBIF UI. In other words, our users are not comfortable using the API for this kind of search and would prefer to have it in the UI with a download option.

@MortenHofft
Copy link
Member Author

MortenHofft commented Mar 6, 2024

please see gbif/portal16#1912

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants