-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishing a new version by a script (command line) #1480
Comments
Hello @meliezer I totally agree with this feature. I list the resource IDs in a file (one ID by row) But I agree that a REST API would be better ! |
I'm not sure a full REST API for the IPT makes sense, as it quickly becomes the GBIF registry, but having some management API for key features like the one @meliezer and @sylmorin-gbif need would make the tool more useful for their workflows. I suggest we keep the API small initially so that it can be included in a 2.5.1 or 2.5.2 release |
Thank you @timrobertson100 ! Personally I would like to:
|
Perhaps this could also be possible?
|
That relates to #955. |
@timrobertson100 : did you mean it would be easier to publish datasets directly to registry just using its current api? i.e., is there a way to avoid IPT hosting and keep our datasets and their metadata somewhere (i.e. github repository) we can tell GBIF registry to read from? Thanks |
If you have a Darwin Core Archive (or just an EML file for a metadata-only dataset) on a public URL you can register it directly with the GBIF. https://github.com/gbif/registry/tree/dev/registry-examples/src/test/scripts has an example using Bash, which is obviously not production-ready in any way! There are probably 20 or so publishers registering datasets in this way. Dataset metadata is read from the EML file within the DWCA. You need to keep track of what GBIF dataset UUID is assigned to datasets you have registered, so you can update them. |
Thanks a lot! Is it possible to somehow search/know those publishers? |
To sum up, we should:
a)
(copied from the scripts @MattBlissett mentioned ! thanks !) to create the dataset on GBIF.org, and get the GBIF uuid of the dataset (let's call it b)
to tell GBIF.org how to access the dwca archive on our web server
we call is it enough to trigger the update? since the URL has not changed, we don't have to call the "endpoint" URL, right? |
as it is written in @mike-podolskiy90 register.sh script:
if our archives are named with stable internal UUIDs, we can even just rely on the GBIF.org API. calling that's what I do with some scripts to compare what is on GBIF.org and what is on my IPT, after an update |
@timrobertson100 you told me to do this 2 years ago... but I love too much IPT to abandon it :-) I guess it's time for me to migrate to this solution - having 10K datasets on my IPT is becoming difficult to handle. I'm just wondering if the migration will be easy.
to update the installationKey
to update the URL endpoint of the archive Can we update the installationKey of an existing dataset? What about the user to use for this call? Currently, I don't care about this, since it's the IPT that is doing the registry calls. |
If there are no modifications to make you can call (with authentication) To migrate, you would only need to include the changed field:
(I think, haven't done this for a while.) Updates cause a crawl after 1 minute, in case there are more updates. You should write to helpdesk@gbif.org to get authorization to make these requests. It's usually best to create a new institutional account on gbif.org for this. Create one on gbif-uat.org too, so you can test everything there first. |
Thank you @MattBlissett. |
Wow ... tons of information today. |
Hi @MattBlissett,
Here is the result:
So I added them:
Here is the new result:
I don't think adding "created" or "modified" dates is normal....
And the result is... 400 BAD REQUEST Any idea? |
@sylvain-morin did you finally end up with a solution you can share? |
I did a very simple Python app server, to handle our needs at GBIF France. In short:
the POST endpoint will:
It's really basic, but it has been handling +15 000 datasets for 1 year (https://www.gbif.org/installation/e44d0fd7-0edf-477f-aa82-50a81836ab46) Our goal was to have a simple tool to handle the GBIF publication/update at the end of our dataset pipeline. |
Oh great! Thanks a lot for the summarized explanations. I understand APT basically replicates IPT behaviour, but in a way that you must create DwCA files yourself before, and then use APT to both serve and register them (and their updates). This is great but I am mostly interested in "serving" datasets from a different place (i.e. institutional repository, or Zenodo), but using Python/APT only to register them. def dataset_url(id):
# your current code:
# return CONFIG.APT_PUBLIC_URL + "/dataset/"+id
# use my own function to get dataset urls from wherever I store them (i.e. database, excel, ...):
return get_remote_dataset_url(id) Also the In such an scenario, would it be possible to use APT from a local machine (not accessible to GBIF registry) so that I only run it when publishing, but Zenodo or my institution's repository take care of keeping the DwC datasource accesible online 24x7? I suppose the main concern would be checking for valid DwCA file structure before uploading it to a public url and registering it. But perhaps python-dwca-reader might do the trick. Of course creating valid DwCA files on your own might not be trivial (specially the metadata part) ... but that is a different question. |
I think I missed the "HTTP installation" role of APT in my previous message. I guess both APT & IPT are expected to be accessible online, so they constitute kind of an index page for the datasets they serve. In other words, can we just store (and keep updated) both the "installation" and its datasets in a static website? |
Hello,
Would it be possible to add an option to create a new version after I overwrite the source files with a script or simply because database content has changed, maybe by API using curl?
Of course the HTTP response code would tell my script if something is wrong, and the details will be found in the validation log.
Cheers,
Menashè
The text was updated successfully, but these errors were encountered: