Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphan datasets from Belgium #4

Open
DimEvil opened this issue Jun 19, 2017 · 8 comments
Open

Orphan datasets from Belgium #4

DimEvil opened this issue Jun 19, 2017 · 8 comments
Assignees
Labels
Milestone

Comments

@DimEvil
Copy link

DimEvil commented Jun 19, 2017

@kbraak

InboVeg - NICHE-Vlaanderen groundwater related vegetation relevés for Flanders, Belgium 3d1231e8-2554-45e6-b354-e590c56ce9a8
Zomerganzen - Summering geese management and population counts in Flanders, Belgium 2b2bf993-fc91-4d29-ae0b-9940b97e3232

identified as orphan datasets?
http://data.inbo.be/ipt/resource?r=zomerganzen-events
http://data.inbo.be/ipt/resource?r=inboveg-niche-vlaanderen-events

@kbraak
Copy link
Contributor

kbraak commented Jun 19, 2017

Thank you @DimEvil, these are two false positives caused by an unexplained error in GBIF's crawling service: GBIF hasn't tried to recrawl these two datasets since October 2016. While we investigate this error, you can safely ignore these datasets and continue to review the remainder of potential orphans in Belgium's list. Thanks

@kbraak kbraak self-assigned this Jun 19, 2017
@kbraak kbraak added the bug label Jun 19, 2017
@DimEvil
Copy link
Author

DimEvil commented Jun 19, 2017

@kbraak
This dataset: Waterbirds of the Botanic Garden Meise c0cc29de-f49f-4b66-b4ec-c83afbb7101d can also be removed from the Orphan list. As long as Meise did not succeed in installing the newest version of IPT they are turning IPT ON/OFF when they want to publish. (due to the security breach, some months ago)

@kbraak
Copy link
Contributor

kbraak commented Jun 19, 2017

Thanks @DimEvil, I can see the Botanical Garden Meise IPT hosts 3 datasets. They need to remain permanently accessible online. Perhaps they would be interested in moving their datasets to a trusted data hosting centre in Belgium?

@DimEvil
Copy link
Author

DimEvil commented Jun 19, 2017

Hi @kbraak , I will check this with Meise. they normally should have an IPT permanently on line.
The fact that they turn IPT on/off is because of the security issue earlier and the lack of time in updating the IPT.

@kbraak
Copy link
Contributor

kbraak commented Jul 14, 2017

Thank you @niconoe for the following analysis:

Dataset Status
3d1231e8-2554-45e6-b354-e590c56ce9a8 Appears back on line (endpoint works as of July 14th)
2b2bf993-fc91-4d29-ae0b-9940b97e3232 Appears back on line (endpoint works as of July 14th)
c0cc29de-f49f-4b66-b4ec-c83afbb7101d Temporary technical issue, provider needed to update IPT, Java, and update the URL then it will be made online again.
f58465c4-27ff-11e2-85e3-00145eb45e9a False positive: Was republished in 2012 on an IPT (http://www.gbif.org/dataset/b76c1a65-b912-4ca6-be7e-50eb365f4a32)
f5499142-27ff-11e2-85e3-00145eb45e9a False positive: Was republished in 2012 on an IPT: http://www.gbif.org/dataset/5bba3c0c-4cfe-4e9c-a744-25eeb5adf2fe
860fc602-f762-11e1-a439-00145eb45e9a According to the dataset page, it seems there never was any data nor (meaningful) metadata in this dataset. This looks like something that was published by mistake. Still contacted the guy, he’s on holidays for now.
85e8c69c-f762-11e1-a439-00145eb45e9a Weird: the Belgian orphan list shows no URL endpoint for this dataset, while by browsing GBIF pages I can find the following BioCASE installation that seems to accept requests, at first look. Bug in orphan search code?
82f258ae-f762-11e1-a439-00145eb45e9a Metafro-infosys-prelude.. Seems to have disappeared, but was on our DIGIR provider, so we should have moved it to the IPT. To be investigated, we can in all cases adopt it (again)!
82f603dc-f762-11e1-a439-00145eb45e9a We were previously hosting datasets on the behalf of those institution (BCCM), and they initially planned to take over their responsibilities and host the datasets themselves. As far as I know it never happened, and our privilegied contact doesn’t work there anymore. We therefore need to contact them and ask if they still want to provide data, and host it themselves. If positive, it may also be good to suggest updating the (very old) data. @andrejjh, do you agree with this approach?
82f73af4-f762-11e1-a439-00145eb45e9a We were previously hosting datasets on the behalf of those institution (BCCM), and they initially planned to take over their responsabilities and host the datasets themselves. As far as I know it never happened, and our privilegied contact doesn’t work there anymore. We therefore need to contact them and ask if they still want to provide data, and host it themselves. If positive, it may also be good to suggest updating the (very old) data. @andrejjh, do you agree with this approach?

Please confirm:

By the way, it turns out GBIF has never crawled 85e8c69c-f762-11e1-a439-00145eb45e9a. I have just triggered a crawl manually.. let's see what happens ;)

@niconoe
Copy link

niconoe commented Jul 18, 2017

  • f58465c4-27ff-11e2-85e3-00145eb45e9a should be deleted, because it is a duplicate of b76c1a65-b912-4ca6-be7e-50eb365f4a32: Correct
  • f5499142-27ff-11e2-85e3-00145eb45e9a should be deleted, because it is a duplicate of 5bba3c0c-4cfe-4e9c-a744-25eeb5adf2fe: Correct
  • 860fc602-f762-11e1-a439-00145eb45e9a will potentially be deleted - pending answer from publisher. Very high probability it's indeed to be deleted, let's just wait a bit more to be 100% certain

Also:

  • @andrejjh will very soon republish 82f603dc-f762-11e1-a439-00145eb45e9a, 82f73af4-f762-11e1-a439-00145eb45e9a (and a new dataset from the same provider) on our IPT pretty soon.
  • 82f258ae-f762-11e1-a439-00145eb45e9a should be investigated a bit more on our side.

@niconoe
Copy link

niconoe commented Jul 18, 2017

We just had confirmation that 82f258ae-f762-11e1-a439-00145eb45e9a should also been deleted: it has been replaced by 49c5b4ac-e3bf-401b-94b1-c94a2ad5c8d6 (as a checklist, that was a better match for the data.

So IHMO, the finally remaining tasks are:

  • republish BCCM datasets: 82f603dc-f762-11e1-a439-00145eb45e9a, 82f73af4-f762-11e1-a439-00145eb45e9a (@andrejjh)
  • If no news in a couple of weeks, delete 860fc602-f762-11e1-a439-00145eb45e9a (@kbraak)

Sounds good for everyone?

@kbraak
Copy link
Contributor

kbraak commented Aug 17, 2017

Thanks @andrejjh and @niconoe for your follow ups.

I confirm that following datasets have been flagged as deleted in the GBIF Registry:

  • f58465c4-27ff-11e2-85e3-00145eb45e9a - Royal Museum of Central Africa - Albertian Rift Birds (ENBI wp13)
  • f5499142-27ff-11e2-85e3-00145eb45e9a - Royal Museum of Central Africa - Albertian Rift Butterflies (ENBI wp13)
  • 82f603dc-f762-11e1-a439-00145eb45e9a - BCCM/IHEM - Biomedical Fungi and Yeasts Collection
  • 82f4c440-f762-11e1-a439-00145eb45e9a - BCCM/LMG - Laboratory of Microbiology Gent Bacteria Collection
  • 82f73af4-f762-11e1-a439-00145eb45e9a - BCCM/MUCL - (Agro)Industrial Fungi and Yeasts Collection
  • 82f258ae-f762-11e1-a439-00145eb45e9a - Royal Museum of Central Africa - Metafro-Infosys - Prelude
  • 860fc602-f762-11e1-a439-00145eb45e9a - Generic Taxonomic Database System on Mysida and Nematoda

@kbraak kbraak added this to the 2018 milestone Oct 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants