-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigBatch processing: issues to cover #73
Comments
|
A few more point:
|
One more point that just came up in our discussion here in Oslo
|
|
no, we should make them |
clean up |
find a solution to normalize types if we do the attributes, we should there start to use a reference vocabulary |
@myrmoteras that's a lot harder to filter by, and far more prone to incurring incorrect cross-article matches ... |
That's what I had understood as well... |
@flsimoes @gsautter can you point out a couple of examples, to make sure we talk about the same? in my view, in the taxonomicName attributes, Corvus sp. 1 should be attributed
In taxonomy, This also allows to differentiate between sp. 1 and sp. 2 in a genus in the same publication. |
Taxonomically speaking, I fully agree with your rationale. In this issue we discussed the same |
Well, the And while I do see the citation part, the implied connection between "Corvus sp. 1 Smith 1900" and "Corvus sp. 1 Jones 1986" just doesn't exist, generating a pattern of homonyms that is way harder to filter downstream as a well-defined single placeholder "undefined" ... |
Another point: remove URL prefix from DOIs to simplify comparison. |
Another point:
|
Yet another point:
|
@flsimoes @tcatapano please check weather we can go ahead to run the big batch. Please check by end of March 2023. |
@myrmoteras: it's not clear to me what the task is. I havent been involved in this issue so don't have any context. |
@tcatapano the whole effort is basically a full-corpus cleanup operation to get rid of erroneous annotation types, normalize values of certain attributes, run now-standard linking jobs and QC on the older portion of our data, etc. The sheer number of documents to process sort of justifies the more thorough planning than a 200 document job would require. Meaning to say: if you can think of anything we kind of did wrong in the past, but never got around to cleaning up the existing mess after resolving the issue for prospective documents, cleaning up the earlier documents is something to add to this list ... |
Yes, no further comments from my part |
I dont think it is worth delaying anything. Go ahead. |
All implemented now, just need to (a) figure out how to bundle up all the gizmos and (b) run a few tests afterwards. |
Thanks Guido. Good luck and tell us when you need us to check output
Donat
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Guido Sautter ***@***.***>
Sent: Monday, October 23, 2023 4:06:11 AM
To: plazi/treatmentBank ***@***.***>
Cc: Donat Agosti ***@***.***>; Mention ***@***.***>
Subject: Re: [plazi/treatmentBank] BigBatch processing: issues to cover (Issue #73)
EXTERNAL SENDER
All implemented now, just need to (a) figure out how to bundle up all the gizmos and (b) run a few tests afterwards.
—
Reply to this email directly, view it on GitHub<#73 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABDFPJA5U3WO5YVAPKUZ3KTYAXGJHAVCNFSM6AAAAAATP5SEG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZUGMYTSMZTGI>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Intermediate result after some 60 hours: about 12K IMFs processed, another 7K in the feed hopper, basically all IMFs uploaded before Jan 01, 2019 (first half of 2016 alone is 14K IMFs). Looking good so far, the impact on day-to-day operations is minimal (according to POA), only the export queues are (expectably) quite full, and there is the current issue with Zenodo, which is why the updates are currently held back from going there (to be addressed at the sprint). |
Batch processing of all the treatments should include the following issues
The text was updated successfully, but these errors were encountered: