Can we temporarily suspend a recipe and make it remove (or hide) the corresponding zim files? #524

Popolechien · 2020-08-31T11:43:06Z

As described in #1230 the Wikibooks, Wikiversity, Wikisource, Wikiquote, Wikinews all fail to scrape properly and return a semi-empty zim file. This does not show in the download window and people end up downloading useless resources (and then sending half-angry messages).
Until the problem is fixed, would it be possible to either:

suspend the recipe running for the corresponding recipes and automatically delete the last available version (so that people stop downloading faulty material and no new ZIM is generated until the problem is fixed)
move the destination folder to /hidden/dev, which in turn would move pre-existing zim files to the same folder?

rgaudin · 2020-08-31T11:53:26Z

Any recipe could be manually disabled and/or its warehouse path moved to .dev. If this concerns a large number of recipes, I can script this change.

The zimfarm is not able to delete zim files and won't. I think this separation is a safety net.
Faulty zim files should not pass the zimcheck and thus not appear online. I don't remember if this is disabled at the moment or not though.
From what I understand, this is not a faulty zim files (as per what a ZIM is) but a scraper bug that creates a ZIM file that's different from what's expected. zimfarm is unable to know that this is not a proper zim.
In the future, when we combine the zimcheck and zimfarm, we may raise a warning or something if a ZIM size is drastically smaller than previous iterations but we're not there yet.

Let me know about the mass disabling but for removal, that would be up to @kelson42.

Popolechien · 2020-08-31T12:26:08Z

Yeah the zim is not at fault - the scraper is.

In the future, when we combine the zimcheck and zimfarm, we may raise a warning or something if a ZIM size is drastically smaller than previous iterations but we're not there yet.

Sounds like a great idea.

@kelson42 Do you know if anyone is currently working on this issue? If not then let's mass suspend and mass delete.

stale · 2020-10-31T03:43:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 · 2020-10-31T15:41:17Z

Similar to #545.

@Popolechien Nobody working on this for the moment. Priority is more to avoid releasing bad file to download.kiwix.org.

stale · 2021-01-02T03:54:16Z

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

rgaudin · 2021-07-21T14:59:30Z

Recipe can be disabled, scraper-wide disabling in #545 and the rest should be CMS's job. closing

Popolechien added question enhancement labels Aug 31, 2020

stale bot added the stale label Oct 31, 2020

stale bot removed the stale label Oct 31, 2020

stale bot added the stale label Jan 2, 2021

rgaudin closed this as completed Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we temporarily suspend a recipe and make it remove (or hide) the corresponding zim files? #524

Can we temporarily suspend a recipe and make it remove (or hide) the corresponding zim files? #524

Popolechien commented Aug 31, 2020

rgaudin commented Aug 31, 2020

Popolechien commented Aug 31, 2020

stale bot commented Oct 31, 2020

kelson42 commented Oct 31, 2020

stale bot commented Jan 2, 2021

rgaudin commented Jul 21, 2021

Can we temporarily suspend a recipe and make it remove (or hide) the corresponding zim files? #524

Can we temporarily suspend a recipe and make it remove (or hide) the corresponding zim files? #524

Comments

Popolechien commented Aug 31, 2020

rgaudin commented Aug 31, 2020

Popolechien commented Aug 31, 2020

stale bot commented Oct 31, 2020

kelson42 commented Oct 31, 2020

stale bot commented Jan 2, 2021

rgaudin commented Jul 21, 2021