Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we temporarily suspend a recipe and make it remove (or hide) the corresponding zim files? #524

Closed
Popolechien opened this issue Aug 31, 2020 · 6 comments

Comments

@Popolechien
Copy link
Contributor

As described in #1230 the Wikibooks, Wikiversity, Wikisource, Wikiquote, Wikinews all fail to scrape properly and return a semi-empty zim file. This does not show in the download window and people end up downloading useless resources (and then sending half-angry messages).
Until the problem is fixed, would it be possible to either:

  • suspend the recipe running for the corresponding recipes and automatically delete the last available version (so that people stop downloading faulty material and no new ZIM is generated until the problem is fixed)
  • move the destination folder to /hidden/dev, which in turn would move pre-existing zim files to the same folder?
@rgaudin
Copy link
Member

rgaudin commented Aug 31, 2020

Any recipe could be manually disabled and/or its warehouse path moved to .dev. If this concerns a large number of recipes, I can script this change.

The zimfarm is not able to delete zim files and won't. I think this separation is a safety net.
Faulty zim files should not pass the zimcheck and thus not appear online. I don't remember if this is disabled at the moment or not though.
From what I understand, this is not a faulty zim files (as per what a ZIM is) but a scraper bug that creates a ZIM file that's different from what's expected. zimfarm is unable to know that this is not a proper zim.
In the future, when we combine the zimcheck and zimfarm, we may raise a warning or something if a ZIM size is drastically smaller than previous iterations but we're not there yet.

Let me know about the mass disabling but for removal, that would be up to @kelson42.

@Popolechien
Copy link
Contributor Author

Yeah the zim is not at fault - the scraper is.

In the future, when we combine the zimcheck and zimfarm, we may raise a warning or something if a ZIM size is drastically smaller than previous iterations but we're not there yet.

Sounds like a great idea.

@kelson42 Do you know if anyone is currently working on this issue? If not then let's mass suspend and mass delete.

@stale
Copy link

stale bot commented Oct 31, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Oct 31, 2020
@kelson42
Copy link
Contributor

Similar to #545.

@Popolechien Nobody working on this for the moment. Priority is more to avoid releasing bad file to download.kiwix.org.

@stale stale bot removed the stale label Oct 31, 2020
@stale
Copy link

stale bot commented Jan 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Jan 2, 2021
@rgaudin
Copy link
Member

rgaudin commented Jul 21, 2021

Recipe can be disabled, scraper-wide disabling in #545 and the rest should be CMS's job. closing

@rgaudin rgaudin closed this as completed Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants