-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Images missing even when under cutoff value #70
Comments
I believe I can confirm the problem, with latest wiki.openzim.org scrape, I had myself a few images missing. |
Please share the ZIM and indications of where to find such images so we can look at what's special about them. Task ID or link as well so we can check the logs. |
https://farm.youzim.it/pipeline/d1c2f201514f3da67f887df5 for the task - images are missing on every second page or so. |
OK, I've looked into this. It is also related to image's What we're seeing here is that the crawler is not making requests for all of the images in the srcset (or those fail to complete) ; so there are missing images. Depending on the one your browser picks (kinda hard to predict but you can inspect what its trying to display) you may got one that was crawled or not. I've also found that it's sort of random in selecting which image gets crawled… I've opened a ticket upstream: webrecorder/browsertrix-crawler#3 @kelson42 should we keep that open until it gets solved upstream? |
@rgaudin Yes, please keep this ticket open please. |
I just zimmed up a wordpress blog with 186 articles (cutoff at 1,000) and about 500 images (https://mesquartierschinois.wordpress.com). Standard, free wordpress, ie no funky extension added.
I would say 10-20% of images are still missing.
The text was updated successfully, but these errors were encountered: