-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report more clearly in the log when no ZIM is produced on-purpose + produce the ZIM even if some error occured #79
Comments
Looking at the scraper logs, the termination of the process seems quite "abrupt". |
Oh no, I missed the log line stating "Finishing ZIM file" + "Finished Zim ifixit_es_all_2022-05.zim in /output" |
Looks like the process finished properly, there is just some stranger "phantom" logs about Images been scrapped with weird number of images. |
Who, I looked at another scrape I ran on my side and it looks like I have the same issue. The logs states that the final ZIM file has been produced but it is not there. |
OK, seems similar ; let me know if there's anything I can check on my end. |
I had a look again at the scraper logs and noticed the following. When the scraper finishes, I usually have these logs:
From my understanding, all logs starting with "T:" are indeed logs from the zimscraperlib / libzim. What I observed (from few logs, so it is maybe not a good generalization) is that:
@rgaudin : any idea what this could mean? |
@mgautierfr I wonder if this is somehow linked to openzim/libzim#666 |
(pull requests with ID #666 should probably be refused anyway ^^) |
In is the python-scraperlib's And this is possible if there is a exception during the This may be the cause if some specific content making |
@mgautierfr's right. Because The workaround is ON by default ; but it just sets this variable and re-raises the Exception, so it wouldn't prevent your code from seeing it. From the logs, it seems no exception is raised up to the scraper's It seems that |
Thank you to all of you for those explanation. I confirm that there was some errors during processing which led to can_finish to be set to False. The current logic is to catch those exceptions scraper-side, to count how many exceptions occurred and to stop processing / ZIM file creation only if too many errors occurs (threshold to be assessed). Maybe this is not the most appropriate technique, but I assumed it was better to have a fresher ZIM with few missing items due to errors than no new ZIM at all. But I understand this assumption might be wrong. I will add a log checking clearly the "can_finish" status and displaying it in the log so that we can better understand the situation next time. Current logs are clearly misleading. |
and I opened another issue for those errors which are not really expected indeed |
https://farm.openzim.org/pipeline/9d1f72ebc98ea2bced85e826
The text was updated successfully, but these errors were encountered: