Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New request: Atheist Republic #761

Open
SecularSekai opened this issue Dec 11, 2023 · 18 comments
Open

New request: Atheist Republic #761

SecularSekai opened this issue Dec 11, 2023 · 18 comments

Comments

@SecularSekai
Copy link

SecularSekai commented Dec 11, 2023

Please use the following format for a ZIM creation request (and delete unnecessary information)

@Popolechien
Copy link
Collaborator

@SecularSekai Hey can you please drop a line at hello@kiwix.org for the permission?

@RavanJAltaie
Copy link
Contributor

@Popolechien please let up know here once permission is sent so we can start creating the recipe. @SecularSekai

@SecularSekai
Copy link
Author

@Popolechien @RavanJAltaie Sorry for the wait! I notified the copyright holder who has informed me that she sent an email providing permission to hello@kiwix.org
If you need any additional information, please let me know. Thanks, guys!

@Popolechien
Copy link
Collaborator

@RavanJAltaie We're good to go - fingers crossed.

@RavanJAltaie
Copy link
Contributor

https://farm.openzim.org/recipes/atheistrepublic_en_all
Recipe created, will update the library link here once ready

@SecularSekai
Copy link
Author

@RavanJAltaie Fantastic! How can I access it and view the ZIM locally on Kiwix Desktop?

@SecularSekai
Copy link
Author

@RavanJAltaie @Popolechien Hi, guys! It looks like the recipe failed when I check the link. What would be the next step to troubleshoot it?

@RavanJAltaie
Copy link
Contributor

@benoit74 the recipe is failing with error:
File "/usr/bin/zimit", line 566, in <module> zimit() File "/usr/bin/zimit", line 437, in zimit raise subprocess.CalledProcessError(crawl.returncode, cmd_args) subprocess.CalledProcessError: Command '['crawl', '--failOnFailedSeed', '--waitUntil', 'load', '--title', 'Atheist Republic', '--description', 'We are not just atheists, we are atheists who care.', '--depth', '-1', '--timeout', '90', '--scopeType', 'domain', '--lang', 'eng', '--behaviors', 'autoplay,autofetch,siteSpecific', '--behaviorTimeout', '90', '--diskUtilization', '90', '--url', 'https://www.atheistrepublic.com/', '--userAgent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 +Zimit contact+zimfarm@kiwix.org', '--cwd', '/output/.tmpmypky3kv', '--statsFilename', '/output/crawl.json']' returned non-zero exit status 9.
Any ideas?

@benoit74
Copy link
Contributor

It looks after some time the scraper has been banned due to too many requests.

However, digging a bit into the logs it is clear that we spend a lot of time (more than 90% I would say) trying to archive the store items which is probably not intentional. I don't think this is needed to be present in the ZIM.

I suggest that we exclude all https://www.atheistrepublic.com/store URLs.

I also noticed there is a forum which is hosted both on https://www.atheistrepublic.com/forums/ (old forum if I get it right) and https://forum.atheistrepublic.com/ (new forum). New forum seems to be based on Discourse platform, which will probably be difficult to scrape (we might be blocked quite soon).

@SecularSekai do you want the forums to be archived in the ZIM as well or is it not needed or not mandatory?

@RavanJAltaie
Copy link
Contributor

@benoit74 what do you think we should do?
Shall we tag this as upstream or reject the issue?

@SecularSekai
Copy link
Author

@benoit74 Hi! Sorry I didn't see your comment back in February until now. The forums are less important, so they can be omitted if need be for the ZIM

We really appreciate all the help with this!

@RavanJAltaie
Copy link
Contributor

@benoit74 shall we try create it without the forums?

@benoit74
Copy link
Contributor

Next steps are:

  • Confirm that we can get rid of store items
  • If we all agree that these are not needed, get rid of these store items
  • Run again the recipe and see if forums are working ok or not.
  • If not, then get rid of forum items as well and run again the recipe

Do we all agree that we do not want store content inside the ZIM?

@RavanJAltaie
Copy link
Contributor

@benoit74 I agree on all points.
@SecularSekai do you agree?

@SecularSekai
Copy link
Author

@RavanJAltaie @benoit74 I agree and appreciate the thorough thought on this. Let's move forward with that strategy and see where we get.
Loss of the forums is not a major problem.

@benoit74
Copy link
Contributor

benoit74 commented May 1, 2024

@RavanJAltaie do you need help to remove the store items?

@RavanJAltaie
Copy link
Contributor

@benoit74 yes please, you can let me know how to be done and I'll do it.

@benoit74
Copy link
Contributor

benoit74 commented May 7, 2024

I've updated the recipe exclude criteria:

image

This will exclude store items for now. Forum should be archived since it is on the same domain and scopeType is set to domain. Let's see if it achieves to properly archive the forum as well. I've requested the recipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants