Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alcohol.stackexchange.com_en is failing #1004

Closed
kelson42 opened this issue May 18, 2024 · 4 comments
Closed

alcohol.stackexchange.com_en is failing #1004

kelson42 opened this issue May 18, 2024 · 4 comments
Labels
Bug Something isn't working Upstream For tickets which are waiting for an upstream modification (typically scrapper or target website)

Comments

@kelson42
Copy link
Collaborator

https://farm.openzim.org/pipeline/9a42bb2f-0b19-493b-91ba-63647224e7bd/debug for https://farm.openzim.org/recipes/alcohol.stackexchange.com_en

Starting redis.. PID: 9
[MainThread::2024-05-18 09:27:05,225] INFO:testing S3 Optimization Cache credentials
[MainThread::2024-05-18 09:27:07,060] INFO:Starting scraper with:
  domain: alcohol.stackexchange.com
  lang: ['en'] (['eng'])
  build_dir: /alcohol.stackexchange.com_dyzo8iwi
  output_dir: /output
  using cache: s3.us-west-1.wasabisys.com with bucket: org-kiwix-sotoki
[MainThread::2024-05-18 09:27:07,060] DEBUG:Fetching site details…
[MainThread::2024-05-18 09:27:08,672] ERROR:FAILED. An error occurred: "No site details found for domain='alcohol.stackexchange.com'"
[MainThread::2024-05-18 09:27:08,672] ERROR:"No site details found for domain='alcohol.stackexchange.com'"
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sotoki-2.1.2-py3.8.egg/sotoki/entrypoint.py", line 348, in main
    sys.exit(scraper.run())
  File "/usr/local/lib/python3.8/site-packages/sotoki-2.1.2-py3.8.egg/sotoki/scraper.py", line 152, in run
    Global.init(get_site(self.domain))
  File "/usr/local/lib/python3.8/site-packages/sotoki-2.1.2-py3.8.egg/sotoki/utils/sites.py", line 68, in get_site
    raise KeyError(f"No site details found for {domain=}")
KeyError: "No site details found for domain='alcohol.stackexchange.com'"
[MainThread::2024-05-18 09:27:08,674] DEBUG:Removing /alcohol.stackexchange.com_dyzo8iwi

@kelson42 kelson42 added the Bug Something isn't working label May 18, 2024
@benoit74
Copy link
Contributor

This is a recipe configuration issue.

Technically speaking the domain name to use is still beer.stackexchange.com even if it has been moved to alcohol.stackexchange.com on the web. Dump is still made in beer.stackexchange.com and there is no dump for alcohol.stackexchange.com

@RavanJAltaie @Popolechien do you confirm you still want to create new ZIMs with the new domain name i.e. alcohol.stackexchange.com? It seems quite ok for me, it just means we must not forget to delete old ZIM which is currently named beer.stackexchange.com_xxx since it won't be automatically superseeded by the new ZIMs since name has changed

Moving this to zim-requests since scraper has no problem in this specific case

@benoit74 benoit74 transferred this issue from openzim/sotoki May 18, 2024
@benoit74 benoit74 changed the title No site details found for domain alcohol.stackexchange.com_en is failing May 18, 2024
@benoit74
Copy link
Contributor

New run with beer.stackexchange.com achieved to find the site details and the dump, but it showed a new bug: https://farm.openzim.org/pipeline/f2657003-92e2-44f8-84bf-50e89c8cf3f6

This is clearly a new upstream issue this time.

@benoit74 benoit74 added the Upstream For tickets which are waiting for an upstream modification (typically scrapper or target website) label May 18, 2024
@RavanJAltaie
Copy link
Contributor

Ok understood @benoit74

@benoit74
Copy link
Contributor

ZIM produced, remaining problem was only a memory configuration issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Upstream For tickets which are waiting for an upstream modification (typically scrapper or target website)
Projects
None yet
Development

No branches or pull requests

3 participants