Skip to content

Hellofresh.de Scraping fails #3158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
FuXXz opened this issue Feb 11, 2024 · 5 comments
Closed
3 tasks done

Hellofresh.de Scraping fails #3158

FuXXz opened this issue Feb 11, 2024 · 5 comments
Labels
bug Something isn't working scraper

Comments

@FuXXz
Copy link

FuXXz commented Feb 11, 2024

First Check

  • I used the GitHub search to find a similar issue and didn't find it.

  • I have verified that this issue is not related to the underlying library
    hhyrsev/recipe-scrapers by 1) checking
    the debugger and data is returned, 2)
    verifying that there are errors in the log related to application level code, or
    3) verified that the site provides recipe data, or is otherwise supported by
    hhyrsev/recipe-scrapers

  • This issue can be replicated on the demo site (https://demo.mealie.io/)

Please provide 1-5 example URLs that are having errors

https://www.hellofresh.de/recipes/thai-hahnchen-stir-fry-nach-art-pad-kra-pao-6551f7e22d94ad532d60306d

Please provide your logs for the Mealie container docker logs <container-id> > mealie.logs

I cant provide, i used the Online Demo and the Scraper say only:
recipe_scrapers was unable to scrape this URL

Deployment

Other

@FuXXz FuXXz added bug Something isn't working scraper triage labels Feb 11, 2024
@mbaiti
Copy link

mbaiti commented Feb 12, 2024

Tried the same on my local mealie container. Here are the logs:

mealie  | INFO:     0.0.0.0:0 - "POST /api/recipes/create-url HTTP/1.1" 500 Internal Server Error
mealie  | ERROR:    Exception in ASGI application
mealie  | Traceback (most recent call last):
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/anyio/_core/_sockets.py", line 189, in connect_tcp
mealie  |     addr_obj = ip_address(remote_host)
mealie  |   File "/usr/local/lib/python3.10/ipaddress.py", line 54, in ip_address
mealie  |     raise ValueError(f'{address!r} does not appear to be an IPv4 or IPv6 address')
mealie  | ValueError: 'www.hellofresh.de' does not appear to be an IPv4 or IPv6 address
mealie  | 
mealie  | During handling of the above exception, another exception occurred:
mealie  | 
mealie  | Traceback (most recent call last):
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 114, in connect_tcp
mealie  |     stream: anyio.abc.ByteStream = await anyio.connect_tcp(
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/anyio/_core/_sockets.py", line 192, in connect_tcp
mealie  |     gai_res = await getaddrinfo(
mealie  | asyncio.exceptions.CancelledError
mealie  | 
mealie  | During handling of the above exception, another exception occurred:
mealie  | 
mealie  | Traceback (most recent call last):
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
mealie  |     yield
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 113, in connect_tcp
mealie  |     with anyio.fail_after(timeout):
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/anyio/_core/_tasks.py", line 119, in __exit__
mealie  |     raise TimeoutError
mealie  | TimeoutError
mealie  | 
mealie  | The above exception was the direct cause of the following exception:
mealie  | 
mealie  | Traceback (most recent call last):
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
mealie  |     yield
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 371, in handle_async_request
mealie  |     resp = await self._pool.handle_async_request(req)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 268, in handle_async_request
mealie  |     raise exc
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 251, in handle_async_request
mealie  |     response = await connection.handle_async_request(request)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_async/connection.py", line 99, in handle_async_request
mealie  |     raise exc
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_async/connection.py", line 76, in handle_async_request
mealie  |     stream = await self._connect(request)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_async/connection.py", line 124, in _connect
mealie  |     stream = await self._network_backend.connect_tcp(**kwargs)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_backends/auto.py", line 30, in connect_tcp
mealie  |     return await self._backend.connect_tcp(
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_backends/anyio.py", line 112, in connect_tcp
mealie  |     with map_exceptions(exc_map):
mealie  |   File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
mealie  |     self.gen.throw(typ, value, traceback)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
mealie  |     raise to_exc(exc) from exc
mealie  | httpcore.ConnectTimeout
mealie  | 
mealie  | The above exception was the direct cause of the following exception:
mealie  | 
mealie  | Traceback (most recent call last):
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
mealie  |     result = await app(  # type: ignore[func-returns-value]
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
mealie  |     return await self.app(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in __call__
mealie  |     await super().__call__(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
mealie  |     await self.middleware_stack(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
mealie  |     raise exc
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
mealie  |     await self.app(scope, receive, _send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 24, in __call__
mealie  |     await responder(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/gzip.py", line 44, in __call__
mealie  |     await self.app(scope, receive, self.send_with_gzip)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
mealie  |     raise exc
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
mealie  |     await self.app(scope, receive, sender)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
mealie  |     raise e
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
mealie  |     await self.app(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
mealie  |     await route.handle(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
mealie  |     await self.app(scope, receive, send)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
mealie  |     response = await func(request)
mealie  |   File "/app/mealie/routes/_base/routers.py", line 35, in custom_route_handler
mealie  |     response = await original_route_handler(request)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 274, in app
mealie  |     raw_response = await run_endpoint_function(
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
mealie  |     return await dependant.call(**values)
mealie  |   File "/app/mealie/routes/recipe/recipe_crud_routes.py", line 167, in parse_recipe_url
mealie  |     recipe, extras = await create_from_url(req.url)
mealie  |   File "/app/mealie/services/scraper/scraper.py", line 33, in create_from_url
mealie  |     new_recipe, extras = await scraper.scrape(url)
mealie  |   File "/app/mealie/services/scraper/recipe_scraper.py", line 30, in scrape
mealie  |     result = await scraper.parse()
mealie  |   File "/app/mealie/services/scraper/scraper_strategies.py", line 202, in parse
mealie  |     scraped_data = await self.scrape_url()
mealie  |   File "/app/mealie/services/scraper/scraper_strategies.py", line 170, in scrape_url
mealie  |     recipe_html = await self.get_html(self.url)
mealie  |   File "/app/mealie/services/scraper/scraper_strategies.py", line 103, in get_html
mealie  |     return await safe_scrape_html(url)
mealie  |   File "/app/mealie/services/scraper/scraper_strategies.py", line 35, in safe_scrape_html
mealie  |     async with client.stream("GET", url, timeout=SCRAPER_TIMEOUT, headers={"User-Agent": _FIREFOX_UA}) as resp:
mealie  |   File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
mealie  |     return await anext(self.gen)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1602, in stream
mealie  |     response = await self.send(
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1646, in send
mealie  |     response = await self._send_handling_auth(
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1674, in _send_handling_auth
mealie  |     response = await self._send_handling_redirects(
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1711, in _send_handling_redirects
mealie  |     response = await self._send_single_request(request)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1748, in _send_single_request
mealie  |     response = await transport.handle_async_request(request)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 370, in handle_async_request
mealie  |     with map_httpcore_exceptions():
mealie  |   File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
mealie  |     self.gen.throw(typ, value, traceback)
mealie  |   File "/opt/pysetup/.venv/lib/python3.10/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
mealie  |     raise mapped_exc(message) from exc
mealie  | httpx.ConnectTimeout

@boc-the-git
Copy link
Collaborator

@mbaiti is any scraping working for you? Ie. Other websites.

Can you try a docker-compose down && docker-compose up -d then try the import again and see if you get a different result?

@Kuchenpirat
Copy link
Collaborator

Hey, i just checked on my instance as well as the demo instance and i am also not able to scrape from hellofresh.de as well as hellofresh.com.

I also tried to scrape the sites with our scraper library directly, which also results in no returned data. When visiting the website or using curl, i can confirm that they are still using the schema, but they might either be blocking the scraper or have updated some parts of their page.

In conclusion, it seems that this is not a mealie issue. To fix this you would need create an issue over at https://github.com/hhursev/recipe-scrapers.

When they have fixed it and release a new version it will get meged into mealie.

@FuXXz
Copy link
Author

FuXXz commented Feb 12, 2024

Can someone please open a post there? I can't provide any logs as I don't have mealie installed, because it's only of interest if hellofresh is working

@Kuchenpirat
Copy link
Collaborator

Just retried in the demo and my dev instance, issues with hellofresh.de and .com seem to have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working scraper
Projects
None yet
Development

No branches or pull requests

4 participants