Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite hang when using wild_mode with scrape_me #759

Closed
2 tasks done
jtarbard opened this issue Apr 9, 2023 · 2 comments
Closed
2 tasks done

Infinite hang when using wild_mode with scrape_me #759

jtarbard opened this issue Apr 9, 2023 · 2 comments
Labels

Comments

@jtarbard
Copy link

jtarbard commented Apr 9, 2023

Pre-filing checks

  • I have searched for open issues that report the same problem
  • I have checked that the bug affects the latest version of the library (14.36.0)

The URL of the recipe(s) that are not being scraped correctly
https://www.coop.co.uk/recipes/frying-pan-pizza
https://www.coop.co.uk/recipes/garlic-and-rosemary-lamb-with-red-wine-gravy

The results you expect to see

For the first example:
{
"@context": "http://schema.org",
"@type": "Recipe",
"name": "Frying pan pizza",
"description" : "Homemade pizza is easier than you think - impress your friends with our quick method",
"prepTime" : "PT85M",
"cookTime" : "PT5M",
...
}

For the second:
{
"@context": "http://schema.org",
"@type": "Recipe",
"name": "Garlic & rosemary lamb with red wine gravy",
"description" : "Nailing a classic dish like this is easier than you think — plus it’s a brilliant way to feed a crowd",
"prepTime" : "PT20M",
"cookTime" : "PT170M",
...
}

The results (including any Python error messages) that you are seeing

The program hangs indefinitely when attempting to scrape the website using the wild_mode. Code that reproduces the error:

from recipe_scrapers import scrape_me

a = scrape_me("https://www.coop.co.uk/recipes/frying-pan-pizza", wild_mode=True)
@jtarbard jtarbard added the bug label Apr 9, 2023
@jtarbard
Copy link
Author

jtarbard commented Apr 9, 2023

Fixed by passing the timeout option and catching the error. Apologies for the unnecessary issue.

@jtarbard jtarbard closed this as completed Apr 9, 2023
@jayaddison
Copy link
Collaborator

No problem @jtarbard; glad that you were able to find a way to detect and avoid the stall 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants