Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HelloFresh] Recipe-scraper does not return the correct amount of ingredients for Hellofresh #527

Closed
3 tasks done
bonsdawende opened this issue Apr 13, 2022 · 14 comments
Closed
3 tasks done
Assignees
Labels

Comments

@bonsdawende
Copy link

bonsdawende commented Apr 13, 2022

Thanks for filing a bug report with us!

If your request is about a website that is not supported, please open a 'new scraper' issue request instead.

To help get the issue fixed, please fill in the information below.

Pre-filing checks

  • I have searched for open issues that report the same problem
  • I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

The version of Python you're using

Python 3.9.5

The operating system of your environment

Ubuntu 21.04

The results you expect to see

For 2 servings we would have:

  • 1 pièce Gousse d'ail instead of 1/2 Gousse d'ail
  • 1/2 pièce Oignon jaune instead of 1/4 pièce Oignon jaune
  • 1 pièce Poireau instead of 1/2 pièce Poireau
  • ....

The results (including any Python error messages) that you are seeing

½ pièce Gousse d'ail", '¼ pièce Oignon jaune', '½ pièce Poireau'

Can you write Python and would you like to help fix the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:

  • I'd prefer if the recipe-scrapers team try to fix this
@gloriousDan
Copy link
Contributor

gloriousDan commented Apr 14, 2022

Looking at the source of the recipe you linked reveals the issue:

The amounts per serving visible on the html site are different from the amounts per serving in the schema.org json which is embedded on the site.
recipe-scrapers gets its data from the embedded json.

This is the embedded json and that's why its wrong:

  "recipeIngredient": [
    "½ pièce Gousse d'ail",
    "¼ pièce Oignon jaune",
    "½ pièce Poireau",
    "50 g Ricotta",
    "100 g Épinards",
    "1 selon le goût Noix de muscade",
    "20 g Fromage italien râpé",
    "100 g Feuilles de lasagne fraîches",
    "113 g Miettes de saumon fumé à chaud",
    "10 g Noisettes grillées",
    "30 g Tomates semi-séchées",
    "½ cs Huile d'olive",
    "½ cc Vinaigre de vin blanc",
    "15 g Beurre",
    "15 g Farine",
    "275 ml Lait",
    "½ cc Moutarde",
    "½ cs Vinaigre balsamique noir",
    "selon le goût Poivre et sel"
  ],
  "recipeYield": 2,

Is it wrong for all recipes from hellofresh or only for some?
If it's only this recipe, maybe you can contact hellofresh to correct this. Otherwise we could try scraping the correct recipeYield from the HTML

@bonsdawende
Copy link
Author

@gloriousDan, Yes i tested some recipes all have the same issue unfortunately. it's not only for the recipe. Yes i think it s better to try scraping the correct recipeYield from the HTML.

@hhursev
Copy link
Owner

hhursev commented Apr 14, 2022

We gotta fix this indeed.. By browsing the site I'm with the impression that this problem will be happening only if the recipe has "serving amount 1" as an option.
Can you send me couple more links to other recipes with the problem described (ideally break my hypothesis if you think it's wrong)

@gloriousDan
Copy link
Contributor

gloriousDan commented Apr 15, 2022

That's interesting. It seems like all french hellofresh links (as far as I've checked) have the "serving amount 1" option while recipes in most other languages (e.g.: https://www.hellofresh.com/recipes/uk-balsamic-streak-with-red-cabb-5841a8ad9df18165854cdd72 as an example for an english recipe) only have the "serving amount 2" or larger settings.
French recipes can be accessed from: https://www.hellofresh.fr/recipes
English recipes at: https://www.hellofresh.com/recipes

Maybe we can assume that the ingredient amounts listed in the schema correspond to the lowest selectable serving amount.
This way it hopefully won't break the recipes for other languages than french.

@bonsdawende
Copy link
Author

@gloriousDan : I made the same observation that for the recipes in French the minimum portion was one and in the other languages from two.

We can make this assumption but I think it can confuse users, especially if it is not specified anywhere saying that for French the 2 serving displayed by recipe-scraper should not be taken into account, but that the value minimum is one.

@gloriousDan
Copy link
Contributor

We can make this assumption but I think it can confuse users, especially if it is not specified anywhere saying that for French the 2 serving displayed by recipe-scraper should not be taken into account, but that the value minimum is one.

What I meant is, that based on this assumption we could implement something in the hellofresh scraper, which gets the correct serving count from the html instead of fron the schema.
The user won't notice anything about that.

@bonsdawende
Copy link
Author

@gloriousDan : I agree with you. 👍

@gloriousDan
Copy link
Contributor

gloriousDan commented Apr 16, 2022

I just raised the issue with the hellofresh.fr customer support

Transcript of customer support chat
Agent: How may I assist you ? Good Afternoon though. :)

Me: I noticed an error on seemingly all french recipes on https://www.hellofresh.fr/recipes The embedded #schema.org json lists the wrong serving amount.

As an example on https://www.hellofresh.com/recipes/lasagne-saumon-epinards-ricotta-6231d16b3073ea4e521968d4 The json looks like this:

"recipeIngredient": [ "½ pièce Gousse d'ail", "¼ pièce Oignon jaune", "½ pièce Poireau", "50 g Ricotta", "100 g Épinards", "1 selon le goût Noix de muscade", "20 g Fromage italien râpé", "100 g Feuilles de lasagne fraîches", "113 g Miettes de saumon fumé à chaud", "10 g Noisettes grillées", "30 g Tomates semi-séchées", "½ cs Huile d'olive", "½ cc Vinaigre de vin blanc", "15 g Beurre", "15 g Farine", "275 ml Lait", "½ cc Moutarde", "½ cs Vinaigre balsamique noir", "selon le goût Poivre et sel" ], "recipeYield": 2,

Here the recipeYield should be 1 instead of 2. Can you raise this issue with the dev team?

Agent: Thanks you for this information that i'll do my Best to transfer to the team.

The question is if and when they will fix it. When implementing the fix we should take care that it doesn't break again if the issue gets fixed.

@hhursev
Copy link
Owner

hhursev commented Apr 18, 2022

I say we wait around 4 weeks for hellofresh to fix it on their end. If after 4 weeks the problem persists we'll do the ad-hoc solution in our code

@bonsdawende
Copy link
Author

ok @hhursev ! thank you so much for your reactivity as well as that of @gloriousDan.
we'll wait the back of hellofresh team

@gloriousDan
Copy link
Contributor

2 months have passed now and I just had a quick look at https://www.hellofresh.com/recipes/wraps-aux-galettes-quinoa-tomates-6255db99b7800a4ac46cb29e where the error still persists.
I could have a look at implementing a fix soon but if you, @hhursev already have some ideas, feel free to implement a fix.

@hhursev
Copy link
Owner

hhursev commented Jul 9, 2022

The best that comes to my mind is simply detecting if there's serving size "1" on the recipe page, and prepending 2 * on all of the ingredients in the list if that's the case. Does that sound good to you @gloriousDan

@hhursev hhursev self-assigned this Jul 9, 2022
@gloriousDan
Copy link
Contributor

From what I've seen so far I think this should work. I'm not sure if it's true for all recipes from hellofresh though since I didn't look at too many

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants