Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Nutrional Data #241

Closed
ptindall opened this issue Oct 11, 2020 · 12 comments
Closed

Support for Nutrional Data #241

ptindall opened this issue Oct 11, 2020 · 12 comments

Comments

@ptindall
Copy link

Has there ever been any discussion about supporting nutritional data if it is available?

I have seen several websites implement it and would love to have that data as part of the scraping. Here is a snippet of the data available on innit.com as part of the recipe schema.

        "nutrition": {
            "@type": "NutritionInformation",
                "sugarContent": "2 g",
                "proteinContent": "21 g",
                "fiberContent": "7 g",
                "unsaturatedFatContent": "36 g",
                "fatContent": "46 g",
                "cholesterolContent": "570 mg",
                "calories": "550 kcal",
                "carbohydrateContent": "11 g",
                "saturatedFatContent": "10 g",
                "sodiumContent": "1380 mg"        
},

Another from whole foods:

  "nutrition": {
    "calories": "350 calories",
    "fatContent": "8 grams",
    "saturatedFatContent": "1 grams",
    "cholesterolContent": "85 milligrams",
    "sodiumContent": "550 milligrams",
    "carbohydrateContent": "53 grams",
    "proteinContent": "18 grams",
    "@type": "NutritionInformation"
  },

And from HEB, we have a table based structure:

<dd class="fact aps">
    <div class="rules rule3"><img src="/img/common/background-table-1.png"></div>
    <table class="details-single">
        <tbody>
        <tr>
            <td colspan="4" class="strong">Amount Per Serving</td>
        </tr>
        <tr>
            <td colspan="4" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="strong">Calories</td>
            <td class="strong">80</td>
            <td class="val">Calories From Fat</td>
            <td class="val">30</td>
        </tr>
        </tbody>
    </table>
</dd>
<dd class="dv">
    <div class="rules rule2"><img src="/img/common/background-table-2.png"></div>
    <table class="clearfix">
        <tbody>
        <tr>
            <td colspan="2" class="label percentage">% Daily Value*</td>
        </tr>
        <tr>
            <td colspan="2" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="label">Total Fat <span class="abs"> 3.5 g</span></td>
            <td class="avg"> 5%</td>
        </tr>
        <tr>
            <td colspan="2" class="rules sub-label">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="sub-label">Saturated Fat <span class="abs"> 0.5 g</span></td>
            <td class="avg"> 3%</td>
        </tr>
        <tr>
            <td colspan="2" class="rules sub-label">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="sub-label">Trans Fat <span class="abs"> 0.0 g</span></td>
            <td class="avg"></td>
        </tr>
        <tr>
            <td colspan="2" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="label">Cholesterol<span class="abs"> 0 mg</span></td>
            <td class="avg">0%</td>
        </tr>
        <tr>
            <td colspan="2" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="label">Sodium<span class="abs"> 200 mg</span></td>
            <td class="avg">8%</td>
        </tr>
        <tr>
            <td colspan="2" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="label">Total Carbohydrate<span class="abs"> 9 g</span></td>
            <td class="avg">3%</td>
        </tr>
        <tr>
            <td colspan="2" class="rules sub-label">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="sub-label">Dietary Fiber<span class="abs"> 2 g</span></td>
            <td class="avg">8%</td>
        </tr>
        <tr>
            <td colspan="2" class="rules sub-label">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="sub-label">Sugars<span class="abs"> 5 g</span></td>
        </tr>
        <tr>
            <td colspan="2" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="label">Protein</td>
            <td class="avg">4 g</td>
        </tr>
        </tbody>
    </table>
    <div class="rules rule3"><img src="/img/common/background-table-1.png"></div>
    <table class="vitamins">
        <tbody>
        <tr>
            <td class="label_1">Vitamin A<span class="abs"> 45%</span></td>
            <td class="label_2">•</td>
            <td class="label_3">Iron<span class="abs"> 6%</span></td>
        </tr>
        <tr>
            <td colspan="3" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        <tr>
            <td class="label_1">Vitamin C<span class="abs"> 50%</span></td>
            <td class="label_2">•</td>
            <td class="label_3">Calcium<span class="abs"> 6%</span></td>
        </tr>
        <tr>
            <td colspan="3" class="rules">
                <div class="rules rule1"><img src="/img/common/background-table-3.png"></div>
            </td>
        </tr>
        </tbody>
    </table>
    <div class="notice"><span>*Percent Daily Values are based on a 2,000 calorie diet. Your daily values may be higher or lower depending on your calorie needs.</span>
    </div>
    <div>&nbsp;</div>
    <div class="notice"><span>Nutrition Facts represent the ingredients displayed and are estimates only. We make no representations or warranties regarding the nutrition information provided. Adding optional ingredients or substituting products could alter the nutritional content of this recipe.</span>
    </div>
    <div>&nbsp;</div>
    <div class="notice"><span>The dietary lifestyle and nutritional information are provided for educational purposes only.  Product formulations may change; therefore, we recommend that you consult the product's label for nutrition information, contact the manufacturer for product related questions, and consult a healthcare provider for nutritional guidance specific to your needs.  Since cooking times can vary, ensure that all recipe ingredients are cooked to a safe internal temperature according to USDA guidelines.</span>
    </div>
</dd>
@bfcarpio
Copy link
Collaborator

I have no objections. It seems as though the first two are schema formats so it'd make sense to just expand our schema parser. For something like that table I'm not aware if there's a standard. I'm guessing we'd just add a nutrition() function to our object and let the client handle it?

@ptindall
Copy link
Author

That sounds reasonable. Implementing the standard schema format version first sounds like a great plan. I can try to take this on unless someone else wants to do it since it is the core parser.

@bfcarpio
Copy link
Collaborator

bfcarpio commented Oct 12, 2020

I don't think anyone has dibs over the code. Just recently, you'll find someone did quite a few additions to the core parser. So, if you have the free time and the desire, feel free to do it. @hhursev (I don't speak for him) has always been happy to include quality features into this package and nutritional data is very much apart of a recipe.

@hhursev
Copy link
Owner

hhursev commented Oct 12, 2020

I second the nutritional data idea and will help to implement if needed 😉

nutrition() in SchemaOrg more then welcome. feel free to add it in AbstractScraper too. I guess in ON_EXCEPTION_RETURN_VALUES we should put empty dict (I might be wrong, go with whatever you believe is best). This in separate PR will be accepted instantly from me. (These are the 3 spots you'll need to massage a bit as far as I recall)

As for the table format and HTML parsing - nutrition to the respective scraper class (in follow-up PR) 👍

I won't be able to work on the project this week so we won't duplicate work 😉

@jayaddison
Copy link
Collaborator

This is a great idea for the library 👍

Although I think you should lead with your preferred design @ptindall, as a user I can offer a preference for the results of the nutrition() call to have standard field names for each nutritional element - .sugar(), .protein(), etc.

To phrase that idea another way: in client code it's always possible to call scrape.total_time() and be sure that a time in minutes will be returned, without any per-site processing of the results. In a similar way, scrape().nutrition.().protein() could always return a protein amount (or a NotImplementedError), meaning that applications could handle extracting recipe protein totals from, for example, the HEB and innit.com sites identically.

@bfcarpio
Copy link
Collaborator

Something to keep in mind is how we handle units. Obviously, this doesn't need to be in the first iteration of the feature, but some recipes might support both metric and imperial units or only one. Something akin to nutrition(unit="metric") or setting a default unit in scrape_me might be something to consider.

I'm just brainstorming so no body should feel obligated to implement this. Heck, I might do it myself once the initial feature is out 😉

@arthur-fontaine
Copy link

Any update?

@hhursev
Copy link
Owner

hhursev commented Dec 10, 2020

No update here @arthur-fontaine. I'll ping you when nutrition-related functionality is added. 🙂

@bluhmr bluhmr mentioned this issue Dec 11, 2020
@hhursev
Copy link
Owner

hhursev commented Dec 17, 2020

@arthur-fontaine, thanks to @bluhmr and @sloanemk, starting from version 11.0.0 we have .nutrients() method that works just fine if nutrients data is included in the Recipe Schema on the site it is fetched from.

@hhursev hhursev closed this as completed Dec 17, 2020
@ptindall
Copy link
Author

ptindall commented Dec 17, 2020 via email

@MarcusWolschon
Copy link

@hhursev
Copy link
Owner

hhursev commented Jan 12, 2022

It's not removed but rather not all recipe sites have nutrients data. The following sites should support nutrients with recipe-scrapers:

hellofresh.[com|co.uk|de|.fr]
innit.com
purplecarrot.com
springlane.de
woolworths.com.au
woop.co.nz

As well as others. If you run the package with the default configuration the PLUGINS setting setup will search for nutrients data on it's own. And try to fetch nutrients magically. For example scrape_me('https://www.allrecipes.com/recipe/222282/picnic-marinated-summer-slaw/').nutrients() will work even though allrecipes is not listed above.

However, this package leans on data being available on the site it's scraping from. So high chance there's no nutrients data available to begin with.


That being said nutrients data is a broader topic. In my opinion worthy of a separate package with it's own database (or using myfitnesspal API or sth). Package that normalizes the quantities and calculates nutrients based on the given ingredients alone. Something I'd like to tackle but also not on my radar for the next months to come. The .nutrients() method in this package may not be your thing - depends on what you intend to do with the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants