Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy: Added unknown Croatian ingredients to the taxonomy (part 9) #9236

Merged
merged 7 commits into from
Nov 21, 2023

Conversation

benbenben2
Copy link
Collaborator

What

  • added HR in taxonomy

  • seems to me that we could make difference between:

    • filling (garniture in FR) that is inside a preparation (like in doughnut)
    • coating (nappage in FR) that is around a preparation

    I tried to apply changes accordingly

  • Also wondering if we could merge preparation and filling? Can we say that preparations are fillings? As it is not clear to me (there is term en:compound as well), I made all preparations as children of en:preparation (although preparation alone does not exists).

Related issue(s) and discussion

  • Fixes #-none-

@benbenben2 benbenben2 requested a review from a team as a code owner November 2, 2023 19:20
@benbenben2 benbenben2 self-assigned this Nov 2, 2023
@github-actions github-actions bot added 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🥗 Ingredients 🧪 additives 🥜 Allergens labels 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis labels Nov 2, 2023
@benbenben2 benbenben2 marked this pull request as draft November 3, 2023 17:23
@@ -1851,6 +1855,32 @@ hr:rafinirano, djelomično rafinirano
#fr:Dessucré, partiellement dessucré
#hr:bez šećera, sa smanjenom količinom šećera

en:Homogenization, homogenisation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this entry should be "Homogenized", not "Homogenization"

@@ -208,6 +209,7 @@ pl:proszek karmelowy, karmel w proszku
pt:caramelo em pó
# ingredient/fr:caramel-en-poudre has 69 products in 5 languages @2019-03-09

<en:preparation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this change, but "caramel chocolate preparation" should not be under E150 I think

@stephanegigandet
Copy link
Contributor

  • seems to me that we could make difference between:

    • filling (garniture in FR) that is inside a preparation (like in doughnut)
    • coating (nappage in FR) that is around a preparation

    I tried to apply changes accordingly

  • Also wondering if we could merge preparation and filling? Can we say that preparations are fillings? As it is not clear to me (there is term en:compound as well), I made all preparations as children of en:preparation (although preparation alone does not exists).

Another option woul be to make "preparation", "filling" and "coating" ingredients processing instead. That way we don't have to add entries for basically all fruits, vegetables etc. ("strawberry preparation" and so on) and their translations in all languages.

We will have to be careful when counting % of fruits and vegetables (e.g. "strawberry preparation 50% (strawberries, water, sugar)" should not count as 50% of fruits), but it can be done easily.

@teolemon teolemon added the 🇭🇷 Croatia https://hr.openfoodfacts.org/ label Nov 6, 2023
@aleene
Copy link
Contributor

aleene commented Nov 6, 2023

In principle I am not in favour to put it in processing, as the idea was that a processing step would not influence the characteristics of an ingredient. For filling this is not the case (strawberry and strawberry filling are only related on one ingredient.
However we are talking here about compounds: an X filling can have multiple ingredients. So the question is more what to do with compounds? As an ingredient this is unimportant. For percentage determination maybe. For analysis purpose we do not care where it is a filling or whatever. It messes up the ingredients taxonomy and could just be replaced by the word compound. It belongs more to a list of "ingredients" to be ignored.

@benbenben2
Copy link
Collaborator Author

Good point from @aleene
Would that be an option to create a new .txt file for compounds?

"strawberry filling" may be related to more than one ingredient. For example, this "apple filling" (Punjenje od jabuke) contains more than apple: https://hr.openfoodfacts.org/product/3856021206184/pita-of-jabuka-s-budget

@stephanegigandet
Copy link
Contributor

Some queries that are useful to see what kind of fillings etc. we have in ingredients lists:

https://uk.openfoodfacts.org/ingredients?filter=filling

https://fr.openfoodfacts.org/ingredients?filter=preparation

@stephanegigandet
Copy link
Contributor

We could have a compound taxonomy that we use for ingredient parsing. Whenever we see "[something] preparation" (or filling, cover etc.), we map it to an ingredient id "preparation". Same for "préparation [something]" in French etc.

Today we have things like:

{
id: "fr:preparation-a-base-de-poivron",
ingredients: [
],
percent: 5.1,
percent_estimate: 5.1,
percent_max: 5.1,
percent_min: 5.1,
text: "préparation à base de poivron"
},

And we would replace them with:

{
id: "en:preparation",
type: "compound",
ingredients: [
],
percent: 5.1,
percent_estimate: 5.1,
percent_max: 5.1,
percent_min: 5.1,
text: "préparation à base de poivron"
},

@benbenben2 benbenben2 marked this pull request as ready for review November 7, 2023 16:47
@teolemon teolemon changed the title taxonomy: tax_hr_unknown_ingred_9 taxonomy: Added unknown Hungarian ingredients to the taxonomy (part 9) Nov 9, 2023
@benbenben2 benbenben2 changed the title taxonomy: Added unknown Hungarian ingredients to the taxonomy (part 9) taxonomy: Added unknown Croatian ingredients to the taxonomy (part 9) Nov 14, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 17, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8d8be8e) 48.68% compared to head (f872c66) 48.68%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9236   +/-   ##
=======================================
  Coverage   48.68%   48.68%           
=======================================
  Files          65       65           
  Lines       20268    20268           
  Branches     4896     4896           
=======================================
  Hits         9867     9867           
  Misses       9141     9141           
  Partials     1260     1260           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benbenben2
Copy link
Collaborator Author

This is a good idea.

@stephanegigandet
Copy link
Contributor

@benbenben2 I think we can merge as-is (maybe change "homogenization" to "homogenized" in the ingredients processing first).

I'm filing a bug for the proposal to deal with compounds.

@stephanegigandet
Copy link
Contributor

New issue for the parsing of "xyz preparation": #9345

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@stephanegigandet stephanegigandet enabled auto-merge (squash) November 21, 2023 08:17
Copy link

sonarcloud bot commented Nov 21, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@stephanegigandet stephanegigandet merged commit 8e88392 into main Nov 21, 2023
13 checks passed
@stephanegigandet stephanegigandet deleted the tax_hr_unknown_ingred_9 branch November 21, 2023 08:54
danwyk pushed a commit to danwyk/openfoodfacts-server that referenced this pull request Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧪 additives 🥜 Allergens Allergens 🇭🇷 Croatia https://hr.openfoodfacts.org/ 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients ingredients labels 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants