Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy: improve taxonomy for products available in Croatia #8140

Merged
merged 13 commits into from
Mar 1, 2023

Conversation

benbenben2
Copy link
Collaborator

updated taxonomies for products found in Croatia (include Serbian, Bosnian, Slovenian, German, etc. languages)

@benbenben2 benbenben2 requested a review from a team as a code owner February 25, 2023 18:33
@benbenben2 benbenben2 self-assigned this Feb 25, 2023
@github-actions github-actions bot added 🧪 additives 🥜 Allergens categories 📚 Documentation Documentation issues improve the project for everyone. 🥗 Ingredients 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies labels Feb 25, 2023
@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2023

Codecov Report

Merging #8140 (19f79e8) into main (70a4926) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #8140   +/-   ##
=======================================
  Coverage   45.56%   45.56%           
=======================================
  Files         102      102           
  Lines       20279    20279           
  Branches     4632     4632           
=======================================
  Hits         9240     9240           
  Misses       9942     9942           
  Partials     1097     1097           
Impacted Files Coverage Δ
lib/ProductOpener/Ingredients.pm 90.66% <ø> (ø)
tests/unit/additives.t 97.91% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@github-actions github-actions bot removed 🧪 tests 📚 Documentation Documentation issues improve the project for everyone. labels Feb 27, 2023
@benbenben2
Copy link
Collaborator Author

Why did you change this test? @stephanegigandet

  • Long answer:

After adding so many entries in the taxonomies I encountered couple of errors regarding duplicated entries (for example, an entry in additives that was already in minerals).
Then, I worked on solving duplicated entries that were listed in perl unit tests (GitHub workflow).

In the logs there were also duplicates for entries not related to the changes I did (some fr and fi entries, mainly. I remember for vitamin c for these two languages that are in both additives and in vitamins)
And that was a challenge because some of those were also used in unit tests.
I tried to update unit tests according to the changes that I did to remove duplicates.
At some point there were failures on both unit test and perl unit test github workflows, or when it was not on one workflow it was failing on the other one. Since I could not find out a proper way to fix those,(1)(2) I rolled back the changes for entries not related to the changes I did.

Phosphate de calcium was in minerals and ingredients and that is good that you noticed that I forgot these two.

(1) could it be that one of the workflow (unit test) is related to the taxonomies files whereas the other workflow (perl unit test) is related to something else like ingredients.all ?
(2) why the duplicates that I created resulted into workflow failure whereas other duplicates (which appears in the logs at each pull request: checks, pull requests, perl unit tests, raw logs) are not resulting into failure?

- Short answer:

I messed up. I just fixed that.

- Rolled back changes

In any case, I rolled back changes as follow:

  • removed comment In minerals:
    replaced:
 # commented because of duplicate with en:phosphate in ingredients.txt
# synonyms:fr:orthophosphate, phosphate

by:

synonyms:fr:orthophosphate, phosphate
- put back the test as it was before.
  • removed comment In additives.t:
    replaced:
    ingredients_text => "huile de colza, phosphates de calcium, carbonate de calcium, citrates de potassium"
    by:
    ingredients_text => "huile de colza, orthophosphates de calcium, carbonate de calcium, citrates de potassium"

I guess other duplicates entries not related to the changes I did could be addressed in different pull request.
Same goes for schema description.

@sonarcloud
Copy link

sonarcloud bot commented Feb 27, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@stephanegigandet
Copy link
Contributor

@benbenben2 So for ingredients, it is expected to have duplicates, as some ingredients can be both minerals and additives for instance. When we concatenate them into ingredients.all, there will be duplicates. But we ignore duplicates for ingredients, in the sense that they won't make the tests fail.

@benbenben2
Copy link
Collaborator Author

I don't know, I misunderstood the logs after the failed tests. If you have time you can have a look at it (you can see the logs by scrolling here above, between "Merge branch 'main' into taxonomy_HR_ingredients_20230221" and "a69971f", if you click on the red cross, then "Details" next to "Pull Requests / Perl unit tests (pull_request)") and tell me what was wrong because I am a bit curious and would like to understand. If you don't have time let's forget about it.

Anyway, I applied changes since Yesterday. Let me know if something else need to be changed.

@stephanegigandet
Copy link
Contributor

I don't know, I misunderstood the logs after the failed tests. If you have time you can have a look at it (you can see the logs by scrolling here above, between "Merge branch 'main' into taxonomy_HR_ingredients_20230221" and "a69971f", if you click on the red cross, then "Details" next to "Pull Requests / Perl unit tests (pull_request)") and tell me what was wrong because I am a bit curious and would like to understand. If you don't have time let's forget about it.

For this test, the error that makes the test fail is this one:

Errors in the ingredients_processing taxonomy definition:
ERROR - hr:pržene already is associated to en:fried - hr:pržene cannot be mapped to entry en:roasted
Errors in the ingredients_processing taxonomy definition at /opt/product-opener/lib/ProductOpener/Tags.pm line 924.

The corresponding code in Tags.pm explains a bit what happens:

		if ($errors ne "") {

			print STDERR "Errors in the $tagtype taxonomy definition:\n";
			print STDERR $errors;
			# Disable die for the ingredients taxonomy that is merged with additives, minerals etc.
			# Temporarily (hopefully) disable die for the categories taxonomy, to give time to fix issues
			unless (($tagtype eq "ingredients") or ($tagtype eq "categories")) {
				die("Errors in the $tagtype taxonomy definition");
			}
		}

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@stephanegigandet stephanegigandet merged commit d986c22 into main Mar 1, 2023
@stephanegigandet stephanegigandet deleted the taxonomy_HR_ingredients_20230221 branch March 1, 2023 12:59
@teolemon teolemon added the 🇭🇷 Croatia https://hr.openfoodfacts.org/ label Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧪 additives 🥜 Allergens categories 🇭🇷 Croatia https://hr.openfoodfacts.org/ 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants