Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy: wikidata housekeeping #7311

Merged
merged 26 commits into from
Oct 10, 2022
Merged

taxonomy: wikidata housekeeping #7311

merged 26 commits into from
Oct 10, 2022

Conversation

jayaddison
Copy link
Contributor

What

During development of RecipeRadar, a recipe search engine (also AGPLv3-licensed), I'm performing some manual data entry/refinement to associate ingredient names with WikiData entities. Where possible I'm linking to entities that are subclasses of food or food ingredient in WikiData -- and ideally where corresponding OpenFoodFacts ingredient IDs exist.

During this process I'd like to provide possible corrections to the WikiData and OpenFoodFacts datasets - similar to the way that open source code bugfixes are upstreamed.

As a result, this pull request is a work-in-progress for some WikiData-related housekeeping in the OpenFoodFacts ingredients taxonomy.

This is currently a work-in-progress; I'd expect it'll take a few more days to work through the entire set of ingredient names I have available locally. I'm opening this early to see whether this approach seems to make sense, and to gather feedback.

One open question: should the taxonomies/ingredients.result.sto and taxonomies/ingredients.result.txt be updated during this pull request?

@@ -44517,7 +44517,7 @@ pl:Burak
ru:Свёкла
sk:cvikly
sv:rödbeta, rödbetor, rödbets
wikidata:en:Q165437
wikidata:en:Q99548274
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I understand that squashing commits is recommended in the contributing guide, I thought that for review purposes it might make sense to keep individual data edits separate (so that they're explainable, reviewable and revertible individually).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say it's up to you to group sensible commit in a single PR and avoid a PR mixing a lot of thing.

It will also help merging PR faster, as if you mix multiple subject, and one is either controversial or problematic, it will block all the other.

@jayaddison jayaddison changed the title taxonomy: ingredients: wikidata housekeeping taxonomy: ingredients, categories: wikidata housekeeping Sep 6, 2022
@jayaddison
Copy link
Contributor Author

(ah: I didn't realize that scans run on every git push during pull requests; I'll try to keep the number of commit batches pushed to a reasonable level)

@jayaddison
Copy link
Contributor Author

Sorry, I've already failed at not pushing individual commits; my learned habit is to want to push commits continually. I'll try to avoid doing that again. Something I've seen work in other high-commit-volume repositories is to disable some tests/scans for pull requests that are in the 'draft' status.

@stephanegigandet
Copy link
Contributor

One open question: should the taxonomies/ingredients.result.sto and taxonomies/ingredients.result.txt be updated during this pull request?

They should not. The automated tests will build them to run the tests, and we periodically create PRs to just build the new taxonomies.

@stephanegigandet
Copy link
Contributor

Thanks a lot for explaining what you are doing and contributing to the taxonomies @jayaddison !
It may be useful to also join our #taxonomies channel on https://slack.openfoodfacts.org

@stephanegigandet
Copy link
Contributor

Something I've seen work in other high-commit-volume repositories is to disable some tests/scans for pull requests that are in the 'draft' status.

Some taxonomy changes, especially for ingredients, will have effects on tests (e.g. changing the ingredients analysis, the Nutri-Score etc.), so it can be a good thing to see their result for work in progress.

@jayaddison
Copy link
Contributor Author

Thanks @stephanegigandet! You're welcome - I hope that the changes will be mutually beneficial. After reading some wiki content in #7315 I'm learning a bit more about the taxonomies, and also data quality.

I hope to join the #taxonomies channel within the next few days; I don't know how active I'll be in there, but it'd be good to say hello to everyone.

@jayaddison jayaddison changed the title taxonomy: ingredients, categories: wikidata housekeeping taxonomies: wikidata housekeeping Sep 8, 2022
@jayaddison jayaddison changed the title taxonomies: wikidata housekeeping taxonomy: wikidata housekeeping Sep 8, 2022
@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Oct 6, 2022
@jayaddison jayaddison marked this pull request as ready for review October 7, 2022 14:21
@sonarcloud
Copy link

sonarcloud bot commented Oct 7, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@github-actions github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Oct 10, 2022
Copy link
Member

@alexgarel alexgarel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you so much @jayaddison :-)

@alexgarel alexgarel merged commit 212490a into openfoodfacts:main Oct 10, 2022
@jayaddison
Copy link
Contributor Author

Thanks @alexgarel! I'll try to repeat this process in future, although I don't know when, yet.

@jayaddison jayaddison deleted the ingredient-taxonomy/20220906-wikidata-edits branch October 10, 2022 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

3 participants