-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
taxonomy: wikidata housekeeping #7311
taxonomy: wikidata housekeeping #7311
Conversation
…hat is of subclass 'root vegetable' and has an OFF ingredient ID correspondence
taxonomies/ingredients.txt
Outdated
@@ -44517,7 +44517,7 @@ pl:Burak | |||
ru:Свёкла | |||
sk:cvikly | |||
sv:rödbeta, rödbetor, rödbets | |||
wikidata:en:Q165437 | |||
wikidata:en:Q99548274 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I understand that squashing commits is recommended in the contributing guide, I thought that for review purposes it might make sense to keep individual data edits separate (so that they're explainable, reviewable and revertible individually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say it's up to you to group sensible commit in a single PR and avoid a PR mixing a lot of thing.
It will also help merging PR faster, as if you mix multiple subject, and one is either controversial or problematic, it will block all the other.
(ah: I didn't realize that scans run on every git push during pull requests; I'll try to keep the number of commit batches pushed to a reasonable level) |
Sorry, I've already failed at not pushing individual commits; my learned habit is to want to push commits continually. I'll try to avoid doing that again. Something I've seen work in other high-commit-volume repositories is to disable some tests/scans for pull requests that are in the 'draft' status. |
They should not. The automated tests will build them to run the tests, and we periodically create PRs to just build the new taxonomies. |
Thanks a lot for explaining what you are doing and contributing to the taxonomies @jayaddison ! |
Some taxonomy changes, especially for ingredients, will have effects on tests (e.g. changing the ingredients analysis, the Nutri-Score etc.), so it can be a good thing to see their result for work in progress. |
Thanks @stephanegigandet! You're welcome - I hope that the changes will be mutually beneficial. After reading some wiki content in #7315 I'm learning a bit more about the taxonomies, and also data quality. I hope to join the #taxonomies channel within the next few days; I don't know how active I'll be in there, but it'd be good to say hello to everyone. |
…ties merged upstream)
…hat is of subclass 'root vegetable'
… from previous WDQID
…t is of subclass 'seafood'
Kudos, SonarCloud Quality Gate passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you so much @jayaddison :-)
Thanks @alexgarel! I'll try to repeat this process in future, although I don't know when, yet. |
What
During development of RecipeRadar, a recipe search engine (also AGPLv3-licensed), I'm performing some manual data entry/refinement to associate ingredient names with WikiData entities. Where possible I'm linking to entities that are subclasses of
food
orfood ingredient
in WikiData -- and ideally where corresponding OpenFoodFacts ingredient IDs exist.During this process I'd like to provide possible corrections to the WikiData and OpenFoodFacts datasets - similar to the way that open source code bugfixes are upstreamed.
As a result, this pull request is a work-in-progress for some WikiData-related housekeeping in the OpenFoodFacts ingredients taxonomy.
This is currently a work-in-progress; I'd expect it'll take a few more days to work through the entire set of ingredient names I have available locally. I'm opening this early to see whether this approach seems to make sense, and to gather feedback.
One open question: should the
taxonomies/ingredients.result.sto
andtaxonomies/ingredients.result.txt
be updated during this pull request?