Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categories - Pâtés et Pates are merged together #458

Open
Tracked by #10272
teolemon opened this issue Sep 8, 2016 · 7 comments
Open
Tracked by #10272

Categories - Pâtés et Pates are merged together #458

teolemon opened this issue Sep 8, 2016 · 7 comments
Assignees
Labels
🐛 bug This is a bug, not a feature request. categories 🧽 Data quality https://wiki.openfoodfacts.org/Quality P3 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies

Comments

@teolemon
Copy link
Member

teolemon commented Sep 8, 2016

What

fr:pâtés
fr:pates
are amalgamated when typing the category from world.off

Part of

@hangy
Copy link
Member

hangy commented Nov 12, 2016

Let's maybe simply remove the unconditional deaccenting of tags when they are canonicalized (ProductOpener::Store::unac_string_perl or ProductOpener::Store::get_fileid)? We had other reports where this yieled similar unintended results in German.

@teolemon teolemon added the 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies label Dec 11, 2016
@teolemon teolemon added categories 🐛 bug This is a bug, not a feature request. labels Dec 21, 2016
@teolemon teolemon added the P3 label Jan 22, 2017
@teolemon teolemon removed the 🎯 P1 label Feb 16, 2017
@teolemon teolemon added the 🧽 Data quality https://wiki.openfoodfacts.org/Quality label Oct 14, 2017
@hangy
Copy link
Member

hangy commented Jan 14, 2018

We should probably also use Unicode::Casing to support different languages properly (ie. Turkish I problem)?

@teolemon
Copy link
Member Author

@hangy probably. @maddingue knowledgeable about this ?

@stephanegigandet
Copy link
Contributor

For French we should keep the unaccenting, it helps in many cases, a lot of people type "boeuf" (I have no idea how to type the oe char in fact ;-) ). There are a few conflicts where 2 words that deaccent to the same string mean 2 different things, but they are very rare. One example is pâte and pâté.

@hangy
Copy link
Member

hangy commented Mar 7, 2019

One problem is that get_fileid does not have a language/country for context. äöü shouldn't be replaced for a de locale, for sure. There's just too much potential for conflict, and noone with a German keyboard layout writes "Doener" instead of "Döner".

Unconditional unaccenting of é to e for other languages than French might still cause conflicts. I honestly don't know enough about all languages to know how ie. a native Hungarian speaker would handle that.

@teolemon
Copy link
Member Author

teolemon commented Nov 3, 2019

We can close this one, right ?

@hangy
Copy link
Member

hangy commented Nov 3, 2019

We can close this one, right ?

Depends. https://world.openfoodfacts.org/category/fr:p%C3%A2t%C3%A9s and https://world.openfoodfacts.org/category/fr:pates both redirect to https://world.openfoodfacts.org/category/pastas, as unaccenting is intentionally enabled for French:

fr => {
unaccent => 1,
lowercase => 1,
},

@hangy hangy assigned stephanegigandet and unassigned hangy Dec 25, 2019
@teolemon teolemon changed the title Pâtés et Pates are merged together Categories - Pâtés et Pates are merged together Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug This is a bug, not a feature request. categories 🧽 Data quality https://wiki.openfoodfacts.org/Quality P3 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies
Projects
Status: To discuss and validate
Development

Successfully merging a pull request may close this issue.

3 participants