Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dutch/Estonian: dash is not always a separator #6122

Open
aleene opened this issue Nov 28, 2021 · 6 comments
Open

Dutch/Estonian: dash is not always a separator #6122

aleene opened this issue Nov 28, 2021 · 6 comments
Labels
🐛 bug This is a bug, not a feature request. 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis

Comments

@aleene
Copy link
Contributor

aleene commented Nov 28, 2021

Describe the bug

The dash should not always been interpreted as a separator. In dutch it is a way to limit repetitions.

Other examples:

  • mono- en diglyceriden van vetzuren veresterd met mono- en diacetylwijnsteenzuur

To Reproduce

See: https://nl.openfoodfacts.org/product/8718907369589/bloemenhoning-albert-heijn

Expected behavior

For instance: EU- en niet-EU-honing, should not be expanded to EU, niet-EU-honing, but should left untouched. In dutch this is interpreted as EU-honing, niet-EU-honing.

Additional context

A parse rule could be based on the surroundings of the dash:

  • dash must come directly after a caharacter;
  • dash is followed by a space and then the characters en;
  • (more to follow)

Number of products impacted

Happens quite often.

Part of

@aleene aleene added the 🐛 bug This is a bug, not a feature request. label Nov 28, 2021
@stephanegigandet
Copy link
Contributor

It's strange, if I remove the extra space in "gemengde EU - en niet-EU-honing" (before the first EU), it's added back.

@aleene
Copy link
Contributor Author

aleene commented Dec 1, 2021

Another example:

  • gerehydrateerd SOJA- en TARWE-EIWIT: is broken in two

@aleene aleene changed the title Dutch: dash is not always a separator Dutch/Estonian: dash is not always a separator Dec 2, 2021
@aleene
Copy link
Contributor Author

aleene commented Dec 2, 2021

In Estonian the same language construction is used:

  • lõhna- ja maitseaine

See also https://et.wiktionary.org/wiki/flavoring Both mean flavouring of some sort

@teolemon teolemon added the 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis label Dec 28, 2021
@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Mar 29, 2022
@aleene
Copy link
Contributor Author

aleene commented Apr 12, 2022

Another interesting product: https://nl.openfoodfacts.org/product/8712800025665/brownie-mona .
in this case an ingredient is not parsed at all.

@alexgarel
Copy link
Member

Oups sorry I inadvertently removed ingredients on above product, but restore them thereafter !

@github-actions github-actions bot removed the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug This is a bug, not a feature request. 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis
Projects
Status: To discuss and validate
Development

No branches or pull requests

4 participants