Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoreplace " by * for Organic Ingredients in OCR output #1906

Open
teolemon opened this issue Jun 9, 2019 · 5 comments
Open

Autoreplace " by * for Organic Ingredients in OCR output #1906

teolemon opened this issue Jun 9, 2019 · 5 comments

Comments

@teolemon
Copy link
Member

teolemon commented Jun 9, 2019

Autoreplace " by * when "ingredients issus de l'agriculture biologique" is detected

Farine de pois chiche (26%), farine de mais, semoule de mais, farine de riz", tomates/basie 8% (farine de mais, tomate", basilic", oignon', sel), huile de tournesol".

@teolemon teolemon added 🐛 bug This is a bug, not a feature request. OCR labels Jun 9, 2019
@stephanegigandet
Copy link
Contributor

Which product is this? How common is this pattern?

We currently do not support identifying organic ingredients marked with a *.

@teolemon
Copy link
Member Author

teolemon commented Jun 9, 2019

https://world.openfoodfacts.org/product/3770008009417/l-apero-boules-tomate-basilic-bio-funky-veggie
It would be just a replace at OCR time

if "ingredients issus de l'agriculture biologique" in string:
REPLACE(/word", >> /word*,)

@aleene
Copy link
Contributor

aleene commented Jun 9, 2019

Note that not all * indicates bio, I identified already two other uses.

@stephanegigandet
Copy link
Contributor

There are also products that use more than one symbol (e.g. organic + fair trade), sometimes they use a small upper 1, which can look like a ' from an ocr point of view.

Before doing anything, we should look at many OCR results samples to see how common this is and what we should do exactly in which case.

@stephanegigandet stephanegigandet added needs data analysis and removed 🐛 bug This is a bug, not a feature request. labels Jun 13, 2019
@stephanegigandet
Copy link
Contributor

from @teolemon :

https://fr.openfoodfacts.org/produit/26017341/muesli-chocolat-amarante-aldi-bon-et-bio

flocons d'avoine complet' 39%, amarante soufflée' 12%, chocolat au lait' 12% (sucre de canne', poudre de lait entier', beurre de cacao', pâte de cacao'), sucre de canne', farine d'avoine complet', semoule de maiïs', huile de tournesol', flocons de blé complet' 5%, farine d'épeautre complet', poudre de cacao maigre', flocons de noix de coco', miel', noisettes concassées', sel de mer. 'Ingrédients issus de l'agriculture biologique.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Status: To discuss and validate
Development

No branches or pull requests

4 participants