Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve OCR recognition/processing of ingredients #3643

Open
Tracked by #9096
AcuarioCat opened this issue Jun 16, 2020 · 3 comments
Open
Tracked by #9096

Improve OCR recognition/processing of ingredients #3643

AcuarioCat opened this issue Jun 16, 2020 · 3 comments
Labels
✨ Feature Features or enhancements to Open Food Facts server ingredient-list-cutting 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients OCR ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it.

Comments

@AcuarioCat
Copy link
Contributor

AcuarioCat commented Jun 16, 2020

The OCR is pretty good but a high proportion of the unknown ingredients are as a result of very minor errors such as a missing comma between ingredients or common characters such as 'o' and 'a', 'i' and 'l' and 'rn (R N)' and 'm' being mis-interpreted.

I don't know if there is any processing between the extraction of words from the image and the words added to the list of ingredients but if there is then this would be a perfect place to pre-analyse the words.

As the language is generally known then running the word through the ingredients analysis for that language would determine if it was recognized or not, Substitutions of letters and attempts with split words (in the case of missing commas) could then be made to see if a match could be found. As the errors are fairly consistent this could be a fairly rapid process.

Part of

@AcuarioCat AcuarioCat added ✨ Feature Features or enhancements to Open Food Facts server 🥗 Ingredients OCR ingredient-list-cutting 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis labels Jun 16, 2020
@aleene
Copy link
Contributor

aleene commented Jun 16, 2020

I am not sure, but I think the latest spelling improvements are now tested out in french only. Check out #spelling on Slack.

@github-actions github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jan 15, 2021
@teolemon teolemon removed the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jul 19, 2021
@teolemon
Copy link
Member

@rbournhonesque is currently reprocessing old OCRs. This should help.

Copy link
Contributor

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts

@github-actions github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ Feature Features or enhancements to Open Food Facts server ingredient-list-cutting 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients OCR ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it.
Projects
Status: To discuss and validate
Development

No branches or pull requests

3 participants