Improve OCR recognition/processing of ingredients #3643

AcuarioCat · 2020-06-16T11:22:27Z

The OCR is pretty good but a high proportion of the unknown ingredients are as a result of very minor errors such as a missing comma between ingredients or common characters such as 'o' and 'a', 'i' and 'l' and 'rn (R N)' and 'm' being mis-interpreted.

I don't know if there is any processing between the extraction of words from the image and the words added to the list of ingredients but if there is then this would be a perfect place to pre-analyse the words.

As the language is generally known then running the word through the ingredients analysis for that language would determine if it was recognized or not, Substitutions of letters and attempts with split words (in the case of missing commas) could then be made to see if a match could be found. As the errors are fairly consistent this could be a fairly rapid process.

Part of

Add more ingredient analysis (tracker) #5706

aleene · 2020-06-16T11:54:35Z

I am not sure, but I think the latest spelling improvements are now tested out in french only. Check out #spelling on Slack.

teolemon · 2022-10-12T19:06:27Z

@rbournhonesque is currently reprocessing old OCRs. This should help.

github-actions · 2024-02-23T00:07:41Z

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts

AcuarioCat added ✨ Feature Features or enhancements to Open Food Facts server 🥗 Ingredients OCR ingredient-list-cutting 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis labels Jun 16, 2020

github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jan 15, 2021

teolemon removed the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Jul 19, 2021

teolemon mentioned this issue Oct 12, 2022

Add more ingredient analysis (tracker) #5706

Open

teolemon mentioned this issue Oct 2, 2023

Improve ingredient parsing (tracker) #9096

Open

1 task

github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve OCR recognition/processing of ingredients #3643

Improve OCR recognition/processing of ingredients #3643

AcuarioCat commented Jun 16, 2020 •

edited by teolemon

Loading

aleene commented Jun 16, 2020

teolemon commented Oct 12, 2022

github-actions bot commented Feb 23, 2024

Improve OCR recognition/processing of ingredients #3643

Improve OCR recognition/processing of ingredients #3643

Comments

AcuarioCat commented Jun 16, 2020 • edited by teolemon Loading

Part of

aleene commented Jun 16, 2020

teolemon commented Oct 12, 2022

github-actions bot commented Feb 23, 2024

AcuarioCat commented Jun 16, 2020 •

edited by teolemon

Loading