Improve OCR recognition/processing of ingredients #3643
Labels
✨ Feature
Features or enhancements to Open Food Facts server
ingredient-list-cutting
🥗🔍 Ingredients analysis
https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis
🥗 Ingredients
OCR
⏰ Stale
This issue hasn't seen activity in a while. You can try documenting more to unblock it.
The OCR is pretty good but a high proportion of the unknown ingredients are as a result of very minor errors such as a missing comma between ingredients or common characters such as 'o' and 'a', 'i' and 'l' and 'rn (R N)' and 'm' being mis-interpreted.
I don't know if there is any processing between the extraction of words from the image and the words added to the list of ingredients but if there is then this would be a perfect place to pre-analyse the words.
As the language is generally known then running the word through the ingredients analysis for that language would determine if it was recognized or not, Substitutions of letters and attempts with split words (in the case of missing commas) could then be made to see if a match could be found. As the errors are fairly consistent this could be a fairly rapid process.
Part of
The text was updated successfully, but these errors were encountered: