-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expiration date format #22
Comments
Indeed there is a preprocessing going on, first to check that the candidate is really a date, and to normalize the date format. However only dates of format See https://github.com/openfoodfacts/robotoff/blob/master/robotoff/insights/ocr/expiration_date.py for more info. |
The regex |
Yes indeed, as I said above robotoff matches dates of format |
Sorry, I must've misread. In that case, robotoff shouldn't replace the separators.
I usually do use ISO when editing manually, because it's the only consistent syntax. 😁 But yes, there may be differences. There's an old openfoodfacts-server issue about normalizing the date, but it work on it hasn't been started, yet. |
In https://de.openfoodfacts.org/produkt/4311501619872/harzer-minis-gut-gunstig?rev=20 the expiration date was updated to |
I've fixed the normalization issue by normalizing dates to ISO 8601 in 836b4eb. Regarding the other issue you mention, I think we could change the detection pattern given the detected language on the image. If most words returned by the OCR are detected as german -> |
Looks good, thank you!
That's probably a good idea. Wikipedia lists several interesting date patterns that could be used for parsing. |
I noticed that revision 29, robotoff added the expiration date
14/06/2019
. In the JSON file for my uploaded picture, the date is written as14.06.2019
(dd.mm.yyyy). Clearly, there's some kind of processing going on. In my opinion, the date should either be written in a format in the language of the uploaded picture, or normalized to ISO 8601, so that consumers don't need to play a guessing game about which digit is a day and which is a month. I prefer ISO 8601 for all languages.The text was updated successfully, but these errors were encountered: