-
-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load packager codes for Slovenia #8958
Comments
I use tabula, which is a java tool, for taking apart tables in a PDF from a government report here. I do not use the latest version as I do not like the way one has to call it, so I use tabula-1.0.4-SNAPSHOT-jar-with-dependencies.jar. I can try something with it later today. Unless someone else gets to it first. |
I downloaded the pdf file and ran:
I got a tsv file, which I uploaded (with a copy of the pdf) to https://opencalaccess.org/OFF/slovenia_packaging/ and you can download it from there. But I will look to see what fixes need to be made to the tsv file. If this is a workable approach, how can this be invoked? Will it be something that is done only once, or will it need to be repeated? Will it need to be done automatically? |
This is great @rkiddy ! |
Is the TSV sufficient? Are you able to process that? Is anything else needed? Are there going to be updates to the PDF that will have to be tracked? Or is this ticket done for now? |
Remaining tasks:
Last two steps are pretty easy, similar to previous pull request for Croatia. |
The addresses seem as though they usually have a relationship to each other, but they are different. Can we ask someone to review the Slovenian documentation here? It seems as though one address could be a physical location, such as where something gets delivered, and the other could be the address of the managing business, or something like that. If there are two different addresses there, and they represent different kinds of something, which one do we want to keep? There is no way for us to guess. |
The lists usually have the company headquarters address and the manufacturing location address. The codes are per manufacturing location so that's the one you want. I'm not Slovenian nor speak the language, but the second address seems to be the actual location. I'm basing this on companies that appear multiple times have the same first address but different second address. For example 'MERCATOR D.O.O.' always has 'DUNAJSKA CESTA 107, 1000 LJUBLJANA' as the first address, but the second varies. Same order seems to apply to the title field. |
There are some duplicated packaging codes (SI 873, for example). @rkiddy, to explain a bit more how it is working.
and exported, and used in Display.pm) Hence, this issue, is about a) updating these packager_codes.sto and geocode_adresses.sto files, b) update the display.pm file.
First line to enter in docker and second line to recreate the file (script will run inside docker).
See here for example: |
### What packaging codes adds Ireland ### Screenshot ![Screenshot_20240710_173536](https://github.com/openfoodfacts/openfoodfacts-server/assets/110821832/e0eb280e-5018-4daa-be72-cf0e48256762) ### Related issue(s) and discussion Part of #338 More examples: #8921, #8958, #10264, #10318, #10351, #10388, #10485: - lib/ProductOpener/Display.pm add description (name, street, city) based on columns in the file or hardcoded - lib/ProductOpener/PackagerCodes.pm add country and suffix of the code - scripts/update_packager_codes.pl add code formatting ('country' 'code' 'suffix', for example if code does not already contain 'country' or 'suffix') add the column name for the $code variable - packager-codes/ add the csv file (mind the naming) - scripts/packager-codes/ add your script - update sto files ``` docker exec -it po_off-backend-1 bash ./scripts/update_packager_codes.pl ``` Based on the experience acquired in previous PR, I did the following changes: -> switch from geocode to nominatim (+ no need of API key, +/- exactly same results) -> reintroduced cache (introduced for Slovenija, #10124, and not used afterward) -> handled whole process without manual intervention (to fetch files, _etc_.), using Excel to dataframe feature from polars and using beautiful soup, not sure that this will be possible to do the same for future countries but at least for that one it was successful. Fixes: #1572
### What packaging codes adds Ireland ### Screenshot ![Screenshot_20240710_173536](https://github.com/openfoodfacts/openfoodfacts-server/assets/110821832/e0eb280e-5018-4daa-be72-cf0e48256762) ### Related issue(s) and discussion Part of openfoodfacts#338 More examples: openfoodfacts#8921, openfoodfacts#8958, openfoodfacts#10264, openfoodfacts#10318, openfoodfacts#10351, openfoodfacts#10388, openfoodfacts#10485: - lib/ProductOpener/Display.pm add description (name, street, city) based on columns in the file or hardcoded - lib/ProductOpener/PackagerCodes.pm add country and suffix of the code - scripts/update_packager_codes.pl add code formatting ('country' 'code' 'suffix', for example if code does not already contain 'country' or 'suffix') add the column name for the $code variable - packager-codes/ add the csv file (mind the naming) - scripts/packager-codes/ add your script - update sto files ``` docker exec -it po_off-backend-1 bash ./scripts/update_packager_codes.pl ``` Based on the experience acquired in previous PR, I did the following changes: -> switch from geocode to nominatim (+ no need of API key, +/- exactly same results) -> reintroduced cache (introduced for Slovenija, openfoodfacts#10124, and not used afterward) -> handled whole process without manual intervention (to fetch files, _etc_.), using Excel to dataframe feature from polars and using beautiful soup, not sure that this will be possible to do the same for future countries but at least for that one it was successful. Fixes: openfoodfacts#1572
### What packaging codes adds Luxembourg ### Screenshot BEFORE -> AFTER ![Screenshot_20240719_172512](https://github.com/user-attachments/assets/7fc6b545-6bfb-4b29-9219-7bcb17bc4827) ### Related issue(s) and discussion Part of #338 More examples: #8921, #8958, #10264, #10318, #10351, #10388, #10485, #10533: - lib/ProductOpener/Display.pm add description (name, street, city) based on columns in the file or hardcoded - lib/ProductOpener/PackagerCodes.pm add country and suffix of the code - scripts/update_packager_codes.pl add code formatting ('country' 'code' 'suffix', for example if code does not already contain 'country' or 'suffix') add the column name for the $code variable - packager-codes/ add the csv file (mind the naming) - scripts/packager-codes/ add your script - update sto files ``` docker exec -it po_off-backend-1 bash ./scripts/update_packager_codes.pl ``` Fixes: #331
What
Part of
The text was updated successfully, but these errors were encountered: