-
-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add slovenian packager codes #10124
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @benbenben2, that's really great !
packager-codes/geocode_addresses.sto
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephanegigandet we do currently store it in the code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess yes, this hasn't been touched in ages.
|
||
# SI M-1035 SI | ||
if input_code.endswith('SI'): | ||
input_code = input_code.replace(' SI', '').strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you did not put the space above (for ES) but you put it here ?
Maybe you want to support "-" oh "_" etc. In this case we could use a regexp with world delimiter:
re.sub(r"\b(SI|ES)$", "", "test SI").strip()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, typo
I applied suggestions
# fetch last occurence | ||
# words 123A, place, 4567 city name | ||
# Á found in a city name (PROSENJAKOVCI -PÁRTOSFALVA) | ||
pattern = r'(([a-zčćžđšA-ZČĆŽĐŠŽ\s\-\.]+\d+[ABCDEFGIJ]?),(?:[a-zčćžđšA-ZČĆŽĐŠŽ\s\-\.\<\>]+,\s*)?[\<\>]*(\s*\d{4}[a-zčćžđšA-ZČĆŽĐŠŽÁ\s\-\.\<\>]+)$)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Could you eventually use verbose flag and make it understandable ?
Also using named group for capture would be easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also instead of [a-zčćžđšA-ZČĆŽĐŠŽ\s\-\.]
why not use [\w\s\-\.]
that you can even write t[\w\s.-]
("." does not need to be escaped in intervals, as well as "-" if it is the last character)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using \w, but could not achieve same result.
It may be due to \w including numbers. While I use \d{4} to recognize postal code.
For example, this input:
KIDRIČEVA CESTA 63A, 4220 ŠKOFJA LOKA
FUŽINSKA ULICA 1, 4220 ŠKOFJA LOKA"
Should lead to
FUŽINSKA ULICA 1, 4220 ŠKOFJA LOKA"
|
||
def convert_address_to_lat_lng(address_to_convert: str) -> str: | ||
# free plan: 1 request per second | ||
sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are patient, with 1s per request, I would have made a dbm cache to avoid issuing same requests again ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied suggestions
For the fun. There are only 11 addresses repeated. It spared 11 seconds. But this is cool, and can be used for other countries woth bigger files.
|
||
|
||
|
||
file_name = "slovenian_packaging_raw.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always good to put such sections in a if __name__ == "__main__"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied suggestions
|
* feat_add_si_packager_codes * applies suggestions
### What packaging codes adds Ireland ### Screenshot ![Screenshot_20240710_173536](https://github.com/openfoodfacts/openfoodfacts-server/assets/110821832/e0eb280e-5018-4daa-be72-cf0e48256762) ### Related issue(s) and discussion Part of #338 More examples: #8921, #8958, #10264, #10318, #10351, #10388, #10485: - lib/ProductOpener/Display.pm add description (name, street, city) based on columns in the file or hardcoded - lib/ProductOpener/PackagerCodes.pm add country and suffix of the code - scripts/update_packager_codes.pl add code formatting ('country' 'code' 'suffix', for example if code does not already contain 'country' or 'suffix') add the column name for the $code variable - packager-codes/ add the csv file (mind the naming) - scripts/packager-codes/ add your script - update sto files ``` docker exec -it po_off-backend-1 bash ./scripts/update_packager_codes.pl ``` Based on the experience acquired in previous PR, I did the following changes: -> switch from geocode to nominatim (+ no need of API key, +/- exactly same results) -> reintroduced cache (introduced for Slovenija, #10124, and not used afterward) -> handled whole process without manual intervention (to fetch files, _etc_.), using Excel to dataframe feature from polars and using beautiful soup, not sure that this will be possible to do the same for future countries but at least for that one it was successful. Fixes: #1572
### What packaging codes adds Ireland ### Screenshot ![Screenshot_20240710_173536](https://github.com/openfoodfacts/openfoodfacts-server/assets/110821832/e0eb280e-5018-4daa-be72-cf0e48256762) ### Related issue(s) and discussion Part of openfoodfacts#338 More examples: openfoodfacts#8921, openfoodfacts#8958, openfoodfacts#10264, openfoodfacts#10318, openfoodfacts#10351, openfoodfacts#10388, openfoodfacts#10485: - lib/ProductOpener/Display.pm add description (name, street, city) based on columns in the file or hardcoded - lib/ProductOpener/PackagerCodes.pm add country and suffix of the code - scripts/update_packager_codes.pl add code formatting ('country' 'code' 'suffix', for example if code does not already contain 'country' or 'suffix') add the column name for the $code variable - packager-codes/ add the csv file (mind the naming) - scripts/packager-codes/ add your script - update sto files ``` docker exec -it po_off-backend-1 bash ./scripts/update_packager_codes.pl ``` Based on the experience acquired in previous PR, I did the following changes: -> switch from geocode to nominatim (+ no need of API key, +/- exactly same results) -> reintroduced cache (introduced for Slovenija, openfoodfacts#10124, and not used afterward) -> handled whole process without manual intervention (to fetch files, _etc_.), using Excel to dataframe feature from polars and using beautiful soup, not sure that this will be possible to do the same for future countries but at least for that one it was successful. Fixes: openfoodfacts#1572
What
Added Slovenian packager codes.
Instructions to recreate the packager codes are given in the python file.
Screenshot
Related issue(s) and discussion