-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timezone problem #31
Comments
Any update on this issue? Many people using GeoBases are reporting this ;) |
From looking at the script that generates ori_por_public.csv: In my opinion, we should forget about the ori_tz_light file and get the timezone using the geolocation information. There are several ways to do this: http://stackoverflow.com/questions/16086962/how-to-get-a-time-zone-from-a-location-using-latitude-and-longitude-coordinates |
So I partially solved the problem, using ideas from your link. I realized that GeoBases contains the GeoNames cities (with
The script and its result are available at this gist. To reproduce the result, you need GeoBasesDev installed ( git clone https://gist.github.com/7a8a35c691fad5f170d8.git
cd 7a8a35c691fad5f170d8
python fix_tz.py # writes in tz_fixes.csv The format of the output is: $ head tz_fixes.csv
NDP,America/USA,America/Chicago,2.57
SPX,America/USA,America/Chicago,3.91
QPF,America/Brazil,Indian/Antananarivo,214.30
SPO,Europe/Spain,Europe/Madrid,7.67
NOH,America/USA,America/Chicago,1.77
HLR,America/USA,America/Chicago,5.83 4 fields:
My recommendation would be to integrate the new timezones as is in $ cat tz_fixes.csv | awk -F',' '{ if ($4 > 500) {print $0}}'
BAR,America/USA,America/Anchorage,1234.26
QDO,America/Brazil,Africa/Mogadishu,508.17
QDP,America/Brazil,Indian/Antananarivo,1004.01
DCK,America/USA,America/Anchorage,716.94
QLB,America/Brazil,Indian/Antananarivo,695.53
QPU,America/Brazil,Indian/Antananarivo,503.76
QIB,America/Brazil,Indian/Antananarivo,725.24
JON,Pacific/US_Minor_Islands,Pacific/Honolulu,1304.40
QGR,America/Greenland,America/Godthab,598.84
JON,Pacific/US_Minor_Islands,Pacific/Honolulu,1304.40
QGK,America/Brazil,Indian/Mauritius,1529.87
UQE,America/USA,America/Anchorage,543.02
ZQB,America/USA,America/Anchorage,1764.81
QMB,America/Brazil,Indian/Antananarivo,832.53
UNS,America/USA,America/Anchorage,1380.22
KPK,,America/Anchorage,541.99
JTI,America/Brazil,Atlantic/Stanley,827.49
MHL,America/USA,America/Anchorage,649.45
QEL,Australia/Unknown,Pacific/Rarotonga,1654.68 Another thing that worried me when developing the script: I ran it on por with valid timezones (a very simple change in the script), and found that actually 1816 of them had a match in GeoNames with a different timezone. This could mean that these 1816 have wrong coordinates (I checked a few of them manually, like XAI, who seems to have an reversed longitude but the right timezone). Right now I see no way to automatically fix this. |
For people wanting a quick fix of broken timezones, until this is properly merged into opentraveldata, here is how to do it with GeoBases. First download this file locally, then: from GeoBases import GeoBase
G = GeoBase(data='ori_por', verbose=False)
TZ_FILE = op.join(op.abspath(op.dirname(__file__)), 'tz_fixes.csv')
TZ_FIXES = {}
with open(TZ_FILE ) as f:
for row in f:
iata, _, tz, _ = row.rstrip().split(',', 3)
TZ_FIXES[iata] = tz
for por in G:
iata = G.get(por, 'iata_code')
if iata in TZ_FIXES:
# Now G has no longer the broken timezone
G.set(por, timezone=TZ_FIXES[iata]) |
Many thanks, Alex! Please note that the optd project has been deprecated a few month ago. The currently mainstream one is now opentraveldata. Nevertheless, I will try to incorporate your fix, one way or the other, into the data processing code. |
Note that the execution of the fix_tz.py script fails, apparently with the MLC record: Traceback (most recent call last):
File "fix_tz.py", line 45, in <module>
main()
File "fix_tz.py", line 28, in main
for p, p_tz, p_iata, p_city, p_geocode in pors_with_unk_tz(db_oripor):
File "fix_tz.py", line 14, in pors_with_unk_tz
p_city = db_oripor.get(p, 'city_name_list')[0]
File "/home/build/.local/lib/python2.7/site-packages/GeoBases-4.23.0-py2.7.egg/GeoBases/GeoBaseModule.py", line 623, in get
raise KeyError("Field '%s' [for key '%s'] not in %s" % (field, key, self._things[key].keys()))
KeyError: "Field 'city_name_list' [for key 'MLC'] not in ['comment', 'city_name_ascii', 'adm3_code', 'adm2_code', 'icao_code', 'adm2_name_ascii', '__gar__', 'alt_name_section', '__dup__', 'country_code', 'adm2_name_utf', 'timezone', 'lng', '__lno__', 'iata_code', 'gmt_offset', 'wiki_link', 'dst_offset', 'date_from', 'date_until', 'city_name_utf', 'raw_offset', 'cc2', 'fcode', 'is_geonames', 'adm1_name_utf', 'adm1_name_ascii', 'gtopo30', 'country_name', 'city_code', 'elevation', 'tvl_por_list@raw', 'tvl_por_list', '__par__', 'moddate', 'lat', 'state_code', 'location_type', '__key__', 'population', 'fclass', 'name', 'alt_name_section@raw', 'page_rank', 'geoname_id', 'adm4_code', 'faa_code', 'valid_id', 'continent_name', 'asciiname', 'adm1_code']" |
@alexprengere, could you alter your script, so that, from the optd_por_best_known_so_far.csv file, it generates the optd_por_tz.csv file, which currently has got only 426 records, which is fine to fix the current wrong time-zones, but is not future-proofed. |
ab45c19 brings the time-zones, as present in the optd_por_tz.csv file, for which the time-zones of a few POR have been fixed. |
First, the reason why the script is failing is because this is not the development version (indeed in legacy version pip uninstall GeoBases GeoBasesDev # repeat if necessary
pip install GeoBasesDev Since the use cases are a bit different, I created another gist to generate the git clone https://gist.github.com/d4ed1527f4c89a697755.git
cd d4ed1527f4c89a697755
wget 'https://raw.githubusercontent.com/opentraveldata/opentraveldata/master/opentraveldata/optd_por_best_known_so_far.csv'
python generate_optd_por_tz.py optd_por_best_known_so_far.csv > optd_por_tz.csv The output is enclosed in the gist here. Note that in the last version of python fix_tz.py
STF with tz "" matches tz "Pacific/Port_Moresby" (dist 63.6km, "Stephens Island" -> "Daru")
WDB with tz "America/USA" matches tz "America/Vancouver" (dist 228.8km, "Deep Bay" -> "Terrace")
RNU with tz "Asia/Malaysia" matches tz "Asia/Kuching" (dist 0.0km, "Ranau MY" -> "Ranau")
JUC with tz "America/USA" matches tz "America/Los_Angeles" (dist 0.5km, "Los Angeles" -> "Silver Lake")
WLN with tz "America/USA" matches tz "America/Juneau" (dist 280.3km, "Little Naukati AK US" -> "Juneau")
JSN with tz "America/USA" matches tz "America/Los_Angeles" (dist 1.6km, "Los Angeles" -> "Echo Park")
HKP with tz "America/USA" matches tz "Pacific/Honolulu" (dist 20.1km, "Kaanapali Maui" -> "Wailuku")
JON with tz "Pacific/US_Minor_Islands" matches tz "Pacific/Honolulu" (dist 1304.4km, "Johnston Island" -> "Makakilo City")
PII with tz "America/USA" matches tz "America/Anchorage" (dist 8.9km, "Fairbanks" -> "Fairbanks")
XHG with tz "America/Canada" matches tz "America/Toronto" (dist 0.4km, "Ottawa" -> "Ottawa")
JON with tz "Pacific/US_Minor_Islands" matches tz "Pacific/Honolulu" (dist 1304.4km, "Johnston Island" -> "Makakilo City")
NKV with tz "America/USA" matches tz "America/Juneau" (dist 282.4km, "Nichen Cove" -> "Juneau")
KBK with tz "" matches tz "America/Juneau" (dist 172.6km, "Klag Bay" -> "Juneau")
PKS with tz "Asia/Ventiane" matches tz "Asia/Vientiane" (dist 4.6km, "Paksane" -> "Muang Pakxan")
MNP with tz "" matches tz "Pacific/Port_Moresby" (dist 10.8km, "None" -> "Port Moresby")
UNS with tz "America/USA" matches tz "America/Anchorage" (dist 1380.2km, "Umnak Island" -> "Anchorage")
KPK with tz "" matches tz "America/Anchorage" (dist 542.0km, "Parks Spb" -> "Anchorage")
IAT with tz "" matches tz "America/Los_Angeles" (dist 160.0km, "None" -> "Lompoc")
LAC with tz "" matches tz "Asia/Kuching" (dist 279.5km, "Swallow Reef Airstrip" -> "Victoria")
CBA with tz "America/USA" matches tz "America/Juneau" (dist 79.4km, "Corner Bay" -> "Juneau")
EFO with tz "America/USA" matches tz "America/Chicago" (dist 20.5km, "East Fork" -> "Fort Dodge") |
Thanks! Note that I still have the same issue (with the 'MLC' key) with the first gist, and that I checked that GeoBasesDev was the only installed GeoBases version (with the procedure you give, i.e., uninstall any GeoBases instances and re-install GeoBasesDev). For the second gist, I have another error: BPN^Asia/Makassar
BPN^Asia/Makassar
BPO^Asia/Chongqing
BPO^Asia/Chongqing
Traceback (most recent call last):
File "generate_optd_por_tz.py", line 38, in <module>
main(sys.argv[1])
File "generate_optd_por_tz.py", line 24, in main
tz = db_oripor.get(iata, 'timezone')
File "/home/build/.local/lib/python2.7/site-packages/GeoBases-4.23.0-py2.7.egg/GeoBases/GeoBaseModule.py", line 614, in get
raise KeyError("Thing not found: %s" % str(key))
KeyError: 'Thing not found: BPR' Nevertheless, I went through each of the POR you mentioned above (e.g., STF, WDB, ..., LAC, CBA, EFO) and fixed the corresponding time-zones:
Hence, the fixes will appear in OpenTravelData only once Geonames database dump will have been generated and integrated with OpenTravelData, i.e., not before a few days. Hopefully, next week (beginning of June 2015), it should be fine. |
I am sorry Denis, but you are still not using the development version. First because I cannot reproduce the error, and second because the traceback is betraying you ;) Traceback (most recent call last):
...
File "/home/build/.local/lib/python2.7/site-packages/GeoBases-4.23.0-py2.7.egg ... The Here is another list of points where stuff may go wrong:
I just uploaded a new version of GeoBases on PyPI with the latest data ( Here is the complete set of commands for clean usage of the gists. If anything is not clear tell me. # Manual deletion of obsolete packages
rm -rf /home/build/.local/lib/python2.7/site-packages/GeoBases*
# Virtualenv usage, cache cleaning
rm -rf ~/.GeoBases.d
rm -rf 7a8a35c691fad5f170d8
git clone https://gist.github.com/7a8a35c691fad5f170d8.git
cd 7a8a35c691fad5f170d8
virtualenv --no-site-packages --clear .venv
source .venv/bin/activate
pip install --pre GeoBasesDev
pip install pytz
/usr/bin/env python fix_tz.py In the messages you should see
The error for the second gist will probably be fixed with the same technique # First, forget about the previous virtualenv
deactivate
rm -rf d4ed1527f4c89a697755
rm -rf ~/.GeoBases.d
git clone https://gist.github.com/d4ed1527f4c89a697755.git
cd d4ed1527f4c89a697755
virtualenv --no-site-packages --clear .venv
source .venv/bin/activate
pip install --pre GeoBasesDev
pip install pytz
wget 'https://raw.githubusercontent.com/opentraveldata/opentraveldata/master/opentraveldata/optd_por_best_known_so_far.csv'
/usr/bin/env python generate_optd_por_tz.py optd_por_best_known_so_far.csv > optd_por_tz.csv Anytime you want to run the gists, you should make sure you are using latest version of |
With the latest commit (4d6370bc1c) on OpenTravelData, there should not be any more wrong time-zone. Could you check? Of course, ideally, a script should be run to be sure we do not introduce new wrong time-zone. However, the AWK script raises a warning when a POR has got an unknown time-zone. So, the issue can then be fixed manually. |
With the latest commit, it is almost perfect ;). Just one remaining typo: $ python fix_tz.py
PKS with tz "Asia/Ventiane" matches tz "Asia/Vientiane" ... The correct timezone seems to be In the future, I will manually check the timezone validity when integrating the latest data in |
Awesome! I confirm that the latest data has 0 timezone problem ;). |
Thanks! |
…try. Close #31. This commit was automatically imported from the repository opentraveldata/opentraveldata: commit f5001ef44a89a9ada156e7e7f5fd77cb631f9990 tree 0821d603c4063c44bff7c5a1f77ba665d4b5df23 parent 93f71726e330ea53c9e430a87cafaa69d4755db7 author Denis Arnaud <denis.arnaud_fedora@m4x.org> 1470656248 +0300 committer Denis Arnaud <denis.arnaud_fedora@m4x.org> 1470656248 +0300 [Country] Removed the line for HI/Hawai, as that latter is not a country. Close #31.
In ori_por_public.csv, there are some timezones which are not listed there:
A good way to test and exclude timezones can be to use the
pytz
python package:The text was updated successfully, but these errors were encountered: