Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with Postgres to find out how to convert all times to UTC #75

Closed
agstephens opened this issue Mar 11, 2021 · 8 comments
Closed
Assignees

Comments

@agstephens
Copy link
Collaborator

agstephens commented Mar 11, 2021

In the current r2.0 contents of the GLAMOD DB, we have some strange date/times that include a time zone. Where they have been read in by Pandas they were somehow modified. This relates to:

#73

What we need to do with the existing Postgres content is:

  1. Change column type from: timestamp with timezone to timestamp without timezone (https://www.postgresql.org/docs/11/datatype-datetime.html)
  2. Fix the datetime values so that they are correct, i.e.: if there is a timezone value given - then convert it to UTC.

@jhaigh0 Please can you create you own test Postgres DB and create a simple table with a single column date_time - populate it with some values and then do some googling and testing to find out the best way for us to do the datetime conversion so that we end up removing the timezone using a simple column type: timestamp without timezone

Thanks

@jhaigh0
Copy link
Collaborator

jhaigh0 commented Mar 12, 2021

Just going to compile some links here.

  • We might end up using AT TIME ZONE in some way link
  • ALTER COLUMN might just work for us link

@agstephens
Copy link
Collaborator Author

@jhaigh0 looks promising!

@agstephens
Copy link
Collaborator Author

Possible test/solution, try writing a script of SQL commands that you can run with psql ... -f test-script.sql:

  1. Set timezone 'London/Europe'
  2. Create table
  3. Create record with offset
  4. Select all - to view record
  5. Set timezone 'UTC'
  6. Alter column to remove timezone OR do it a different way
  7. Select all - to view the record
  8. Drop the table

@agstephens
Copy link
Collaborator Author

@jhaigh0: thanks for looking into this.

I have just run a "SELECT" command as mentioned in #73. The funny thing is, now we have (re-)set the timezone, the result is actually different. It actually looks like setting UTC fixes the queries anyway:

cdm=> SHOW timezone; 
TimeZone 
----------
 UTC
(1 row)

cdm=> select station_name, date_time from lite_2_0.observations_1755_land_0 WHERE date_time BETWEEN '1755-01-01 06:00:00'::timestamptz AND '1755-01-03 07:00:00+00:00'::timestamptz LIMIT 1;
 station_name |       date_time        
--------------+------------------------
 BASEL        | 1755-01-01 06:29:00+00
(1 row)

Please see the top-panel in #73 to see how this is different to query I ran then (with TimeZone: 'Europe/London'

So, the situation might be as simple as altering the column type. Please can you test that this will work if we alter the column type to remove the time zone. Hopefully it simplifies your test even more. Please document your testing here. Thanks

@jhaigh0
Copy link
Collaborator

jhaigh0 commented Mar 16, 2021

This is what I've found that seems work;

First set up a test table like the real data

abcunit_jh=> set timezone to 'Europe/London';
SET
abcunit_jh=> CREATE TABLE time_data(id serial PRIMARY KEY, dates TIMESTAMPTZ NOT NULL);
CREATE TABLE
abcunit_jh=> INSERT INTO time_data (dates) VALUES('1800-06-22 19:00:00');
INSERT 0 1
abcunit_jh=> SELECT * FROM time_data;
 id |            dates
----+------------------------------
  1 | 1800-06-22 19:00:00-00:01:15
(1 row)

Then remove timezone info

abcunit_jh=> ALTER TABLE time_data ALTER COLUMN dates TYPE timestamp;
ALTER TABLE
abcunit_jh=> SELECT * FROM time_data;
 id |        dates
----+---------------------
  1 | 1800-06-22 19:00:00
(1 row)

so we should just be able to run this alter table command on the columns that have the problem, if our end goal is to remove timezone info and just assume all is in utc.
@agstephens

@agstephens
Copy link
Collaborator Author

Thanks @jhaigh0, that looks great.
It looks like it is actually easier to solve than I thought. I'll have a think about how/whether we need to make a change for r2.0. For r3.0, we'll build it with UTC only.

@agstephens
Copy link
Collaborator Author

At present, the global timezone change to UTC seems to have fixed the issue that was raised by the CDS Team. I have asked them if we should completely remove the timezone from the returned data. I would propose that that would be the best option, for simplicity and future usage.

@agstephens
Copy link
Collaborator Author

No further action needed on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants