Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPostgres should respect system timezone rather than default to UTC #452

Open
veer0318 opened this issue Dec 8, 2023 · 9 comments
Open

Comments

@veer0318
Copy link

veer0318 commented Dec 8, 2023

When connecting to a Postgres backend server, RPostgres sets the timezone for the connection to UTC by default (see for example #222). This is odd behaviour, as virtually any other database abstraction system defaults to either not setting the time zone and thus using the Postgres backend server timezone OR defaulting to using the user's system time zone.

The consequence of setting timezone="UTC" by default leads to unexpected behaviour (not wrong behaviour, just unexpected). A better option to me seems to default to the user's system time zone. This should not be hard to implement (although systems might differ in how they store TZ information, e.g. environmental variable TZ or in /etc/timezone.

...unless I'm simply overlooking something and UTC is actually appropriate.

@krlmlr
Copy link
Member

krlmlr commented Dec 9, 2023

Thanks. Does dbConnect(timezone = ) help with your use case?

https://rpostgres.r-dbi.org/reference/postgres

@veer0318
Copy link
Author

Hi! Well, I think that argument is the cause of the problem rather than the solution. Yes, it does solve it if you set in manually to the right timezone. However, one would expect the default to be NULL (i.e. use the Postgres' server timezone) OR default to the user's system timezone (e.g. by reading get.env("TZ") or something like it).

@veer0318
Copy link
Author

Maybe to clarify how this causes issues: for a customer, I built a simple Shiny dashboard which allows the user to create queries to get turnover, stock inventory etc (it's an online retailer with shops in the UK and the Netherlands, HQ in the latter so they expect their results to be in Europe/Amsterdam). The query, very unexpectedly, gave very different numbers of stock and turnover in this Shiny app (using DBI), compared to their Power BI dashboard en pgadmin. It took a very long time to track down to the timezone setting (it was an aggregate query).

AFAIK there's simply no good reason to setting the timezone to UTC for everyone.

@iangow
Copy link

iangow commented Dec 29, 2023

I think that if data are stored in the database as timestamptz, then all is good. If not, it may make sense to cast to timestamptz as part of the query using a known time zone. I think the issues with get.env("TZ") is that the database may be in a different time zone from the user and that may be in a different time zone from that of the data (implicitly, if these are timestamp, not timestamptz).

@krlmlr
Copy link
Member

krlmlr commented Dec 31, 2023

I see an advantage of setting times to UTC: consistent results across systems and platforms. Yes, it's unexpected, but times and time zones are a beast anyway. For me, consistency trumps convenience.

@iangow's proposal ensures that results remain as expected even if the server's or user's time zone changes, which is much less under your control than the DBI connection parameters or the queries your code issues.

I appreciate that this is hard, and I'm sorry you spent so much time tracking this down. Is there something we should do better on the documentation front?

@cswingle
Copy link

To repeat what I said in #229: I think it is the most logical and least surprising to use the timezone defined in the database when making a connection. That's the way every Postgres client I've used appears to work (psql, pgAdmin, Python psycopg2, PHP, Go lib/pq, etc.). Either that or they're all using the local timezone, which matches the database timezone in every database I've managed. In no case have I used a database client that automatically assumes UTC.

That said, it may be that the current UTC default is long standing enough with RPostgres that it would be more surprising to current users to have that behavior change. I just need to always remember to use the timezone argument when connecting or strange things will happen.

@veer0318
Copy link
Author

I totally agree with @cswingle . The current behaviour might be long-standing but is definitely different from virtually any other client. The best thing IMHO would be to leave the time zone by default unset (why would a client need to do this explicitly?) And thereby leave it to the server to define the TZ. In this case, a client still has the option of overwriting TZ if needed (which is likely not often the case, while it is the case in the current way RPostgres is set up.

@cswingley
Copy link

For what it's worth, when querying a Postgres database where the system time zone is not set to UTC, I get equivalent results when retrieving timestamp with time zone data using dbConnect without a timezone argument, with timezone = NULL and with timezone = "US/Alaska" (the database time zone). This is excellent.

Where the default RPostgres behavior of setting the time zone to UTC when connecting causes issues is where internal queries or views depend on being run under the system time zone. In my experience, it's aggregation (timestamp to date) that typically breaks in these queries. Perhaps those queries shouldn't be written in that way, but it is certainly a surprise for users to find that R isn't returning the same answer that they get by running R's show_query() statement on the server using their typical database client (psql, dBeaver, pgAdmin4 in our case).

@krlmlr
Copy link
Member

krlmlr commented Mar 18, 2024

I see that this issue perhaps matters mostly with the aforementioned timestamp-to-date aggregation. Users will be typically more affected by the timezone_out argument, which is timezone by default.

It's also good to hear that timezone = NULL works as expected.

Overall, this seems to be a minor problem, affecting mostly server-side aggregation. Consistency across platforms seems to be provided because timezone = NULL will choose the server timezone, and very few users access more than one database with different settings. Consistency with other database clients (and fewer surprises) might be more important indeed.

If we change the default, we'd need to:

  • give a message if timezone is not provided, mentioning the current default, and that it will change
  • offer a way to opt out of the message with an option
  • change the default with the next major release
  • eventually, turn off the message (five years from now?)

Is it worth it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants