Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix to_utc_datetime #3899

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 4 additions & 7 deletions lib/plausible/sites.ex
Original file line number Diff line number Diff line change
Expand Up @@ -199,24 +199,21 @@ defmodule Plausible.Sites do
@doc """
Returns the date of the first recorded stat in the timezone configured by the user.
This function does 2 transformations:
UTC %NaiveDateTime{} -> Local %DateTime{} -> Local %Date
Date -> UTC DateTime at 00:00 -> Local %DateTime{} -> Local %Date

## Examples

iex> Plausible.Site.local_start_date(%Plausible.Site{stats_start_date: nil})
nil

iex> utc_start = ~N[2022-09-28 00:00:00]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

site.starts_start_date is a date

iex> tz = "Europe/Helsinki"
iex> site = %Plausible.Site{stats_start_date: utc_start, timezone: tz}
iex> site = %Plausible.Site{stats_start_date: ~D[2022-09-28], timezone: "Europe/Helsinki"}
iex> Plausible.Site.local_start_date(site)
~D[2022-09-28]

iex> utc_start = ~N[2022-09-28 00:00:00]
iex> tz = "America/Los_Angeles"
iex> site = %Plausible.Site{stats_start_date: utc_start, timezone: tz}
iex> site = %Plausible.Site{stats_start_date: ~D[2022-09-28], timezone: "America/Los_Angeles"}
iex> Plausible.Site.local_start_date(site)
~D[2022-09-27]

"""
@spec local_start_date(Site.t()) :: Date.t() | nil
def local_start_date(site) do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this function is necessary. stats_start_date is already local thanks to

Timezones.to_date_in_timezone(datetime, site.timezone)
and imports are supposed to be UTC #3895 (comment) local time https://3.basecamp.com/5308029/buckets/35871979/card_tables/cards/7202028179#__recording_7227026473 as well. Either way, shifting timezones on a date feels wrong somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess it doesn't hurt since there is no data a day before the start date.

Expand Down
49 changes: 37 additions & 12 deletions lib/plausible/timezones.ex
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,52 @@ defmodule Plausible.Timezones do

@spec to_utc_datetime(NaiveDateTime.t(), String.t()) :: DateTime.t()
def to_utc_datetime(naive_date_time, timezone) do
case Timex.to_datetime(naive_date_time, timezone) do
%DateTime{} = tz_dt ->
Timex.Timezone.convert(tz_dt, "UTC")
case DateTime.from_naive(naive_date_time, timezone) do
{:ok, date_time} ->
DateTime.shift_zone!(date_time, "Etc/UTC")

%Timex.AmbiguousDateTime{after: after_dt} ->
Timex.Timezone.convert(after_dt, "UTC")
{:gap, _before_dt, after_dt} ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases before dt might be preferred. Like maybe in where: ^to_utc_datetime(naive, site.timezone) <= timestamp so I've added a third argument :up | :down which defaults to :up (current behaviour)

DateTime.shift_zone!(after_dt, "Etc/UTC")

{:error, {:could_not_resolve_timezone, _, _, _}} ->
Timex.Timezone.convert(naive_date_time, "UTC")
{:ambiguous, _first_dt, second_dt} ->
DateTime.shift_zone!(second_dt, "Etc/UTC")

{:error, :time_zone_not_found} ->
DateTime.from_naive!(naive_date_time, "Etc/UTC")
end
end

@spec to_date_in_timezone(Date.t() | NaiveDateTime.t() | DateTime.t(), String.t()) :: Date.t()
@spec to_date_in_timezone(NaiveDateTime.t() | DateTime.t(), String.t()) :: Date.t()
def to_date_in_timezone(dt, timezone) do
to_datetime_in_timezone(dt, timezone) |> Timex.to_date()
dt |> to_datetime_in_timezone(timezone) |> DateTime.to_date()
end

@spec to_datetime_in_timezone(Date.t() | NaiveDateTime.t() | DateTime.t(), String.t()) ::
@doc """
Represents a point in time in a different timezone.
Naive datetimes are assumed to be GMT.

Examples:

iex> to_datetime_in_timezone(~U[2024-03-16 01:50:45.180928Z], "Asia/Kuala_Lumpur")
#DateTime<2024-03-16 09:50:45.180928+08:00 +08 Asia/Kuala_Lumpur>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ date
Sat Mar 16 09:50:49 CST 2024
iex(1)> DateTime.utc_now
~U[2024-03-16 01:50:45.180928Z]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously:

iex> Tzdata.TimeZoneDatabase.time_zone_periods_from_wall_datetime(~U[2024-03-16 01:50:45.180928Z], "Asia/Kuala_Lumpur")
{:ok,
 %{
   zone_abbr: "+08",
   utc_offset: 28800,
   std_offset: 0,
   until_wall: :max,
   from_wall: ~N[1982-01-01 00:00:00]
 }}
iex> Tzdata.TimeZoneDatabase.time_zone_periods_from_wall_datetime(~U[2024-03-16 01:50:45.180928Z], "Etc/GMT-8")
{:ok,
 %{
   zone_abbr: "+08",
   utc_offset: 28800,
   std_offset: 0,
   until_wall: :max,
   from_wall: :min
 }}
iex> Plausible.Timezones.to_datetime_in_timezone(~N[2024-03-16 01:50:45.180928], "Etc/GMT-8")
#DateTime<2024-03-15 17:50:45.180928-08:00 -08 Etc/GMT+8>

I'm not sure what's happening here exactly. Why is the new timezone +8 and not -8? It seems to have gone in the other direction. I don't know if that's how GMT+X timezones are supposed to work though.

This PR:

iex> Plausible.Timezones.to_datetime_in_timezone(~N[2024-03-16 01:50:45.180928], "Etc/GMT-8")
#DateTime<2024-03-16 09:50:45.180928+08:00 +08 Etc/GMT-8>

I think this makes more sense since Tzdata shows that Etc/GMT-8 and Asia/Kuala_Lumpur timezones are equivalent for that point in time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, it seems to be wrong still... Or maybe Timex is right and everything else is wrong :)


iex> to_datetime_in_timezone(~N[2024-03-16 01:50:45.180928], "Asia/Kuala_Lumpur")
#DateTime<2024-03-16 09:50:45.180928+08:00 +08 Asia/Kuala_Lumpur>

iex> to_datetime_in_timezone(~N[2024-03-16 01:50:45.180928], "Etc/GMT-8")
#DateTime<2024-03-16 09:50:45.180928+08:00 +08 Etc/GMT-8>

"""
@spec to_datetime_in_timezone(NaiveDateTime.t() | DateTime.t(), String.t()) ::
DateTime.t()
def to_datetime_in_timezone(dt, timezone) do
dt |> Timex.to_datetime("UTC") |> Timex.Timezone.convert(timezone)
def to_datetime_in_timezone(%NaiveDateTime{} = naive, timezone) do
naive
|> DateTime.from_naive!("Etc/UTC")
|> to_datetime_in_timezone(timezone)
end

def to_datetime_in_timezone(%DateTime{} = dt, timezone) do
DateTime.shift_zone!(dt, timezone)
end

defp build_option(timezone_code, acc, now) do
Expand Down
18 changes: 10 additions & 8 deletions test/plausible/timezones_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ defmodule Plausible.TimezonesTest do

import Plausible.Timezones

doctest Plausible.Timezones, import: true

test "options/0 returns a list of timezones" do
options = options()
refute Enum.empty?(options)
Expand All @@ -23,20 +25,20 @@ defmodule Plausible.TimezonesTest do
assert to_utc_datetime(~N[2022-09-11 00:00:00], "Etc/UTC") == ~U[2022-09-11 00:00:00Z]

assert to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") ==
~U[2022-09-11 00:00:00Z]
~U[2022-09-11 04:00:00Z]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for spotting it @ruslandoga.

On CI the failure was different:

     left:  ~U[2022-09-11 03:00:00Z]
     right: ~U[2022-09-11 00:00:00Z]

As much as I'm in favor of using stdlib over Timex, I need better clarity on what happened and if we're doing the right thing here. Why has the test started failing? Was it updated TZ DB or DST change or something else entirely? Is stdlib using up to date information?

My confidence/knowledge about timezones is lacking unfortunately.

This was untested before 518cdb3.

America/Santiago was used here because it was problematic in a certain case - see 5c8f39a.

I expect more issues the coming days as lots of DSTs happen in March?

Paging @zoldar too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the test started failing?

My guess at the time of this PR was that I started using a new actions@cache key. Now I'm not sure.

I'll need to read up on those PRs and check Timex logic to answer the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On CI the failure was different:

Yeah, on CI it keeps switching between 03:00 and 04:00

I don't know the reason yet. But I'd like to blame tzdata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was also #3873 done recently (and reverted via #3885)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, saw that one: #3811 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it updated TZ DB or DST change or something else entirely?

Seems like a bug in Timex.

master
assert to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") ==
             ~U[2022-09-11 00:00:00Z]

Timex uses Tzdata in a way that doesn't resolve America/Santiago and Plausible falls back to UTC

Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
  Timex.Protocol.NaiveDateTime.to_datetime(~N[2022-09-11 00:00:00], "America/Santiago") # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
    Timex.Timezone.convert(~N[2022-09-11 00:00:00], "America/Santiago") # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
      Timex.Timezone.get("America/Santiago", ~N[2022-09-11 00:00:00]) # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
        Timex.Timezone.name_of("America/Santiago") # "America/Santiago"
          Tzdata.zone_exists?("America/Santiago") # true
            Tzdata.zone_list() # [...]
        Timex.Timezone.get_info("America/Santiago", ~N[2022-09-11 00:00:00], :wall) # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
          Timex.to_gregorian_seconds(~N[2022-09-11 00:00:00]) # 63830073600 
            Timex.Protocol.NaiveDateTime.to_gregorian_seconds(~N[2022-09-11 00:00:00]) # 63830073600
          Tzdata.zone_exists?("America/Santiago") # true
            Tzdata.zone_list() # [...]
          Timex.Timezone.resolve("America/Santiago", 63830073600, :wall) # {:error, {:could_not_resolve_timezone, "America/Santiago", 63830073600, :wall}}
            Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) # []
              Tzdata.possible_periods_for_zone_and_time("America/Santiago", 63830073600, :wall) # {:ok, []}
              Tzdata.do_consecutive_matching([], match_fn, [], false) # []

Here nesting corresponds to position in the call stack. The calls go from top to bottom. After comment is the return value. The returned values (after #) go in the reverse direction (up).

The important function call that makes everything fail is Timex.Timezone.resolve. It doesn't check for gaps when Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) returns []

fix-timezones
assert to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago") ==
             ~U[2022-09-11 04:00:00Z]

DateTime uses Tzdata (Timex just relays to it in https://github.com/bitwalker/timex/blob/7f584cef5794a5eac320ba957d17370849b0b212/lib/timezone/database.ex#L37) differently and resolves the timezone. It checks for gaps when Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) returns []

Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
  DateTime.from_naive(~N[2022-09-11 00:00:00], "America/Santiago")
    Calendar.get_time_zone_database() # Timex.Timezone.Database
    DateTime.from_naive(~N[2022-09-11 00:00:00], "America/Santiago", Timex.Timezone.Database) # {:gap, #DateTime<2022-09-10 23:59:59.999999-04:00 -04 America/Santiago>, #DateTime<2022-09-11 01:00:00-03:00 -03 America/Santiago>}
      Tzdata.TimeZoneDatabase.time_zone_periods_from_wall_datetime(~N[2022-09-11 00:00:00], "America/Santiago") # {:gap, {%{zone_abbr: "-04", utc_offset: -14400, std_offset: 0, until_wall: ~N[2022-09-11 00:00:00], from_wall: ~N[2022-04-02 23:00:00]}, ~N[2022-09-11 00:00:00]}, {%{zone_abbr: "-03", utc_offset: -14400, std_offset: 3600, until_wall: ~N[2023-04-02 00:00:00], from_wall: ~N[2022-09-11 01:00:00]}, ~N[2022-09-11 01:00:00]}}
        Tzdata.periods_for_time("America/Santiago", 63830073600, :wall) # []
        Tzdata.TimeZoneDatabase.gap_for_time_zone("America/Santiago", 63830073600) # {:gap, {%{zone_abbr: "-04", utc_offset: -14400, std_offset: 0, until_wall: ~N[2022-09-11 00:00:00], from_wall: ~N[2022-04-02 23:00:00]}, ~N[2022-09-11 00:00:00]}, {%{zone_abbr: "-03", utc_offset: -14400, std_offset: 3600, until_wall: ~N[2023-04-02 00:00:00], from_wall: ~N[2022-09-11 01:00:00]}, ~N[2022-09-11 01:00:00]}}
          Tzdata.periods("America/Santiago") # {:ok, [...]}
      DateTime.from_naive_with_period(~N[2022-09-10 23:59:59.999999], "America/Santiago", %{zone_abbr: "-04", utc_offset: -14400, std_offset: 0, until_wall: ~N[2022-09-11 00:00:00], from_wall: ~N[2022-04-02 23:00:00]}) # #DateTime<2022-09-10 23:59:59.999999-04:00 -04 America/Santiago>
      DateTime.from_naive_with_period(~N[2022-09-11 01:00:00], "America/Santiago", %{zone_abbr: "-03", utc_offset: -14400, std_offset: 3600, until_wall: ~N[2023-04-02 00:00:00], from_wall: ~N[2022-09-11 01:00:00]}) #DateTime<2022-09-11 01:00:00-03:00 -03 America/Santiago>
  DateTime.shift_zone!(#DateTime<2022-09-11 01:00:00-03:00 -03 America/Santiago>, "Etc/UTC", Timex.Timezone.Database)
  # etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timex is not checking for gaps when doing timezone conversion. This means all "winter time to summer time" gaps would be unhandled. Timex does handle "summer to winter" (ambiguous) time however.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the test started failing?

tzdata's fault ;)

master

iex> Tzdata.tzdata_version()
"2021e" # this version didn't include gap for Santiago on 2022-09-11 (probably cause it's in the future?)
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 03:00:00Z]
iex> Tzdata.DataLoader.download_new()
{:ok, 451270, "2024a",
 "/var/folders/cr/fjw4jzx53ybbrl325z6nwd6h0000gn/T/tzdata_data/tmp_downloads/451270_80870041/",
 "Thu, 01 Feb 2024 18:40:48 GMT"}
iex> Tzdata.DataBuilder.load_and_save_table()
{:ok, 451270, "2024a"}
iex> Tzdata.EtsHolder.new_release_has_been_downloaded()
:ok
iex> Tzdata.tzdata_version()
"2024a"
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 00:00:00Z]

So depending on when the tests for timezones run, tzdata might or might have not had enough time to update the database. If the version is 2021e, the tests fail with ~U[2022-09-11 03:00:00Z] != ~U[2022-09-11 00:00:00Z]. It showed up in my ci branch since I started using a new cache key and tzdata wasn't cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3811 is also partly at fault since it started using System.tmp_dir! for storing tzdata stuff which is not cached in CI, that causes the flipping in #3899 (comment)

fix-timezones

iex> Tzdata.tzdata_version()
"2021e"
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 03:00:00Z]
iex> Tzdata.DataLoader.download_new()
{:ok, 451270, "2024a",
 "/var/folders/cr/fjw4jzx53ybbrl325z6nwd6h0000gn/T/tzdata_data/tmp_downloads/451270_80870041/",
 "Thu, 01 Feb 2024 18:40:48 GMT"}
iex> Tzdata.DataBuilder.load_and_save_table()
{:ok, 451270, "2024a"}
iex> Tzdata.EtsHolder.new_release_has_been_downloaded()
:ok
iex> Tzdata.tzdata_version()
"2024a"
iex> Plausible.Timezones.to_utc_datetime(~N[2022-09-11 00:00:00], "America/Santiago")
~U[2022-09-11 04:00:00Z]

Fixed in #3898


assert to_utc_datetime(~N[2023-10-29 00:00:00], "Atlantic/Azores") == ~U[2023-10-29 01:00:00Z]
end

test "to_date_in_timezone/1" do
assert to_date_in_timezone(~D[2021-01-03], "Etc/UTC") == ~D[2021-01-03]
assert to_date_in_timezone(~U[2015-01-13 13:00:07Z], "Etc/UTC") == ~D[2015-01-13]
assert to_date_in_timezone(~N[2015-01-13 13:00:07], "Etc/UTC") == ~D[2015-01-13]
assert to_date_in_timezone(~N[2015-01-13 19:00:07], "Etc/GMT+12") == ~D[2015-01-14]
assert to_date_in_timezone(~N[2015-01-13 19:00:07], "Etc/GMT+12") == ~D[2015-01-13]
assert to_date_in_timezone(~N[2015-01-13 11:00:07], "Etc/GMT+12") == ~D[2015-01-12]
assert to_date_in_timezone(~N[2015-01-13 19:00:07], "Etc/GMT-12") == ~D[2015-01-14]
end

test "to_datetime_in_timezone/1" do
assert to_datetime_in_timezone(~D[2021-01-03], "Etc/UTC") == ~U[2021-01-03 00:00:00Z]
assert to_datetime_in_timezone(~N[2015-01-13 13:00:07], "Etc/UTC") == ~U[2015-01-13 13:00:07Z]

assert to_datetime_in_timezone(~N[2015-01-13 19:00:07], "Etc/GMT+12") ==
Expand All @@ -45,13 +47,13 @@ defmodule Plausible.TimezonesTest do
second: 7,
calendar: Calendar.ISO,
month: 1,
day: 14,
day: 13,
year: 2015,
minute: 0,
hour: 7,
time_zone: "Etc/GMT-12",
zone_abbr: "+12",
utc_offset: 43_200,
time_zone: "Etc/GMT+12",
zone_abbr: "-12",
utc_offset: -43_200,
std_offset: 0
}

Expand Down