Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong epoch timestamp after Daylight Savings Time #3195

Closed
pranavmarla opened this issue Dec 9, 2020 · 10 comments · Fixed by #3524
Closed

Wrong epoch timestamp after Daylight Savings Time #3195

pranavmarla opened this issue Dec 9, 2020 · 10 comments · Fixed by #3524
Labels

Comments

@pranavmarla
Copy link

pranavmarla commented Dec 9, 2020

Describe the bug

Summary

Essentially, when Fluentd is told to parse a UTC timestamp present in incoming logs and convert it to the equivalent epoch timestamp, but Fluentd is installed in a location where the local time zone's offset from UTC changes (eg. due to Daylight Savings Time) then, every time the offset changes, Fluentd will generate the wrong epoch timestamp for all subsequent logs. Currently, the only way to fix this is to restart Fluentd after the offset changes, so that it "sees" the new UTC offset, every time the offset changes.

I believe this is effectively the same issue as #2806, opened back in Jan.

Details

I have Fluentd agents which receive logs, process them (including extracting the preexisting UTC timestamp in the log, generating the equivalent epoch timestamp, and setting that as the official log timestamp) and then forward them on. The host machines (where the Fluentd agents are installed) are located in the Eastern time zone (ET).

On Nov 1, Daylight Savings Time (DST) ended -- i.e. the local time zone on the hosts changed from Eastern Daylight Time (EDT) (UTC - 4) to Eastern Standard Time (EST) (UTC - 5). After that, I noticed that the logs coming from Fluentd suddenly had the wrong timestamp -- specifically, they were 1 hour in the future.

As a short term fix, I was able to resolve this issue by simply restarting all the agents (till then, they had been running continuously since Sep). However, based on my testing, it looks like there is a bug in the way Fluentd generates its timestamp and, when DST begins again on March 14 2021, I expect a similar issue to occur, except that this time the timestamp will be 1 hour in the past -- thus, a long term fix needs to be made to Fluentd's code.

Steps to Reproduce

Summary

Another way of restating the problem is that Fluentd will ONLY generate the correct epoch timestamp when:

  • Fluentd thinks DST is currently in effect on the host, and the preexisting UTC timestamp in the log is also from a date when DST would have been in effect on the host
  • Fluentd thinks DST is not currently in effect on the host, and the preexisting UTC timestamp in the log is also from a date when DST would not have been in effect on the host

Thus, we can clearly demonstrate this bug by:

  • Manually generating two input logs, one containing a pre-Nov UTC timestamp (when DST would have been in effect) and the other containing a post-Nov UTC timestamp (when DST would not have been in effect), to supply to Fluentd for processing.
  • Manually changing the date on the underlying host to a pre-Nov date (when DST would have been in effect) and a post-Nov date (when DST would not have been in effect) -- followed, in each case, by restarting Fluentd afterwards so that it's aware of the new DST/non-DST state of the underlying host.
  • Noticing that, when the two are not in sync, the epoch timestamp generated by Fluentd will be off by 1 hour.

Details

Fluentd Config

To demonstrate this bug, I've created a simple Fluentd config that accepts input logs via HTTP, parses the UTC timestamp contained within, and prints the processed logs to stdout.
Note that, for clarity, I print each log to stdout twice -- once with an epoch timestamp, and once with a regular Fluentd timestamp (i.e. a well-formatted timestamp in the local time zone).

# Input
<source>
  @type http
  port 9880
  body_size_limit 32m

  # Parse preexisting UTC timestamp, present in the log
  <parse>
    @type json
    time_key time
    keep_time_key true
    time_type string
    time_format %Y-%m-%dT%H:%M:%S.%NZ
    utc true
  </parse>
</source>

# Throw away internal Fluentd logs
<label @FLUENT_LOG>
  <match fluent.**>
    @type null
  </match>
</label>

# Outputs
<match **>
  @type copy

  # Print log with epoch timestamp
  <store ignore_error>
    @type stdout
    <format>
      @type out_file
      time_type float
    </format>
  </store>

  # Print log with regular timestamp
  <store ignore_error>
    @type stdout
  </store>
</match>

Test Input

We will create two sample logs to send to Fluentd, that are identical except for the month of their UTC timestamp

  1. DST Log: Log with UTC timestamp set to Sep 8, 2020 at 8 pm (when DST would have been in effect on the host):
    2020-09-08T20:00:00.000000000Z
  2. Non-DST Log: Log with UTC timestamp set to Dec 8, 2020 at 8 pm (when DST would not have been in effect on the host):
    2020-12-08T20:00:00.000000000Z

Expected behaviour

Each UTC timestamp should ALWAYS be converted by Fluentd to the SAME equivalent epoch timestamp, regardless of the local time zone on the host! After all, by definition, epoch timestamps are supposed to be a time zone-agnostic way of referring to a particular point in time.

  1. Sep 8 2020, 8 pm UTC -> 1599595200.00000000
  2. Dec 8 2020, 8 pm UTC -> 1607457600.00000000

Steps

Scenario 1: DST in effect on the host
  1. Ensure Fluentd is not running
> systemctl stop td-agent
  1. To simulate DST being in effect on the host, manually change the date to any pre-Nov date
> date --set="Sep 8 2020 16:14:00"
Tue Sep  8 16:14:00 EDT 2020

Note: By default, at least on my machine, chronyd notices that the date is wrong, and fixes it, within ~2 minutes!

  1. Confirm that the local time zone on the host is now EDT (UTC - 4)
> timedatectl
                      Local time: Tue 2020-09-08 16:14:01 EDT
                  Universal time: Tue 2020-09-08 20:14:01 UTC
                        RTC time: Tue 2020-12-08 21:14:31
                       Time zone: America/New_York (EDT, -0400)
       System clock synchronized: no
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
  1. Restart Fluentd
> systemctl restart td-agent
  1. Tail Fluentd's log file
> tail -f /var/log/td-agent/td-agent.log
...
2020-09-08 16:14:06 -0400 [info]: #0 fluentd worker is now running worker=0
  1. Send DST log to Fluentd, and view the output (after processing) in Fluentd's log file
> curl -X POST -d '{"log":"Test log", "time":"2020-09-08T20:00:00.000000000Z"}' http://localhost:9880/test
> tail -f /var/log/td-agent/td-agent.log
...
1599595200.00000000                   test    {"log":"Test log","time":"2020-09-08T20:00:00.000000000Z"}
2020-09-08 16:00:00.000000000 -0400   test:   {"log":"Test log","time":"2020-09-08T20:00:00.000000000Z"}

Compare the expected epoch timestamp to the actual epoch timestamp:

Expected Epoch Timestamp: 1599595200.00000000
Actual Epoch Timestamp:   1599595200.00000000

Thus, since DST is in effect on the host, and this is a DST log, Fluentd generates the correct epoch timestamp.

  1. Send non-DST log to Fluentd, and view the output (after processing) in Fluentd's log file
> curl -X POST -d '{"log":"Test log", "time":"2020-12-08T20:00:00.000000000Z"}' http://localhost:9880/test
> tail -f /var/log/td-agent/td-agent.log
...
1607461200.00000000                   test    {"log":"Test log","time":"2020-12-08T20:00:00.000000000Z"}
2020-12-08 16:00:00.000000000 -0500   test:   {"log":"Test log","time":"2020-12-08T20:00:00.000000000Z"}

Compare the expected epoch timestamp to the actual epoch timestamp:

Expected Epoch Timestamp: 1607457600.00000000
Actual Epoch Timestamp:   1607461200.00000000 

Note that the actual epoch timestamp is off by 1 hour (in the future) -- i.e. it corresponds to Dec 8, 2020 at 9 pm UTC!

Thus, since DST is in effect on the host, but this is a non-DST log, Fluentd generates the wrong epoch timestamp!
This corresponds to the situation described at the beginning of this issue, which led me to discover this bug: After Nov 1, all new incoming logs were non-DST logs but, since my Fluentd agent had been running continuously since Sep, it apparently, wrongly, thought that DST was still in effect on the underlying host.

  1. Stop Fluentd
> systemctl stop td-agent
> tail -f /var/log/td-agent/td-agent.log
...
2020-09-08 16:14:27 -0400 [info]: Worker 0 finished with status 0

################

Scenario 2: DST not in effect on the host
  1. Ensure Fluentd is not running
> systemctl stop td-agent
  1. In my case, since DST is currently not in effect, I just confirm the host is set to the correct local time. In general, need to ensure the host date is set to any post-Nov (and pre-March) date
> date
Tue Dec  8 16:24:30 EST 2020
  1. Confirm that the local time zone on the host is EST (UTC - 5):
> timedatectl
                      Local time: Tue 2020-12-08 16:24:44 EST
                  Universal time: Tue 2020-12-08 21:24:44 UTC
                        RTC time: Tue 2020-12-08 21:24:44
                       Time zone: America/New_York (EST, -0500)
       System clock synchronized: no
systemd-timesyncd.service active: yes
                 RTC in local TZ: no
  1. Restart Fluentd
> systemctl restart td-agent
  1. Tail Fluentd's log file
> tail -f /var/log/td-agent/td-agent.log
...
2020-12-08 16:28:09 -0500 [info]: #0 fluentd worker is now running worker=0
  1. Send DST log to Fluentd, and view the output (after processing) in Fluentd's log file
> curl -X POST -d '{"log":"Test log", "time":"2020-09-08T20:00:00.000000000Z"}' http://localhost:9880/test
> tail -f /var/log/td-agent/td-agent.log
...
1599591600.00000000                   test    {"log":"Test log","time":"2020-09-08T20:00:00.000000000Z"}
2020-09-08 15:00:00.000000000 -0400   test:   {"log":"Test log","time":"2020-09-08T20:00:00.000000000Z"}

Compare the expected epoch timestamp to the actual epoch timestamp:

Expected Epoch Timestamp: 1599595200.00000000
Actual Epoch Timestamp:   1599591600.00000000

Note that the actual epoch timestamp is off by 1 hour (in the past) -- i.e. it corresponds to Sep 8, 2020 at 7 pm UTC!

Thus, since DST is not in effect on the host, but this is a DST log, Fluentd generates the wrong epoch timestamp!
This corresponds to what the situation will be after March 14, 2021: After March 14, all new incoming logs will be DST logs but, since my Fluentd agent will have been running continuously since Dec, it will wrongly think that DST is still not in effect on the underlying host.

  1. Send non-DST log to Fluentd, and view the output (after processing) in Fluentd's log file
> curl -X POST -d '{"log":"Test log", "time":"2020-12-08T20:00:00.000000000Z"}' http://localhost:9880/test
> tail -f /var/log/td-agent/td-agent.log
...
1607457600.00000000                   test    {"log":"Test log","time":"2020-12-08T20:00:00.000000000Z"}
2020-12-08 15:00:00.000000000 -0500   test:   {"log":"Test log","time":"2020-12-08T20:00:00.000000000Z"}

Compare the expected epoch timestamp to the actual epoch timestamp:

Expected Epoch Timestamp: 1607457600.00000000
Actual Epoch Timestamp:   1607457600.00000000 

Thus, since DST is not in effect on the host, and this is a non-DST log, Fluentd generates the correct epoch timestamp.

  1. Stop Fluentd
> systemctl stop td-agent
> tail -f /var/log/td-agent/td-agent.log
...
2020-12-08 16:35:47 -0500 [info]: Worker 0 finished with status 0

Environment

  • td-agent version: 4.0.1
  • Fluentd version: 1.11.2
  • Operating system:
> cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel version:
> uname -r
4.15.0-122-generic
@pranavmarla
Copy link
Author

@repeatedly When you have a chance, would you mind taking a look at this?

@kenhys
Copy link
Contributor

kenhys commented Dec 9, 2020

Hmm, It may need to reset TimeParser after DST.

@pranavmarla
Copy link
Author

pranavmarla commented Jan 17, 2021

@kenhys Just noticed the label that was assigned to this issue -- I'm not sure "feature request" is the appropriate label for this. This issue is describing a flaw in existing functionality, not requesting brand new functionality -- as such, I believe this should be assigned the "bug report" label instead.

@pranavmarla
Copy link
Author

pranavmarla commented Jan 19, 2021

Better Workaround:

As mentioned in my original post, one workaround for Fluentd's Daylight Savings Time (DST) bug is to restart Fluentd after every DST change on the underlying host. However, that is error-prone, inconvenient and not sustainable long-term.

Thankfully, I have discovered another, better workaround: Simply change the time zone of the underlying host (where Fluentd is installed) to UTC (and restart Fluentd afterwards so it sees the new time zone) -- since the host time zone no longer has DST, it will no longer trigger Fluentd's DST bug. Unlike the earlier workaround, this only needs to be done once per host.

ps: To be clear, this is just a workaround for the DST bug, not an actual solution for it! A true, long-term solution would be to actually fix the bug in Fluentd's source code, so that these workarounds are not required.

@abhichandra21
Copy link

@pranavmarla in my case, changing the timezone in UTC would not be feasible for reasons unrelated to fluentd. I may have to use the restart of the service for the change to take effect, which is inconvinient.

As you mentioned, the issue I am seeing is due to Daylight saving time change.
This is my config:

<label @HOURLY>
  <filter tcp.xyz>
    @type grep
    <regexp>
      key logMessage
      pattern /(.+)/
    </regexp>
  </filter>
  <filter tcp.xyz>
    @type base64_decode
    fields logMessage
  </filter>
  <match tcp.xyz>
    @type file
    path /opt/active/%Y%m%d/%Y%m%d%H
    append true
    <format>
      @type single_value
      message_key logMessage
      add_newline false
    </format>
    <buffer time>
      type memory
      timekey 1h
      timekey_wait 0
      timekey_use_utc false
      chunk_limit_size 85MB
      flush_mode immediate
      flush_thread_count 8
    </buffer>
  </match>
</label>

When the Daylight saving time changes, it keeps writing to a log file that is one hour in future or behind.

Sun Mar 14 03:20:11 CDT 2021
total 4104
-rw-r--r-- 1 user group    6317 Mar 14 01:59 2021031401.log
-rw-r--r-- 1 user group 3023622 Mar 14 03:20 2021031402.log

@repeatedly , @kenhys are there any plans to fix it in a future release?

@ashie
Copy link
Member

ashie commented Oct 5, 2021

In this case, the following parser:

fluentd/lib/fluent/time.rb

Lines 258 to 260 in 6a3b221

else
->(v){ t = Time.strptime(v, format); Fluent::EventTime.new(t.to_i + offset_diff, t.nsec) }
end

and the following offset_diff are used:

else Time.now.localtime.utc_offset # utc

Since always Time.now is used to get offset, it's not appropriate for incoming messages which might have past or future timestamp.

@ashie
Copy link
Member

ashie commented Oct 5, 2021

@pranavmarla Although the above code has fault, you can avoid it by your configuration.

curl -X POST -d '{"log":"Test log", "time":"2020-12-08T20:00:00.000000000Z"}' http://localhost:9880/test

Since the trailing Z in your timestamp means UTC, it should be parsed as timezone.
But your configuration doesn't do it.
It should be:

-    time_format %Y-%m-%dT%H:%M:%S.%NZ
+    time_format %Y-%m-%dT%H:%M:%S.%N%z

It will create correct epoch times.

ashie added a commit that referenced this issue Oct 5, 2021
Shouldn't use `Time.now.localtime.utc_offset` as time offset for event
times because incoming events might have past or future time with
different DST.

In addition, require strptime 0.2.4 or later to parse `Z` as UTC:
nurse/strptime@39bbe26

Fix #3195

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
ashie added a commit that referenced this issue Oct 6, 2021
… Time

Fluentd generates wrong epoch timestamp when a configuration doesn't
specify timezone (should be parsed as UTC) and Daylight Saving Time is
started or ended while running Fluentd. It's because utc_offset at
start up is always used to convert localtime to UTC.

This commit parses event times correctly as UTC without complicated
calculation by adding `%z` to a time format and `Z` to a time value on
calling `strptime` for such cases.

In addition, require strptime 0.2.4 or later to parse `Z` as UTC:
nurse/strptime@39bbe26

Fix #3195

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
@pranavmarla
Copy link
Author

@pranavmarla Although the above code has fault, you can avoid it by your configuration.

curl -X POST -d '{"log":"Test log", "time":"2020-12-08T20:00:00.000000000Z"}' http://localhost:9880/test

Since the trailing Z in your timestamp means UTC, it should be parsed as timezone. But your configuration doesn't do it. It should be:

-    time_format %Y-%m-%dT%H:%M:%S.%NZ
+    time_format %Y-%m-%dT%H:%M:%S.%N%z

It will create correct epoch times.

Thanks for taking a look at this @ashie! It's been a long time since I've worked on this code but:

  1. As per Ruby's strftime() documentation (linked to by Fluentd's time_format documentation), %z seems to be meant only for numerical offsets from UTC:

    %z - Time zone as hour and minute offset from UTC (e.g. +0900)

    Even in their examples, they only use %z for numerical offsets, not for the literal character 'Z' (designating UTC):

    %H%M%S%z => 083748-0600 Local time and the difference from UTC (basic)
    ...
    %Y%jT%H%MZ => 2007323T0837Z Ordinal date and UTC of day (basic)

    Unless their documentation is wrong/incomplete, or I'm misinterpreting it, it seems to imply %z cannot be used to interpret the letter Z as UTC.

  2. If I understand correctly, you're saying that this issue is partly caused because Fluentd does not know what time zone the log timestamp is in (i.e. it does not know that the log timestamp is in UTC).
    But, in my configuration, I set the utc param to true:

      # Parse preexisting UTC timestamp, present in the log
      <parse>
        ...
        utc true
      </parse>
    

    Doesn't that parameter tell Fluentd that the log timestamp is in UTC?

@ashie
Copy link
Member

ashie commented Oct 6, 2021

1. As per Ruby's [strftime()](https://docs.ruby-lang.org/en/2.4.0/Time.html#method-i-strftime) documentation (linked to by Fluentd's [time_format](https://docs.fluentd.org/configuration/parse-section#time-parameters) documentation), `%z` seems to be meant only for numerical offsets from UTC:

I'm not sure the exact specification of the API but I confirmed actual behaviour:

$ irb
irb(main):001:0> require 'time'
=> true
irb(main):002:0> Time.strptime("2020-12-08T20:00:00.000000000Z", "%Y-%m-%dT%H:%M:%S.%N%z")
=> 2020-12-08 20:00:00 UTC

And strptime gem also support it: nurse/strptime@39bbe26

Doesn't that parameter tell Fluentd that the log timestamp is in UTC?

Of course it intends to tell Fluentd the timezone.
But Fluentd uses a wrong time offset in this case due to the bug: #3524

@pranavmarla
Copy link
Author

Cool, thanks for explaining @ashie !

ashie added a commit that referenced this issue Oct 7, 2021
Signed-off-by: Takuro Ashie <ashie@clear-code.com>
ashie added a commit that referenced this issue Oct 7, 2021
… Time

Fluentd generates wrong epoch timestamp when a configuration doesn't
specify timezone (should be parsed as UTC) and Daylight Saving Time is
started or ended while running Fluentd. It's because utc_offset at
start up is always used to convert localtime to UTC.

This commit parses event times correctly as UTC without complicated
calculation by adding `%z` to a time format and `Z` to a time value on
calling `strptime` for such cases.

In addition, require strptime 0.2.4 or later to parse `Z` as UTC:
nurse/strptime@39bbe26

Fix #3195

Signed-off-by: Takuro Ashie <ashie@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants