Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supported 'timezone' configuration parameter. (e.g. "+09:00" for JST) #430

Closed
wants to merge 41 commits into from
Closed

Supported 'timezone' configuration parameter. (e.g. "+09:00" for JST) #430

wants to merge 41 commits into from

Conversation

TakahikoKawasaki
Copy link

This commit adds 'timezone' configuration parameter which enables users to specify an arbitrary timezone. The following is an example of the usage.

<match pattern>
  type file
  path /var/log/fluent/myapp
  time_slice_format %Y%m%d
  time_slice_wait 10m
  time_format %Y%m%dT%H%M%S%z
  compress gzip
  timezone +09:00
</match>

The current implementation of fluentd supports 'localtime' and 'utc'. This commit adds an arbitrary timezone there.

The value format of 'timezone' is the format that Time.localtime(utc_offset) accepts. For example, "+09:00".

@repeatedly repeatedly self-assigned this Sep 19, 2014
@repeatedly
Copy link
Member

Cool!
I and @tagomoris just discussed this problem in RubyKaigi.

One question. Can we use 'JST' like parameter instead of '+0.900'?

@TakahikoKawasaki
Copy link
Author

Because Time.localtime(utc_offset) accepts "+HH:MM" or "-HH:MM" only, "JST" would raise ArgumentError. To support formats such as "JST" and "America/New_York", another patch is needed.

@tagomoris
Copy link
Member

@sorah teach me about tzinfo.gem for timezone name resolver. He also says that Rails depends on it.
I think that this gem (built on IANA timezone database) is worth to be added in fluentd dependency.

@frsyuki
Copy link
Member

frsyuki commented Sep 23, 2014

FYI: tzinfo does not include (most of) 3-character names such as "JST" or "PDT". It does include "Asia/Tokyo" and "America/Los_Angeles", though.

@tagomoris
Copy link
Member

ouch.

@TakahikoKawasaki
Copy link
Author

I'll investigate the way to support the three formats:

  1. Offset values (+HH:MM, -HH:MM)
  2. Short names (e.g. JST)
  3. Long names (e.g. Asia/Tokyo)

and hopefully come back with an updated patch.

…rameter.

  1. +HH:MM, -HH:MM  (e.g. "+09:00")
  2. Some time zone abbreviations  (e.g. "JST")
  3. Region/Zone  (e.g. "Asia/Tokyo")
@TakahikoKawasaki
Copy link
Author

Updated the patch. All the three formats mentioned above are supported.

Note 1

According to http://stackoverflow.com/a/18407231/1174054, "Time zone abbreviations are not officially coordinated by anyone." Therefore, the number of time zone abbreviations supported by this patch is limited. To be concrete, this patch supports (1) the abbreviations listed in Time.ZoneOffset (see "time.rb" in Ruby source tree) and (2) the abbreviations listed in Fluent::TimeZone.ZoneOffset (see "lib/fluent/timezone.rb" added by this patch). The concrete values are as follows:

From "time.rb" (depending on Ruby version)

UTC, Z, UT, GMT, EST, EDT, CST, CDT, MST, MDT, PST, PDT,
A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y

From "lib/fluent/timezone.rb"

JST

Note 2

This patch introduces dependencies on "tzinfo" and "tzinfo-data". See "fluentd.gemspec".

Note 3

Detailed description about the behaviour of the "timezone" configuration parameter is as follows.

If "timezone" is defined and its value is valid (= if Fluent::TimeZone.offset(value) returns a non-nil value), the configuration takes precedence over "localtime" and "utc". Otherwise, "localtime" and "utc" behave as heretofore.

@frsyuki
Copy link
Member

frsyuki commented Sep 24, 2014

Interesting. I thought short names are also managed by IANA because IANA's data and tzinfo-data (automatically generated from IANA's data) include short names:

@repeatedly
Copy link
Member

@frsyuki So we don't need this line https://github.com/fluent/fluentd/pull/430/files#diff-308686d0f5582f5b94772fa4c3e71abcR28 ?

I think PR's approach is good for me but I can't judge code is good or not.
I want another reviewer who knows tzinfo gem.

@frsyuki
Copy link
Member

frsyuki commented Sep 24, 2014

@repeatedly we do need the line to support 'JST'. It would be great if we can automatically generate ZoneOffset hash from tzinfo-data but it's optional.

@nurse
Copy link
Contributor

nurse commented Sep 25, 2014

I doubt this feature.

Short names

As @TakahikoKawasaki says, short names are not coordinated. It means they may conflict between some timezones. fluentd shouldn't run into such hell.

Timezone may change

You may know, timezone are sometimes changed. For example Russia, they changes the timezone at 2014-10-26 Windows Update. It means if you miss to update timezone database, servers send wrong time.

@TakahikoKawasaki
Copy link
Author

There are many conflicts in time zone abbreviations. Examples from List of time zone abbreviations are as follows.

AMT: [Amazon Time (Brazil), UTC-04], [Armenia Time, UTC+04]
AST: [Arabia Standard Time, UTC+03], [Atlantic Standard Time, UTC-04]
BST: [Bangladesh Standard Time, UTC+06], [British Summer Time, UTC+01]
CDT: [Central Daylight Time (North America), UTC-05], [Cuba Daylight Time, UTC-04]
CST: [Central Standard Time (North Amer9ca), UTC-06], [China Standard Time, UTC+08], [Central Standard Time (Australia), UTC+09:30], [Central Summer Time (Australia), UTC+10:30], [Cuba Standard Time, UTC-05]
and many...

Therefore, generating ZoneOffset hash from tzinfo-data automatically would bring about the mess. Picking up time zone abbreviations requires careful consideration.

Time zone abbreviations listed in "time.rb" (in the Ruby source tree) are based on ISO 8601, RFC 822, RFC 1123 and RFC 2822.

Time zone abbreviations listed in "timezone.rb" (in this patch) are just "JST". The reasons I picked it up are as follows.

  1. There are no well-known conflicts on "JST".
  2. "JST" is not contained in Time.ZoneOffset (time.rb).
  3. "JST" is not covered by TZInfo::TimeZone.get(timezone).

BTW, time zone identifiers recognized by tzinfo that do not contain '/' can be listed by typing like below.

$ ruby -e "require 'tzinfo'; TZInfo::Timezone.all_identifiers.each {|value| puts value}" \
  | grep -v /
CET
CST6CDT
Cuba
EET
EST
EST5EDT
Egypt
Eire
GB
GB-Eire
GMT
GMT+0
GMT-0
GMT0
Greenwich
HST
Hongkong
Iceland
Iran
Israel
Jamaica
Japan
Kwajalein
Libya
MET
MST
MST7MDT
NZ
NZ-CHAT
Navajo
PRC
PST8PDT
Poland
Portugal
ROC
ROK
Singapore
Turkey
UCT
UTC
Universal
W-SU
WET
Zulu

@nurse
Copy link
Contributor

nurse commented Sep 25, 2014

tagomoris teach me that this parses external output, so this feature is needed.

@repeatedly
Copy link
Member

After several discussions, I knew short name is nightmare so we should not support short name configuration.
To support only +0:900 or Asia/Tokyo like long name is better.

What do you think?

@frsyuki
Copy link
Member

frsyuki commented Sep 25, 2014

@repeatedly I think that idea is good as long as documents mention about the limitation.

@tagomoris
Copy link
Member

+1 for @repeatedly 's comment.
And we need some links to timezone list in documents.

@nurse
Copy link
Contributor

nurse commented Sep 25, 2014

+1 for @repeatedly
Support offset or tzdata's name sounds reasonable.
http://www.iana.org/time-zones
http://en.wikipedia.org/wiki/Tz_database

@TakahikoKawasaki
Copy link
Author

IMHO, well-known short names are useful and won't do harm if they are well-documented. The standard 'time' library of Ruby supports, although limited, some well-known short names as mentioned so far.

In any case, even if Time.ZoneOffset hash and Fluent::TimeZone.ZoneOffset hash are not referred to from within the implementation of Fluent::TimeZone.offset method, short names recognized by 'tzinfo' (listed in my last comment) are still usable as the value of 'timezone' configuration parameter.

@frsyuki
Copy link
Member

frsyuki commented Sep 25, 2014

@TakahikoKawasaki which list should we use for the short names? For example, this list you mentioned?:

UTC, Z, UT, GMT, EST, EDT, CST, CDT, MST, MDT, PST, PDT,
A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y

@yorkxin
Copy link

yorkxin commented Sep 25, 2014

I don't know much about fluentd but I have once surveyed the tzinfo gem (and IANA database). I think it is better to use time zone region (Asia/Tokyo) instead of a fixed offset (+09:00), because for some regions that have daylight saving time, the offset may change.

Also, the offset will be changed according to political reason, for example, Taiwan, when it was Japan's territory, during WW2 it adopted the same timezone as Asia/Tokyo i.e. +09:00, instead of +08:00. The abbreviation changes due to political reason too, for example, Japanese Taiwan adopts JWT (Japan Western Time, +8) before WW2, and changed to JCT (Japan Central Time, +9) during WW2, and back to JWT after WW2.

IANA TZ database uses regions (Asia/Tokyo) as unique indexes, and in Rails the time zone selector also stores region instead of offset. Even JST is non-ambiguous, for people living in other countries and the future, it's better to represent time zone by region, although it is much longer.

All offsets including DST is recorded in IANA TZ Database but system admins have to update them regularly. Currently tzinfo gem takes tz database in unix system, so if the OS is up-to-date, future updates will be reflected to the applications -- even if not up-to-date, they can be corrected as log as we store UTC. But still there are some system that don't have latest tz database in the OS, in this case there is a tzinfo-data gem that converts TZ database into Ruby code that tzinfo gem can understand (this is how tzinfo was used before, and is also the only way to use tzinfo gem in Windows.)

@nurse
Copy link
Contributor

nurse commented Sep 25, 2014

I don't know much about fluentd but I have once surveyed the tzinfo gem (and IANA database). I think it is better to use time zone region (Asia/Tokyo) instead of a fixed offset (+09:00), because for some regions that have daylight saving time, the offset may change.

The fact "the offset may change" is not the real problem. The real issue is to follow the generator of the log by fluentd. Even if the other real world changes the timezone, the fluentd daemon must stay old definition until the related generator uses old definition.

@TakahikoKawasaki
Copy link
Author

Since it seems controversial, it may be better to ignore short names completely. It can be implemented like below.

def offset(timezone)
  if timezone.nil?
    return nil
  end

  # [+-]HH:MM, [+-]HH
  if %r{\A[+-]\d\d(:?\d\d)?\z} =~ timezone
    return Time.zone_offset(timezone)
  end

  # Region/Zone
  if %r{\A[^/]+/[^/]+\z} =~ timezone
    begin
      return TZInfo::Timezone.get(timezone).current_period.offset.utc_total_offset
    rescue
      return nil
    end
  end

  return nil
end

Should I update the patch?

@TakahikoKawasaki
Copy link
Author

Updated the patch. Supported formats of the 'timezone' parameter are as follows. Time zone abbreviations such as PST and JST are not supported intentionally.

  1. [+-]HH:MM (e.g. "+09:00")
  2. [+-]HHMM (e.g. "+0900")
  3. [+-]HH (e.g. "+09")
  4. Region/Zone (e.g. "Asia/Tokyo")

@repeatedly
Copy link
Member

@TakahikoKawasaki Cool!
Could you add several tests to test/test_formatter.rb and related plugins?

@@ -53,6 +53,10 @@ def configure(conf)
@localtime = false
end

if conf['timezone']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lines for out_exec ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for symmetry. I applied the same relations among timezone, localtime and utc to all the existing parts where localtime appears.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, this code is not needed because in_exec doesn't use localtime and utc parameters.
I'm not sure why in_exec accepts these parameters...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally, in_exec has parsers to parse input text data, so parser SHOULD require time parser & timezone configuration.
Now, in_exec does not use timezone because parsers of in_exec are not kinds of TextParser.

We should fix it later, and timezone is required sooner or later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Leave in_exec issue up to you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said "we".
Anyway, this line seems right for me.

repeatedly and others added 17 commits October 21, 2014 20:21
Measure lib/* coverage based on test execution in addition to spec.
Test and spec changes are for satisfying generic rules below.

 * simplecov requires target code to be loaded before invoking
   SimpleCov.start.
 * Invoke SimpleCov.start once. Multiple call works if use_merging =
   true (default) but unnecessarily complex.

Report from 'rake coverage' at this point.

  All Files (69.33% covered at 842.16 hits/line)
  64 files in total. 5931 relevant lines. 4112 lines covered and 1819 lines missed
This change needs consultation with CRuby guys.

 * CRuby trunk and 2.2.0-preview1 needs this but is this intentional
   change?
 * If this change only needs for developers living with CRuby trunk?

FWIW I once heard that this is only needed by the ruby command built
from svn repo but unnecesarry in general ruby installation.
We keep this option for backward compatibility.
Maybe, there is no user who wants to disable this option.
We will remove related code in the future.
@TakahikoKawasaki
Copy link
Author

@repeatedly
I rebased the commits. Is this what you requested?
https://github.com/TakahikoKawasaki/fluentd/commit/093be1da628fea3d09b946a0ca8793b893ba0163

@repeatedly
Copy link
Member

My intention is 'rebase master and squash commits' but maybe no problem.
I will check this PR again. Please wait.

@TakahikoKawasaki
Copy link
Author

@repeatedly

https://github.com/TakahikoKawasaki/fluentd/commit/093be1da628fea3d09b946a0ca8793b893ba0163 is a result of squash. It contains all the changes I have made.

@repeatedly
Copy link
Member

I merged this PR manually.
Thank you for all your hard work 👍

@TakahikoKawasaki
Copy link
Author

@repeatedly @tagomoris @frsyuki @nurse @chitsaou @komamitsu @sonots @nahi @kiyoto

Thank you very much for your review and final merge. Comments from all of you have made my first half-baked patch be much better. It was a pleasure for me to cooperate with you.

@tagomoris
Copy link
Member

👏

@repeatedly
Copy link
Member

@sonots Could you cherry-pick this commit to v0.10? Maybe, good for v0.10.56 release with mixin regression fix.

@sonots
Copy link
Member

sonots commented Oct 27, 2014

cherry-picked b5ded5e to v0.10 branch

@kiyoto
Copy link
Contributor

kiyoto commented Oct 27, 2014

@TakahikoKawasaki welcome to the quirky group of fluentd contributors!

@sonots sonots added the v0.10 label Nov 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants