Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set non-UTC timezone for filebeat modules testing #13874

Merged
merged 3 commits into from Oct 2, 2019

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Oct 2, 2019

Using UTC timezone for filebeat modules testing hides some issues when
modifying code to parse timestamps without timezones. Use a non-UTC
timezone to facilitate the detection of this kind of issues.

Using UTC timezone for filebeat modules testing hides some issues when
modifying code to parse timestamps without timezones. Use a non-UTC
timezone to facilitate the detection of this kind of issues.
@jsoriano jsoriano added module review Filebeat Filebeat Team:Integrations Label for the Integrations team labels Oct 2, 2019
@jsoriano jsoriano requested review from a team as code owners October 2, 2019 09:17
@jsoriano jsoriano self-assigned this Oct 2, 2019
@@ -140,7 +140,7 @@ def run_on_file(self, module, fileset, test_file, cfgfile):
output.write(" ".join(cmd) + "\n")

local_env = os.environ.copy()
local_env["TZ"] = 'Etc/UTC'
Copy link
Member Author

@jsoriano jsoriano Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only change in code, the rest of changes are the result when regenerating golden files.

@sayden
Copy link
Contributor

sayden commented Oct 2, 2019

Why exactly all timezones are -2 hours and not minus 200? 🙂 Can you develop, please?

@@ -140,7 +140,7 @@ def run_on_file(self, module, fileset, test_file, cfgfile):
output.write(" ".join(cmd) + "\n")

local_env = os.environ.copy()
local_env["TZ"] = 'Etc/UTC'
local_env["TZ"] = 'Etc/GMT+2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to add a comment here explaining why this timezone

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, comment added.

@@ -18,7 +18,7 @@
"event.action": "access_granted",
"event.dataset": "elasticsearch.audit",
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct. I think this should remain +00:00.

If you look at the first sample log entry in test-audit-730.log, it has the following field:

"timestamp":"2019-06-11T05:21:08,484-0700"

The corresponding field in test-audit-730.log-expected.json is:

"@timestamp": "2019-06-11T12:21:08.484Z"

So wouldn't we want event.timezone here to be +00:00?

@@ -8,7 +8,7 @@
"event.action": "anonymous_access_denied",
"event.dataset": "elasticsearch.audit",
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -8,7 +8,7 @@
"elasticsearch.node.name": "es1_1",
"event.dataset": "elasticsearch.deprecation",
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -6,7 +6,7 @@
"elasticsearch.node.name": "node-0",
"event.dataset": "elasticsearch.server",
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -16,7 +16,7 @@
"event.dataset": "elasticsearch.slowlog",
"event.duration": 3000000,
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -19,7 +19,7 @@
"event.dataset": "elasticsearch.slowlog",
"event.duration": 6000000,
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -16,7 +16,7 @@
"event.dataset": "elasticsearch.slowlog",
"event.duration": 4000000,
"event.module": "elasticsearch",
"event.timezone": "+00:00",
"event.timezone": "-02:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing the changes in this PR for stack monitoring modules. I left a few comments but started noticing a pattern in them so I'll leave a more general comment now.

In the cases where we are ingesting JSON logs, the timestamp field in the source log already has time zone information. In that case, it looks like the ingest pipeline is converting the timestamp to UTC before setting the final value in the @timestamp field. Further, the value that's set in the @timestamp field has the Z time zone identifier for UTC. As such, we should either set event.timezone to +00:00 (as it was before this PR) or omit that field entirely, according to ECS:

This field should be populated when the event’s timestamp does not include timezone information already (e.g. default Syslog timestamps). It’s optional otherwise.

@jsoriano
Copy link
Member Author

jsoriano commented Oct 2, 2019

@ycombinator I agree that there are issues to fix, specially in modules. This PR is intended only to change module tests to use non-UTC timezones, to give visibility to problems that we hadn't seen when everything was using UTC. Fixes in modules will come in follow ups (#13877). I would like to merge this change in tests before further fixes, so we can check in follow ups that timestamps actually change to proper values.

Regarding the use of event.timezone, maybe we should further discuss it as a separate issue. At the moment this field is added by the add_locale processor, that sets it to the timezone of the agent (not the timezone of the event). We use it in modules that may contain timestamps without timezone to convert the timestamps, so to correctly consume the data you only need to use the timestamp, and not both the timestamp and event.timezone, what is not supported by Kibana for example.

The case of the stack modules are a bit special, because they have two log formats, one of them contains the timezone in the format and the other one doesn't, and we set event.timezone in both cases, because we don't do any differentiation till the ingest pipeline. Maybe an option to avoid confusion in this case is to drop this field in the json pipeline, or to add it only if the log line doesn't start with {. In any case something that I would discuss in follow ups.

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @jsoriano off-PR. The changes in this PR LGTM (for stack modules). There will be a follow up PR in which we will modify the JSON-processing ingest pipelines for stack (and maybe other?) modules to drop the event.timezone field set by the add_locale processor. This will address my concerns in this PR.

BTW, the core issue here is that ECS specifies event.timezeone to be the time zone of the event, whereas the add_locale processor sets event.timezone to be the timezone where the agent is running. We think the ideal fix is to modify the add_locale processor to set something like agent.timezone but a) this field is not in ECS at the moment and b) this would be a breaking change for the processor so we must wait to do it only in the next major release.

@jsoriano jsoriano merged commit 0c76357 into elastic:master Oct 2, 2019
@jsoriano jsoriano deleted the no-utc-filebeat-tests branch October 2, 2019 16:22
@zube zube bot removed the [zube]: In Review label Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat module review Team:Integrations Label for the Integrations team v7.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants