Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for Timestamp in file outputter path #38029

Merged
merged 20 commits into from Mar 14, 2024

Conversation

yspotts
Copy link
Contributor

@yspotts yspotts commented Feb 14, 2024

Proposed commit message

Add support for timestamp substitution variable in the file outputter path configuration. The format for the timestamp variable is is described in https://www.elastic.co/guide/en/beats/libbeat/8.12/config-file-format-type.html#_format_string_sprintf and is the same as for the elasticsearch index specification: https://www.elastic.co/guide/en/beats/filebeat/master/elasticsearch-output.html#index-option-es.

Full set of supported variables:

// Package dtfmt provides time formatter support with pattern syntax mostly
// similar to joda DateTimeFormat. The pattern syntax supported is a subset
// (mostly compatible) with joda DateTimeFormat.
//
// Symbol Meaning Type Supported Examples
// ------ ------- ------- --------- -------
// G era text no AD
// C century of era (>=0) number no 20
// Y year of era (>=0) year yes 1996
//
// x weekyear year yes 1996
// w week of weekyear number yes 27
// e day of week number yes 2
// E day of week text yes Tuesday; Tue
//
// y year year yes 1996
// D day of year number yes 189
// M month of year month yes July; Jul; 07
// d day of month number yes 10
//
// a halfday of day text yes PM
// K hour of halfday (0~11) number yes 0
// h clockhour of halfday (1~12) number yes 12
//
// H hour of day (0~23) number yes 0
// k clockhour of day (1~24) number yes 24
// m minute of hour number yes 30
// s second of minute number yes 55
// S fraction of second nanoseconds yes 978000
// f fraction of seconds nanoseconds yes 123456789
// multiple of 3
// z time zone text no Pacific Standard Time; PST
// Z time zone offset/id zone no -0800; -08:00; America/Los_Angeles
//
// ' escape for text delimiter
// '' single quote literal
//

An example configuration:

path: 'fileoutput-%{+yyyy.MM.dd}'

This is accomplished by changing the config type of path to be fmtstr.TimestampFormatString which will Unpack the variable using the Unpack method specific for that type which handles substitution for timestamps. The current time in UTC is used for the actual substitution.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

In one of the fields used by the file outputter, add a time +FORMAT variable to be substituted.

Related issues

Use cases

We will be using Packetbeat for the COGS network metering project. The use case and necessity for this feature is as follows:

If the Packetbeat instance crashes for some reason and restarts, due to the way packetbeat names files, this could likely cause duplicate filenames. The Archive CSI Driver would then upload the new file which would replace the file with the same name in s3, thus causing data loss of COGS metering information.

In order to avoid this, we have identified an easy “fix” to packetbeat: introduce a unique subdirectory path in the output path configuration. This can be accomplished with the current timestamp.

@yspotts yspotts requested a review from a team as a code owner February 14, 2024 20:27
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 14, 2024
Copy link
Contributor

mergify bot commented Feb 14, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @yspotts? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@yspotts yspotts requested a review from a team February 14, 2024 20:28
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 14, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-03-05T12:47:53.538+0000

  • Duration: 131 min 10 sec

Test stats 🧪

Test Results
Failed 0
Passed 29155
Skipped 2046
Total 31201

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Feb 15, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 15, 2024
@pierrehilbert pierrehilbert added the Team:Security-Linux Platform Linux Platform Team in Security Solution label Feb 15, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/sec-linux-platform (Team:Security-Linux Platform)

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if this should be documented or left as an undocumented feature.

Please document it, I see no reason to not document it and other users might find it useful.

Could you also add a test to ensure this new feature works?

libbeat/outputs/fileout/file.go Outdated Show resolved Hide resolved
@yspotts
Copy link
Contributor Author

yspotts commented Feb 20, 2024

@belimawr added unit tests and update documentation

@yspotts
Copy link
Contributor Author

yspotts commented Feb 21, 2024

@belimawr thanks for approving!
Any ideas about the failing tests? They seem completely unrelated to my changes. Should we try to rerun them?

@yspotts
Copy link
Contributor Author

yspotts commented Feb 21, 2024

@andrewkroh
I went in the direction you suggested and allowed the date time to be specified in the path. Let me know what you think. I tested it out locally and it works as designed

@yspotts
Copy link
Contributor Author

yspotts commented Feb 22, 2024

It is interesting to me that the code:

output_file = "output/packetbeat-"+self.today+".ndjson"

Seems to assume a unix path separator. Not sure if that would cause problems in windows, but it seems curious that the failures all seem to be windows

@andrewkroh
Copy link
Member

It seems like the fmtstr code is doing something unexpected with the backslashes.

{
  "@timestamp": "2024-02-22T03:27:42.475Z",
  "ecs.version": "1.6.0",
  "log.level": "info",
  "log.logger": "file",
  "log.origin": {
    "file.line": 104,
    "file.name": "fileout/file.go",
    "function": "github.com/elastic/beats/v7/libbeat/outputs/fileout.(*fileOutput).init"
  },
  "message": "Initialized file output. path=C:UsersjenkinsworkspacePR-38029-8-20a152a8-baa5-4ccc-9622-ac59a245009csrcgithub.comelasticbeatsx-packmetricbeatbuildsystem-testsruntest_airflow.Test.test_server507\\output\\metricbeat max_size_bytes=1024000 max_backups=7 permissions=-rw-------",
  "service.name": "metricbeat"
}
# metricbeat.yml
output.file:
  path: C:\Users\jenkins\workspace\PR-38029-8-20a152a8-baa5-4ccc-9622-ac59a245009c\src\github.com\elastic\beats\x-pack\metricbeat\build\system-tests\run\test_airflow.Test.test_server507/output

From: https://storage.googleapis.com/beats-ci-temp/Beats/beats/PR-38029-8/test-build-artifacts-x-pack-metricbeat-windows-2022-windows-2022-tgz

@yspotts
Copy link
Contributor Author

yspotts commented Feb 22, 2024

@andrewkroh good find.
Looks like the fmtstr code treats backslash as an escape character:

case '\\': // escape next character

So the path separator in this case gets swallowed.
Uggh.
Well I see three ways forward I guess:

1] Write our own Unpacker which splits apart the parts of the path and then puts them back together when Run is called
2] Settle for using the fmtstr ONLY on the filename which I think would accomplish our goals since the filenames would be unique. This done have a downside however: the filenames would look quite awkward since the outputter also includes the current date into the filename and that is hard coded in elastic-agent-lib. It seems kinda awkward to provide a way to include some time format in the filename whilst we ALSO include a hard coded date format to the end of the filename
3] Add a new configuration option for the date format of the suffix of the outputted filename (instead of the hard coded one). I am really hesitant about this one since it means having to modify elastic-agent-lib and Im concerned about unintended consequences. Also - I am conerned about how invasive this would need to be to get the config option down to the code that rotates

More I think about it, the more I favor option 1

Can you see any alternatives?

@andrewkroh
Copy link
Member

andrewkroh commented Feb 22, 2024

@yspotts
Copy link
Contributor Author

yspotts commented Feb 22, 2024

Here's my idea 🤷 :

https://github.com/yspotts/beats/compare/config-rnd-path...andrewkroh:beats:config-rnd-path?expand=1

Nice! Let me try and adopt that - lets see if all the tests pass

Thanks so much for all your help!!

@yspotts
Copy link
Contributor Author

yspotts commented Feb 22, 2024

Here's my idea 🤷 :

https://github.com/yspotts/beats/compare/config-rnd-path...andrewkroh:beats:config-rnd-path?expand=1

well, that wont work for the following example:

c:\path\%{+yyyy-MM-dd-mm} because it will interpret it as escaping the %.
At that point, there is no way to tell if the percent is intended to be escaped or not.

I did a variation of option 3: https://github.com/elastic/beats/pull/38029/files#diff-7b5b4e797d2a07c4776c672a714e5fc98f390ddc4fa855b8d6dfc2f876f473daR1

If that works, and seems ok enough, I can add unit tests there

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @yspotts

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 5, 2024

💔 Build Failed

Failed CI Steps

History

cc @yspotts

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @yspotts

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @yspotts

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @yspotts

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @yspotts

@yspotts yspotts requested a review from andrewkroh March 5, 2024 15:18
@pierrehilbert
Copy link
Collaborator

@yspotts it seems that we are good to go with this one. Could you take care of merging it please?

@yspotts yspotts merged commit e067577 into elastic:main Mar 14, 2024
113 checks passed
@yspotts yspotts deleted the config-rnd-path branch March 14, 2024 14:25
@cmacknz cmacknz added the backport-v8.13.0 Automated backport with mergify label Mar 21, 2024
mergify bot pushed a commit that referenced this pull request Mar 21, 2024
* support for TIME_NOW var

* fix import ordering

* add documentation

* add unit tests

* fix linting error

* path is a date time formatter

* fmt

* remove unneeded

* update docs

* add UTC

* improve message

* change link

* create new Unpacker

* fmt

* description of the use case

* unit tests

(cherry picked from commit e067577)
cmacknz added a commit that referenced this pull request Mar 22, 2024
* support for TIME_NOW var

* fix import ordering

* add documentation

* add unit tests

* fix linting error

* path is a date time formatter

* fmt

* remove unneeded

* update docs

* add UTC

* improve message

* change link

* create new Unpacker

* fmt

* description of the use case

* unit tests

(cherry picked from commit e067577)

Co-authored-by: Yoel Spotts <yspotts@users.noreply.github.com>
Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.13.0 Automated backport with mergify >enhancement Team:Elastic-Agent Label for the Agent team Team:Security-Linux Platform Linux Platform Team in Security Solution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow specifying a randomly named directory in file output
6 participants