-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-31384: Add extra information to log record #134
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
* Add the output run to the log record. | ||
* Add ``--log-label`` option to ``pipetask`` command to allow extra information to be injected into the log record. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this does not cause surprises for people who may be parsing MDC from our current logs 🙂
Maybe worth adding news fragment, this feels like user-visible new feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we really want that info in every record? I guess this will be repeating for all records in the same log file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was about to write the news fragment. It will only appear in the JSON record as a separate entry and is automatically handled by ButlerLogRecord. I don't think anyone is parsing our JSON log records at the moment and if they are I don't see a problem. The motivation for adding this to every record is that in LogStash or wherever it might make it easier for people to find their own log records in the sea of log records arriving from all the workflows being executed. The output run is the one piece of information provided by the person submitting the workflow and so is the one thing that they know should be there and is easy to find. The job_id is something allocated by the workflow system itself and I would expect to turn up via the
--log-label
option but it's surely easier for scientists to know their RUN than to know the workflow submission ID. This RUN information is not turning up in long-log output. Do you think we need to add a command line option to pipetask to control whether RUN is included? Clearlybps
could add--log-label run={outputRun},job_id={jobId}
to its configuration but I was assuming we'd always like to have RUN in the searchable logs and that's something we already knew.Maybe @ktlim has an opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with RUN and other info being there if it's useful for people. For log harvesting I can imagine we will want a bit more metadata associated with records, but that does not have to come from the records themselves, we'll probably have some other way to specify that extra per-log file or per-dataset metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any given log is associated with a run automatically (in the butler or in some artefact relating to that pipetask run) but I'm worrying about the log records that get injected directly into logstash. Maybe we end up with doing that injection outside of pipetask in the fluentd client or something but at the moment it's a bit undefined. I can always remove RUN later -- shouldn't hurt anything to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that most log analysis systems will flatten out any per-file metadata into per-record columns anyway. (Most of these things compress out very well, particularly if columnar.)
The output run may not be provided by the submitter: it could be auto-generated, no? Still, I agree that in the absence of an easily-derived-by-user quantum identifier, adding RUN automatically will be beneficial for filtering logs.