DM-31384: Add extra information to log record #134

timj · 2021-08-12T16:13:39Z

Requires lsst/daf_butler#557

Adds the output RUN to every record and also now supports the --log-label command.

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

andy-slac

Looks great, one minor comment.

andy-slac · 2021-08-12T16:56:16Z

python/lsst/ctrl/mpexec/singleQuantumExecutor.py

@@ -236,7 +236,7 @@ def captureLogging(self, taskDef, quantum, butler):

        ctx = _LogCaptureFlag()
        try:
-            with ButlerMDC.set_mdc({"LABEL": label}):
+            with ButlerMDC.set_mdc({"LABEL": label, "RUN": butler.run}):


I hope this does not cause surprises for people who may be parsing MDC from our current logs 🙂

Maybe worth adding news fragment, this feels like user-visible new feature?

Also, do we really want that info in every record? I guess this will be repeating for all records in the same log file?

Yes, I was about to write the news fragment. It will only appear in the JSON record as a separate entry and is automatically handled by ButlerLogRecord. I don't think anyone is parsing our JSON log records at the moment and if they are I don't see a problem. The motivation for adding this to every record is that in LogStash or wherever it might make it easier for people to find their own log records in the sea of log records arriving from all the workflows being executed. The output run is the one piece of information provided by the person submitting the workflow and so is the one thing that they know should be there and is easy to find. The job_id is something allocated by the workflow system itself and I would expect to turn up via the --log-label option but it's surely easier for scientists to know their RUN than to know the workflow submission ID. This RUN information is not turning up in long-log output. Do you think we need to add a command line option to pipetask to control whether RUN is included? Clearly bps could add --log-label run={outputRun},job_id={jobId} to its configuration but I was assuming we'd always like to have RUN in the searchable logs and that's something we already knew.

Maybe @ktlim has an opinion on this.

I'm OK with RUN and other info being there if it's useful for people. For log harvesting I can imagine we will want a bit more metadata associated with records, but that does not have to come from the records themselves, we'll probably have some other way to specify that extra per-log file or per-dataset metadata.

Any given log is associated with a run automatically (in the butler or in some artefact relating to that pipetask run) but I'm worrying about the log records that get injected directly into logstash. Maybe we end up with doing that injection outside of pipetask in the fluentd client or something but at the moment it's a bit undefined. I can always remove RUN later -- shouldn't hurt anything to do that.

I suspect that most log analysis systems will flatten out any per-file metadata into per-record columns anyway. (Most of these things compress out very well, particularly if columnar.)

The output run may not be provided by the submitter: it could be auto-generated, no? Still, I agree that in the absence of an easily-derived-by-user quantum identifier, adding RUN automatically will be beneficial for filtering logs.

timj added 2 commits August 12, 2021 08:44

Add --log-label option to pipetask

8758d46

Add output run to log record

e7ca701

andy-slac approved these changes Aug 12, 2021

View reviewed changes

Add news fragment

86510d7

timj merged commit ea1ed07 into master Aug 12, 2021

timj deleted the tickets/DM-31384 branch August 12, 2021 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-31384: Add extra information to log record #134

DM-31384: Add extra information to log record #134

timj commented Aug 12, 2021 •

edited

andy-slac left a comment

andy-slac Aug 12, 2021

andy-slac Aug 12, 2021

timj Aug 12, 2021

andy-slac Aug 12, 2021

timj Aug 12, 2021

ktlim Aug 12, 2021

DM-31384: Add extra information to log record #134

DM-31384: Add extra information to log record #134

Conversation

timj commented Aug 12, 2021 • edited

Checklist

andy-slac left a comment

Choose a reason for hiding this comment

andy-slac Aug 12, 2021

Choose a reason for hiding this comment

andy-slac Aug 12, 2021

Choose a reason for hiding this comment

timj Aug 12, 2021

Choose a reason for hiding this comment

andy-slac Aug 12, 2021

Choose a reason for hiding this comment

timj Aug 12, 2021

Choose a reason for hiding this comment

ktlim Aug 12, 2021

Choose a reason for hiding this comment

timj commented Aug 12, 2021 •

edited