Manifest data in weblog message #1076

sven1103 · 2019-03-15T14:07:41Z

Manifest data in weblog message

The weblog message content should also provide the manifest information on workflow submit, such that it can be used for remote database logging.

Usage scenario

When you want to do remote logging of your workflows persistently in a database, and you need to relate the workflow manifest with the trace data.

Suggest implementation

Fetch the manifest object from the Session object as Map Session.manifest.toMap() in the WebLogObserver class. Add a property manifest in the JSON message, if the manifest data is provided.

The text was updated successfully, but these errors were encountered:

sven1103 · 2019-03-18T10:22:33Z

@johandahlberg @apeltzer @ewels any suggestions to this? The current PR #1077 only sends the manifest data on workflow submission, which I think is sufficient, as you can store the manifest data with the uuid4 of the run and link further trace messages. What do you think?

ewels · 2019-03-18T22:27:35Z

Sounds great! Yes I think that’s probably sufficient.

If we’re sending this, why not just send the entire config output on the workflow start?

johandahlberg · 2019-03-19T12:06:22Z

Sounds reasonable, though I don't know enough about the inner workings of nextflow to know the details of how this could best be achieved. 😄 I agree with @ewels that it might be useful to get the full config submitted at the start of the workflow.

pditommaso · 2019-03-19T12:35:40Z

is there's a really need for that? I would prefer to follow a kiss approach.

ewels · 2019-03-19T12:53:18Z

I think that we will definitely need more information about the workflows - the workflow run time variables at least (eg. the directories being used and user etc). If it's just the manifest then I could imagine us having a nice webpage showing 50 different rnaseq pipelines running with no real information to differentiate between them. It could be nice to be able to show the input data used from params.input or something for example.

Because we expect people to implement different tools to monitor workflows using this, I would advocate just sending everything and then people can choose what to use. Shouldn't be much data still and should be pretty easy to serialise into a JSON object.

pditommaso · 2019-03-19T13:14:07Z

I see the rationale but, the full config can contains sensitive informations, eg. cloud security credentials, tokens and password as env vars, etc. therefore it would be required to strip all this information.

ewels · 2019-03-19T13:23:03Z

hmm, yep, ok. How about just params and workflow? I guess the sensitive stuff will be in executor and env scopes usually?

apeltzer · 2019-03-19T13:32:00Z

Hmm, not sure this is feasible but I agree that a restriction makes sense. I assume the same as Phil that an exclusion of the executor and env scopes could already suffice if properly documented? Could also make sure to mention that in the docs for the weblog feature that people shouldn't have params with sensitive info?

sven1103 · 2019-03-19T13:34:23Z

I agree, the information in params scope would be indeed beneficial for internal quality assessment procedures. Having it in the weblog message, makes it easy to have information relevant for reusability...

pditommaso · 2019-03-19T13:41:59Z

+1 for workflow which is supposed to hold all workflow metadata. Eventually also params however the latter may be incomplete because it would not reflect parameters set in the main script, therefore it could be confusing and it would be better to not include for now.

ewels · 2019-03-19T16:58:47Z

How much work is required to get the parameters set in the main script? I suspect that this will be a fairly high priority for weblog to be picked up.

pditommaso · 2019-03-20T10:00:30Z

Complexity index 9 out if 10. The problem here is the parameters in the script are just variable assignments. To evaluate them you need to run the full script. The new modules syntax will introduce some changes that may allow to parse them without running the full pipelines. However it's not anytime soon.

…

On Tue, Mar 19, 2019, 17:58 Phil Ewels ***@***.***> wrote: How much work is required to get the parameters set in the main script? I suspect that this will be a fairly high priority for weblog to be picked up. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1076 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAx3SEhdJiriYMpxalJOnNqgC9FIzfHlks5vYRdHgaJpZM4b2i2B> .

sven1103 · 2019-03-20T11:17:00Z

@ewels @johandahlberg I have appended the WorkflowMetadata content now to the payload. When do you want this to have included (event types)?

apeltzer · 2019-03-20T11:57:48Z

Thanks for clarification @pditommaso !

@sven1103 I think once at the beginning of each workflow should suffice - or does it make more sense to send it upon successful (?) finalization of the job? Then we don't have to filter it afterward as it will only be there when a job finishes successfully... ?

ewels · 2019-03-20T12:24:30Z

To evaluate them you need to run the full script.

@pditommaso - but we're talking about a sending this weblog event after the workflow has started, so we're already running the full script here anyway?

ewels · 2019-03-20T12:28:21Z

@sven1103 @apeltzer - I think maybe both start and end? Some would be useful to have at the end, such as workflow.success which obviously only makes sense then. But most things would be most useful when the workflow first starts.

pditommaso · 2019-03-20T12:48:22Z

we're talking about a sending this weblog event after the workflow has started

ouch, I was stuck in the other config issue! yes, here params are valid when the execution starts

apeltzer · 2019-03-20T13:07:18Z

@ewels @sven1103
True, might really make sense to have both beginning and end.

pditommaso · 2019-03-20T13:09:03Z

The metadata are complete on start, why they should be sent at the end ?

sven1103 · 2019-03-20T13:16:01Z

For example: workflow completion date-time, duration, success flag, exit status

pditommaso · 2019-03-20T13:20:39Z

+1

sven1103 · 2019-03-20T13:36:15Z

ok, implemented and works. Still need to gather the params

sven1103 · 2019-03-20T13:37:00Z

Example output as appetizer

"metadata": {
        "start": "2019-03-20T13:31:40+0000",
        "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
        "manifest": {
            "nextflowVersion": ">=18.10.1",
            "defaultBranch": "master",
            "version": "1.1.4",
            "homePage": "https://github.com/nf-core/hlatyping",
            "gitmodules": null,
            "description": "Precision HLA typing from next-generation sequencing data.",
            "name": "nf-core/hlatyping",
            "mainScript": "main.nf",
            "author": null
        },
        "complete": "2019-03-20T13:32:36+0000",
        "profile": "docker,test",
        "homeDir": "/Users/sven1103",
        "workDir": "/Users/sven1103/git/nextflow/work",
        "container": "nfcore/hlatyping:1.1.4",
        "commitId": "4bcced898ee23600bd8c249ff085f8f88db90e7c",
        "errorMessage": null,
        "repository": "https://github.com/nf-core/hlatyping.git",
        "containerEngine": "docker",
        "scriptFile": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/main.nf",
        "userName": "sven1103",
        "launchDir": "/Users/sven1103/git/nextflow",
        "runName": "elated_murdock",
        "configFiles": [
            "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/nextflow.config"
        ],
        "sessionId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
        "errorReport": null,
        "scriptId": "2902f5aa7f297f2dccd6baebac7730a2",
        "revision": "master",
        "exitStatus": 0,
        "commandLine": "./launch.sh run nf-core/hlatyping -profile docker,test -with-weblog 'http://localhost:4567'",
        "nextflow": {
            "version": {
                "minor": "03",
                "major": "19",
                "patch": "0-edge"
            },
            "build": 5114,
            "timestamp": "20-03-2019 13:25 UTC"
        },
        "stats": {
            "computeTimeFmt": "(a few seconds)",
            "cachedCount": 0,
            "cachedDuration": {
                "days": 0,
                "millis": 0,
                "hours": 0,
                "minutes": 0,
                "seconds": 0,
                "durationInMillis": 0
            },
            "failedDuration": {
                "days": 0,
                "millis": 0,
                "hours": 0,
                "minutes": 0,
                "seconds": 0,
                "durationInMillis": 0
            },
            "succeedDuration": {
                "days": 0,
                "millis": 37266,
                "hours": 0,
                "minutes": 0,
                "seconds": 37,
                "durationInMillis": 37266
            },
            "failedCount": 0,
            "cachedPct": 0.0,
            "cachedCountFmt": "0",
            "succeedCountFmt": "6",
            "failedPct": 0.0,
            "failedCountFmt": "0",
            "ignoredCountFmt": "0",
            "ignoredCount": 0,
            "succeedPct": 100.0,
            "succeedCount": 6,
            "ignoredPct": 0.0
        },
        "resume": false,
        "success": true,
        "scriptName": "main.nf",
        "duration": {
            "days": 0,
            "millis": 55688,
            "hours": 0,
            "minutes": 0,
            "seconds": 55,
            "durationInMillis": 55688
        }
    },
    "runId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
    "event": "completed",
    "runName": "elated_murdock",
    "runStatus": "completed",
    "utcTime": "2019-03-20T13:32:37Z"

johandahlberg · 2019-03-20T14:10:50Z

@sven1103 I agree with the others here that it would make sense to send it at the workflow start, and at workflow finishing (regardless of whether it was successful or not).

sven1103 · 2019-03-20T14:25:21Z

@johandahlberg It will be now send when the workflow is started and when it is completed. In terms of failure, the success boolean property is false and the errorReport and errorMessage properties will have more detailed information. At least it works, when I intentionally break my pipeline :D

For example docker daemon not running:

...
"errorMessage": "docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.\nSee 'docker run --help'.",
...

ewels · 2019-03-20T15:04:10Z

Awesome work @sven1103 😁

Once the params are added then I think we should have pretty much everything we'll need 👍

johandahlberg · 2019-03-20T15:15:50Z

@sven1103 cool! Looks very useful.

sven1103 · 2019-03-20T19:43:31Z

Ok, params sneak preview (nf-core/hlatyping):

"metadata": {
     "params": {
            "container": "nfcore/hlatyping:1.1.4",
            "help": false,
            "outdir": "results",
            "bam": true,
            "singleEnd": false,
            "single-end": false,
            "reads": "data/test*{1,2}.fq.gz",
...},
   "workflow": {
            "start": "2019-03-20T19:30:08Z",
            "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
            "manifest": {
                "nextflowVersion": ">=18.10.1",
                "defaultBranch": "master",
                "version": "1.1.4",
                "homePage": "https://github.com/nf-core/hlatyping",
                "gitmodules": null,
                "description": "Precision HLA typing from next-generation sequencing data.",
                "name": "nf-core/hlatyping",
                "mainScript": "main.nf",
                "author": null
            },
            "complete": null,
            "profile": "docker,test",
     ...
   }

sven1103 · 2019-04-05T14:35:08Z

@rsuchecki workflow metadata will be send as JSON payload soon as well: #1077 . This issue is solved imo so I will close it :)

sven1103 mentioned this issue Mar 15, 2019

Provides workflow metadata in weblog message on workflow submission event #1077

Merged

rsuchecki mentioned this issue Apr 3, 2019

Capture workflow metadata csiro-crop-informatics/repset#17

Closed

sven1103 closed this as completed Apr 5, 2019

pditommaso added this to the v19.04.0 milestone Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manifest data in weblog message #1076

Manifest data in weblog message #1076

sven1103 commented Mar 15, 2019 •

edited

Loading

sven1103 commented Mar 18, 2019

ewels commented Mar 18, 2019

johandahlberg commented Mar 19, 2019

pditommaso commented Mar 19, 2019

ewels commented Mar 19, 2019

pditommaso commented Mar 19, 2019 •

edited

Loading

ewels commented Mar 19, 2019

apeltzer commented Mar 19, 2019

sven1103 commented Mar 19, 2019

pditommaso commented Mar 19, 2019 •

edited

Loading

ewels commented Mar 19, 2019

pditommaso commented Mar 20, 2019 via email

sven1103 commented Mar 20, 2019

apeltzer commented Mar 20, 2019

ewels commented Mar 20, 2019

ewels commented Mar 20, 2019

pditommaso commented Mar 20, 2019 •

edited

Loading

apeltzer commented Mar 20, 2019

pditommaso commented Mar 20, 2019

sven1103 commented Mar 20, 2019

pditommaso commented Mar 20, 2019

sven1103 commented Mar 20, 2019

sven1103 commented Mar 20, 2019

johandahlberg commented Mar 20, 2019

sven1103 commented Mar 20, 2019 •

edited

Loading

ewels commented Mar 20, 2019

johandahlberg commented Mar 20, 2019

sven1103 commented Mar 20, 2019

sven1103 commented Apr 5, 2019

Manifest data in weblog message #1076

Manifest data in weblog message #1076

Comments

sven1103 commented Mar 15, 2019 • edited Loading