Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifest data in weblog message #1076

Closed
sven1103 opened this issue Mar 15, 2019 · 29 comments
Closed

Manifest data in weblog message #1076

sven1103 opened this issue Mar 15, 2019 · 29 comments
Milestone

Comments

@sven1103
Copy link
Contributor

sven1103 commented Mar 15, 2019

Manifest data in weblog message

The weblog message content should also provide the manifest information on workflow submit, such that it can be used for remote database logging.

Usage scenario

When you want to do remote logging of your workflows persistently in a database, and you need to relate the workflow manifest with the trace data.

Suggest implementation

Fetch the manifest object from the Session object as Map Session.manifest.toMap() in the WebLogObserver class. Add a property manifest in the JSON message, if the manifest data is provided.

@sven1103
Copy link
Contributor Author

@johandahlberg @apeltzer @ewels any suggestions to this? The current PR #1077 only sends the manifest data on workflow submission, which I think is sufficient, as you can store the manifest data with the uuid4 of the run and link further trace messages. What do you think?

@ewels
Copy link
Member

ewels commented Mar 18, 2019

Sounds great! Yes I think that’s probably sufficient.

If we’re sending this, why not just send the entire config output on the workflow start?

@johandahlberg
Copy link

Sounds reasonable, though I don't know enough about the inner workings of nextflow to know the details of how this could best be achieved. 😄 I agree with @ewels that it might be useful to get the full config submitted at the start of the workflow.

@pditommaso
Copy link
Member

is there's a really need for that? I would prefer to follow a kiss approach.

@ewels
Copy link
Member

ewels commented Mar 19, 2019

I think that we will definitely need more information about the workflows - the workflow run time variables at least (eg. the directories being used and user etc). If it's just the manifest then I could imagine us having a nice webpage showing 50 different rnaseq pipelines running with no real information to differentiate between them. It could be nice to be able to show the input data used from params.input or something for example.

Because we expect people to implement different tools to monitor workflows using this, I would advocate just sending everything and then people can choose what to use. Shouldn't be much data still and should be pretty easy to serialise into a JSON object.

@pditommaso
Copy link
Member

pditommaso commented Mar 19, 2019

I see the rationale but, the full config can contains sensitive informations, eg. cloud security credentials, tokens and password as env vars, etc. therefore it would be required to strip all this information.

@ewels
Copy link
Member

ewels commented Mar 19, 2019

hmm, yep, ok. How about just params and workflow? I guess the sensitive stuff will be in executor and env scopes usually?

@apeltzer
Copy link
Contributor

Hmm, not sure this is feasible but I agree that a restriction makes sense. I assume the same as Phil that an exclusion of the executor and env scopes could already suffice if properly documented? Could also make sure to mention that in the docs for the weblog feature that people shouldn't have params with sensitive info?

@sven1103
Copy link
Contributor Author

I agree, the information in params scope would be indeed beneficial for internal quality assessment procedures. Having it in the weblog message, makes it easy to have information relevant for reusability...

@pditommaso
Copy link
Member

pditommaso commented Mar 19, 2019

+1 for workflow which is supposed to hold all workflow metadata. Eventually also params however the latter may be incomplete because it would not reflect parameters set in the main script, therefore it could be confusing and it would be better to not include for now.

@ewels
Copy link
Member

ewels commented Mar 19, 2019

How much work is required to get the parameters set in the main script? I suspect that this will be a fairly high priority for weblog to be picked up.

@pditommaso
Copy link
Member

pditommaso commented Mar 20, 2019 via email

@sven1103
Copy link
Contributor Author

@ewels @johandahlberg I have appended the WorkflowMetadata content now to the payload. When do you want this to have included (event types)?

@apeltzer
Copy link
Contributor

Thanks for clarification @pditommaso !

@sven1103 I think once at the beginning of each workflow should suffice - or does it make more sense to send it upon successful (?) finalization of the job? Then we don't have to filter it afterward as it will only be there when a job finishes successfully... ?

@ewels
Copy link
Member

ewels commented Mar 20, 2019

To evaluate them you need to run the full script.

@pditommaso - but we're talking about a sending this weblog event after the workflow has started, so we're already running the full script here anyway?

@ewels
Copy link
Member

ewels commented Mar 20, 2019

@sven1103 @apeltzer - I think maybe both start and end? Some would be useful to have at the end, such as workflow.success which obviously only makes sense then. But most things would be most useful when the workflow first starts.

@pditommaso
Copy link
Member

pditommaso commented Mar 20, 2019

we're talking about a sending this weblog event after the workflow has started

ouch, I was stuck in the other config issue! yes, here params are valid when the execution starts

@apeltzer
Copy link
Contributor

@ewels @sven1103
True, might really make sense to have both beginning and end.

@pditommaso
Copy link
Member

The metadata are complete on start, why they should be sent at the end ?

@sven1103
Copy link
Contributor Author

For example: workflow completion date-time, duration, success flag, exit status

@pditommaso
Copy link
Member

+1

@sven1103
Copy link
Contributor Author

ok, implemented and works. Still need to gather the params

@sven1103
Copy link
Contributor Author

Example output as appetizer

"metadata": {
        "start": "2019-03-20T13:31:40+0000",
        "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
        "manifest": {
            "nextflowVersion": ">=18.10.1",
            "defaultBranch": "master",
            "version": "1.1.4",
            "homePage": "https://github.com/nf-core/hlatyping",
            "gitmodules": null,
            "description": "Precision HLA typing from next-generation sequencing data.",
            "name": "nf-core/hlatyping",
            "mainScript": "main.nf",
            "author": null
        },
        "complete": "2019-03-20T13:32:36+0000",
        "profile": "docker,test",
        "homeDir": "/Users/sven1103",
        "workDir": "/Users/sven1103/git/nextflow/work",
        "container": "nfcore/hlatyping:1.1.4",
        "commitId": "4bcced898ee23600bd8c249ff085f8f88db90e7c",
        "errorMessage": null,
        "repository": "https://github.com/nf-core/hlatyping.git",
        "containerEngine": "docker",
        "scriptFile": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/main.nf",
        "userName": "sven1103",
        "launchDir": "/Users/sven1103/git/nextflow",
        "runName": "elated_murdock",
        "configFiles": [
            "/Users/sven1103/.nextflow/assets/nf-core/hlatyping/nextflow.config"
        ],
        "sessionId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
        "errorReport": null,
        "scriptId": "2902f5aa7f297f2dccd6baebac7730a2",
        "revision": "master",
        "exitStatus": 0,
        "commandLine": "./launch.sh run nf-core/hlatyping -profile docker,test -with-weblog 'http://localhost:4567'",
        "nextflow": {
            "version": {
                "minor": "03",
                "major": "19",
                "patch": "0-edge"
            },
            "build": 5114,
            "timestamp": "20-03-2019 13:25 UTC"
        },
        "stats": {
            "computeTimeFmt": "(a few seconds)",
            "cachedCount": 0,
            "cachedDuration": {
                "days": 0,
                "millis": 0,
                "hours": 0,
                "minutes": 0,
                "seconds": 0,
                "durationInMillis": 0
            },
            "failedDuration": {
                "days": 0,
                "millis": 0,
                "hours": 0,
                "minutes": 0,
                "seconds": 0,
                "durationInMillis": 0
            },
            "succeedDuration": {
                "days": 0,
                "millis": 37266,
                "hours": 0,
                "minutes": 0,
                "seconds": 37,
                "durationInMillis": 37266
            },
            "failedCount": 0,
            "cachedPct": 0.0,
            "cachedCountFmt": "0",
            "succeedCountFmt": "6",
            "failedPct": 0.0,
            "failedCountFmt": "0",
            "ignoredCountFmt": "0",
            "ignoredCount": 0,
            "succeedPct": 100.0,
            "succeedCount": 6,
            "ignoredPct": 0.0
        },
        "resume": false,
        "success": true,
        "scriptName": "main.nf",
        "duration": {
            "days": 0,
            "millis": 55688,
            "hours": 0,
            "minutes": 0,
            "seconds": 55,
            "durationInMillis": 55688
        }
    },
    "runId": "2a45ef7d-6dc8-4cbc-a51c-f74483ded0c9",
    "event": "completed",
    "runName": "elated_murdock",
    "runStatus": "completed",
    "utcTime": "2019-03-20T13:32:37Z"

@johandahlberg
Copy link

@sven1103 I agree with the others here that it would make sense to send it at the workflow start, and at workflow finishing (regardless of whether it was successful or not).

@sven1103
Copy link
Contributor Author

sven1103 commented Mar 20, 2019

@johandahlberg It will be now send when the workflow is started and when it is completed. In terms of failure, the success boolean property is false and the errorReport and errorMessage properties will have more detailed information. At least it works, when I intentionally break my pipeline :D

For example docker daemon not running:

...
"errorMessage": "docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.\nSee 'docker run --help'.",
...

@ewels
Copy link
Member

ewels commented Mar 20, 2019

Awesome work @sven1103 😁

Once the params are added then I think we should have pretty much everything we'll need 👍

@johandahlberg
Copy link

@sven1103 cool! Looks very useful.

@sven1103
Copy link
Contributor Author

Ok, params sneak preview (nf-core/hlatyping):

"metadata": {
     "params": {
            "container": "nfcore/hlatyping:1.1.4",
            "help": false,
            "outdir": "results",
            "bam": true,
            "singleEnd": false,
            "single-end": false,
            "reads": "data/test*{1,2}.fq.gz",
...},
   "workflow": {
            "start": "2019-03-20T19:30:08Z",
            "projectDir": "/Users/sven1103/.nextflow/assets/nf-core/hlatyping",
            "manifest": {
                "nextflowVersion": ">=18.10.1",
                "defaultBranch": "master",
                "version": "1.1.4",
                "homePage": "https://github.com/nf-core/hlatyping",
                "gitmodules": null,
                "description": "Precision HLA typing from next-generation sequencing data.",
                "name": "nf-core/hlatyping",
                "mainScript": "main.nf",
                "author": null
            },
            "complete": null,
            "profile": "docker,test",
     ...
   }



@sven1103
Copy link
Contributor Author

sven1103 commented Apr 5, 2019

@rsuchecki workflow metadata will be send as JSON payload soon as well: #1077 . This issue is solved imo so I will close it :)

@sven1103 sven1103 closed this as completed Apr 5, 2019
@pditommaso pditommaso added this to the v19.04.0 milestone Apr 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants