Instructions for server integration / workflow monitor standard? #130

vsoch · 2020-12-06T20:38:33Z

hey panoptes team!

I heard about you via snakemake, which already is setup to use panoptes for monitoring (great!) because I am working on a small (Django) web interface for running snakemake workflows. My thinking is that we can set the default for running snakemake to use --wms-monitor, and I'm wondering if you have a guide / best practices to share about how to integrate the module here into another web server application. E.g., I know that I would define the endpoints needed but then likely I'd need to map the requests from Snakemake to the models that I have for workflows, etc.

But here is an idea (and why I'm opening this issue). Have you thought about implementing some kind of workflow communication standard, akin to OCI (opencontainers) but specific to workflow monitoring / update requests? My thinking is that snakemake would then implement the standard, and be able to work with any server implementing it (e.g., panoptes here, or the application I'm working on). It minimally would be useful to have an api spec for the expected inputs and outputs, even if the database models aren't the same. Let me know what you think!

The text was updated successfully, but these errors were encountered:

fgypas · 2020-12-10T21:54:00Z

Hi @vsoch

First of all sorry for the late reply and thank you for contacting us. I think that our implementations are related so it's good to keep things consistent.

Just to be open, one of the next steps for us is to also trigger pipelines from the web interface, so I am curious how you plan to implement this (the vision for us would be to have something like nextflow tower, but it should work not only for cloud users). Via HTTP requests or add some API at the side of snakemake? Regarding our current monitoring service, we kind of hijacked the log service and we send the necessary log messages from snakemake to panoptes, where we parse/process them. We implemented it like this, as we did not want to make many changes on the side of snakemake. We are open to discuss any standard both for remotely running and/or getting information from snakemake and we can also spend some time implementing it. We prefer to introduce some breaking changes now, rather than running into them in the future. Note also that we are also open to merging/collaborating in projects if this would be beneficial for the community.

I could write more technical details, but please let us know what you think, or if something is not clear. I guess you are more familiar than us with the codebase from snakemake, so your opinion matters.

vsoch · 2020-12-10T23:19:53Z

I'm creating an interface "Snakemake Interface" == snakeface that serves as a wrapper to snakemake - basically having database models that map to workflows, workflow collections, etc., and I'm creating it so that the user can run it as a notebook and then log in with a token, or it can be deployed on some HPC or cloud environment with a custom configuration of backends. E.g., if an HPC cluster wants to configure a front end for users to log in with LDAP or SAML and then only expose the slurm (cluster) executor, that would work. The commands of the interface (e.g., the forms to input what you want for a workflow run) are generated directly from the argparse parser, so it should stand the test of time if/when the snakemake client changes. To have snakemake integrated with the logging I'm basically going to override those endpoints to send updates to the same server that will update the interface - and it's the structure of those messages that I think we could develop a standard for, meaning endpoints and then data. This would mean that any application that implements those endpoints and expects a particular format of data could be plugged into snakemake. I figure since you've already developed something, we maybe could start with that, and then I could take a look, contribute feedback, and implement it for snakeface as well? I'm also happy to do any sort of PRs to snakemake that might be needed. And yes, I'd love to work together! I really love snakemake, so I'm definitely enjoying this little side project <3

vsoch · 2020-12-11T15:39:39Z

@fgypas any context of why this is assigned to me?

fgypas · 2020-12-11T15:48:01Z

@fgypas any context of why this is assigned to me?

Sorry, mistake :)

fgypas · 2020-12-12T17:59:31Z

Hi @vsoch

I did not understand only one thing. You mention that the formats of the interface are generated directly from the argparse parser. Does this mean that each interface that you setup can run only a specific pipeline? I thought that if you expose the config.yaml would be enough to work for any pipeline (given that the data are already in a shared filesystem).

Anyway, other than that let me try to describe what we implemented in snakemake and what in panoptes in order to find a common ground.

What we wanted initially was to have a monitoring service where all the pipelines of a user would be processed a specific location. Ofcourse we have plans to suppport multiple users via tokens (see issue #2), but we did not implement it yet. Personally, I trigger multiple pipelines at the same time, and since this is a feature I am using we started with that. This is what we implemented:

The user starts a snakemake pipeline with the --wms-monitor flag e here: (https://github.com/snakemake/snakemake/blob/master/snakemake/__init__.py#L1328-L1335)
Once this happens snakemake checks if the IP and port specified the user is correct. This is done by checking (https://github.com/snakemake/snakemake/blob/master/snakemake/logging.py#L513) if the /api/service-info endpoint of panoptes has the status running (https://github.com/snakemake/snakemake/blob/master/snakemake/logging.py#L520)
If this works fine, then it sends a request to panoptes in order to get a workflow id (https://github.com/snakemake/snakemake/blob/master/snakemake/logging.py#L530 and https://github.com/panoptes-organization/panoptes/blob/master/panoptes/app.py#L70)

Then all requests sent from snakemake to panoptes are done via the /update_workflow_status (https://github.com/snakemake/snakemake/blob/master/snakemake/logging.py#L268 and https://github.com/panoptes-organization/panoptes/blob/master/panoptes/app.py#L83). We always send the msg dictionary as it has all information we need, but in principle what we use is shown here: (

panoptes/panoptes/server_utilities/db_queries.py

Lines 13 to 71 in b5add3e

    
           def maintain_jobs(msg, wf_id): 
        
               msg_json = eval(msg) 
        
               if "jobid" in msg_json.keys(): 
        
                   if msg_json["level"] == 'job_info': 
        
                       job = WorkflowJobs( 
        
                               msg_json['jobid'], 
        
                               wf_id, msg_json['msg'], 
        
                               msg_json['name'], 
        
                               repr(msg_json['input']), 
        
                               repr(msg_json['output']), 
        
                               repr(msg_json['log']), 
        
                               repr(msg_json['wildcards']), 
        
                               msg_json['is_checkpoint'], 
        
                           ) 
        
                       db_session.add(job) 
        
                       db_session.commit() 
        
                       return True 
        
                   if msg_json["level"] == 'job_finished': 
        
                       job = WorkflowJobs.query.filter(WorkflowJobs.wf_id == wf_id)\ 
        
                           .filter(WorkflowJobs.jobid == msg_json["jobid"]).first() 
        
                       job.job_done() 
        
                       db_session.commit() 
        
                       return True 
        
                   if msg_json["level"] == 'job_error': 
        
                       job = WorkflowJobs.query.filter(WorkflowJobs.wf_id == wf_id)\ 
        
                           .filter(WorkflowJobs.jobid == msg_json["jobid"]).first() 
        
                       job.job_error() 
        
                       db_session.commit() 
        
                       return True 
        
               if msg_json["level"] == 'info': 
        
                   if msg_json['msg'] == 'Nothing to be done.': 
        
                       wf = Workflows.query.filter(Workflows.id == wf_id).first() 
        
                       wf.set_not_executed() 
        
                       db_session.commit() 
        
                       return True 
        
               if msg_json["level"] == 'progress': 
        
                   wf = Workflows.query.filter(Workflows.id == wf_id).first() 
        
                   wf.edit_workflow(msg_json['done'], msg_json['total']) 
        
                   db_session.commit() 
        
                   return True 
        
               if msg_json["level"] == 'error': 
        
                   wf = Workflows.query.filter(Workflows.id == wf_id).first() 
        
                   wf.set_error() 
        
                   db_session.commit() 
        
                   return True 
        
               if msg_json["level"] in ['shellcmd', '']: 
        
                   w = WorkflowMessages(msg, wf_id=wf_id) 
        
                   db_session.add(w) 
        
                   db_session.commit() 
        
                   return True 
        
               return False

)

We are now working on a solution to show the dag in real-time, so this is the information that we definitely need to have. Also Johannes Koester (maybe we should include him in the discussion?) mentioned at some point that it would be a good idea to be able to send the report that snakemake generates and integrate it.

Finally, many of the non command line (expert) users (e.g. experimental biologists) asked us to support pipelines from a web interface. This means that it would be good to find a solution for the IDs (it should work both when you start a pipeline from the web interface and from the command line). Also, one issue that we currently have is that when a pipeline crashes and you try to restart it from the command line, it gets a new ID. I think this should not happen (bad user experiece), and it's something that we have to consider.

Please let me know if you have questions, or if something is not clear.

vsoch · 2020-12-12T18:31:38Z

I did not understand only one thing. You mention that the formats of the interface are generated directly from the argparse parser. Does this mean that each interface that you setup can run only a specific pipeline?

Oh no, not at all! It's generated from the general object, before it's used, and then the user specifies their specific inputs they want when they create a new workflow. What I was trying to say that if snakemake has an argument for --google-lifesciences-region, that will reliably show up as a text field for a new workflow creation. If snakemake changes (and the user updates the installed version) to add a new argument --another-executor-region that would automatically show up. The alternative would have been to hard code options into an interface and map them to the snakemake parser/command, which would be hard to keep up to date.

I thought that if you expose the config.yaml would be enough to work for any pipeline (given that the data are already in a shared filesystem).

The user / cluster admin would be able to specify a config.yaml, but they would still likely want to tweak jobs on the fly.

Thanks for sharing the workflow! Snakeface is similar, but it's not started by snakemake. I can outline the steps (as I see them so far).

The user starts snakeface, usually just by typing snakeface. This is akin to starting a jupyter notebook, they see a token in the console and open up some port to enter it and authenticate. As another use case (not developed yet) a center can deploy (locally or on the cloud) a snakeface interface, and give users access via some other authentication (e.g., saml, pam, or OAuth2). For this use case, the user just browses to wherever the server is and then logs in.
Once logged in, the user can create a workflow collection, and then create and interact with workflows in it. This means specifying any or all parameters that are needed, and having selection of some number of exposed exceutors (e.g., you might imagine that a Google Cloud deployed Snakeface would expose Google LifeSciences, and an HPC one would expose Slurm). Submitting the workflow means that the server executes the snakemake command on behalf of the user. This would be no different than if the user ran smakemake on their local HPC to submit a SLURM job, or to somewhere in the cloud.
The workflow would be run automatically with the --wms-monitor flag, and pointing back to the server. The receiving endpoints would update the workflow model (status, dag, etc.) This is why I reached out to you - if we both implement these endpoints we should have them be the same.
Finally, we'd probably generate reports automatically unless the user disabled it, and make these easy to browse in the interface.

And that's it! There would be views to visualize or otherwise monitor the workflow, and (I hope) tables that update automatically with web sockets.

We are now working on a solution to show the dag in real-time, so this is the information that we definitely need to have. Also Johannes Koester (maybe we should include him in the discussion?) mentioned at some point that it would be a good idea to be able to send the report that snakemake generates and integrate it.

Definitely agree!

Finally, many of the non command line (expert) users (e.g. experimental biologists) asked us to support pipelines from a web interface.

This is what (I hope) snakeface will do!

This means that it would be good to find a solution for the IDs (it should work both when you start a pipeline from the web interface and from the command line).

For the way I'm implementing it, the workflows submit from the interface will always have an id.

Also, one issue that we currently have is that when a pipeline crashes and you try to restart it from the command line, it gets a new ID. I think this should not happen (bad user experiece), and it's something that we have to consider.

I think snakeface having a separate id for the workflow would get around this.

Please let me know if you have questions, or if something is not clear.

Thanks for the details! I think for now (since I'm early in development and we are about to go into vacation) would be to focus on the messaging standard. For example, you might have a repository panoptes-organization/monitor-schema that has things like:

/service-info/

Endpoint to check status of running monitor service. Possible responses are:

404: not implemented
200: success
503: service not available

For each response above, the response should include a "status" of "not found," "success," and "service not available" respectively.

And then you would write that up for all the other endpoints. Does that make sense?

vsoch · 2020-12-12T19:57:01Z

@fgypas if you are short on time, I can offer to create a first shot at this "spec" - if you want to make a repository here and add me as a contributor. I think I can derive most of the functionality from the code here, and then we can discuss if anything should be changed or extended.

fgypas · 2020-12-12T21:46:11Z

Hi @vsoch
Thank you for the quick reply and for sharing the plans for snakeface.
I just added you to the organization and gave you maintain permissions to the new repository that you recommended:
https://github.com/panoptes-organization/monitor-schema
Feel free to organize it as you wish. Indeed I am a bit short on time, but I will try to help as much as possible. Please, let me know if the permissions you have are fine.

vsoch · 2020-12-12T21:53:39Z

Awesome thank you! I’ll get a draft up ASAP, and ping you here when it’s ready (and then I’ll also close the issue).

vsoch · 2020-12-12T23:59:48Z

okay all set, we have the start of a standard! Let's pick up discussion here panoptes-organization/monitor-schema#1 and feel free to ping Johannes if/when you want his feedback for what you currently do with snakemake.

vsoch changed the title ~~Instructions for server integration~~ Instructions for server integration / workflow monitor standard? Dec 6, 2020

vsoch mentioned this issue Dec 6, 2020

Server messages should work with workflow monitor standard snakemake/snakeface#4

Closed

fgypas self-assigned this Dec 8, 2020

fgypas added the question Further information is requested label Dec 8, 2020

fgypas assigned vsoch Dec 11, 2020

fgypas unassigned vsoch Dec 11, 2020

vsoch closed this as completed Dec 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instructions for server integration / workflow monitor standard? #130

Instructions for server integration / workflow monitor standard? #130

vsoch commented Dec 6, 2020

fgypas commented Dec 10, 2020

vsoch commented Dec 10, 2020 •

edited

vsoch commented Dec 11, 2020

fgypas commented Dec 11, 2020

fgypas commented Dec 12, 2020

vsoch commented Dec 12, 2020

vsoch commented Dec 12, 2020

fgypas commented Dec 12, 2020 •

edited

vsoch commented Dec 12, 2020

vsoch commented Dec 12, 2020

Instructions for server integration / workflow monitor standard? #130

Instructions for server integration / workflow monitor standard? #130

Comments

vsoch commented Dec 6, 2020

fgypas commented Dec 10, 2020

vsoch commented Dec 10, 2020 • edited

vsoch commented Dec 11, 2020

fgypas commented Dec 11, 2020

fgypas commented Dec 12, 2020

vsoch commented Dec 12, 2020

vsoch commented Dec 12, 2020

fgypas commented Dec 12, 2020 • edited

vsoch commented Dec 12, 2020

vsoch commented Dec 12, 2020

vsoch commented Dec 10, 2020 •

edited

fgypas commented Dec 12, 2020 •

edited