New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature - stdout live reporting #16975
base: dev
Are you sure you want to change the base?
Feature - stdout live reporting #16975
Conversation
1f25df5
to
8ac0e54
Compare
@gecage952 Very cool! I'll defer to others for feedback on the API design, but I went ahead and pushed a minor change fixing up the linting and client test failures, and updating the client API schema. |
e5b887e
to
201f556
Compare
201f556
to
24e7d15
Compare
That is very cool and a feature we also get asked every now and then. As a Pulsar deployer, I'm also interested in the Pulsar part and how this works in practice. Are you using the MQ deployment of Pulsar? |
- stdout_position: The index of the character to begin reading stdout from | ||
- stdout_length: How many characters of stdout to read | ||
- stderr_position: The index of the character to begin reading stderr from | ||
- stderr_length: How many characters of stderr to read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add two separate endpoints job for the jobs' stdout and stderr ?
Yeah, rabbitmq. |
This is very cool, thanks a lot!
that'd be great!
The integration tests are going to run with the local job runner by default (as opposed to the API tests that could run against external Galaxy servers where the stdio streams may not be available). What you can do is submit a tool job against such an instance that prints something to stdout, then sleeps and prints something at the end, in your test you can then assert that you saw the first message but not the second. Take a look at https://github.com/galaxyproject/galaxy/blob/dev/test/integration/test_job_recovery.py#L30-L36 for running tools in integration tests. Let me know if you need any help with this. |
So, I opened a pr in the pulsar repo with that code: galaxyproject/pulsar#345 |
…/galaxy into feature_stdout_live_reporting
Ok, I added a new endpoint for getting stdout and stderr, and then updated everything to use the new endpoint. I see it was mentioned to have them separate, and I can still do that if necessary. I just combined them here now for my own personal testing purposes. |
I'll also take a look at the merge conflicts. |
Ok. Sorry for taking a while to get back to this. I've tried to update everything to be compatible to the latest changes in Galaxy including updating the code to Vue3, updating the api endpoint, fixing merge conflicts, etc. I'll try to fix the rest of these checks, once they finish. |
Just to update on this. This is done enough to start being looked at just fyi. I'm sure there are areas for improvement of course. |
…galaxy into feature_stdout_live_reporting
@@ -212,6 +212,12 @@ class JobOutput(Model): | |||
value: EncodedDataItemSourceId = Field(default=..., title="Dataset", description="The associated dataset.") | |||
|
|||
|
|||
class JobConsoleOutput(Model): | |||
state: Optional[JobState] = Field(None, title="Job State", description="The current job's state") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to sounds weird but internally we track both a job_stderr/job_stdout and a tool_stderr/tool_stdout. Pulsar is worse at this than the other runners unfortunately. But I think the thing you're returning here is the tool_stdout - the output of the running tool. The job_stdout would be things like messages printed by SLURM and such. Can you update this description here to say Tool STDOUT and Tool STDERR. There are other places in comments where you refer to it as the job's stdout but I think it should be the job's tool stdout and such. It seems like there probably shouldn't be a distinction I imagine but we actually have disentangled a few problems this was and we shield the users from non-tool stuff they don't care about this way too.
): | ||
if job is None: | ||
raise ObjectNotFound() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of reasons this might not be possible I think. Older Pulsar, remote execution outside of Pulsar, a Pulsar modality that doesn't allow this, runners that respect the location of working directory stuff in this fashion. It feels like the right thing to do here is to throw a galaxy.exceptions.ConfigDoesNotAllowException
and then catch this on the client and not poll. Does that make sense to you?
Ideally I think we would make all this opt-in based on the job destination until we've hardened it and seen it work well in production without performance issues or confusing user experience. The job object has a job.job_destination
YAML thing we could use to configure things per environment/destination.
Hi,
As part of our work at Oak Ridge National Lab, we've been using Galaxy for quite a while (some of us have attended GCC as well). We've also been doing some internal development on some features that users have requested here. One the most common ones was the ability to see live console output as jobs are running. This issue has been brought up before, for example #2332. But given, our users wanted it at the present, I took a stab at an implementation. My main goal was to minimize impact to the way Galaxy works now, so that there aren't any compatibility issues. I'll try to provide details about each part that was touched.
Overview
The overall idea here was to add logic to allow the job manager to read the tool_stdout and tool_stderr files that are saved in the job directory, and return them as part of a status. The reason I put it into the status is because the UI already calls the status regularly form the JobInformation page, so I wouldn't have to make a new thread or anything. It also just kinda made sense to me that you might want it as part ofhte status of the job. To facilitate this, the api endpoint for getting job status was adjusted to allow parameters to select which part of the stdout/stderr (both work the same, so I'll just refer to stdout from here) that you want. There's
stdout_pos
which is the starting index in the stdout_file, andstdout_length
which is how much of the stdout that you want (in chars). Because stdout could potentially be a relatively larger file, I didn't want to force people to read the whole file every time status is called.I then adjusted the UI for the job information view in a few different ways. First, I made the code blocks scrollable and set a max height for them. Then I moved the expand on click functionality over to only be on the expand icons, rather than the whole table row (if users would try and highlight a part of the stdout or click on the scroll bar to scroll, it would collapse the view). Lastly, I added an autoscroll feature that automatically scrolls the code blocks when the user is at the bottom of the stdout. If the user scrolls up, this is disabled. If they scroll back to the bottom, it starts again.
The last thing I want to note is that as far as compatibility with job runners go, we almost exclusively use Pulsar for running jobs. As is, this pr will only work for job runners that save their stdout to the job directory inside of the Galaxy. Internally, we've added functionality for Pulsar to do this (the purpose of lib/galaxy/webapps/galaxy/api/job_files.py changes). I did not include those changes here, because that would require an additional pr to the Pulsar repository. Let me know if there's interest in seeing that however. It would also be nice to get some feedback on testing this.
I understand this is a pretty big change, and I imagine there are a lot of areas for improvement. Please let me know if this is something people are interested in helping with, or if this is a terrible way to try to do this, or whatever. It's worked for us so far internally, but I would love to have some feedback from here.
How to test the changes?
(Select all options that apply)
License