Feature - stdout live reporting #16975

gecage952 · 2023-11-03T17:59:29Z

Hi,
As part of our work at Oak Ridge National Lab, we've been using Galaxy for quite a while (some of us have attended GCC as well). We've also been doing some internal development on some features that users have requested here. One the most common ones was the ability to see live console output as jobs are running. This issue has been brought up before, for example #2332. But given, our users wanted it at the present, I took a stab at an implementation. My main goal was to minimize impact to the way Galaxy works now, so that there aren't any compatibility issues. I'll try to provide details about each part that was touched.

Overview

The overall idea here was to add logic to allow the job manager to read the tool_stdout and tool_stderr files that are saved in the job directory, and return them as part of a status. The reason I put it into the status is because the UI already calls the status regularly form the JobInformation page, so I wouldn't have to make a new thread or anything. It also just kinda made sense to me that you might want it as part ofhte status of the job. To facilitate this, the api endpoint for getting job status was adjusted to allow parameters to select which part of the stdout/stderr (both work the same, so I'll just refer to stdout from here) that you want. There's stdout_pos which is the starting index in the stdout_file, and stdout_length which is how much of the stdout that you want (in chars). Because stdout could potentially be a relatively larger file, I didn't want to force people to read the whole file every time status is called.

I then adjusted the UI for the job information view in a few different ways. First, I made the code blocks scrollable and set a max height for them. Then I moved the expand on click functionality over to only be on the expand icons, rather than the whole table row (if users would try and highlight a part of the stdout or click on the scroll bar to scroll, it would collapse the view). Lastly, I added an autoscroll feature that automatically scrolls the code blocks when the user is at the bottom of the stdout. If the user scrolls up, this is disabled. If they scroll back to the bottom, it starts again.

The last thing I want to note is that as far as compatibility with job runners go, we almost exclusively use Pulsar for running jobs. As is, this pr will only work for job runners that save their stdout to the job directory inside of the Galaxy. Internally, we've added functionality for Pulsar to do this (the purpose of lib/galaxy/webapps/galaxy/api/job_files.py changes). I did not include those changes here, because that would require an additional pr to the Pulsar repository. Let me know if there's interest in seeing that however. It would also be nice to get some feedback on testing this.

I understand this is a pretty big change, and I imagine there are a lot of areas for improvement. Please let me know if this is something people are interested in helping with, or if this is a terrible way to try to do this, or whatever. It's worked for us so far internally, but I would love to have some feedback from here.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. The best way to test is to start a tool that will run for some time with stdout.
2. Start the tool and go to the JobInformation page for the job.
3. Expand the Tool stdout by clicking on the expand icon to the right.

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

merge dev into branch

dannon · 2023-11-06T13:46:45Z

@gecage952 Very cool! I'll defer to others for feedback on the API design, but I went ahead and pushed a minor change fixing up the linting and client test failures, and updating the client API schema.

bgruening · 2023-11-07T13:54:58Z

That is very cool and a feature we also get asked every now and then. As a Pulsar deployer, I'm also interested in the Pulsar part and how this works in practice. Are you using the MQ deployment of Pulsar?

mvdbeek · 2023-11-07T16:51:58Z

lib/galaxy/webapps/galaxy/api/jobs.py

+        - stdout_position: The index of the character to begin reading stdout from
+        - stdout_length: How many characters of stdout to read
+        - stderr_position: The index of the character to begin reading stderr from
+        - stderr_length: How many characters of stderr to read


Can you add two separate endpoints job for the jobs' stdout and stderr ?

gecage952 · 2023-11-07T17:30:32Z

That is very cool and a feature we also get asked every now and then. As a Pulsar deployer, I'm also interested in the Pulsar part and how this works in practice. Are you using the MQ deployment of Pulsar?

Yeah, rabbitmq.
The general idea of the pulsar piece is to send the stdout/stderr files to the job_files api endpoint periodically. To make sure we aren't sending the entire file every time, we keep track of the last position in a map. Then when it's time to send the next chunk, we seek to that position in the files, post it to the job_files endpoint which appends it to the file in the Galaxy job directory. We also considered using the message queue for this instead of the api, but ended up not going that direction.

mvdbeek · 2023-11-07T18:16:10Z

This is very cool, thanks a lot!

Let me know if there's interest in seeing that however.

that'd be great!

It would also be nice to get some feedback on testing this.

The integration tests are going to run with the local job runner by default (as opposed to the API tests that could run against external Galaxy servers where the stdio streams may not be available). What you can do is submit a tool job against such an instance that prints something to stdout, then sleeps and prints something at the end, in your test you can then assert that you saw the first message but not the second. Take a look at https://github.com/galaxyproject/galaxy/blob/dev/test/integration/test_job_recovery.py#L30-L36 for running tools in integration tests. Let me know if you need any help with this.

…dout is none

gecage952 · 2023-11-17T19:52:31Z

So, I opened a pr in the pulsar repo with that code: galaxyproject/pulsar#345
I'll try to get to the suggestions here in the coming weeks (busy time of year).

…/galaxy into feature_stdout_live_reporting

gecage952 · 2023-12-15T20:04:43Z

Ok, I added a new endpoint for getting stdout and stderr, and then updated everything to use the new endpoint. I see it was mentioned to have them separate, and I can still do that if necessary. I just combined them here now for my own personal testing purposes.

gecage952 · 2023-12-15T20:05:50Z

I'll also take a look at the merge conflicts.

gecage952 · 2024-03-22T16:09:35Z

Ok. Sorry for taking a while to get back to this. I've tried to update everything to be compatible to the latest changes in Galaxy including updating the code to Vue3, updating the api endpoint, fixing merge conflicts, etc. I'll try to fix the rest of these checks, once they finish.

gecage952 · 2024-05-03T17:58:26Z

Just to update on this. This is done enough to start being looked at just fyi. I'm sure there are areas for improvement of course.

…galaxy into feature_stdout_live_reporting

jmchilton · 2024-05-15T15:05:41Z

lib/galaxy/schema/jobs.py

@@ -212,6 +212,12 @@ class JobOutput(Model):
    value: EncodedDataItemSourceId = Field(default=..., title="Dataset", description="The associated dataset.")


+class JobConsoleOutput(Model):
+    state: Optional[JobState] = Field(None, title="Job State", description="The current job's state")


This is going to sounds weird but internally we track both a job_stderr/job_stdout and a tool_stderr/tool_stdout. Pulsar is worse at this than the other runners unfortunately. But I think the thing you're returning here is the tool_stdout - the output of the running tool. The job_stdout would be things like messages printed by SLURM and such. Can you update this description here to say Tool STDOUT and Tool STDERR. There are other places in comments where you refer to it as the job's stdout but I think it should be the job's tool stdout and such. It seems like there probably shouldn't be a distinction I imagine but we actually have disentangled a few problems this was and we shield the users from non-tool stuff they don't care about this way too.

jmchilton · 2024-05-15T15:13:20Z

lib/galaxy/managers/jobs.py

+    ):
+        if job is None:
+            raise ObjectNotFound()
+


There are a bunch of reasons this might not be possible I think. Older Pulsar, remote execution outside of Pulsar, a Pulsar modality that doesn't allow this, runners that respect the location of working directory stuff in this fashion. It feels like the right thing to do here is to throw a galaxy.exceptions.ConfigDoesNotAllowException and then catch this on the client and not poll. Does that make sense to you?

Ideally I think we would make all this opt-in based on the job destination until we've hardened it and seen it work well in production without performance issues or confusing user experience. The job object has a job.job_destination YAML thing we could use to configure things per environment/destination.

gecage952 and others added 5 commits October 17, 2023 14:55

Add option to load stdout and stderr when requesting job status

58d24dc

Update api endpoint for jobs

89cc78a

Add missing variables and import

870917b

Add capability for UI to load live job stdout and stderr

6dfbc7a

Merge pull request #1 from gecage952/dev

fba01ad

merge dev into branch

github-actions bot added the area/API label Nov 3, 2023

martenson added area/jobs kind/feature labels Nov 3, 2023

dannon force-pushed the feature_stdout_live_reporting branch from 1f25df5 to 8ac0e54 Compare November 6, 2023 13:46

dannon force-pushed the feature_stdout_live_reporting branch from e5b887e to 201f556 Compare November 6, 2023 14:03

dannon added 3 commits November 6, 2023 09:04

Linting / format.

de7cadf

Fix jest test

31065a6

Update API schema

24e7d15

dannon force-pushed the feature_stdout_live_reporting branch from 201f556 to 24e7d15 Compare November 6, 2023 14:05

mvdbeek reviewed Nov 7, 2023

View reviewed changes

Configure pulsar runner to read stdout from file if finish message st…

d336cad

…dout is none

gecage952 mentioned this pull request Nov 17, 2023

[WIP] Feature: Send stdout and stderr to Galaxy while job is running galaxyproject/pulsar#345

Open

gecage952 and others added 2 commits November 17, 2023 14:53

XMerge branch 'feature_stdout_live_reporting' of github.com:gecage952…

afa0b25

…/galaxy into feature_stdout_live_reporting

Merge branch 'dev' into feature_stdout_live_reporting

d2be71f

martenson marked this pull request as draft November 17, 2023 20:02

gecage952 added 3 commits December 15, 2023 13:52

Add new api endpoint to fetch job stdout and stderr

fd0237f

Use stdout api endpoint in job info ui while job is running

70bfcc4

Merge upstream into branch

7516d6a

gecage952 added 2 commits March 21, 2024 12:53

Merge dev into branch and fix conflicts

7052e90

Update stdout reporting code to vue3

89113d0

gecage952 and others added 3 commits March 27, 2024 11:44

Fix linting errors

791348a

Fix more formatting and schema issues

87f5126

Merge branch 'dev' into feature_stdout_live_reporting

1b2ba86

gecage952 changed the title ~~[WIP] Feature - stdout live reporting~~ Feature - stdout live reporting May 3, 2024

martenson added this to the 24.1 milestone May 3, 2024

martenson marked this pull request as ready for review May 3, 2024 21:24

format imports

f1f8eec

mvdbeek requested a review from jmchilton May 7, 2024 14:31

gecage952 added 2 commits May 9, 2024 15:18

Fix issue where only one of stdout or stderr is present

6faf683

Merge branch 'feature_stdout_live_reporting' of github.com:gecage952/…

beb5d5b

…galaxy into feature_stdout_live_reporting

jmchilton reviewed May 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature - stdout live reporting #16975

Feature - stdout live reporting #16975

gecage952 commented Nov 3, 2023 •

edited

dannon commented Nov 6, 2023 •

edited

bgruening commented Nov 7, 2023

mvdbeek Nov 7, 2023

gecage952 commented Nov 7, 2023

mvdbeek commented Nov 7, 2023

gecage952 commented Nov 17, 2023 •

edited

gecage952 commented Dec 15, 2023

gecage952 commented Dec 15, 2023

gecage952 commented Mar 22, 2024

gecage952 commented May 3, 2024

jmchilton May 15, 2024

jmchilton May 15, 2024

Feature - stdout live reporting #16975

Are you sure you want to change the base?

Feature - stdout live reporting #16975

Conversation

gecage952 commented Nov 3, 2023 • edited

Overview

How to test the changes?

License

dannon commented Nov 6, 2023 • edited

bgruening commented Nov 7, 2023

mvdbeek Nov 7, 2023

Choose a reason for hiding this comment

gecage952 commented Nov 7, 2023

mvdbeek commented Nov 7, 2023

gecage952 commented Nov 17, 2023 • edited

gecage952 commented Dec 15, 2023

gecage952 commented Dec 15, 2023

gecage952 commented Mar 22, 2024

gecage952 commented May 3, 2024

jmchilton May 15, 2024

Choose a reason for hiding this comment

jmchilton May 15, 2024

Choose a reason for hiding this comment

gecage952 commented Nov 3, 2023 •

edited

dannon commented Nov 6, 2023 •

edited

gecage952 commented Nov 17, 2023 •

edited