Failed GLS job log files are not uploaded to bucket AND --show-failed-logs does not work on instance #1992

cademirch · 2022-12-06T05:04:14Z

Snakemake version

7.18.2
Describe the bug

When a job fails on GLS the stdout/stderr from the instance is captured and uploaded to the specified bucket. However, if the stderr/stdout is captured in a log file, that file is not uploaded to the bucket. Looking at the executor code, it doesn't seem like this is a feature - so I guess this isn't really a bug - but it would be a nice feature.

I attempted a quick work around by passing the option --show-failed-logs, expecting that would cat the log file to stdout (on the instance), which would be captured. However, it seems that --show-failed-logs is not passed to the snakemake command on the instance, rather it tries to display the log file locally, which of course does not exist.
Logs

Execute the ME with --show-failed-logs:
snakemake --google-lifesciences --default-remote-prefix gls-log-test -j1 --show-failed-logs
It fails as expected and tries to display the log file (which does not exist locally):

Error in rule hi:
   jobid: 1
   output: gls-log-test/hi.txt
   log: gls-log-test/logs/hi/log.txt (check log file(s) for error message)
   shell:
       hi 2> gls-log-test/logs/hi/log.txt
       (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
   jobid: 6527137249714480570
Logfile gls-log-test/logs/hi/log.txt not found.

Check the command executed on the cloud:

> gcloud beta lifesciences operations describe projects/snaketest/locations/us-central1/operations/4249048113650117362

/tmp/workdir.tar.gz && tar -xzvf /tmp/workdir.tar.gz && python -m snakemake
       --snakefile 'Snakefile' --target-jobs 'hi:' --allowed-rules 'hi' --cores 'all'
       --attempt 1 --force-use-threads  --resources 'mem_mb=1000' 'disk_mb=1000'  --force
       --keep-target-files --keep-remote --max-inventory-time 0 --nocolor --notemp
       --no-hooks --nolock --ignore-incomplete --rerun-triggers 'params' 'mtime'
       'software-env' 'code' 'input' --skip-script-cleanup  --conda-frontend 'mamba'
       --wrapper-prefix 'https://github.com/snakemake/snakemake-wrappers/raw/' --latency-wait
       5 --scheduler 'ilp' --default-remote-prefix 'gls-log-test' --default-remote-provider
       'GS' --default-resources 'mem_mb=1000' 'disk_mb=1000' 'tmpdir=system_tmpdir'
       --mode 2

Minimal example

rule all:
    input: "hi.txt"

rule hi:
    output: "hi.txt"
    log: "logs/hi/log.txt"
    shell: "hi 2> {log}" # /bin/bash: hi: command not found

Additional context

Would appreciate your insight @vsoch. It seems to me that uploading the job log file could be handled by gls_helper.py but it would probably take me a bit to implement that. It would probably be easier to pass --show-failed-logs similar to how --use-conda. Though I'm not sure where/how that is handled.

@CowanCS1 Interested if/how you handle this too!

The text was updated successfully, but these errors were encountered:

vsoch · 2022-12-06T05:30:14Z

It's been a while since I worked on this, but at least when I used this API I used the command as you described above:

https://github.com/snakemake/snakemake/blob/main/snakemake/executors/google_lifesciences.py#L862-L876

but that would only work if you put more verbose printing in your job. Maybe it would be a matter of adding --verbose to the command there for snakemake debugging output, and perhaps if you can't see anything even with extra prints, we could minimally have the script that runs everything catch the failure and print a log? I'm not sure to debug the user needs to clutter their storage with logs - likely it's better to get it interactively from the client command as you've done above. There are definitely many options - let's discuss with the folks here and @johanneskoester.

cademirch · 2022-12-06T05:43:44Z

Yeah, that only works if you don't redirect the stderr/stdout of your job to a log file. Which then defeats the purpose of the log directive when defining a rule. I think a suitable workaround is allowing --show-failed-logs to be passed to the cloud instance because we then retain our defined log file for that rule.

I just tried adding w2a("show_failed_logs") in the return here:

snakemake/snakemake/executors/__init__.py

Line 302 in a5c3523

def general_args(self):

However, show_failed_logs isn't an attribute of the workflow so that doesn't work :/

Edit:

I guess we could just hardcode --show-failed-logs... though I'm not sure of the consequences of that.

vsoch · 2022-12-06T06:05:35Z

@cademirch remember that the container the life sciences worker is using is the latest snakemake - so if you want to try changing something you'd need to build a container, push to a registry, and then provide the container URI to the executor.

cademirch · 2022-12-06T06:19:20Z

Right, but what I'm proposing is just adding --show-failed-logs to the snakemake command line being run in the container, which is handled in the function linked above. Fwiw I've added it on my fork and it is working as I expected. However, it does hardcode the option to the command executed in the container, which I'm not sure is a good thing.

See:
cademirch@3b90fa3

vsoch · 2022-12-06T06:20:50Z

Ok great! Glad you found a solution.

cademirch · 2022-12-06T06:33:43Z

Opened a PR but further discussion is probably needed. Thanks for your help @vsoch! :)

vsoch · 2022-12-06T06:34:39Z

Haha I didn’t do anything - all you @cademirch! 🙌

oxenit · 2023-08-22T16:42:12Z

+1 on this feature. It was very frustrating to have a job failing all afternoon because a file in a parameter directive could not be found without finding the exact reason 😝 (both from the pipeline logs or in the bucket itself)

What is surprising is, just like OP and in opposition to what is written here, logs of a rule are actually not written in the bucket if the rule is failing. I can only see rule logs when the rule passes (which is a bit sad).
We can find the GLS pipeline logs in another folder of the bucket but they are not as verbose as the rule logs.

cademirch · 2023-08-24T17:33:16Z

@oxenit, I haven't had a chance to work on this unfortunately but my workaround above has been sufficient for our use. I expect that this behavior won't change anytime soon as we will eventually have to migrate to Google Batch. Hopefully we can solve this issue when that happens.

cademirch added the bug Something isn't working label Dec 6, 2022

cademirch mentioned this issue Dec 6, 2022

add '--show-failed-logs` to executor args #1993

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed GLS job log files are not uploaded to bucket AND --show-failed-logs does not work on instance #1992

Failed GLS job log files are not uploaded to bucket AND --show-failed-logs does not work on instance #1992

cademirch commented Dec 6, 2022 •

edited

vsoch commented Dec 6, 2022

cademirch commented Dec 6, 2022 •

edited

vsoch commented Dec 6, 2022

cademirch commented Dec 6, 2022 •

edited

vsoch commented Dec 6, 2022

cademirch commented Dec 6, 2022 •

edited

vsoch commented Dec 6, 2022

oxenit commented Aug 22, 2023

cademirch commented Aug 24, 2023

Failed GLS job log files are not uploaded to bucket AND --show-failed-logs does not work on instance #1992

Failed GLS job log files are not uploaded to bucket AND --show-failed-logs does not work on instance #1992

Comments

cademirch commented Dec 6, 2022 • edited

vsoch commented Dec 6, 2022

cademirch commented Dec 6, 2022 • edited

vsoch commented Dec 6, 2022

cademirch commented Dec 6, 2022 • edited

vsoch commented Dec 6, 2022

cademirch commented Dec 6, 2022 • edited

vsoch commented Dec 6, 2022

oxenit commented Aug 22, 2023

cademirch commented Aug 24, 2023

cademirch commented Dec 6, 2022 •

edited

cademirch commented Dec 6, 2022 •

edited

cademirch commented Dec 6, 2022 •

edited

cademirch commented Dec 6, 2022 •

edited