Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s executor does not report command log when a task fail #699

Closed
wikiselev opened this issue May 14, 2018 · 9 comments
Closed

K8s executor does not report command log when a task fail #699

wikiselev opened this issue May 14, 2018 · 9 comments

Comments

@wikiselev
Copy link
Contributor

I've got an error similar to this multiple times in different processes:

[warm up] executor > k8s
[25/55f4e7] Cached process > irods (ImmHet6564352)
[10/2f2af3] Cached process > irods (ImmHet6564353)
[44/8b51c6] Cached process > irods (ImmHet6564351)
[3f/c8eb8d] Cached process > irods (ImmHet6564354)
[58/4b07df] Cached process > irods (ImmHet6564350)
[0f/6280d6] Submitted process > merge_sample_crams (ImmHet6564351)
[f2/0efce7] Submitted process > merge_sample_crams (ImmHet6564354)
[c4/f1e33b] Submitted process > merge_sample_crams (ImmHet6564350)
[ab/8431ab] Submitted process > merge_sample_crams (ImmHet6564353)
[a7/466d3b] Submitted process > merge_sample_crams (ImmHet6564352)
ERROR ~ Error executing process > 'merge_sample_crams (ImmHet6564351)'

Caused by:
  Process `merge_sample_crams (ImmHet6564351)` terminated with an error exit status (1)

Command executed:

  samtools merge -f ImmHet6564351.cram 20918_1#2.cram 20966_1#2.cram

Command exit status:
  1

Command output:
  (empty)

Work dir:
  /mnt/gluster/root/work/0f/6280d67a12ac7c1bd5a0c3de5671f0

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details
Execution cancelled -- Finishing pending tasks before exit

If I look in the work folder there is no err or log file:

ubuntu@vlad-k8s-test-k8s-master-nf-1:/mnt/gluster/pvc-319c8c17-3ca6-11e8-89b1-fa163e31bb09$ ls -la root/work/0f/6280d67a12ac7c1bd5a0c3de5671f0/.
total 15
drwxr-xr-x 2 root root 4096 May 14 23:31 .
drwxr-xr-x 3 root root 4096 May 14 23:30 ..
-rw-r--r-- 1 root root    0 May 14 23:31 .command.begin
-rw-r--r-- 1 root root 2205 May 14 23:30 .command.run
-rw-r--r-- 1 root root   93 May 14 23:30 .command.sh
-rw-r--r-- 1 root root 2652 May 14 23:30 .command.stub
-rw-r--r-- 1 root root    1 May 14 23:31 .exitcode

In the .nextflow.log it says this:

May-14 23:30:56.837 [Task submitter] INFO  nextflow.Session - [0f/6280d6] Submitted process > merge_sample_crams (ImmHet6564351)
May-14 23:30:56.873 [Task submitter] INFO  nextflow.Session - [f2/0efce7] Submitted process > merge_sample_crams (ImmHet6564354)
May-14 23:30:56.897 [Task submitter] INFO  nextflow.Session - [c4/f1e33b] Submitted process > merge_sample_crams (ImmHet6564350)
May-14 23:30:56.925 [Task submitter] INFO  nextflow.Session - [ab/8431ab] Submitted process > merge_sample_crams (ImmHet6564353)
May-14 23:30:56.959 [Task submitter] INFO  nextflow.Session - [a7/466d3b] Submitted process > merge_sample_crams (ImmHet6564352)
May-14 23:31:11.558 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 7; name: merge_sample_crams (ImmHet6564351); status: COMPLETED; exit: 1; error: -; workDir: /mnt/gluster/root/work/0f/6280d67a12ac7c1bd5a0c3de5671f0]
May-14 23:31:11.567 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump output of process 'merge_sample_crams (ImmHet6564351)' -- Cause: java.nio.file.NoSuchFileException: /mnt/gluster/root/work/0f/6280d67a12ac7c1bd5a0c3de5671f0/.command.out
May-14 23:31:11.569 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'merge_sample_crams (ImmHet6564351)' -- Cause: java.nio.file.NoSuchFileException: /mnt/gluster/root/work/0f/6280d67a12ac7c1bd5a0c3de5671f0/.command.err
May-14 23:31:11.571 [Task monitor] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'merge_sample_crams (ImmHet6564351)' -- Cause: java.nio.file.NoSuchFileException: /mnt/gluster/root/work/0f/6280d67a12ac7c1bd5a0c3de5671f0/.command.log
May-14 23:31:11.572 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'merge_sample_crams (ImmHet6564351)'

So looks like NF is unable to find its own files (NoSuchFileException) and not able to write into them. So, it is really hard to understand the source of the error...

@wikiselev
Copy link
Contributor Author

I am stuck at this step, because unable to test further. If you prioritise bug fixing, could you please start with this among all my recently posted issues?

@pditommaso
Copy link
Member

Can do little without further info. You need to check the status of the pod using kubectl logs <pod> and kubectl describe pod <id>.

@wikiselev
Copy link
Contributor Author

ok, cool, looks like kubectl logs <pod> is really useful! I forgot to edit some of the processes where I used LSF specific stuff in the process definition:

beforeScript "set +u; source activate rnaseq${version}"
afterScript "set +u; source deactivate"

The logs showed what the problem was! Would be good to duplicate these logs and kubectl describe pod <id> into NF logs, so that the user don't need to dig for them?

root@vlad-k8s-test-k8s-master-nf-1:/mnt/gluster/pvc-319c8c17-3ca6-11e8-89b1-fa163e31bb09# kubectl logs nf-8240762e635f22bc3fbd5e6fe330441e
Could not find conda environment: rnaseq1.5
You can list all discoverable environments with `conda info --envs`.

@pditommaso
Copy link
Member

In principle, that output should be in the .log file, not sure why it's missing. Need to check.

@wikiselev
Copy link
Contributor Author

Yes, I can confirm, that logs are not written in any of the processes, does not matter whether succeeded or failed.

@pditommaso pditommaso changed the title NoSuchFileException for .err, .out and .log files on K8s K8s executor does not report command log when a task fail May 15, 2018
@pditommaso pditommaso added this to the v0.30.0 milestone May 15, 2018
@pditommaso
Copy link
Member

pditommaso commented May 15, 2018

Found. The problem here is that the log file is not created properly.

@wikiselev
Copy link
Contributor Author

Cool! Will check it tonight!

@pditommaso
Copy link
Member

Not yet published, wait! ;)

@wikiselev
Copy link
Contributor Author

I thought it's already in v0.30.0 ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants