Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

medaka was stopped with "truncated file" message #186

Closed
hyunjokoo opened this issue Aug 5, 2020 · 11 comments
Closed

medaka was stopped with "truncated file" message #186

hyunjokoo opened this issue Aug 5, 2020 · 11 comments
Labels

Comments

@hyunjokoo
Copy link

hyunjokoo commented Aug 5, 2020

Describe the bug
Medaka was stopped with "truncated file" message at random location of input file.

Logging

Error example in Ubuntu 18.04 servers

(medaka) [nicem@ngs5:/home/nicem/hjk/gajuk]$ medaka_consensus -d gajuk_smartdenovo.dmo.cns.fasta -i PAE44315_porechop.fq -o medaka -t 60
Checking program versions
This is medaka 1.0.3
Program    Version    Required   Pass     
bcftools   1.10.2     1.9        True     
bgzip      1.10.2     1.9        True     
minimap2   2.17       2.11       True     
samtools   1.10       1.9        True     
tabix      1.10.2     1.9        True     
Aligning basecalls to draft
Removing previous index file /home/nicem/hjk/gajuk/gajuk_smartdenovo.dmo.cns.fasta.mmi
Removing previous index file /home/nicem/hjk/gajuk/gajuk_smartdenovo.dmo.cns.fasta.fai
Constructing minimap index.
[M::mm_idx_gen::32.185*1.81] collected minimizers
[M::mm_idx_gen::40.592*2.04] sorted minimizers
[M::main::54.645*1.69] loaded/built the index for 2358 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2358
[M::mm_idx_stat::55.862*1.68] distinct minimizers: 59610097 (52.15% are singletons); average occurrences: 3.599; average spacing: 5.339
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -I 16G -x map-ont --MD -d /home/nicem/hjk/gajuk/gajuk_smartdenovo.dmo.cns.fasta.mmi /home/nicem/hjk/gajuk/gajuk_smartdenovo.dmo.cns.fasta
[M::main] Real time: 56.053 sec; CPU: 93.920 sec; Peak RSS: 7.341 GB
[M::main::6.812*1.00] loaded/built the index for 2358 target sequence(s)
[M::mm_mapopt_update::8.154*1.00] mid_occ = 525
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2358
[M::mm_idx_stat::9.250*1.00] distinct minimizers: 59610097 (52.15% are singletons); average occurrences: 3.599; average spacing: 5.339
[M::worker_pipeline::59.008*43.48] mapped 25020 sequences
[M::worker_pipeline::89.887*45.06] mapped 21745 sequences
[M::worker_pipeline::120.475*45.92] mapped 21873 sequences
   (Lots of "worker_pipeline" lines were removed)
[M::worker_pipeline::6023.380*18.30] mapped 22307 sequences
[M::worker_pipeline::6129.587*18.19] mapped 22515 sequences
[M::worker_pipeline::6234.410*18.10] mapped 22830 sequences
[M::worker_pipeline::6374.811*17.92] mapped 25941 sequences
[main_samview] truncated file.
double free or corruption (!prev)
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
samtools sort: truncated file. Aborting
Alignment pipeline failed.
Failed to run alignment of reads to draft.

In Ubuntu 16.04 server (error example 1 for the same process)

(medaka) [hjk@ngs4:/data_ngs4/hjk/data/Aa_nanopore_2020/smartdenovo]$ medaka_consensus -d gajuk_smartdenovo.dmo.cns.fasta -i ../PAE44315_porechop.fq -o medaka -t 40 -f
Checking program versions
This is medaka 1.0.3
Program    Version    Required   Pass     
bcftools   1.10.2     1.9        True     
bgzip      1.10.2     1.9        True     
minimap2   2.17       2.11       True     
samtools   1.10       1.9        True     
tabix      1.10.2     1.9        True     
Warning: Output will be overwritten (-f flag)
Aligning basecalls to draft
Removing previous index file /data_ngs4/hjk/data/Aa_nanopore_2020/smartdenovo/gajuk_smartdenovo.dmo.cns.fasta.mmi
Removing previous index file /data_ngs4/hjk/data/Aa_nanopore_2020/smartdenovo/gajuk_smartdenovo.dmo.cns.fasta.fai
Constructing minimap index.
[M::mm_idx_gen::54.627*1.95] collected minimizers
[M::mm_idx_gen::69.959*2.15] sorted minimizers
[M::main::79.916*2.00] loaded/built the index for 2358 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2358
[M::mm_idx_stat::81.771*1.98] distinct minimizers: 59610097 (52.15% are singletons); average occurrences: 3.599; average spacing: 5.339
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -I 16G -x map-ont --MD -d /data_ngs4/hjk/data/Aa_nanopore_2020/smartdenovo/gajuk_smartdenovo.dmo.cns.fasta.mmi /data_ngs4/hjk/data/Aa_nanopore_2020/smartdenovo/gajuk_smartdenovo.dmo.cns.fasta
[M::main] Real time: 81.969 sec; CPU: 162.168 sec; Peak RSS: 7.471 GB
[M::main::7.583*1.00] loaded/built the index for 2358 target sequence(s)
[M::mm_mapopt_update::10.032*1.00] mid_occ = 525
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2358
[M::mm_idx_stat::11.351*1.00] distinct minimizers: 59610097 (52.15% are singletons); average occurrences: 3.599; average spacing: 5.339
[M::worker_pipeline::250.757*16.64] mapped 25020 sequences
[M::worker_pipeline::326.293*20.16] mapped 21745 sequences
[M::worker_pipeline::409.098*21.98] mapped 21873 sequences
   (Lots of "worker_pipeline" lines were removed)
[M::worker_pipeline::7838.325*27.29] mapped 22833 sequences
[M::worker_pipeline::7907.374*27.37] mapped 25297 sequences
[main_samview] truncated file.
*** Error in `samtools': free(): invalid next size (normal): 0x00007fc0381cbd20 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fc0532b27e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fc0532bb37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fc0532bf53c]
/home/hjk/.conda/envs/medaka/bin/../lib/libhts.so.3(+0x47786)[0x7fc053cd5786]
/home/hjk/.conda/envs/medaka/bin/../lib/libhts.so.3(hts_close+0xb6)[0x7fc053cc2246]
samtools(+0x6527f)[0x55be9e3ce27f]
samtools(+0x12b32)[0x55be9e37bb32]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fc05325b830]
samtools(+0xa0fa)[0x55be9e3730fa]
======= Memory map: ========
55be9e369000-55be9e372000 r--p 00000000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
55be9e372000-55be9e3cf000 r-xp 00009000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
55be9e3cf000-55be9e3e9000 r--p 00066000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
55be9e3e9000-55be9e3ed000 r--p 0007f000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
55be9e3ed000-55be9e3ee000 rw-p 00083000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
55be9f3f2000-55bea0b8e000 rw-p 00000000 00:00 0                          [heap]
7fbfb8000000-7fbfbff11000 rw-p 00000000 00:00 0 
7fbfbff11000-7fbfc0000000 ---p 00000000 00:00 0 
7fbfc0000000-7fbfc012e000 rw-p 00000000 00:00 0 
  (Lots of memory map lines were removed)
7fc053d4c000-7fc053d4e000 r--p 000bd000 fc:02 2884128                    /home/hjk/.conda/envs/medaka/lib/libhts.so.1.10.2
7fc053d4e000-7fc053d4f000 rw-p 000bf000 fc:02 2884128                    /home/hjk/.conda/envs/medaka/lib/libhts.so.1.10.2
7fc053d4f000-7fc053d50000 rw-p 00000000 00:00 0 
7fc053d50000-7fc053d51000 r--p 00025000 fc:00 272119                     /lib/x86_64-linux-gnu/ld-2.23.so
7fc053d51000-7fc053d52000 rw-p 00026000 fc:00 272119                     /lib/x86_64-linux-gnu/ld-2.23.so
7fc053d52000-7fc053d53000 rw-p 00000000 00:00 0 
7ffd7d294000-7ffd7d2b5000 rw-p 00000000 00:00 0                          [stack]
7ffd7d2ff000-7ffd7d302000 r--p 00000000 00:00 0                          [vvar]
7ffd7d302000-7ffd7d304000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
[bam_sort_core] merging from 120 files and 1 in-memory blocks...
/home/hjk/.conda/envs/medaka/bin/mini_align: line 159: 76874 Broken pipe             minimap2 ${ALIGN_OPTS} -t ${THREADS} -a ${REFERENCE}.mmi ${INPUT} -A ${MATCH_SCORE} -B ${MISMATCH_SCORE} -O ${GAP_OPEN} -E ${GAP_EXTEND}
     76875 Aborted                 (core dumped) | samtools view -@ ${THREADS} -T ${REFERENCE} ${FILTER} -bS -
     76876 Done                    | samtools sort -@ ${THREADS} ${SORT} -l 9 -o ${PREFIX}.bam -
Alignment pipeline failed.
Failed to run alignment of reads to draft.

In Ubuntu 16.04 server (error example 2 for the same process)

  ( Front lines were removed)
[M::worker_pipeline::2785.749*27.47] mapped 22268 sequences
[M::worker_pipeline::2859.605*27.63] mapped 22415 sequences
[M::worker_pipeline::2947.122*27.65] mapped 22214 sequences
[main_samview] truncated file.
*** Error in `samtools': free(): invalid next size (normal): 0x00007fb964b23810 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb97d95b7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fb97d96437a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fb97d96853c]
/home/hjk/.conda/envs/medaka/bin/../lib/libhts.so.3(+0x47786)[0x7fb97e37e786]
/home/hjk/.conda/envs/medaka/bin/../lib/libhts.so.3(hts_close+0xb6)[0x7fb97e36b246]
samtools(+0x6527f)[0x55fb14a7127f]
samtools(+0x12b32)[0x55fb14a1eb32]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fb97d904830]
samtools(+0xa0fa)[0x55fb14a160fa]
======= Memory map: ========
55fb14a0c000-55fb14a15000 r--p 00000000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
55fb14a15000-55fb14a72000 r-xp 00009000 fc:02 1058078                    /home/hjk/.conda/envs/medaka/bin/samtools
  (Lots of memory map lines were removed)
7fb97e3f5000-7fb97e3f7000 r--p 000bd000 fc:02 2884128                    /home/hjk/.conda/envs/medaka/lib/libhts.so.1.10.2
7fb97e3f7000-7fb97e3f8000 rw-p 000bf000 fc:02 2884128                    /home/hjk/.conda/envs/medaka/lib/libhts.so.1.10.2
7fb97e3f8000-7fb97e3f9000 rw-p 00000000 00:00 0 
7fb97e3f9000-7fb97e3fa000 r--p 00025000 fc:00 272119                     /lib/x86_64-linux-gnu/ld-2.23.so
7fb97e3fa000-7fb97e3fb000 rw-p 00026000 fc:00 272119                     /lib/x86_64-linux-gnu/ld-2.23.so
7fb97e3fb000-7fb97e3fc000 rw-p 00000000 00:00 0 
7fff86db9000-7fff86dda000 rw-p 00000000 00:00 0                          [stack]
7fff86de8000-7fff86deb000 r--p 00000000 00:00 0                          [vvar]
7fff86deb000-7fff86ded000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
[bam_sort_core] merging from 40 files and 40 in-memory blocks...
/home/hjk/.conda/envs/medaka/bin/mini_align: line 159: 67428 Broken pipe             minimap2 ${ALIGN_OPTS} -t ${THREADS} -a ${REFERENCE}.mmi ${INPUT} -A ${MATCH_SCORE} -B ${MISMATCH_SCORE} -O ${GAP_OPEN} -E ${GAP_EXTEND}
     67429 Aborted                 (core dumped) | samtools view -@ ${THREADS} -T ${REFERENCE} ${FILTER} -bS -
     67430 Done                    | samtools sort -@ ${THREADS} ${SORT} -l 9 -o ${PREFIX}.bam -
Alignment pipeline failed.
Failed to run alignment of reads to draft.

In Ubuntu 16.04 server (error example 3 for the same process)

  ( Front lines were removed)
[M::worker_pipeline::3757.367*27.87] mapped 22229 sequences
[M::worker_pipeline::3860.228*27.75] mapped 22082 sequences
[M::worker_pipeline::3932.985*27.87] mapped 22297 sequences
[main_samview] truncated file.
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
[bam_sort_core] merging from 40 files and 40 in-memory blocks...
/home/hjk/.conda/envs/medaka/bin/mini_align: line 159:  1291 Broken pipe             minimap2 ${ALIGN_OPTS} -t ${THREADS} -a ${REFERENCE}.mmi ${INPUT} -A ${MATCH_SCORE} -B ${MISMATCH_SCORE} -O ${GAP_OPEN} -E ${GAP_EXTEND}
      1292 Aborted                 (core dumped) | samtools view -@ ${THREADS} -T ${REFERENCE} ${FILTER} -bS -
      1293 Done                    | samtools sort -@ ${THREADS} ${SORT} -l 9 -o ${PREFIX}.bam -
Alignment pipeline failed.
Failed to run alignment of reads to draft.

Environment (if you do not have a GPU, write No GPU):

  • Installation method conda
  • OS: Ubuntu 18.04.2 LTS, Ubuntu 18.04.3 LTS, Ubuntu 16.04.3 LTS
  • GPU model:
    VGA compatible controller: Matrox Electronics Systems Ltd. G200eR2 (Ubuntu 18)
    VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a) (Ubuntu 16)
  • Nvidia driver version
  • CUDA version
  • cuDNN version

Additional context
When I try to run medaka several times, the errors were occured in different location, which means that the process of "M::worker_pipeline" is stopped at random round (Above, I attached three error examples in Ubuntu 16.04 for the sample). In another sample, medaka ran successfully after about 10th trial. But I had no problem in running medaka for several other samples in these servers. Could you please tell me why medaka had this kind of problem in several samples and stopped in random place? I think it is not input file problem because medaka went through successfully for one sample which had medaka run error several times.

Thank you.
HJK

@hyunjokoo hyunjokoo added the bug label Aug 5, 2020
@philres
Copy link
Contributor

philres commented Aug 12, 2020

Hi,

we had the same problem and solved it by downgrading samtools from 1.10 to 1.9

Philipp

@nextgenusfs
Copy link

nextgenusfs commented Aug 14, 2020

Bump -- have same issue, sometimes medaka completes, other times errors out with the samtools error. Downgrading on conda to samtools-1.9 results in downgrading medaka.

Work around was to downgrade medaka in conda, and then install latest from PyPi.....

@cjw85
Copy link
Member

cjw85 commented Aug 17, 2020

If I recall correctly the upgrade to samtools, bcftools, and htslib 1.10 was caused by something in bioconda packaging requiring the upgrade. Therefor using the PyPI distribution is the solution currently. I don't remember ever seeing issues like this with samtools 1.9.

The issue here isn't to do with medaka per se, as the log above shows at the point of error we are running minimap2 and piping the output to samtools:

minimap2 ${ALIGN_OPTS} -t ${THREADS} -a ${REFERENCE}.mmi ${INPUT} -A ${MATCH_SCORE} -B ${MISMATCH_SCORE} -O ${GAP_OPEN} -E ${GAP_EXTEND} 
| samtools view -@ ${THREADS} -T ${REFERENCE} ${FILTER} -bS -
| samtools sort -@ ${THREADS} ${SORT} -l 9 -o ${PREFIX}.bam -

If any one knows why this error is occurring...

@cjw85
Copy link
Member

cjw85 commented Oct 7, 2020

I've left this issue to stew for a while. The only reports we've had of it happening occur with the conda package. I don't know if that indicates something up with the samtools conda package or simply that most of medaka's users are using conda.

I'm going to close this issue as it is not an bug of medaka per se. If there continue to be issues we will look to amend the conda package.

@cjw85 cjw85 closed this as completed Oct 7, 2020
@lbal-biomat
Copy link

Hi,
I was able to reproduce the problem described consistently while running mini_align with the -t flag set to any number > 1. I also found that if I don't set the -t flag and run it with only 1 thread the program ends successfully.
But mapping with one thread was very slow, so I tried running the minimap2/samtools line alone and found that the only -t flag that is causing the issue is the samtools view one, so while this doesn't work:

minimap2 ${ALIGN_OPTS} -t ${THREADS} -a ${REFERENCE}.mmi ${INPUT} \
  -A ${MATCH_SCORE} -B ${MISMATCH_SCORE} -O ${GAP_OPEN} -E ${GAP_EXTEND} |
  samtools view -@ ${THREADS} -T ${REFERENCE} ${FILTER} -bS - |
  samtools sort -@ ${THREADS} ${SORT} -l 9 -o ${PREFIX}.bam -

this one does work without any problem:

minimap2 ${ALIGN_OPTS} -t ${THREADS} -a ${REFERENCE}.mmi ${INPUT} \
  -A ${MATCH_SCORE} -B ${MISMATCH_SCORE} -O ${GAP_OPEN} -E ${GAP_EXTEND} |
  samtools view -T ${REFERENCE} ${FILTER} -bS - |
  samtools sort -@ ${THREADS} ${SORT} -l 9 -o ${PREFIX}.bam -

So I just edited mini_align and voila!

@RaverJay
Copy link

I think this bug of samtools v1.10 was addressed here:
samtools/samtools#1293

So this is probably fixed in samtools v1.11 (and was working in v1.9), but:

conda create -n medaka_test medaka=1.2.1
grabs the bugged samtools v1.10, but builds the environment

conda create -n medaka_test medaka=1.2.1 samtools=1.11
conflicts:

Package samtools conflicts for:
samtools=1.11
medaka=1.2.1 -> samtools[version='>=1.9']

conda create -n medaka_test medaka=1.2.1 samtools=1.9
conflicts:

Package samtools conflicts for:
medaka=1.2.1 -> samtools[version='>=1.9']
samtools=1.9

Huh, I was under the impression that 1.9 >= 1.9, and 1.11 >= 1.9
Ideas?

@cjw85
Copy link
Member

cjw85 commented Jan 12, 2021

I don't pretend to understand conda's package resolution, I've seen it complain about things like this before. This is probably one for the bioconda developers.

@RaverJay
Copy link

Agreed, will post there.

Changing medakas dependencies to avoid samtools 1.10 could be important though, so that the mini_align script will stop failing randomly in the future.

@fbemm
Copy link

fbemm commented Jan 12, 2021

+1 the 1.10 samtools bug is still persisting in the Conda package build and thus also in the biocontainer.

@RaverJay
Copy link

Think I found why medaka 1.2.1 and samtools 1.11 do not mix:

medaka deps:

medaka 1.2.1 py38hfcf0ad1_0
---------------------------
file name   : medaka-1.2.1-py38hfcf0ad1_0.tar.bz2
name        : medaka
version     : 1.2.1
build       : py38hfcf0ad1_0
build number: 0
size        : 37.9 MB
license     : Mozilla Public License 2.0
subdir      : linux-64
url         : https://conda.anaconda.org/bioconda/linux-64/medaka-1.2.1-py38hfcf0ad1_0.tar.bz2
md5         : 483317cb802995cf3e1193e4707a85ee
timestamp   : 2020-11-26 23:36:19 UTC
dependencies: 
  - bcftools >=1.9
  - biopython
  - bzip2 >=1.0.8,<2.0a0
  - cffi
  - h5py
  - htslib >=1.10.2,<1.11.0a0
  - intervaltree >=3.0.0
  - libgcc-ng >=7.5.0
  - mappy
  - minimap2 >=2.17
  - numpy
  - ont-fast5-api
  - parasail-python
  - pysam >=0.16.0.1
  - pyspoa
  - python >=3.8,<3.9.0a0
  - python-edlib
  - python_abi 3.8.* *_cp38
  - requests
  - samtools >=1.9
  - tensorflow >=2.2.0
  - whatshap >=0.18
  - xz >=5.2.5,<5.3.0a0
  - zlib >=1.2.11,<1.3.0a0

samtools deps:

samtools 1.11 h6270b1f_0
------------------------
file name   : samtools-1.11-h6270b1f_0.tar.bz2
name        : samtools
version     : 1.11
build       : h6270b1f_0
build number: 0
size        : 383 KB
license     : MIT
subdir      : linux-64
url         : https://conda.anaconda.org/bioconda/linux-64/samtools-1.11-h6270b1f_0.tar.bz2
md5         : 185503bd7a14eadbc088a301784b22d1
timestamp   : 2020-10-04 13:29:15 UTC
dependencies: 
  - htslib >=1.11,<1.12.0a0
  - libgcc-ng >=7.5.0
  - ncurses >=6.2,<6.3.0a0
  - zlib >=1.2.11,<1.3.0a0

htslib conflicting.
This is probably what conda create found, but did not show in a meaningful way.
Found it using mamba, which gives much better details on what is actually conflicting.

@cjw85 Is this htslib <1.11.0a0 dependency necessary?

Cheers

@cjw85
Copy link
Member

cjw85 commented Jan 19, 2021

@RaverJay

Apologies for no replying sooner, since this issue is closed I don't receive alerts.

The bioconda recipe isn't putting that constraint:
https://github.com/bioconda/bioconda-recipes/blob/master/recipes/medaka/meta.yaml

I suggest raising an issue (or making a PR) on bioconda as we have limited resource available to support the bioconda builds.

@nanoporetech nanoporetech locked as off-topic and limited conversation to collaborators Jan 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

7 participants