Skip to content

Batch file contains wrong filename for read 1 #114

@jma1991

Description

@jma1991

Describe the issue
The batch file produced by kb count (kb_python 0.25.1) uses the read 2 filename for both the read 1 and read 2 columns.

What is the exact command that was run?

kb count -i index.idx -g t2g.txt -x SMARTSEQ -o output -t 2 fastq/ERR1830349_1.fastq.gz fastq/ERR1830349_2.fastq.gz

Command output (with --verbose flag)

[2021-04-06 15:37:22,088]   DEBUG Printing verbose output
[2021-04-06 15:37:22,088]   DEBUG kallisto binary located at /opt/miniconda3/envs/kb-python/lib/python3.8/site-packages/kb_python/bins/darwin/kallisto/kallisto
[2021-04-06 15:37:22,088]   DEBUG bustools binary located at /opt/miniconda3/envs/kb-python/lib/python3.8/site-packages/kb_python/bins/darwin/bustools/bustools
[2021-04-06 15:37:22,089]   DEBUG Creating output/tmp directory
[2021-04-06 15:37:22,089]   DEBUG Namespace(c1=None, c2=None, cellranger=False, command='count', dry_run=False, fastqs=['fastq/ERR1830349_1.fastq.gz', 'fastq/ERR1830349_2.fastq.gz'], filter=None, g='t2g.txt', h5ad=False, i='index.idx', keep_tmp=False, lamanno=False, list=False, loom=False, m='4G', mm=False, no_inspect=False, no_validate=False, nucleus=False, o='output', overwrite=False, report=False, t=2, tcc=False, tmp=None, verbose=True, w=None, workflow='standard', x='SMARTSEQ')
[2021-04-06 15:37:22,089]    INFO Found the following FASTQs:
[2021-04-06 15:37:22,089]    INFO         0    fastq/ERR1830349_2.fastq.gz  fastq/ERR1830349_2.fastq.gz
[2021-04-06 15:37:22,089]    INFO Writing batch definition TSV to output/batch.txt
[2021-04-06 15:37:22,090]    INFO Using index index.idx to generate matrices to output
[2021-04-06 15:37:22,090]   DEBUG kallisto pseudo --quant -i index.idx -o output -b output/batch.txt -t 2
[2021-04-06 15:38:07,160]   DEBUG
[2021-04-06 15:38:07,160]   DEBUG [quant] fragment length distribution will be estimated from the data
[2021-04-06 15:38:07,160]   DEBUG [index] k-mer length: 31
[2021-04-06 15:38:07,160]   DEBUG [index] number of targets: 140,725
[2021-04-06 15:38:07,160]   DEBUG [index] number of k-mers: 120,554,425
[2021-04-06 15:38:07,161]   DEBUG [index] number of equivalence classes: 509,144
[2021-04-06 15:38:07,161]   DEBUG [quant] running in paired-end mode
[2021-04-06 15:38:07,161]   DEBUG [quant] will process pair 1: fastq/ERR1830349_2.fastq.gz
[2021-04-06 15:38:07,161]   DEBUG fastq/ERR1830349_2.fastq.gz
[2021-04-06 15:38:07,161]   DEBUG [quant] finding pseudoalignments for all files ... done
[2021-04-06 15:38:07,161]   DEBUG [quant] processed 1,905,288 reads, 1,479,192 reads pseudoaligned
[2021-04-06 15:38:07,161]   DEBUG [quant] Running EM algorithm for each cell .. done
[2021-04-06 15:38:07,161]   DEBUG
[2021-04-06 15:38:07,211]   DEBUG output/matrix.abundance.mtx passed validation
[2021-04-06 15:38:07,756]   DEBUG Removing output/tmp directory

Possible cause
I think this problem might be caused by the following line of code:

cells[cell_id] = (fastq_2, fastq_2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions