Isoseq collapse filtering out criteria #664

MengjunWu · 2024-03-20T07:27:10Z

Hi,
I have some problems with isoseq collapse. While most of my reads (90%) are mapped, almost half of them are filtered out after isoseq collpase. I was wondering how do you calculate coverage and identify? I am using the mg tag to get the identity, and calculating coverage per read as number of matches and mismatches in the cigar string divided by the read length, but I get much less reads filtered out than by isoseq collapse with the same thresholds. Are either coverage or identity calculated differently?

Many thanks
Mengjun

armintoepfer · 2024-03-20T07:53:53Z

Assigning to @jmattick

jmattick · 2024-06-25T18:40:40Z

Hi @MengjunWu,
collapse filters based on the following:

Read is mapped
Read is a primary alignment
Read meets the minimum coverage (aligned end - aligned start) / (read length)
Read meets the minimum identity (matches / (matches + mis-matches + inserted bases + deleted bases)
Optional: If using single-cell workflow, read must be marked as coming from a real cell using the rc tag.

These minimum values can be changed using the following options.

Alignment Filter Options:
  --min-aln-coverage              FLOAT  Ignore alignments with less than minimum query read coverage. [0.99]
  --min-aln-identity              FLOAT  Ignore alignments with less than minimum alignment identity. [0.95]

armintoepfer added the IsoSeq label Mar 20, 2024

jmattick closed this as completed Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isoseq collapse filtering out criteria #664

Isoseq collapse filtering out criteria #664

MengjunWu commented Mar 20, 2024

armintoepfer commented Mar 20, 2024

jmattick commented Jun 25, 2024

Isoseq collapse filtering out criteria #664

Isoseq collapse filtering out criteria #664

Comments

MengjunWu commented Mar 20, 2024

armintoepfer commented Mar 20, 2024

jmattick commented Jun 25, 2024