Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate pandora failing to map reads at the edge of a gene #316

Open
leoisl opened this issue Dec 20, 2022 · 3 comments
Open

Investigate pandora failing to map reads at the edge of a gene #316

leoisl opened this issue Dec 20, 2022 · 3 comments

Comments

@leoisl
Copy link
Collaborator

leoisl commented Dec 20, 2022

This is an example from drprg, Michael's data. Pandora discover and map output here: /hps/nobackup/iqbal/mbhall/drprg/paper/results/amr_predictions/drprg/illumina/PRJNA436454/SAMN08626350/SRR6824314.

We are looking at the pncA gene here. It is a gene that has lots of variants at the start (20+, complex structure on the left), and then gets simple. Most of the variants are alleles with ~500 bps, but there is one that is just 1 bp, i.e. a long deletion with respect to all other alleles. The sample should genotype towards this 1-bp allele.

image

In the PRG file (omitted here), this complex structure is described in the first site, which spans 10250 bps of the PRG. At position 10251 we are past this complex structure (i.e. we are at the red 29-bp node). Looking at the pandora SAM file (/hps/nobackup/iqbal/mbhall/drprg/paper/results/amr_predictions/drprg/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.filtered.sam), we are able to map 379 reads, but all of the mappings start after position 10250, i.e. past the complex structure. Thus, we don't map any reads to the complex structure, and to the 1-bp deletion we should genotype to.

If we map the sample reads using minimap2 against the gene's denovo sequence (which contains the 1-bp deletion we should find), we are able to map 623 reads, 244 reads more. In the following IGV plot, we can see that the average depth in ~300x, and the first bp, that pandora could not map, has a count of 213 reads, all calling G, no indels or errors:

image

This is a very good example to improve the mapping on the edge of the genes. We have the SAM files to debug, but it would be even better to have the 4 other mapping debugging files too (we can easily rerun the command line looking at the @PG header line of the pandora SAM). Reads are illumina and they perfectly map to the edge of the gene

@mbhall88
Copy link
Member

Nice work as always @leoisl. FYI, I have moved that directory to /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_without_debug as it will get overwritten next time I run drprg.

I have also copied the current drprg index (which contains PRG) to /hps/nobackup/iqbal/mbhall/pandora_issue316/index as this might also be overwritten soon.

Regarding getting the extra debugging files, I know you can just rerun pandora separately; as you say, you can just use the command in the header. But I have just rerun drprg with the --debug flag, which uses the debugging files flag in pandora, and you can find that in /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug

If you want to reproduce any of this, it was done using this container docker://quay.io/mbhall88/drprg:c7d848e

With command

index=/hps/nobackup/iqbal/mbhall/pandora_issue316/index
reads=/hps/nobackup/iqbal/mbhall/drprg/paper/results/filtered/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.gz
outdir=/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug
sample=SRR6824314
uri="docker://quay.io/mbhall88/drprg:c7d848e"
cd /hps/nobackup/iqbal/mbhall/pandora_issue316
singularity exec -B /hps/scratch "$uri" drprg predict --sample $sample -v --ignore-synonymous -I -f 0.1 --max-gaps 0.3 -d 3 -b 0.01 -g 5 -K 0.51 -i "$reads" -o "$outdir" -x "$index" -t 2 --debug

Here is the log showing the commands that were run in case you want to reproduce any of them (this is using the pandora executable in the container, which is v0.10.0-alpha.0)

[2022-12-20T03:44:38Z DEBUG] Cli { verbose: true, threads: 2, cmd: Predict(Predict { pandora_exec: None, makeprg_exec: None, mafft_exec: None, index: "/hps/nobackup/iqbal/mbhall/pandora_issue316/index", input: "/hps/nobackup/iqbal/mbhall/drprg/paper/results/filtered/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.gz", outdir: "/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug", sample: Some("SRR6824314"), is_illumina: true, min_allele_freq: 0.1, max_gaps: 0.3, ignore_synonymous: true, filterer: Filterer { min_covg: 3, max_covg: 2147483647, min_strand_bias: 0.01, min_gt_conf: 5.0, max_indel: None, min_frs: 0.51 }, pandora_min_cluster_size: 10, debug: true }) }
[2022-12-20T03:44:38Z DEBUG] Index is valid
[2022-12-20T03:44:38Z INFO ] Discovering variants...
[2022-12-20T03:44:38Z DEBUG] Running pandora discover...
[2022-12-20T03:46:25Z DEBUG] Ran: /usr/bin/pandora discover -g 4411532 -v -o /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/discover -t 2 -w 14 -k 15 -c 10 -I -K /hps/nobackup/iqbal/mbhall/pandora_issue316/index/dr.prg /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/query.tsv
[2022-12-20T03:46:25Z DEBUG] Updating the PRGs with novel variants
[2022-12-20T03:46:25Z DEBUG] 8 genes to update with novel variants: ["pncA", "inhA", "ethA", "gid", "rpoB", "gyrA", "embA", "embB"]
[2022-12-20T03:46:25Z DEBUG] Updating MSA for pncA
[2022-12-20T03:46:28Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/pncA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/pncA.fa
[2022-12-20T03:46:28Z DEBUG] Updating MSA for inhA
[2022-12-20T03:46:29Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/inhA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/inhA.fa
[2022-12-20T03:46:29Z DEBUG] Updating MSA for ethA
[2022-12-20T03:46:29Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/ethA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/ethA.fa
[2022-12-20T03:46:29Z DEBUG] Updating MSA for gid
[2022-12-20T03:46:31Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/gid.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/gid.fa
[2022-12-20T03:46:31Z DEBUG] Updating MSA for rpoB
[2022-12-20T03:46:32Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/rpoB.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/rpoB.fa
[2022-12-20T03:46:32Z DEBUG] Updating MSA for gyrA
[2022-12-20T03:46:33Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/gyrA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/gyrA.fa
[2022-12-20T03:46:33Z DEBUG] Updating MSA for embA
[2022-12-20T03:46:34Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/embA.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/embA.fa
[2022-12-20T03:46:34Z DEBUG] Updating MSA for embB
[2022-12-20T03:46:34Z DEBUG] Ran: /usr/bin/mafft --auto --thread -1 --quiet --add /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/tmp_msas/embB.consensus.fa /hps/nobackup/iqbal/mbhall/pandora_issue316/index/msas/embB.fa
[2022-12-20T03:46:57Z DEBUG] Ran: /usr/bin/make_prg from_msa -t 2 -L 5 -N 5 -v --force -o updated -i /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/update_msas
[2022-12-20T03:46:57Z DEBUG] make_prg update successfully ran
[2022-12-20T03:46:57Z DEBUG] Indexing updated PRG with pandora index...
[2022-12-20T03:46:58Z DEBUG] Ran: /usr/bin/pandora index -t 2 -w 14 -k 15 /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/updated.dr.prg
[2022-12-20T03:46:58Z INFO ] Genotyping reads against the panel with pandora
[2022-12-20T03:48:17Z DEBUG] Ran: /usr/bin/pandora map --genotype --local -v -o /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug -g 4411532 --vcf-refs /hps/nobackup/iqbal/mbhall/pandora_issue316/index/genes.fa -t 2 -w 14 -k 15 -c 10 -I /hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/updated.dr.prg /hps/nobackup/iqbal/mbhall/drprg/paper/results/filtered/illumina/PRJNA436454/SAMN08626350/SRR6824314/SRR6824314.filtered.fq.gz
[2022-12-20T03:48:17Z INFO ] Successfully genotyped reads
[2022-12-20T03:48:17Z INFO ] Making predictions from variants...
[2022-12-20T03:48:18Z DEBUG] Predictions written to VCF "/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/SRR6824314.drprg.bcf"
[2022-12-20T03:48:18Z DEBUG] Loading the panel...
[2022-12-20T03:48:18Z DEBUG] Loaded panel
[2022-12-20T03:48:18Z INFO ] Prediction report written to JSON file "/hps/nobackup/iqbal/mbhall/pandora_issue316/predict_with_debug/SRR6824314.drprg.json"
[2022-12-20T03:48:18Z INFO ] Done!

Let me know if you want/need anything else

@leoisl
Copy link
Collaborator Author

leoisl commented Dec 20, 2022

Thanks for this Michael! I can see the debugging files are being created when running pandora discover, but it is also important to generate these files when running pandora map post discover, as this is when we would like to debug why we are not mapping to the deletion allele.

I created another dir alongside your predict_with_debug containing the minimal input to reproduce this issue: /hps/nobackup/iqbal/mbhall/pandora_issue316/map_with_debug_minimal_input. It has just the VCF ref for pncA, the update PRG for pncA, and the reads that minimap2 mapped to pncA. I copied the command line to run pandora map from your runs, you can see it in the command_line.sh. 0 reads mapped to the first complex pncA site, but this minimal input is better to run in my local in debug mode as it runs in 0.2 seconds. I will try to understand better why we have this bug, and how hard would be to solve it, but I am a bit on and off in these festive days!

@mbhall88
Copy link
Member

Oh my bad, I'll add the debug stuff to map as well for future. No rush, as you say, it's holiday season

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants