Lineage assignment of 'none' when processing many samples #85

Mach-2 · 2021-06-11T21:54:54Z

Hello, I've been having a problem lately where the run_name_summary_qc.tsv file is failing to populate with lineage assignments. The lineage, lineage_notes, and scorpio_call columns all contain values of 'none' for all samples.

If I run fewer samples, the output seems fine. If there are ~100 or more samples being analyzed, the columns show the 'none' value. The lineage assignments in the lineage/lineage_report.csv seem to be fine regardless of how many samples I'm running ncov-tools on.

Any help with this would be appreciated! Currently I'm just doing a bit of a workaround by stealing the lineages from the lineage_report.csv and sticking them into the run_name_summary_qc.tsv file before I generate the pdf report, but I'd definitely prefer a cleaner fix.

Thanks!
Madison

The text was updated successfully, but these errors were encountered:

rdeborja · 2021-06-11T22:03:34Z

Can you tell me the version of ncov-tools and ncov-parser being used?

…

On Jun 11, 2021, at 5:55 PM, Madison Chapel ***@***.***> wrote: Hello, I've been having a problem lately where the run_name_summary_qc.tsv file is failing to populate with lineage assignments. The lineage, lineage_notes, and scorpio_call columns all contain values of 'none' for all samples. If I run fewer samples, the output seems fine. If there are ~100 or more samples being analyzed, the columns show the 'none' value. The lineage assignments in the lineage/lineage_report.csv seem to be fine regardless of how many samples I'm running ncov-tools on. Any help with this would be appreciated! Currently I'm just doing a bit of a workaround by stealing the lineages from the lineage_report.csv and sticking them into the run_name_summary_qc.tsv file before I generate the pdf report, but I'd definitely prefer a cleaner fix. Thanks! Madison — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#85>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAV7CQLQ7IYBNO2QKWUY3TTSKA3VANCNFSM46RRK4MQ>.

Mach-2 · 2021-06-11T22:25:07Z

Yep! ncov-tools is 1.7.1, and ncov-parser should be 0.6.7. We re-made the environment at 1:00pm today with mamba straight from the environment.yml file

rdeborja · 2021-06-12T01:05:55Z

I just re-created my environment and made sure to match yours with the latest version. I ran it on 158 samples and all three columns (i.e. lineage, lineage_notes, scorpio_call) correctly showed the values in the lineage_report.csv file. I ran it with usher and pangolearn as the pangolin inference engine and both populated the fields in the _summary_qc.tsv file correctly. Were there any errors in the snakemake log file by chance or did everything complete successfully?

…

On Jun 11, 2021, at 6:25 PM, Madison Chapel ***@***.***> wrote: Yep! ncov-tools is 1.7.1, and ncov-parser should be 0.6.7. We re-made the environment at 1:00pm today with mamba straight from the environment.yml file — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAV7CTLCUTZA767MCD7433TSKENBANCNFSM46RRK4MQ>.

DarianHole · 2021-06-17T18:32:52Z

So this doesn't appear to be a sample number issue in itself although the number of samples do seem to affect the final output for some reason. When I'm running ncov-tools on a lower number of samples (tested on 7 samples) I have no problem with the final result and see the lineage call as expected.

I tried again on a slightly larger dataset, 17 samples including the original 7, and I ran into the same issue with the lineages populating as none, (my env was made last week Friday as well I believe)

However I do get proper lineages populating the table when I change the config value of completeness_threshold from what we normally have it set to, which is 0.5, to the default of 0.75

So I'd hazard a guess its somewhere in the completeness_threshold parameter but the fact that the number of samples plays a role is odd

rdeborja · 2021-06-17T18:59:12Z

@DarianHole just to confirm, the _lineage_report.csv file is currently populated with the expected lineage info. The none values only occurs in the _summary_qc.tsv file correct?

DarianHole · 2021-06-17T19:12:05Z

Correct! Sorry that was important info I should have included!

Lineage report from pangolin is as expected, the summary_qc file has the none values which then leads to that being in the pdf output

rdeborja · 2021-06-18T12:48:19Z

@DarianHole @Mach-2 which platform (i.e. Illumina, Oxford-nanopore) were you processing when this issue occurred?

DarianHole · 2021-06-18T13:14:37Z

I've seen it with both nanopore and the freebayes Illumina files as input. I could quickly check if the ivar Illumina files lead to a similar output though as I am sure I have a number I could quickly find.

Exact input for the nanopore:

data_root: files
amplicon_bed: amplicon.bed
primer_bed: nCoV-2019.bed
bed_type: unique_amplicons
offset: 0
reference_genome: nCoV-2019.reference.fasta
bam_pattern: "{data_root}/{sample}.sorted.bam"
consensus_pattern: "{data_root}/{sample}.consensus.fasta"
variants_pattern: "{data_root}/{sample}.pass.vcf.gz"
metadata: metadata.tsv
assign_lineages: true
platform: oxford-nanopore
completeness_threshold: 0.50

Command is just the all command for me:

snakemake -s workflow/Snakefile all

rdeborja · 2021-06-18T13:37:59Z

@DarianHole I was able to reproduce the issue with a Nanopore run. The problem is with ncov-parser and the lineage file parser. The lack of scorpio call caused the note field to be populated with Assigned from designation hash. I'm not sure if or how the number of samples affect scorpio calls, but have since fixed ncov-parser under branch issue_85. Do you mind giving it a test?

DarianHole · 2021-06-18T13:46:35Z

Thanks for the quick work Richard! It seems like it may not have been a sample number issue but instead a certain type of sample whose output that was messing with the final output (so more samples makes it more likely).

I can test it right now and report back pretty quick as well

DarianHole · 2021-06-18T13:58:46Z

Yep @rdeborja I can confirm that the changes to the ncov-parser fix the issue for my dataset! Thanks for the quick fix and help as always!

rdeborja · 2021-06-18T14:17:18Z

@DarianHole great, thanks for the quick turn around. I've merged and created a new release (ncov-parser 0.6.8) on pypi. @Mach-2 can you this resolved your issue as well?

Mach-2 · 2021-06-18T14:25:49Z

Thanks for the quick solution Richard (and thanks for helping test things, Darian)! Darian and I were describing the same issue, so I think it's fair to assume that if the fix worked for his test data set then things are good to go now.

Mach-2 closed this as completed Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lineage assignment of 'none' when processing many samples #85

Lineage assignment of 'none' when processing many samples #85

Mach-2 commented Jun 11, 2021

rdeborja commented Jun 11, 2021 via email

Mach-2 commented Jun 11, 2021

rdeborja commented Jun 12, 2021 via email

DarianHole commented Jun 17, 2021

rdeborja commented Jun 17, 2021

DarianHole commented Jun 17, 2021

rdeborja commented Jun 18, 2021

DarianHole commented Jun 18, 2021 •

edited

Loading

rdeborja commented Jun 18, 2021

DarianHole commented Jun 18, 2021

DarianHole commented Jun 18, 2021

rdeborja commented Jun 18, 2021

Mach-2 commented Jun 18, 2021

Lineage assignment of 'none' when processing many samples #85

Lineage assignment of 'none' when processing many samples #85

Comments

Mach-2 commented Jun 11, 2021

rdeborja commented Jun 11, 2021 via email

Mach-2 commented Jun 11, 2021

rdeborja commented Jun 12, 2021 via email

DarianHole commented Jun 17, 2021

rdeborja commented Jun 17, 2021

DarianHole commented Jun 17, 2021

rdeborja commented Jun 18, 2021

DarianHole commented Jun 18, 2021 • edited Loading

rdeborja commented Jun 18, 2021

DarianHole commented Jun 18, 2021

DarianHole commented Jun 18, 2021

rdeborja commented Jun 18, 2021

Mach-2 commented Jun 18, 2021

DarianHole commented Jun 18, 2021 •

edited

Loading