Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lineage assignment of 'none' when processing many samples #85

Closed
Mach-2 opened this issue Jun 11, 2021 · 13 comments
Closed

Lineage assignment of 'none' when processing many samples #85

Mach-2 opened this issue Jun 11, 2021 · 13 comments

Comments

@Mach-2
Copy link

Mach-2 commented Jun 11, 2021

Hello, I've been having a problem lately where the run_name_summary_qc.tsv file is failing to populate with lineage assignments. The lineage, lineage_notes, and scorpio_call columns all contain values of 'none' for all samples.

If I run fewer samples, the output seems fine. If there are ~100 or more samples being analyzed, the columns show the 'none' value. The lineage assignments in the lineage/lineage_report.csv seem to be fine regardless of how many samples I'm running ncov-tools on.

Any help with this would be appreciated! Currently I'm just doing a bit of a workaround by stealing the lineages from the lineage_report.csv and sticking them into the run_name_summary_qc.tsv file before I generate the pdf report, but I'd definitely prefer a cleaner fix.

Thanks!
Madison

@rdeborja
Copy link
Collaborator

rdeborja commented Jun 11, 2021 via email

@Mach-2
Copy link
Author

Mach-2 commented Jun 11, 2021

Yep! ncov-tools is 1.7.1, and ncov-parser should be 0.6.7. We re-made the environment at 1:00pm today with mamba straight from the environment.yml file

@rdeborja
Copy link
Collaborator

rdeborja commented Jun 12, 2021 via email

@DarianHole
Copy link
Contributor

So this doesn't appear to be a sample number issue in itself although the number of samples do seem to affect the final output for some reason. When I'm running ncov-tools on a lower number of samples (tested on 7 samples) I have no problem with the final result and see the lineage call as expected.

I tried again on a slightly larger dataset, 17 samples including the original 7, and I ran into the same issue with the lineages populating as none, (my env was made last week Friday as well I believe)
image

However I do get proper lineages populating the table when I change the config value of completeness_threshold from what we normally have it set to, which is 0.5, to the default of 0.75

So I'd hazard a guess its somewhere in the completeness_threshold parameter but the fact that the number of samples plays a role is odd

@rdeborja
Copy link
Collaborator

@DarianHole just to confirm, the _lineage_report.csv file is currently populated with the expected lineage info. The none values only occurs in the _summary_qc.tsv file correct?

@DarianHole
Copy link
Contributor

Correct! Sorry that was important info I should have included!

Lineage report from pangolin is as expected, the summary_qc file has the none values which then leads to that being in the pdf output

@rdeborja
Copy link
Collaborator

@DarianHole @Mach-2 which platform (i.e. Illumina, Oxford-nanopore) were you processing when this issue occurred?

@DarianHole
Copy link
Contributor

DarianHole commented Jun 18, 2021

I've seen it with both nanopore and the freebayes Illumina files as input. I could quickly check if the ivar Illumina files lead to a similar output though as I am sure I have a number I could quickly find.

Exact input for the nanopore:

data_root: files
amplicon_bed: amplicon.bed
primer_bed: nCoV-2019.bed
bed_type: unique_amplicons
offset: 0
reference_genome: nCoV-2019.reference.fasta
bam_pattern: "{data_root}/{sample}.sorted.bam"
consensus_pattern: "{data_root}/{sample}.consensus.fasta"
variants_pattern: "{data_root}/{sample}.pass.vcf.gz"
metadata: metadata.tsv
assign_lineages: true
platform: oxford-nanopore
completeness_threshold: 0.50

Command is just the all command for me:

snakemake -s workflow/Snakefile all

@rdeborja
Copy link
Collaborator

@DarianHole I was able to reproduce the issue with a Nanopore run. The problem is with ncov-parser and the lineage file parser. The lack of scorpio call caused the note field to be populated with Assigned from designation hash. I'm not sure if or how the number of samples affect scorpio calls, but have since fixed ncov-parser under branch issue_85. Do you mind giving it a test?

@DarianHole
Copy link
Contributor

Thanks for the quick work Richard! It seems like it may not have been a sample number issue but instead a certain type of sample whose output that was messing with the final output (so more samples makes it more likely).

I can test it right now and report back pretty quick as well

@DarianHole
Copy link
Contributor

Yep @rdeborja I can confirm that the changes to the ncov-parser fix the issue for my dataset! Thanks for the quick fix and help as always!

@rdeborja
Copy link
Collaborator

@DarianHole great, thanks for the quick turn around. I've merged and created a new release (ncov-parser 0.6.8) on pypi. @Mach-2 can you this resolved your issue as well?

@Mach-2
Copy link
Author

Mach-2 commented Jun 18, 2021

Thanks for the quick solution Richard (and thanks for helping test things, Darian)! Darian and I were describing the same issue, so I think it's fair to assume that if the fix worked for his test data set then things are good to go now.

@Mach-2 Mach-2 closed this as completed Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants