ValueError: Sample NR112 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 #90

wangjie07070910 · 2023-01-29T04:28:46Z

Thank you very much for such fantastic scripts, please, I have tetraploids in my sample, is it not working?
I tried to use the script for Processing VCF files,
here is my command: python VCF_processing/parseVCF.py -i input.vcf.gz --skipIndels --minQual 30 --gtf flag=DP min=5 max=50 -o output.geno.gz
Yes, there are some samples in my vcf file that are tetraploids

simonhmartin · 2023-01-29T07:00:29Z

Hi,
You can specify the ploidy in a file (first column sample ID, second column ploidy). Add the option --ploidyFile ploidy_file.txt

If you get errors, please post the error here so I can help diagnose it.

Simon

wangjie07070910 · 2023-01-30T08:59:04Z

Hi Simon,

Many thanks for your help, but I still encounter the error: parseVCF.py: error: argument --ploidy: invalid int value: 'ploidy_female.txt'. I presume that the content of my ploidy.txt file was not in the right format. The contents of my ploidy.txt file are as follows:
sample ID ploidy
Sample_1 2
Sample_2 4
Sample_3 4

Thanks again,
Jie

simonhmartin · 2023-01-30T09:44:13Z

If all of your individuals are tetraploid, you can use
--ploidy 4

If some of your individuals are diploid and some are tetraploid, use:
--ploidyFile ploidy_file.txt

wangjie07070910 · 2023-01-30T10:08:30Z

Thanks again.

I used --ploidyFile ploidy_file.txt, and The contents of my ploidy.txt file are as follows:

sample_ID ploidy
Sample_1 2
Sample_2 4
Sample_3 4

Then I got error:
ValueError: invalid literal for int() with base 10: 'ploidy'

Also, I tried the ploidy_file.txt file without the table header:

Sample_1 2
Sample_2 4
Sample_3 4

Then I got error:
IndexError: list index out of range

simonhmartin · 2023-01-31T19:52:44Z

Please check your ploidy file for empty lines. It sounds like the script is trying to read a line in the file that has no data in it.

wangjie07070910 · 2023-02-01T01:18:03Z

Thanks again. When I try to set my ploidy.txt file in the following format（When I turned 'ploidy' in the second column of the first row into a number）, it worked

sample_ID 2
Sample_1 2
Sample_2 4
Sample_3 4

and I don't know if it has an effect. Besides, I'm having a new problem.

Error:Sample Sample_2 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 （appears when I set Sample_2 to be a 2x.）
Error:Sample Sample_2 at Scaffold_2:25 genotype ./. does not match explected ploidy of 4 （appears when I set Sample_2 to be a 4x.）

I know it's supposed to be a problem with my sample (it's supposed to be tetraploid), but I'm putting it here and I would appreciate if you could give your opinion. How should I preprocess a sample like this.

simonhmartin · 2023-02-01T12:47:30Z

Yes, this is a problem with your vcf, which includes incorrect formatting for some sites. You can add the option --ploidyMismatchToMissing to set these sites to missing data.
In general, please remember that you can type parseVCF.py -h to see all the available options.

simonhmartin closed this as completed May 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Sample NR112 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 #90

ValueError: Sample NR112 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 #90

wangjie07070910 commented Jan 29, 2023

simonhmartin commented Jan 29, 2023

wangjie07070910 commented Jan 30, 2023

simonhmartin commented Jan 30, 2023

wangjie07070910 commented Jan 30, 2023

simonhmartin commented Jan 31, 2023

wangjie07070910 commented Feb 1, 2023

simonhmartin commented Feb 1, 2023

ValueError: Sample NR112 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 #90

ValueError: Sample NR112 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 #90

Comments

wangjie07070910 commented Jan 29, 2023

simonhmartin commented Jan 29, 2023

wangjie07070910 commented Jan 30, 2023

simonhmartin commented Jan 30, 2023

wangjie07070910 commented Jan 30, 2023

simonhmartin commented Jan 31, 2023

wangjie07070910 commented Feb 1, 2023

simonhmartin commented Feb 1, 2023