Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Sample NR112 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 #90

Closed
wangjie07070910 opened this issue Jan 29, 2023 · 7 comments

Comments

@wangjie07070910
Copy link

Thank you very much for such fantastic scripts, please, I have tetraploids in my sample, is it not working?
I tried to use the script for Processing VCF files,
here is my command: python VCF_processing/parseVCF.py -i input.vcf.gz --skipIndels --minQual 30 --gtf flag=DP min=5 max=50 -o output.geno.gz
Yes, there are some samples in my vcf file that are tetraploids

@simonhmartin
Copy link
Owner

Hi,
You can specify the ploidy in a file (first column sample ID, second column ploidy). Add the option --ploidyFile ploidy_file.txt

If you get errors, please post the error here so I can help diagnose it.

Simon

@wangjie07070910
Copy link
Author

Hi Simon,

Many thanks for your help, but I still encounter the error: parseVCF.py: error: argument --ploidy: invalid int value: 'ploidy_female.txt'. I presume that the content of my ploidy.txt file was not in the right format. The contents of my ploidy.txt file are as follows:
sample ID ploidy
Sample_1 2
Sample_2 4
Sample_3 4

Thanks again,
Jie

@simonhmartin
Copy link
Owner

If all of your individuals are tetraploid, you can use
--ploidy 4

If some of your individuals are diploid and some are tetraploid, use:
--ploidyFile ploidy_file.txt

@wangjie07070910
Copy link
Author

Thanks again.

I used --ploidyFile ploidy_file.txt, and The contents of my ploidy.txt file are as follows:

sample_ID ploidy
Sample_1 2
Sample_2 4
Sample_3 4

Then I got error:
ValueError: invalid literal for int() with base 10: 'ploidy'

Also, I tried the ploidy_file.txt file without the table header:

Sample_1 2
Sample_2 4
Sample_3 4

Then I got error:
IndexError: list index out of range

@simonhmartin
Copy link
Owner

Please check your ploidy file for empty lines. It sounds like the script is trying to read a line in the file that has no data in it.

@wangjie07070910
Copy link
Author

Thanks again. When I try to set my ploidy.txt file in the following format(When I turned 'ploidy' in the second column of the first row into a number), it worked

sample_ID 2
Sample_1 2
Sample_2 4
Sample_3 4

and I don't know if it has an effect. Besides, I'm having a new problem.

Error:Sample Sample_2 at Scaffold_1:1 genotype ./././. does not match explected ploidy of 2 (appears when I set Sample_2 to be a 2x.)
Error:Sample Sample_2 at Scaffold_2:25 genotype ./. does not match explected ploidy of 4 (appears when I set Sample_2 to be a 4x.)

I know it's supposed to be a problem with my sample (it's supposed to be tetraploid), but I'm putting it here and I would appreciate if you could give your opinion. How should I preprocess a sample like this.

@simonhmartin
Copy link
Owner

Yes, this is a problem with your vcf, which includes incorrect formatting for some sites. You can add the option --ploidyMismatchToMissing to set these sites to missing data.
In general, please remember that you can type parseVCF.py -h to see all the available options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants