-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with ReLERNN_SIMULATE #7
Comments
Hi there! Sorry that you found a potential bug! Can you check to see if a file named |
Hi Jeffrey, Thanks for getting back to me so quick! Here's the first 1000 lines of my VCF file |
Hmm, the vcf you sent me was missing the header. Is this the first ~1000 lines from the file you tried to run ReLERNN on? Higher up in your original error message do you see |
Ah my bad, sorry I removed the header. Here's the first 1000 lines with the header! |
No problem! Thanks for sending this, I'll post a resolution ASAP. |
OK, I think I fixed the problem. The vcf you sent me is now running without issue on my end. Can you try pulling the changes/reinstalling and then give it another go? Please let me know if this doesn't resolve your issues. |
Hi Jeffrey, Great! I have a windows.sizes file now! Although some of my chromosomes now have a sample size of 44, and two chromosomes are 22 ...
Thanks again for your help with this! Let me know if you would like any more info/files etc |
Hmm... Well n=44 is the correct number of haploid chromosomes (22 diploid samples), at least in the file you sent me. The two contigs showing n=22 is what I will need to look at. I'm assuming these are hemizygous sex chromosomes? Can you check to see how these samples are encoded? Looking at the VCFv4.2 specification, I'm not seeing any standard for encoding hemizygous chromosomes. I believe ReLERNN should be able to handle them as if they were autosomes with missing data if they are encoded (1/.), but I will need to double check this. Currently, the best thing to do might be to remove them from the original VCF file and run them separately. Could you send me some lines from at least one of the chromosomes that is reporting n=22 in |
Just got your files! I'll post a resolution as soon as I have one. Thanks! |
OK, I was able to run ReLERNN successfully on my end with the updated files I sent you. Please let me know if you are still getting errors. Thanks for your patience! |
I can confirm that this issue's now fixed! Thanks @jradrion |
Hi @jradrion I appreciate your help and many thanks in advance |
Hi @gavinmonahan, thank you for bringing this to my attention. I'll take a look and get back to you ASAP. |
@gavinmonahan, I just took a quick look at your VCF and it looks like there are a total of five scaffolds (B01-B05) and a combined total of 618 polymorphic sites between them? If this is correct I don't think ReLERNN is going to be of too much help. Are you sure the file you sent is correct? For example, it looks like scaffold B04 has a total of 62 variants with the max coordinate suggesting a scaffold length of at least 41Mb? |
@jradrion thanks for having a look. Yes that's correct. Is that too few polymorphic sites for ReLERNN? There were originally 1354 sites, however I removed sites where there were over 50% no calls. Would any of that explain the error I got when trying to run ReLERNN? |
@gavinmonahan It's not strictly too few polymorphic sites, but we suggest tempering any conclusions based on predictions from genomic windows with fewer than 200 sites, which in this case would be the entirety of your genome. I think the error you were getting is based on a real bug, though I suspect the bug is directly related to trying to run ReLERNN on so few sites. I'm trying to pin this down now. |
@gavinmonahan, The errors are, at least in part, due to how you created your VCF. I've noticed two problems, although there may be more, that are causing errors when scikit-allel attempts to parse your file. We use scikit-allel for all parsing of the VCF so you'll have to make sure your file conforms to their requirements.
I'm going to close this issue for now, but I'm happy to reopen it if you are still getting errors using a VCF that can be parsed with scikit-allel. |
@jradrion I'm glad it's just a problem with my VCF! Thanks again for looking into this and finding the errors. I will update my VCFs and hopefully that will solve the issue. |
No problem, @gavinmonahan. Hopefully this will resolve the issue with the program running without error. Unfortunately it won't change that fact that predictions on genomic windows with such extremely low SNP density are going to be unreliable. |
Hello I think I may be having a similar issue. I’ve removed all the hemizygous/haploid chromosomes from my vcf and my windowSizes file only has chromosomes with sample size 6 However, I am getting the following error: I’m including the information I used for this run (first 50k lines of vcf). Is this also an issue with the vcf for scikit-allel? Any help would be appreciated! |
Hi!
Really excited about using ReLERNN to estimate recombination in some natural data with a low-ish sample size (n22) and also to have a go on some poolseq data too
Just tried to run on my natural data, and I get the following error message when reading the hd5f files
My vcf file is pretty standard, although there is some missing data, and I'm running ReLERNN like this:
ReLERNN_SIMULATE -v paria_marianne_1027798.final.vcf -g STAR.extents.bed -m STAR.chromosomes.release.repeats.bed -d ReLERNN/ -u 4.8e-8 --unphased
I checked the vcf files generated in the first step of the script, and they all have the same number of samples:
for i in *vcf; do bcftools query -l $i | wc -l; done | sort | uniq
22
The text was updated successfully, but these errors were encountered: