Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using physical map sorted based on physical position and ID #26

Closed
vguerracanedo opened this issue Nov 7, 2017 · 20 comments
Closed

Comments

@vguerracanedo
Copy link

I'm running into a silly problem. My physical map file has repetitive physical positions with unique IDs. The data is sorted based on and then in . Example at the end of the message.

When I try to use iHS, I get the following problem: ERROR: Variant physical position must be strictly increasing.
rs201044430 216605 comes after rs112068709 216605
My data is already sorted so that 'rs201044430 216605' comes after 'rs112068709 216605'. So I'm not sure what to do differently.

Best,
Vanessa


Sample file

7 rs28527214 216426 216426
7 rs66644650 216512 216512
7 rs148463803 216515 216515
7 rs28485819 216569 216569
7 rs28498692 216570 216570
7 rs112068709 216605 216605
7 rs201044430 216605 216605
7 rs188651719 216660 216660
7 rs193275413 216662 216662
7 rs137869704 216672 216672
7 rs139968177 216735 216735

@szpiech
Copy link
Owner

szpiech commented Nov 9, 2017

Hi Vanessa,

Sorry for the delay in getting back to you. At the moment selscan can only handle biallelic variants, and so when multiple variants are reported at the same physical position it will throw this error, although I can see how it can be a confusing message. I think your best bet would be to filter these two sites from your dataset. I hope this helps!

-Zach

@vguerracanedo
Copy link
Author

Hi Zach,
Thank you for responding. As you noted, removing the entries with repeated physical positions did the trick.
Best,
V

@vs4223
Copy link

vs4223 commented Mar 31, 2018

Hi Zach,

My problem is along the same lines as Vanessa's so I am posting in the same thread.

I am trying to use selscan to calculate EHH scores. I have 1000 genomes vcf files which I have used to produce map files with vcftools, such that they look like this:

22 rs8142737 50291889 50291889
22 rs570182536 50291936 50291936
22 rs8135816 50291976 50291976
22 rs8140681 50292081 50292081
22 rs9627785 50292178 50292178
22 rs9616779 50292545 50292545
22 rs9616780 50292763 50292763
22 rs139397353 50292931 50292931
22 rs9616364 50292983 50292983
22 rs12159367 50293281 50293281
22 rs7290342 50294176 50294176
22 rs141187212 50294325 50294325
22 rs6520063 50294378 50294378
22 rs6520064 50294469 50294469

My selscan command is:

selscan --ehh 50292931 --vcf chr22.vcf.gz --map plink22.map --maf 0.0001 --out test.txt

This produces the following error:

ERROR: Variant physical position must be strictly increasing.
    -- -9999 comes after    -- -9999

Now I have tried to identify the problem row by trying to grep for "-9999" but get nothing. I have also tried to sort on the physical position column but get the same error. There are no blank rows at the start or end of the file.

To ensure there was no issue with my map file, I tried using different chromosomes but keep getting this error.

By the way, I have also tried using hapbin with the same files using the following command:

ehhbin --locus 50292931 --hap out.impute.hap --map <(awk '{$3=$4;print}' plink22.map)

But I always get an error:

no locus with the id: 50292931

I have checked and the locus is definitely within the .map and the .hap files (which were created from the vcf files). Therefore I think the problem must be within my map files but I cannot fathom what the issue is.

@szpiech
Copy link
Owner

szpiech commented Apr 2, 2018

So my first thought is that you should request the site by rsid and not position. Please try selscan --ehh rs139397353 --vcf chr22.vcf.gz --map plink22.map --maf 0.0001 --out test.txt and see if that works. Admittedly that doesn't seem to be a terribly useful error message that you got. I'll have to make it more informative. Please let me know if this, at least, solves your problem.

@vs4223
Copy link

vs4223 commented Apr 3, 2018

Hi Zach,

Many thanks for getting back to me. Unfortunately this does not solve the issue. I still get the exact same error. Also would using IDs not prove an issue for de novo variants that have not been assigned an ID?

@szpiech
Copy link
Owner

szpiech commented Apr 11, 2018

Sorry for the delay in getting back to you.

Yes, I think that I should modify the lookup scheme to allow for rsid or genomic position. I typically assign variants without an rsid a temporary id based on the chromosome and position, but I forget this isn't what everyone does.

Are you using a publicly accessible vcf file? I would like to try to reproduce this problem.

@szpiech
Copy link
Owner

szpiech commented Jun 1, 2020

Physical map duplicated locations are now allowed, and statistics that are integrated over a map can directly use physical positions with --pmap.

@szpiech szpiech closed this as completed Jun 1, 2020
@TimothyCiesielski
Copy link

Hi Zach,

I am new to Selscan and I am having a similar issue. I have been able to get nSL output for one chromosome but when I attempt to run whole genomes, I get this error:

ERROR: Variant physical position must be monotonically increasing.
2:10610:G:A 10610 appears after 1:248945650:C:G 248945650

example code:
selscan --nsl --vcf nameofVCFfile.vcf --out selscannSLresults

It looks like --pmap is not available for nSL . . . any thoughts?

Thanks in advance for your help (and for making Selscan user friendly),
Tim

😃

@szpiech
Copy link
Owner

szpiech commented Jun 14, 2023 via email

@TimothyCiesielski
Copy link

Thanks Zach - I appreciate the help on this.
Tim

@malteze2024
Copy link

Hello! Please tell me how to solve the problem with the sheep genome map file.
If the map file is sorted by genetic position, the program generates a physical position error and vice versa. Of the 26 chromosomes, only 12 are processed without errors.
The initial map file was generated through GenomeStudio.

for reg in $(seq 1 26) ; do selscan --xpehh --vcf phasedRMMrenchr$reg.vcf.gz --vcf-ref phasedDMrenchr$reg.vcf.gz --map MAP_sorted$reg.map --threads 12 --out 2xpEhhcheap$reg; done
selscan v2.0.0
Opening phasedRMMrenchr1.vcf.gz...
Loading 108 haplotypes and 61259 loci...
Opening phasedDMrenchr1.vcf.gz...
Loading 106 haplotypes and 61259 loci...
Opening MAP_sorted1.map...
Loading map data for 61259 loci
ERROR: Variant genetic position must be monotonically increasing.
oar3_OAR1_101700644 122.639 appears after oar3_OAR1_101688882 122.64

for reg in $(seq 1 26) ; do selscan --xpehh --vcf phasedRMMrenchr$reg.vcf.gz --vcf-ref phasedDMrenchr$reg.vcf.gz --map 2MAP_sorted$reg.map --threads 12 --out 2xpEhhcheap$reg; done
selscan v2.0.0
Opening phasedRMMrenchr1.vcf.gz...
Loading 108 haplotypes and 61259 loci...
Opening phasedDMrenchr1.vcf.gz...
Loading 106 haplotypes and 61259 loci...
Opening MAP_sorted1.map...
Loading map data for 61259 loci
ERROR: Variant physical position must be monotonically increasing.
OAR19_64803054.1 204694 appears after DU281551_498.1 315497

With best regards, Lesya

@szpiech
Copy link
Owner

szpiech commented Mar 26, 2024 via email

@malteze2024
Copy link

Thank you for such a quick response!
Apparently, I will still have to use the --pmap option.
Does using a physical map have a big impact on my results? There are 600k SNP in my file.

@szpiech
Copy link
Owner

szpiech commented Mar 27, 2024 via email

@drsancho
Copy link

hello sir

i m working on cow genome and need to run selscan for ihh12. I have phased the the 29 chromosomes into a single vcf file. I am running the command
selscan --ihh12 --vcf xyzz.vcf --map abc.map --out final

the error it is showing is variant physical position must be monotonically increasing. i am just starting my studies in bioinformatics. can you guide me how to navigate through it.

thank you

sanchit

@drsancho
Copy link

selscan problem

i have tried sorting also, it gives the similar error using command sort -nk 4 xyz.map > xyz1.map

@szpiech
Copy link
Owner

szpiech commented Apr 22, 2024 via email

@drsancho
Copy link

okay sir

@drsancho
Copy link

it is showing the same error that is variant genetic position should be monotonically increasing.
can you please help me further?

@szpiech
Copy link
Owner

szpiech commented Apr 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants