Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging in LD Map.. results in error #8

Closed
batelz opened this issue Aug 1, 2023 · 3 comments
Closed

Merging in LD Map.. results in error #8

batelz opened this issue Aug 1, 2023 · 3 comments

Comments

@batelz
Copy link

batelz commented Aug 1, 2023

Hi again.
This issue can be merged with #5.

I've used GLIMPSE to impute two samples, and resulted with the following vcf file:

(base) batel:~/glimpse/Levant_BA$ bcftools view merged.vcf.gz | head -20
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=01/08/2023 - 11:43:29
##source=GLIMPSE_phase v2.0.0
##contig=<ID=chr22>
##INFO=<ID=RAF,Number=A,Type=Float,Description="ALT allele frequency in the reference panel">
##INFO=<ID=AF,Number=A,Type=Float,Description="ALT allele frequency computed from DS/GP field across target samples">
##INFO=<ID=INFO,Number=A,Type=Float,Description="IMPUTE info quality score for diploid samples">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Phased genotypes">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Genotype posteriors">
##NMAIN=15
##FPLOIDY=2
##bcftools_mergeVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_mergeCommand=merge -l GLIMPSE_ligate/samples_merge.txt -o merged.vcf.gz; Date=Tue Aug  1 13:03:00 2023
##bcftools_viewVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_viewCommand=view merged.vcf.gz; Date=Tue Aug  1 13:04:29 2023
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	S10769.SN	S10770.SN
chr22	10519276	22:10519276:G:C	G	C	.	.	RAF=0.0377889;AF=0.0572749;INFO=1	GT:DS:GP	0|0:0.106:0.896,0.101,0.003	0|0:0.115:0.888,0.108,0.004
chr22	10519389	22:10519389:T:C	T	C	.	.	RAF=0.0376327;AF=0.0567751;INFO=1	GT:DS:GP	0|0:0.104:0.898,0.098,0.004	0|0:0.114:0.889,0.107,0.004

When trying to convert to HD5 using

from ancIBD.IO.prepare_h5 import vcf_to_1240K_hdf
ch = 22
#base_path = f"/n/groups/reich/hringbauer/git/hapBLOCK"
vcf_to_1240K_hdf(#in_vcf_path = f"./data/vcf.raw/example_hazelton_chr{ch}.vcf.gz",
    in_vcf_path = f"./data/vcf.raw/merged_chr22_GT_GP.vcf.gz",
    path_vcf = f"./data/vcf.1240k/example_hazelton_chr{ch}.vcf.gz",
                 path_h5 = f"./data/hdf5/example_hazelton_chr{ch}.h5",
                 marker_path = f"./data/filters/snps_bcftools_ch{ch}.csv",
                 map_path = f"./data/v51.1_1240k.snp",
                 af_path = f"./data/afs/v51.1_1240k_AF_ch{ch}.tsv",
                 col_sample_af = "",
                 buffer_size=20000, chunk_width=8, chunk_length=20000,
                 ch=ch)

I get the following error:

Print downsampling to 1240K...
Finished BCF tools filtering.
Converting to HDF5...
Finished conversion to hdf5!
Merging in LD Map..
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_7983/2495295663.py in <module>
      2 ch = 22
      3 #base_path = f"/n/groups/reich/hringbauer/git/hapBLOCK"
----> 4 vcf_to_1240K_hdf(#in_vcf_path = f"./data/vcf.raw/example_hazelton_chr{ch}.vcf.gz",
      5     in_vcf_path = f"./data/vcf.raw/merged_chr22_GT_GP.vcf.gz",
      6     path_vcf = f"./data/vcf.1240k/example_hazelton_chr{ch}.vcf.gz",

~/anaconda3/lib/python3.9/site-packages/ancIBD/IO/prepare_h5.py in vcf_to_1240K_hdf(in_vcf_path, path_vcf, path_h5, marker_path, map_path, af_path, col_sample_af, chunk_length, chunk_width, buffer_size, ch)
    125     print("Merging in LD Map..")
    126     if len(map_path)>0:
--> 127         merge_in_ld_map(path_h5=path_h5, 
    128                     path_snp1240k=map_path,
    129                     chs=[ch])

~/anaconda3/lib/python3.9/site-packages/ancIBD/IO/h5_modify.py in merge_in_ld_map(path_h5, path_snp1240k, chs, write_mode)
    115     chs: Which Chromosomes to merge in HDF5 [list].
    116     write_mode: Which mode to use on hdf5. a: New field. r+: Change Field"""
--> 117     with h5py.File(path_h5, "r") as f:
    118         print("Lifting LD Map from eigenstrat to HDF5...")
    119         print("Loaded %i variants." % np.shape(f["calldata/GT"])[0])

~/anaconda3/lib/python3.9/site-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, **kwds)
    531                                  fs_persist=fs_persist, fs_threshold=fs_threshold,
    532                                  fs_page_size=fs_page_size)
--> 533                 fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    534 
    535             if isinstance(libver, tuple):

~/anaconda3/lib/python3.9/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    224         if swmr and swmr_support:
    225             flags |= h5f.ACC_SWMR_READ
--> 226         fid = h5f.open(name, flags, fapl=fapl)
    227     elif mode == 'r+':
    228         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = './data/hdf5/example_hazelton_chr22.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

The directory does exist, as I get no error when running:
os.listdir("./data/hdf5/")
and I've also tried writing the absolute path.

Moreover, when running the example you provided -- just by switching the in_vcf it runs perfectly fine.
So it must be something in the vcf file iteslf.

Any ideas?

Batel

@hringbauer
Copy link
Owner

It seems like the output hdf5 file is never produced, and then the last steps (adding map position and allele frequencies to this file) fail because the program cannot find this file.

So the error occurs before that - either when producing the 1240k-filtered VCF or the initial output hdf5 file.

Is the intermediate 1240k VCF produced? It should be at f"./data/vcf.1240k/example_hazelton_chr{ch}.vcf.gz"

@batelz
Copy link
Author

batelz commented Aug 21, 2023

The file is there, but it's empty:

(base):~/ancIBD/data/vcf.1240k$ bcftools view example_hazelton_chr22.vcf.gz
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=01/08/2023 - 11:43:29
##source=GLIMPSE_phase v2.0.0
##contig=<ID=chr22>
##INFO=<ID=RAF,Number=A,Type=Float,Description="ALT allele frequency in the reference panel">
##INFO=<ID=AF,Number=A,Type=Float,Description="ALT allele frequency computed from DS/GP field across target samples">
##INFO=<ID=INFO,Number=A,Type=Float,Description="IMPUTE info quality score for diploid samples">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Phased genotypes">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Genotype posteriors">
##NMAIN=15
##FPLOIDY=2
##bcftools_mergeVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_mergeCommand=merge -l GLIMPSE_ligate/samples_merge.txt -o merged.vcf.gz; Date=Tue Aug  1 13:03:00 2023
##bcftools_annotateVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_annotateCommand=annotate -x ^FORMAT/GT,FORMAT/GP -o GLIMPSE_ligate/merged_chr22_GT_GP.vcf.gz merged.vcf.gz; Date=Tue Aug  1 13:04:43 2023
##bcftools_viewVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_viewCommand=view -Oz -o ./data/vcf.1240k/example_hazelton_chr22.vcf.gz -T ./data/filters/snps_bcftools_ch22.csv -M2 -v snps ./data/vcf.raw/merged_chr22_GT_GP.vcf.gz; Date=Tue Aug  1 13:28:53 2023
##bcftools_viewCommand=view example_hazelton_chr22.vcf.gz; Date=Mon Aug 21 13:20:39 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  S10769.SN       S10770.SN

@hringbauer
Copy link
Owner

hringbauer commented Aug 21, 2023

Then I would recommend manually running the bcftools command that creates that intermediary 1240k VCF - and trouble-shoot what goes wrong (nothing matches the SNP filter is the obvious first thing to look into).

Indeed, I see from the above that you have chr22 notation, but the filter file has 22 only (without the chr). You should transform your VCF accordingly to match the latter notation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants