## Create a MWE for susie_rss

### Old way of creating the z file from mfi files not longer used

The idea of this notebook is to create a rough idea of the file handling and formatting that has to be done to streamline the fine-mapping process

## 1. Rename bgen variants to chr:pos:ref:alt

One major issue is that the UK Biobank released the genotype and imputed data using rsid as the variant indentification, whereas for WES data we used the convention of CHR:POS:REF:ALT in our variant id's given the multiple known problems of the rsids. 

Manipulating the bgen files and editing the variant ids is not easy with available python libraries. Therefore, my preferred method is to use a combination of bgenix and plink to do format and file manipulation

### First, get the region from the UKB imputed data

Use this pipeline `113022_bgenix_ldblocks.ipynb` to get the bgen and bgi files for each LD independent region of the genome based on UKB imputed data

```
sos run ~/UKBB_GWAS_dev/workflow/113022_bgenix_ldblocks.ipynb \
    bgenix\
    --cwd test\
    --genofile_prefix test/ukb_imp_chr\
    --genofile_suffix _v3.bgen\
    --region_file data/ldblocks/EUR/fourier_ls-all.bed\
    --job_size 10
```

## Using the original bgen without liftover

In [69]:
module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_44969183_46899501.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/test_ldstore/01_44969183_46899501_hg19 \
--maf 0.01 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/test_ldstore/01_44969183_46899501_hg19.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_44969183_46899501.bgen ref-first
  --export bgen-1.2 bits=8 ref-first
  --maf 0.01
  --make-just-bim
  --new-id-max-allele-len 100
  --out /home/dmc2245/test_ldstore/01_44969183_46899501_hg19
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a
  --write-snplist

Start time: Wed Dec 13 15:24:37 2023
257481 MiB RAM detected, ~209264 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 59885 variants detected, format v1.2.
487409 sample

In [56]:
import pandas as pd
snplist = pd.read_csv('~/test_ldstore/01_44969183_46899501_hg19.snplist', header=None, sep='\t', names=["rsid"])

In [57]:
snplist.head()

Unnamed: 0,rsid
0,1:44969590:C:T
1,1:44969691:G:T
2,1:44969939:C:T
3,1:44970214:TG:T
4,1:44970950:G:T


In [58]:
snplist[['chromosome', 'position', 'allele1', 'allele2']] = snplist['rsid'].str.split(':', expand=True)

In [59]:
#snplist['chromosome'] = snplist.chromosome.astype(str).str.zfill(2)

In [60]:
#snplist['rsid'] = snplist['rsid'].str.replace(r'^(\d:)', lambda x: 'chr0' + x.group(1), regex=True)

In [61]:
#snplist['rsid'] = snplist['rsid'].str.extract(r'^(\d+:\d+)')

In [62]:
snplist.head()

Unnamed: 0,rsid,chromosome,position,allele1,allele2
0,1:44969590:C:T,1,44969590,C,T
1,1:44969691:G:T,1,44969691,G,T
2,1:44969939:C:T,1,44969939,C,T
3,1:44970214:TG:T,1,44970214,TG,T
4,1:44970950:G:T,1,44970950,G,T


In [63]:
snplist[["rsid","chromosome", "position", "allele1", "allele2"]].to_csv('~/test_ldstore/01_44969183_46899501_hg19.z', header=True, index=False, sep= ' ')

In [65]:
bgenix -g ~/test_ldstore/01_44969183_46899501_hg19.bgen -index


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

bgenix: creating index for "/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.bgen" in "/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.bgen.bgi"...
bgenix: Opened "/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.bgen" with 5076 variants...
Building BGEN index                                         : [******************************] (5076/5076,0.8s,6570.8/s)

Thank you for using bgenix.


In [66]:
module load  Bgenix/1.1.8
bgenix -g /home/dmc2245/test_ldstore/01_44969183_46899501.bgen -list  -incl-range 1:44969183-44969383


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

Building query                                              :  (3/?,0.0s,22193.8/s)
# bgenix: started 2023-12-13 14:14:15
alternate_ids	rsid	chromosome	position	number_of_alleles	first_allele	alternative_alleles
.	01:44969310:A:G	1	44969310	2	A	G
.	01:44969328:C:T	1	44969328	2	C	T
.	01:44969330:C:G	1	44969330	2	C	G
# bgenix: success, total 3 variants.

Thank you for using bgenix.


In [28]:
bgenix -g /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_44969183_46899501.bgen -list -incl-range 01:44969183-44969383 


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

Building query                                              :  (3/?,0.0s,20837.7/s)
# bgenix: started 2023-12-13 13:47:22
alternate_ids	rsid	chromosome	position	number_of_alleles	first_allele	alternative_alleles
1:44969310_A_G	rs777318309	01	44969310	2	A	G
1:44969328_C_T	rs141298448	01	44969328	2	C	T
1:44969330_C_G	rs181919981	01	44969330	2	C	G
# bgenix: success, total 3 variants.

Thank you for using bgenix.


In [47]:
import glob
import pandas as pd
bgen=glob.glob(r"/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.bgen")
masterfile = pd.DataFrame({
    "z" : glob.glob(r"/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.z"),
    "bgen" : glob.glob(r"/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.bgen"),
    "bgi" : glob.glob(r"/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.bgen.bgi"),
    "bcor" : [i.replace('bgen','bcor') for i in bgen],
    "ld" : [i.replace('bgen','ld') for i in bgen],
    "n_samples" : 351430,
    "sample": f"/home/dmc2245/test_ldstore/01_44969183_46899501_hg19.sample",
    "incl" : f"/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/UKB_genotypedatadownloaded083019.090221_sample_variant_qc_final_callrate90.filtered.extracted.white_europeans.filtered.092821_ldprun_unrelated.filtered.incl"
          })
masterfile = masterfile[['z', 'bgen', 'bgi', 'bcor', 'ld','n_samples', 'sample', 'incl']]

masterfile.to_csv("/home/dmc2245/test_ldstore/mastefile_hg19", sep=";", header=True, index=False)

In [67]:
## this needs at least 83GB of memory

module load Singularity
sos run /home/dmc2245/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    bcor\
    --cwd ~/test_ldstore \
    --masterfile /home/dmc2245/test_ldstore/mastefile_hg19 \
    --numThreads 10 \
    --mem 100G \
    --job_size 1

INFO: Running [32mbcor[0m: Create bcor files
INFO: [32mbcor[0m is [32mcompleted[0m.
INFO: Workflow bcor (ID=wd947528e99aa805d) is executed successfully with 1 completed step.


In [52]:
## Set the bash variables 
cwd=~/test_ldstore
ldstore_sbatch=$cwd/ldstore_test_$(date +"%Y-%m-%d").sbatch
masterfile=/home/dmc2245/test_ldstore/mastefile_hg19
jobsize=1
ldstore_sos=~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb
tpl_file=~/project/bioworkflows/admin/csg.yml
mem='100G'
job_size=1
numThreads=10

ldstore_args="""bcor
 --cwd $cwd
 --masterfile $masterfile
 --numThreads $numThreads 
 --mem $mem 
 --job_size $job_size
"""

sos run ~/project/UKBB_GWAS_dev/admin/Get_Job_Script.ipynb csg_mamba \
 --template-file $tpl_file \
 --workflow-file $ldstore_sos \
 --to-script $ldstore_sbatch \
 --args "$ldstore_args"

INFO: Running [32mcsg_mamba[0m: Configuration for Columbia csg partition cluster
INFO: [32mcsg_mamba[0m is [32mcompleted[0m.
INFO: [32mcsg_mamba[0m output:   [32m/home/dmc2245/test_ldstore/ldstore_test_2023-12-13.sbatch[0m
INFO: Workflow csg_mamba (ID=w0662ea312dd5cbcc) is executed successfully with 1 completed step.


In [68]:
## this needs at least 83GB of memory

module load Singularity
sos run /home/dmc2245/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    ld:1\
    --cwd ~/test_ldstore \
    --masterfile /home/dmc2245/test_ldstore/mastefile_hg19 \
    --numThreads 10 \
    --mem 100G \
    --job_size 1

INFO: Running [32mld_1[0m: Calculate LD
INFO: [32mld_1[0m is [32mcompleted[0m.
INFO: Workflow ld (ID=w1b8257c32b61fc0e) is executed successfully with 1 completed step.


### Now, use plink to rename the variant ID and output as VCF

FIXME: we really do not need Dosage data because we don't care about it. Will this impact LD calculations???

In [7]:
module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_44969183_46899501.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--export vcf bgz id-paste=iid \
--set-all-var-ids @:#:\$r:\$a \
--new-id-max-allele-len 100 \
--out /home/dmc2245/test_ldstore/01_44969183_46899501

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/test_ldstore/01_44969183_46899501.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_44969183_46899501.bgen ref-first
  --export vcf bgz id-paste=iid
  --new-id-max-allele-len 100
  --out /home/dmc2245/test_ldstore/01_44969183_46899501
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a

Start time: Tue Dec 12 14:24:16 2023
257481 MiB RAM detected, ~221495 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 59885 variants detected, format v1.2.
487409 samples imported from .sample file to
/home/dmc2245/test_ldstore/01_44

### A very critical step is to do LiftOver if the downstream analysis will use hg38 genomic coordinates

Get all the necessary programs and reference files

In [9]:
cd /home/dmc2245/test_ldstore && \
wget https://github.com/broadinstitute/picard/releases/download/2.27.4/picard.jar && \
wget https://raw.githubusercontent.com/broadinstitute/gatk/master/scripts/funcotator/data_sources/gnomAD/b37ToHg38.over.chain && \
wget https://ilmn-dragen-giab-samples.s3.amazonaws.com/FASTA/hg38.fa


--2023-12-12 14:31:54--  https://github.com/broadinstitute/picard/releases/download/2.27.4/picard.jar
Resolving menloproxy.cumc.columbia.edu (menloproxy.cumc.columbia.edu)... 10.147.211.93
Connecting to menloproxy.cumc.columbia.edu (menloproxy.cumc.columbia.edu)|10.147.211.93|:8080... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/18225913/839cd9dd-e7dc-4c29-ab0d-6ab4bdcded4e?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231212T193154Z&X-Amz-Expires=300&X-Amz-Signature=d66030f0278cda0b061fed812f8f3de9bfa3c976d9c829383e1512baa2749c41&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=18225913&response-content-disposition=attachment%3B%20filename%3Dpicard.jar&response-content-type=application%2Foctet-stream [following]
--2023-12-12 14:31:54--  https://objects.githubusercontent.com/github-production-release

Run the liftover on the `vcf.gz`

In [13]:
## hg38 coordinates will be 44503511-46433829
java -jar ~/test_ldstore/picard.jar LiftoverVcf -I ~/test_ldstore/01_44969183_46899501.vcf.gz -O ~/test_ldstore/01_44969183_46899501.lo38.vcf \
   -C b37ToHg38.over.chain --REJECT ~/test_ldstore/rejected_variants.vcf -R ~/test_ldstore/GRCh38_full_analysis_set_plus_decoy_hla.fa \
   --RECOVER_SWAPPED_REF_ALT true --DISABLE_SORT true

14:56:19.306 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/vast/hpc/homes/dmc2245/test_ldstore/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Dec 12 14:56:19 EST 2023] LiftoverVcf --INPUT /home/dmc2245/test_ldstore/01_44969183_46899501.vcf.gz --OUTPUT /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.vcf --CHAIN b37ToHg38.over.chain --REJECT /home/dmc2245/test_ldstore/rejected_variants.vcf --RECOVER_SWAPPED_REF_ALT true --DISABLE_SORT true --REFERENCE_SEQUENCE /home/dmc2245/test_ldstore/GRCh38_full_analysis_set_plus_decoy_hla.fa --WARN_ON_MISSING_CONTIG false --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_POSITION false --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_se

Sort the file using bcftools

In [14]:
module load BCFTOOLS/1.18
bcftools sort -o ~/test_ldstore/01_44969183_46899501.lo38.vcf.gz -O z ~/test_ldstore/01_44969183_46899501.lo38.vcf

Writing to /tmp/6958056.1.csg.q/bcftools.dFBgzY
Merging 85 temporary files
Cleaning
Done


### Convert to bgen format again to be able to run bcor in ldstore

In [15]:
plink2 --threads 8 --memory 10000 \
        --vcf ~/test_ldstore/01_44969183_46899501.lo38.vcf.gz \
        --export bgen-1.2 'bits=8' 'sample-v2' 'ref-first' \
        --out ~/test_ldstore/01_44969183_46899501.lo38

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.log.
Options in effect:
  --export bgen-1.2 bits=8 sample-v2 ref-first
  --memory 10000
  --out /home/dmc2245/test_ldstore/01_44969183_46899501.lo38
  --threads 8
  --vcf /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.vcf.gz

Start time: Wed Dec 13 11:02:28 2023
257481 MiB RAM detected, ~221246 available; reserving 10000 MiB for main
workspace.
Using up to 8 compute threads.
--vcf: 59885 variants scanned.
--vcf: /home/dmc2245/test_ldstore/01_44969183_46899501.lo38-temporary.pgen +
/home/dmc2245/test_ldstore/01_44969183_46899501.lo38-temporary.pvar.zst +
/home/dmc2245/test_ldstore/01_44969183_46899501.lo38-temporary.psam written.
487409 samples (0 females, 0 males, 487409 ambiguous; 487409 founders) loaded
from /home/dmc2245/test_ldstore/01_44969183_46899501.lo38

In [3]:
module load  Bgenix/1.1.8
bgenix -g ~/test_ldstore/01_44969183_46899501.lo38.bgen -index

### Create the z file using plink

In [16]:
plink2 --threads 8 --memory 10000 \
        --vcf ~/test_ldstore/01_44969183_46899501.lo38.vcf.gz \
        -write-snplist \
        --out ~/test_ldstore/01_44969183_46899501.lo38

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.log.
Options in effect:
  --memory 10000
  --out /home/dmc2245/test_ldstore/01_44969183_46899501.lo38
  --threads 8
  --vcf /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.vcf.gz
  --write-snplist

Start time: Wed Dec 13 11:19:29 2023
257481 MiB RAM detected, ~221215 available; reserving 10000 MiB for main
workspace.
Using up to 8 compute threads.
59885 variants loaded from
/home/dmc2245/test_ldstore/01_44969183_46899501.lo38.vcf.gz.
Note: No phenotype data present.
--write-snplist: Variant IDs written to
/home/dmc2245/test_ldstore/01_44969183_46899501.lo38.snplist .
End time: Wed Dec 13 11:20:56 2023


Use python to finish getting the formatting

In [27]:
import pandas as pd
snplist = pd.read_csv('~/test_ldstore/01_44969183_46899501.lo38.snplist', header=None, names=["rsid_ori"])

In [28]:
snplist.head()

Unnamed: 0,rsid_ori
0,1:44969310:A:G
1,1:44969328:C:T
2,1:44969330:C:G
3,1:44969417:A:T
4,1:44969509:C:T


In [29]:
snplist[['chromosome', 'position', 'allele1', 'allele2']] = snplist['rsid_ori'].str.split(':', expand=True)

In [26]:
snplist['chromosome'] = snplist.chromosome.astype(str).str.zfill(2)

In [17]:
#snplist['rsid'] = snplist['rsid'].str.replace(r'^(\d:)', lambda x: 'chr0' + x.group(1), regex=True)

In [36]:
snplist['rsid'] = snplist['rsid_ori'].str.extract(r'^(\d+:\d+)')

In [37]:
snplist.head()

Unnamed: 0,rsid_ori,chromosome,position,allele1,allele2,rsid
0,1:44969310:A:G,1,44969310,A,G,1:44969310
1,1:44969328:C:T,1,44969328,C,T,1:44969328
2,1:44969330:C:G,1,44969330,C,G,1:44969330
3,1:44969417:A:T,1,44969417,A,T,1:44969417
4,1:44969509:C:T,1,44969509,C,T,1:44969509


In [38]:
snplist[["rsid","chromosome", "position", "allele1", "allele2"]].to_csv('~/test_ldstore/01_44969183_46899501.lo38.z', header=True, index=False, sep= ' ')

### Create the master file for LD store

In [12]:
import glob
import pandas as pd
bgen=glob.glob(r"/home/dmc2245/test_ldstore/*.bgen")
masterfile = pd.DataFrame({
    "z" : glob.glob(r"/home/dmc2245/test_ldstore/01_44969183_46899501.lo38.z"),
    "bgen" : glob.glob(r"/home/dmc2245/test_ldstore/*.bgen"),
    "bgi" : glob.glob(r"/home/dmc2245/test_ldstore/*.bgi"),
    "bcor" : [i.replace('bgen','bcor') for i in bgen],
    "ld" : [i.replace('bgen','ld') for i in bgen],
    "n_samples" : 351430,
    "sample": f"/mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample",
    "incl" : f"/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/UKB_genotypedatadownloaded083019.090221_sample_variant_qc_final_callrate90.filtered.extracted.white_europeans.filtered.092821_ldprun_unrelated.filtered.incl"
          })
masterfile = masterfile[['z', 'bgen', 'bgi', 'bcor', 'ld','n_samples', 'sample', 'incl']]

masterfile.to_csv("/home/dmc2245/test_ldstore/mastefile", sep=";", header=True, index=False)

## Compute LD using LDstore

In [39]:
module load Singularity
sos run /home/dmc2245/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    bcor\
    --cwd ~/test_ldstore \
    --masterfile /home/dmc2245/test_ldstore/mastefile \
    --numThreads 10 \
    --mem 100G \
    --job_size 1

INFO: Running [32mbcor[0m: Create bcor files
INFO: [32mbcor[0m is [32mcompleted[0m.
INFO: Workflow bcor (ID=we1b00b592f51ddac) is executed successfully with 1 completed step.


In [4]:
bgenix -g /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.bgen -list -incl-range 1:44503511-44503520


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

Building query                                              :  (0/?,0.0s,0.0/s)
# bgenix: started 2023-12-13 12:51:30
alternate_ids	rsid	chromosome	position	number_of_alleles	first_allele	alternative_alleles
# bgenix: success, total 0 variants.

Thank you for using bgenix.


In [1]:
cat /home/dmc2245/test_ldstore/01_44969183_46899501.lo38.z | head

rsid chromosome position allele1 allele2
01:44969310:A:G 01 44969310 A G
01:44969328:C:T 01 44969328 C T
01:44969330:C:G 01 44969330 C G
01:44969417:A:T 01 44969417 A T
01:44969509:C:T 01 44969509 C T
01:44969513:C:G 01 44969513 C G
01:44969515:A:G 01 44969515 A G
01:44969518:C:A 01 44969518 C A
01:44969527:T:C 01 44969527 T C


Error : SNP with rsID '1:46899482:G:A' in file '/home/dmc2245/test_ldstore/01_44969183_46899501.lo38.z' could not be matched with SNP in BGEN file '/home/dmc2245/test_ldstore/01_44969183_46899501.lo38.bgen'

In the terminal install bgen-reader. Read [tutorial](https://bgen-reader.readthedocs.io/en/latest/install.html)

```
micromamba install bgen-reader
```

In [45]:
# Read into bgen file
from bgen_reader import open_bgen
bgen_file = '/home/dmc2245/test_ldstore/01_44969183_46899501.lo38.bgen'
bgen = open_bgen(bgen_file, verbose=False)

# Print first 5 samples
print(bgen.samples[:5])
# ['sample_0' 'sample_1' 'sample_2' 'sample_3' 'sample_4']
# Print first 5 variants
print(bgen.ids[:5]) #first 5
# ['6:31571218_C_T' '6:31571228_A_C' '6:31571296_A_G' '6:31571308_C_T' '6:31571330_C_A']
print(bgen.nvariants)
# 43053

## Read the probabilities of the first variant
probs = bgen.read(0)
print(probs)

ModuleNotFoundError: No module named 'bgen_reader'

In [43]:
from bgen import BgenReader, BgenWriter

bfile = BgenReader('/home/dmc2245/01_44969183_46899501.lo38.bgen')
rsids = bfile.rsids()


ModuleNotFoundError: No module named 'bgen'

In [64]:
cat /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/masterfile_chr06_31571218_32682664_v0.001_list_unrelated_whiteEur.txt | head

z;bgen;bgi;bcor;ld;n_samples;sample;incl
/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/06/ukb_modif_06_31571218_32682664_0.01.z;/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/06/ukb_modif_06_31571218_32682664.bgen;/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/06/ukb_modif_06_31571218_32682664.bgen.bgi;/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/06/ukb_modif_06_31571218_32682664.bcor;/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/06/ukb_modif_06_31571218_32682664.ld;351430;/mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample;/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/UKB_genotypedatadownloaded083019.090221_sample_variant_qc_final_callrate90.filtered.extracted.white_europe

In [65]:
cat /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/06/ukb_modif_06_31571218_32682664_0.01.z | head

rsid chromosome position allele1 allele2
chr6:31571218:C:T 6 31571218 C T
chr6:31571228:A:C 6 31571228 A C
chr6:31571296:A:G 6 31571296 A G
chr6:31571308:C:T 6 31571308 C T
chr6:31571330:C:A 6 31571330 C A
chr6:31571337:C:T 6 31571337 C T
chr6:31571362:C:T 6 31571362 C T
chr6:31571375:A:G 6 31571375 A G
chr6:31571384:A:G 6 31571384 A G


### Create MWE for susie_rss

 Create a bgen_list_file for regions of interest


In [84]:
import pandas as pd
import glob

bgen=glob.glob('/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/'+'/01/*.bgen')
df = pd.DataFrame({'bgen':bgen})
df['sample'] = df.apply(lambda x:'/mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample', axis=1)

In [87]:
df.to_csv('/mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/chr1_bgen_list.txt',sep=';', header=True, index=False)

In [None]:
bgi=glob.glob(${cwd:r}+'/*/*.bgen.bgi')
df2 = pd.DataFrame({'bgi':bgi})
df3=pd.concat([df,df1,df2], axis=1)
bcor=[i.replace('bgen','bcor') for i in bgen]
df4 = pd.DataFrame({'bcor':bcor})
ld=[i.replace('bgen','ld') for i in bgen]
df5 = pd.DataFrame({'ld':ld})
df_final=pd.concat([df3,df4,df5], axis=1)

## Regions to generate
```
01_159913048_162346721.bgen (index pos 161155392)
01_206073265_208410364.bgen (index pos 207786828)
02_127373764_128034347.bgen (index pos 127891427)
02_233550003_235150987.bgen (index pos 233981912) 
03_56433907_58157519.bgen (index pos 57226150)
```



In [70]:
module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_159913048_162346721.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/susie_mwe/01_159913048_162346721_hg19 \
--maf 0.001 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/susie_mwe/01_159913048_162346721_hg19.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_159913048_162346721.bgen ref-first
  --export bgen-1.2 bits=8 ref-first
  --maf 0.001
  --make-just-bim
  --new-id-max-allele-len 100
  --out /home/dmc2245/susie_mwe/01_159913048_162346721_hg19
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a
  --write-snplist

Start time: Wed Dec 13 17:19:56 2023
257481 MiB RAM detected, ~217198 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 79539 variants detected, format v1.2.
487409 sampl

In [145]:
module load  Bgenix/1.1.8
bgenix -g ~/susie_mwe/01_159913048_162346721_hg19.bgen -index


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

bgenix: creating index for "/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.bgen" in "/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.bgen.bgi"...
bgenix: Opened "/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.bgen" with 17716 variants...
Building BGEN index                                         : [******************************] (17716/17716,1.9s,9524.0/s)

Thank you for using bgenix.


In [71]:
module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_206073265_208410364.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/susie_mwe/01_206073265_208410364_hg19 \
--maf 0.001 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/susie_mwe/01_206073265_208410364_hg19.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_206073265_208410364.bgen ref-first
  --export bgen-1.2 bits=8 ref-first
  --maf 0.001
  --make-just-bim
  --new-id-max-allele-len 100
  --out /home/dmc2245/susie_mwe/01_206073265_208410364_hg19
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a
  --write-snplist

Start time: Wed Dec 13 17:20:31 2023
257481 MiB RAM detected, ~217180 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 65942 variants detected, format v1.2.
487409 sampl

In [146]:
module load  Bgenix/1.1.8
bgenix -g ~/susie_mwe/01_206073265_208410364_hg19.bgen -index


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

bgenix: creating index for "/home/dmc2245/susie_mwe/01_206073265_208410364_hg19.bgen" in "/home/dmc2245/susie_mwe/01_206073265_208410364_hg19.bgen.bgi"...
bgenix: Opened "/home/dmc2245/susie_mwe/01_206073265_208410364_hg19.bgen" with 13084 variants...
Building BGEN index                                         : [******************************] (13084/13084,1.4s,9460.9/s)

Thank you for using bgenix.


In [72]:
module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/02/02_127373764_128034347.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/susie_mwe/02_127373764_128034347_hg19 \
--maf 0.001 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/susie_mwe/02_127373764_128034347_hg19.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/02/02_127373764_128034347.bgen ref-first
  --export bgen-1.2 bits=8 ref-first
  --maf 0.001
  --make-just-bim
  --new-id-max-allele-len 100
  --out /home/dmc2245/susie_mwe/02_127373764_128034347_hg19
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a
  --write-snplist

Start time: Wed Dec 13 17:21:24 2023
257481 MiB RAM detected, ~217167 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 24508 variants detected, format v1.2.
487409 sampl

In [147]:
module load  Bgenix/1.1.8
bgenix -g ~/susie_mwe/02_127373764_128034347_hg19.bgen -index


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

bgenix: creating index for "/home/dmc2245/susie_mwe/02_127373764_128034347_hg19.bgen" in "/home/dmc2245/susie_mwe/02_127373764_128034347_hg19.bgen.bgi"...
bgenix: Opened "/home/dmc2245/susie_mwe/02_127373764_128034347_hg19.bgen" with 5409 variants...
Building BGEN index                                         : [******************************] (5409/5409,0.6s,9233.6/s)

Thank you for using bgenix.


In [73]:

module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/02/02_233550003_235150987.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/susie_mwe/02_233550003_235150987_hg19 \
--maf 0.001 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/susie_mwe/02_233550003_235150987_hg19.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/02/02_233550003_235150987.bgen ref-first
  --export bgen-1.2 bits=8 ref-first
  --maf 0.001
  --make-just-bim
  --new-id-max-allele-len 100
  --out /home/dmc2245/susie_mwe/02_233550003_235150987_hg19
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a
  --write-snplist

Start time: Wed Dec 13 17:22:02 2023
257481 MiB RAM detected, ~217179 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 57170 variants detected, format v1.2.
487409 sampl

In [148]:
module load  Bgenix/1.1.8
bgenix -g ~/susie_mwe/02_233550003_235150987_hg19.bgen -index


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

bgenix: creating index for "/home/dmc2245/susie_mwe/02_233550003_235150987_hg19.bgen" in "/home/dmc2245/susie_mwe/02_233550003_235150987_hg19.bgen.bgi"...
bgenix: Opened "/home/dmc2245/susie_mwe/02_233550003_235150987_hg19.bgen" with 11838 variants...
Building BGEN index                                         : [******************************] (11838/11838,1.3s,9218.0/s)

Thank you for using bgenix.


In [75]:

module load Plink/2.00a
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/03/03_56433907_58157519.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/susie_mwe/03_56433907_58157519_hg19 \
--maf 0.001 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

PLINK v2.00a4LM 64-bit Intel (11 Apr 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/dmc2245/susie_mwe/03_56433907_58157519_hg19.log.
Options in effect:
  --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/03/03_56433907_58157519.bgen ref-first
  --export bgen-1.2 bits=8 ref-first
  --maf 0.001
  --make-just-bim
  --new-id-max-allele-len 100
  --out /home/dmc2245/susie_mwe/03_56433907_58157519_hg19
  --sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample
  --set-all-var-ids @:#:$r:$a
  --write-snplist

Start time: Wed Dec 13 17:22:46 2023
257481 MiB RAM detected, ~217163 available; reserving 128740 MiB for main
workspace.
Allocated 72416 MiB successfully, after larger attempt(s) failed.
Using up to 64 threads (change this with --threads).
--bgen: 56772 variants detected, format v1.2.
487409 samples imp

In [149]:
module load  Bgenix/1.1.8
bgenix -g ~/susie_mwe/03_56433907_58157519_hg19.bgen -index


Welcome to bgenix
(version: 1.1.8, revision )

(C) 2009-2017 University of Oxford

bgenix: creating index for "/home/dmc2245/susie_mwe/03_56433907_58157519_hg19.bgen" in "/home/dmc2245/susie_mwe/03_56433907_58157519_hg19.bgen.bgi"...
bgenix: Opened "/home/dmc2245/susie_mwe/03_56433907_58157519_hg19.bgen" with 11417 variants...
Building BGEN index                                         : [******************************] (11417/11417,1.1s,10799.4/s)

Thank you for using bgenix.


In [144]:
# Step 2 generate subset bgen file
sos dryrun ~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    subset_bgen \
    --cwd ~/susie_mwe \
    --masterfile  \
    --bgen_list_file /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/chr1_bgen_list.txt \
    --maf_filter 0.001 \
    --numThreads 10 \
    --mem 10G \
    --job_size 1 \
    --container ~/containers/lmm.sif

INFO: Checking [32msubset_bgen[0m: Subset bgen files to a specific maf, change the variant id, write snplist to create *z file and output bim files for downstream liftover
HINT: singularity exec  /home/dmc2245/containers/lmm.sif /bin/bash /mnt/vast/hpc/homes/dmc2245/project/UKBB_GWAS_dev/code/python/tmp88ikv6_d/singularity_run_64496.sh
plink2 --bgen /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/01/01_100826405_102041016.bgen 'ref-first' \
--sample /mnt/vast/hpc/csg/UKBiobank_Yale_transfer/ukb39554_imputeddataset/ukb32285_imputedindiv.sample \
--write-snplist \
--out /home/dmc2245/susie_mwe/01_100826405_102041016.0.001.subset \
--maf 0.001 \
--export bgen-1.2 'bits=8' 'ref-first' \
--set-all-var-ids '@:#:$r:$a' \
--new-id-max-allele-len 100 \
--make-just-bim

bgenix -g /home/dmc2245/susie_mwe/01_100826405_102041016.0.001.subset.bgen -index



INFO: [32msubset_bgen[0m (index=0) is [32mcompleted[0m.
HINT: singularity exec  /home/dmc2245/

In [80]:
# Step 3 create z file
sos run ~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    z_file \
    --cwd ~/susie_mwe \
    --masterfile ~/susie_mwe/mastefile_hg19\
    --numThreads 10 \
    --mem 100G \
    --job_size 1

INFO: Running [32mCreation of the master file LDStore2[0m: 
INFO: [32mz_file[0m (index=2) is [32mcompleted[0m.
INFO: [32mz_file[0m (index=1) is [32mcompleted[0m.
INFO: [32mz_file[0m (index=3) is [32mcompleted[0m.
INFO: [32mz_file[0m (index=4) is [32mcompleted[0m.
INFO: [32mz_file[0m (index=0) is [32mcompleted[0m.
INFO: [32mCreation of the master file LDStore2[0m output:   [32m/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.z /home/dmc2245/susie_mwe/01_206073265_208410364_hg19.z... (5 items in 5 groups)[0m
INFO: Workflow z_file (ID=wc5cc0b79f48825c8) is executed successfully with 1 completed step and 5 completed substeps.


In [135]:
# Step 4 Create the masterfile with the new subsetted files 
sos run ~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    masterfile \
    --cwd ~/susie_mwe \
    --masterfile 'susie_mwe'\
    --number_of_samples 351430 \
    --incl_samples /mnt/vast/hpc/csg/UKBiobank/results/pleiotropy_AD_ARHI/111822_LDstore_files/regions_chr1_22/UKB_genotypedatadownloaded083019.090221_sample_variant_qc_final_callrate90.filtered.extracted.white_europeans.filtered.092821_ldprun_unrelated.filtered.incl \
    --numThreads 10 \
    --mem 1G \
    --job_size 1

INFO: Running [32mCreation of the masterfile[0m: Creation of the masterfile
INFO: [32mCreation of the masterfile[0m is [32mcompleted[0m.
INFO: [32mCreation of the masterfile[0m output:   [32m/home/dmc2245/susie_mwe/susie_mwe.masterfile[0m
INFO: Workflow masterfile (ID=wd57c21927186d3ef) is executed successfully with 1 completed step.


In [174]:
# Step 5 Run bcor in LDstore using previously generated masterfile
sos dryrun ~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    bcor \
    --cwd ~/susie_mwe \
    --masterfile 'susie_mwe'\
    --numThreads 10 \
    --mem 100G \
    --job_size 1


INFO: Checking [32mbcor[0m: Create bcor files
HINT: /bin/bash SCRIPT
~/ldstore_v2.0_x86_64/./ldstore_v2.0_x86_64  \
--in-files f'/home/dmc2245/susie_mwe/susie_mwe.masterfile'\
--write-bcor \
--read-only-bgen \
--n-threads 10 \
--compression 'high'



INFO: [32mbcor[0m (index=0) is [32mcompleted[0m.
HINT: /bin/bash SCRIPT
~/ldstore_v2.0_x86_64/./ldstore_v2.0_x86_64  \
--in-files f'/home/dmc2245/susie_mwe/susie_mwe.masterfile'\
--write-bcor \
--read-only-bgen \
--n-threads 10 \
--compression 'high'



INFO: [32mbcor[0m (index=1) is [32mcompleted[0m.
HINT: /bin/bash SCRIPT
~/ldstore_v2.0_x86_64/./ldstore_v2.0_x86_64  \
--in-files f'/home/dmc2245/susie_mwe/susie_mwe.masterfile'\
--write-bcor \
--read-only-bgen \
--n-threads 10 \
--compression 'high'



INFO: [32mbcor[0m (index=2) is [32mcompleted[0m.
HINT: /bin/bash SCRIPT
~/ldstore_v2.0_x86_64/./ldstore_v2.0_x86_64  \
--in-files f'/home/dmc2245/susie_mwe/susie_mwe.masterfile'\
--write-bcor \
--read-only-bgen \
--n-threads 10

In [36]:
# Step 6 Generate ld files and save as xz format
sos run ~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
    ld \
    --cwd ~/susie_mwe \
    --masterfile 'susie_mwe'\
    --numThreads 10 \
    --mem 100G \
    --job_size 1 -s build

INFO: Running [32mld_1[0m: Calculate LD
INFO: Step [32mld_1[0m (index=0) is [32mignored[0m with signature constructed
INFO: Step [32mld_1[0m (index=2) is [32mignored[0m with signature constructed
INFO: Step [32mld_1[0m (index=3) is [32mignored[0m with signature constructed
INFO: Step [32mld_1[0m (index=4) is [32mignored[0m with signature constructed
INFO: Step [32mld_1[0m (index=1) is [32mignored[0m with signature constructed
INFO: [32mld_1[0m output:   [32m/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.ld /home/dmc2245/susie_mwe/01_206073265_208410364_hg19.ld... (5 items in 5 groups)[0m
INFO: Running [32mld_2[0m: Output LD matriz as compressed xz format
INFO: [32mld_2[0m (index=2) is [32mcompleted[0m.
INFO: [32mld_2[0m (index=4) is [32mcompleted[0m.
INFO: [32mld_2[0m (index=3) is [32mcompleted[0m.
INFO: [32mld_2[0m (index=1) is [32mcompleted[0m.
INFO: [32mld_2[0m (index=0) is [32mcompleted[0m.
INFO: [32mld_2[0m output:   [32m/hom

In [42]:
# Upload to synapse

import synapseclient
from synapseclient import File
syn=synapseclient.login(authToken="eyJ0eXAiOiJKV1QiLCJraWQiOiJXN05OOldMSlQ6SjVSSzpMN1RMOlQ3TDc6M1ZYNjpKRU9VOjY0NFI6VTNJWDo1S1oyOjdaQ0s6RlBUSCIsImFsZyI6IlJTMjU2In0.eyJhY2Nlc3MiOnsic2NvcGUiOlsidmlldyJdLCJvaWRjX2NsYWltcyI6e319LCJ0b2tlbl90eXBlIjoiUEVSU09OQUxfQUNDRVNTX1RPS0VOIiwiaXNzIjoiaHR0cHM6Ly9yZXBvLXByb2QucHJvZC5zYWdlYmFzZS5vcmcvYXV0aC92MSIsImF1ZCI6IjAiLCJuYmYiOjE3MDI5MzUyMTYsImlhdCI6MTcwMjkzNTIxNiwianRpIjoiNDYxOCIsInN1YiI6IjM0ODY1ODEifQ.VcKauz2h1b2rgRBzwZDXyq8sdM9ljnmLYlk_cWoS72xsMgv1Iq0s-IXZfrgAY0jUd6bN4A4jbHDYjEPatK2z-re46kJpVBHacusLlAU6mDNCXKx5iBWp5SyLInT2olzoiJJ7dVolbhGhXTM5-y7XxcmhXoDTlNKvki2VlbRJsRFKrk3bPZbdGVthdBZIdIWXmI2sK12OXMp_JUhK8Ws-9rnIRrwRyQrz_84PQNy8wUql5Os5Anf4v9h-GZiOF1s3z0JkF5Q5v4vQF6GQZ7rd_rRNq0tlqEZ0foO4ZJVabEnxIgKRr-I4JoZubUbk5TtHdooImGX6Agpgwr1wIFqeCw")

# Add a local file to an existing project (syn12345) on Synapse
#file = File(path='/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.xz', parent='syn53163876')
#file = syn.store(file)

Welcome, dcornejo88!




UPGRADE AVAILABLE

A more recent version of the Synapse Client (3.2.0) is available. Your version (2.7.2) can be upgraded by typing:
    pip install --upgrade synapseclient

Python Synapse Client version 3.2.0 release notes

https://python-docs.synapse.org/build/html/news.html



In [None]:
## Save upper diagonal matrix
import pandas as pd
import numpy as np
import lzma

np_ld = np.loadtxt('/home/dmc2245/susie_mwe/02_233550003_235150987_hg19.ld', dtype = "float16"

with xz.open(args.out, "w+", preset=9) as f:
    for r in range(np_ld.shape[0]):
        f.write(" ".join(["{:.6f}".format(x) for x in np_ld[r, :]]).encode())
        f.write(b"\n")

In [11]:
import pandas as pd
import numpy as np
import lzma
#import xz

#z_file=pd.read_csv('/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.z', sep=" ", skiprows=1, header=None)[0].to_numpy()
np_ld = np.loadtxt('/home/dmc2245/susie_mwe/01_159913048_162346721_hg19.ld', dtype = "float16")

In [12]:
np.set_printoptions(formatter={'float': lambda x: "{0:0.6f}".format(x)})

In [13]:
ld_file

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,17707,17708,17709,17710,17711,17712,17713,17714,17715,17716
0,1.000000,-0.039520,0.076936,-0.010260,-0.011528,-0.007244,0.025617,0.349776,-0.040015,-0.003163,...,0.003292,0.001617,-0.002359,0.004602,-0.001753,-0.000437,0.004527,0.003477,-0.001143,
1,-0.039520,1.000000,-0.199267,-0.043424,-0.050493,-0.029090,0.034237,-0.014709,0.577217,-0.015779,...,-0.001471,0.001519,-0.004886,0.001469,0.000761,0.002153,0.000074,0.005014,-0.001248,
2,0.076936,-0.199267,1.000000,0.087308,-0.149868,-0.081060,0.316285,0.036852,-0.458906,-0.043338,...,0.000466,-0.000174,0.003233,0.004464,0.003842,-0.005526,-0.001074,-0.002356,0.002305,
3,-0.010260,-0.043424,0.087308,1.000000,-0.013239,-0.007831,0.028841,-0.003291,-0.041080,-0.004297,...,0.004205,0.001773,-0.002052,-0.000001,-0.000024,-0.002142,-0.001593,0.000161,-0.000206,
4,-0.011528,-0.050493,-0.149868,-0.013239,1.000000,-0.009549,0.030593,-0.005738,-0.045597,-0.001204,...,0.000477,-0.001368,0.003686,0.000412,-0.002626,0.001747,-0.004787,-0.004930,-0.002133,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17711,-0.001753,0.000761,0.003842,-0.000024,-0.002626,-0.001269,0.002409,-0.002987,0.001626,-0.002262,...,-0.029220,-0.904367,-0.005875,0.063842,1.000000,-0.094399,-0.006545,-0.013314,-0.007536,
17712,-0.000437,0.002153,-0.005526,-0.002142,0.001747,-0.003843,-0.001777,-0.002409,0.003086,-0.002267,...,-0.146516,0.103868,0.067467,-0.201952,-0.094399,1.000000,-0.036214,0.138500,-0.048430,
17713,0.004527,0.000074,-0.001074,-0.001593,-0.004787,-0.001417,-0.005010,0.003239,-0.000946,0.001803,...,-0.012423,0.006909,-0.000102,0.048063,-0.006545,-0.036214,1.000000,-0.000973,-0.003410,
17714,0.003477,0.005014,-0.002356,0.000161,-0.004930,-0.003532,-0.001426,0.000944,0.009759,-0.001598,...,-0.021886,0.015060,-0.004916,-0.076375,-0.013314,0.138500,-0.000973,1.000000,-0.005541,


In [14]:
np_ld

array([[1.000000, -0.039520, 0.076965, ..., 0.004528, 0.003477,
        -0.001143],
       [-0.039520, 1.000000, -0.199219, ..., 0.000074, 0.005013,
        -0.001247],
       [0.076965, -0.199219, 1.000000, ..., -0.001074, -0.002356,
        0.002304],
       ...,
       [0.004528, 0.000074, -0.001074, ..., 1.000000, -0.000973,
        -0.003410],
       [0.003477, 0.005013, -0.002356, ..., -0.000973, 1.000000,
        -0.005539],
       [-0.001143, -0.001247, 0.002304, ..., -0.003410, -0.005539,
        1.000000]], dtype=float16)

In [16]:
tri_upper_diag = np.triu(np_ld, k=0)

In [17]:
tri_upper_diag

array([[1.000000, -0.039520, 0.076965, ..., 0.004528, 0.003477,
        -0.001143],
       [0.000000, 1.000000, -0.199219, ..., 0.000074, 0.005013,
        -0.001247],
       [0.000000, 0.000000, 1.000000, ..., -0.001074, -0.002356,
        0.002304],
       ...,
       [0.000000, 0.000000, 0.000000, ..., 1.000000, -0.000973,
        -0.003410],
       [0.000000, 0.000000, 0.000000, ..., 0.000000, 1.000000, -0.005539],
       [0.000000, 0.000000, 0.000000, ..., 0.000000, 0.000000, 1.000000]],
      dtype=float16)

In [30]:
tri_lower_diag = np.tril(np_ld, k=0)

In [35]:
tri_lower_diag

array([[1.000000, 0.000000, 0.000000, ..., 0.000000, 0.000000, 0.000000],
       [-0.039520, 1.000000, 0.000000, ..., 0.000000, 0.000000, 0.000000],
       [0.076965, -0.199219, 1.000000, ..., 0.000000, 0.000000, 0.000000],
       ...,
       [0.004528, 0.000074, -0.001074, ..., 1.000000, 0.000000, 0.000000],
       [0.003477, 0.005013, -0.002356, ..., -0.000973, 1.000000,
        0.000000],
       [-0.001143, -0.001247, 0.002304, ..., -0.003410, -0.005539,
        1.000000]], dtype=float16)

In [None]:
import xz
with xz.open('/home/dmc2245/susie_mwe/01_159913048_162346721_hg19_test_lower_diag.xz', "w+", preset=9) as f:
        for r in range(tri_lower_diag.shape[0]):
            f.write(" ".join(["{:.6f}".format(x) for x in tri_lower_diag[r, :]]).encode())
            f.write(b"\n")

## Run liftover

In [81]:
# Step 7 Run liftover on the bim files
sos run ~/project/UKBB_GWAS_dev/workflow/111722_LDstore.ipynb \
     liftover \
    --cwd ~/susie_mwe/liftover \
    --masterfile 'susie_mwe'\
    --bim_name `echo /home/dmc2245/susie_mwe/*_hg19.bim` \
    --to_build hg38 \
    --chain_file ~/liftover_ucsc/hg19ToHg38.over.chain.gz \
    -s build

INFO: Running [32mliftover_1[0m: Run liftover
INFO: [32mliftover_1[0m (index=4) is [32mcompleted[0m.
INFO: [32mliftover_1[0m (index=2) is [32mcompleted[0m.
INFO: [32mliftover_1[0m (index=3) is [32mcompleted[0m.
INFO: [32mliftover_1[0m (index=1) is [32mcompleted[0m.
INFO: [32mliftover_1[0m (index=0) is [32mcompleted[0m.
INFO: [32mliftover_1[0m output:   [32m/home/dmc2245/susie_mwe/liftover/01_159913048_162346721_hg19.bed /home/dmc2245/susie_mwe/liftover/01_159913048_162346721_hg19.hg38.bed... (15 items in 5 groups)[0m
INFO: Running [32mliftover_2[0m: Organize bim file to account for unmapped variants
INFO: [32mliftover_2[0m (index=2) is [32mcompleted[0m.
INFO: [32mliftover_2[0m (index=3) is [32mcompleted[0m.
INFO: [32mliftover_2[0m (index=4) is [32mcompleted[0m.
INFO: [32mliftover_2[0m (index=1) is [32mcompleted[0m.
INFO: [32mliftover_2[0m (index=0) is [32mcompleted[0m.
INFO: [32mliftover_2[0m output:   [32m/home/dmc2245/susie_mwe/lifto

In [69]:
## Initial bim file in hg19
import pandas as pd
hg19_bim=pd.read_csv('/home/dmc2245/susie_mwe/01_206073265_208410364_hg19.bim',sep='\t', header=None, names=["chr","id", "cm", "pos_hg19","minor_allele","major_allele"])


In [70]:
hg19_bim

Unnamed: 0,chr,id,cm,pos_hg19,minor_allele,major_allele
0,1,1:206073267:A:T,0,206073267,T,A
1,1,1:206073386:A:G,0,206073386,G,A
2,1,1:206074010:A:G,0,206074010,G,A
3,1,1:206074070:T:G,0,206074070,G,T
4,1,1:206074127:G:C,0,206074127,C,G
...,...,...,...,...,...,...
13079,1,1:208410043:A:C,0,208410043,C,A
13080,1,1:208410130:T:C,0,208410130,C,T
13081,1,1:208410160:G:A,0,208410160,A,G
13082,1,1:208410337:A:C,0,208410337,C,A


In [85]:
# Final file in hg38 after running liftover
hg38_bed=pd.read_csv('/home/dmc2245/susie_mwe/01_206073265_208410364_hg19.hg38.bed',sep='\t', header=None, names=["chr","start", "end", "id", "cm","minor_allele","major_allele"])


In [86]:
hg38_bed

Unnamed: 0,chr,start,end,id,cm,minor_allele,major_allele
0,chr1,206268083,206268083,1:206073267:A:T,0,T,A
1,chr1,206267964,206267964,1:206073386:A:G,0,G,A
2,chr1,206267340,206267340,1:206074010:A:G,0,G,A
3,chr1,206267280,206267280,1:206074070:T:G,0,G,T
4,chr1,206267223,206267223,1:206074127:G:C,0,C,G
...,...,...,...,...,...,...,...
13057,chr1,208236698,208236698,1:208410043:A:C,0,C,A
13058,chr1,208236785,208236785,1:208410130:T:C,0,C,T
13059,chr1,208236815,208236815,1:208410160:G:A,0,A,G
13060,chr1,208236992,208236992,1:208410337:A:C,0,C,A


In [91]:
merged_df = pd.merge(hg19_bim,hg38_bed, on='id', how='outer' )

In [102]:
merged_df_2 = pd.merge(hg19_bim,hg38_bed, on='id', how='left', indicator=True)
rows_not_in_df2 = merged_df_2[merged_df_2['_merge'] == 'left_only'].drop(columns=['_merge'])

print(rows_not_in_df2)

      chr_x                   id  cm_x   pos_hg19 minor_allele_x  \
471       1     1:206207649:CT:C     0  206207649              C   
1474      1     1:206508203:AT:A     0  206508203              A   
1487      1      1:206512352:T:G     0  206512352              G   
1495      1    1:206513621:CCT:C     0  206513621              C   
1590      1  1:206545086:AAAAG:A     0  206545086              A   
1754      1      1:206592248:G:C     0  206592248              C   
1755      1      1:206592250:G:C     0  206592250              C   
1756      1      1:206592252:G:C     0  206592252              C   
1757      1      1:206592254:G:C     0  206592254              C   
1955      1    1:206622631:ATT:A     0  206622631              A   
1970      1     1:206624458:TA:T     0  206624458              T   
2050      1     1:206639880:CA:C     0  206639880              C   
2056      1  1:206640435:AGAAT:A     0  206640435              A   
2813      1     1:206750305:AT:A     0  20675030

In [92]:
merged_df

Unnamed: 0,chr_x,id,cm_x,pos_hg19,minor_allele_x,major_allele_x,chr_y,start,end,cm_y,minor_allele_y,major_allele_y
0,1,1:206073267:A:T,0,206073267,T,A,chr1,206268083.0,206268083.0,0.0,T,A
1,1,1:206073386:A:G,0,206073386,G,A,chr1,206267964.0,206267964.0,0.0,G,A
2,1,1:206074010:A:G,0,206074010,G,A,chr1,206267340.0,206267340.0,0.0,G,A
3,1,1:206074070:T:G,0,206074070,G,T,chr1,206267280.0,206267280.0,0.0,G,T
4,1,1:206074127:G:C,0,206074127,C,G,chr1,206267223.0,206267223.0,0.0,C,G
...,...,...,...,...,...,...,...,...,...,...,...,...
13079,1,1:208410043:A:C,0,208410043,C,A,chr1,208236698.0,208236698.0,0.0,C,A
13080,1,1:208410130:T:C,0,208410130,C,T,chr1,208236785.0,208236785.0,0.0,C,T
13081,1,1:208410160:G:A,0,208410160,A,G,chr1,208236815.0,208236815.0,0.0,A,G
13082,1,1:208410337:A:C,0,208410337,C,A,chr1,208236992.0,208236992.0,0.0,C,A


In [32]:
# Example usage:
df1 = pd.DataFrame({'chr': [1, 1, 1,1,1,1], 'id': ['1:206073267:A:T','1:206073386:A:G','1:206074010:A:G','1:206207649:CT:C', '1:206508203:AT:A','1:208410364:A:T'], 'cm': [0, 0, 0,0,0,0], 'pos_hg19': [206073267, 206073386, 206074010,206207649,206508203,208410364],
                    'minor_allele': ['T', 'G', 'G','C','A','A'], 'major_allele': ['A', 'A', 'A','CT','AT','T']})

df2= pd.DataFrame({'id': ['1:206073267:A:T', '1:206073386:A:G', '1:206074010:A:G','1:208410364:A:T'], 'chr': [1, 1, 1,1], 'cm': [0, 0, 0,0], 'pos_hg38': [206268083, 206267964, 206267340,208237019],
                    'minor_allele': ['T', 'G', 'G','T'], 'major_allele': ['A', 'A', 'A','A']})

In [33]:
df1

Unnamed: 0,chr,id,cm,pos_hg19,minor_allele,major_allele
0,1,1:206073267:A:T,0,206073267,T,A
1,1,1:206073386:A:G,0,206073386,G,A
2,1,1:206074010:A:G,0,206074010,G,A
3,1,1:206207649:CT:C,0,206207649,C,CT
4,1,1:206508203:AT:A,0,206508203,A,AT
5,1,1:208410364:A:T,0,208410364,A,T


In [34]:
df2

Unnamed: 0,id,chr,cm,pos_hg38,minor_allele,major_allele
0,1:206073267:A:T,1,0,206268083,T,A
1,1:206073386:A:G,1,0,206267964,G,A
2,1:206074010:A:G,1,0,206267340,G,A
3,1:208410364:A:T,1,0,208237019,T,A


In [66]:
import pandas as pd 
def merge_dataframes(df1, df2):
    # Merge DataFrames using a left join on the 'id' column
    merged_df = pd.merge(df1, df2, on='id', how='left', suffixes=('_df1', '_df2'))
    # Create a list to store dataframes
    result_dfs = []
    # Iterate through rows
    for index, row in merged_df.iterrows():
        if not pd.isna(row['pos_hg38']):  # If 'pos_hg38' is present in df2
            id_string = f"{int(row['chr_df2'])}:{row['pos_hg38']}:{row['major_allele_df2']}:{row['minor_allele_df2']}"
            result_row = pd.Series([int(row['chr_df2']), id_string, int(row['cm_df2']), row['pos_hg38'], row['minor_allele_df2'], row['major_allele_df2']],
                                   index=['chr_df2', 'id', 'cm_df2', 'pos_hg38', 'minor_allele_df2', 'major_allele_df2'])
        else:
             result_row = pd.Series([0, row['id'], int(row['cm_df1']), row['pos_hg19'], row['minor_allele_df1'], row['major_allele_df1']],
                                   index=['chr_df2', 'id', 'cm_df2', 'pos_hg38', 'minor_allele_df2', 'major_allele_df2'])

        result_dfs.append(result_row)
    # Concatenate the list of DataFrames into the final result DataFrame
    result_df = pd.DataFrame(result_dfs)
    # Sort the result DataFrame based on the original order in df1
    #result_df = result_df.sort_values(by='id').reset_index(drop=True)
    return result_df

In [36]:
merged_df

Unnamed: 0,chr_df1,id,cm_df1,pos_hg19,minor_allele_df1,major_allele_df1,chr_df2,cm_df2,pos_hg38,minor_allele_df2,major_allele_df2
0,1,1:206073267:A:T,0,206073267,T,A,1.0,0.0,206268083.0,T,A
1,1,1:206073386:A:G,0,206073386,G,A,1.0,0.0,206267964.0,G,A
2,1,1:206074010:A:G,0,206074010,G,A,1.0,0.0,206267340.0,G,A
3,1,1:206207649:CT:C,0,206207649,C,CT,,,,,
4,1,1:206508203:AT:A,0,206508203,A,AT,,,,,
5,1,1:208410364:A:T,0,208410364,A,T,1.0,0.0,208237019.0,T,A


In [67]:
res= merge_dataframes(df1,df2)

In [68]:
res

Unnamed: 0,chr_df2,id,cm_df2,pos_hg38,minor_allele_df2,major_allele_df2
0,1,1:206268083.0:A:T,0,206268083.0,T,A
1,1,1:206267964.0:A:G,0,206267964.0,G,A
2,1,1:206267340.0:A:G,0,206267340.0,G,A
3,0,1:206207649:CT:C,0,206207649.0,C,CT
4,0,1:206508203:AT:A,0,206508203.0,A,AT
5,1,1:208237019.0:A:T,0,208237019.0,T,A


## Read the xz files using R

Need to `install.packages('data.table')`

In [19]:
library('data.table')

In [26]:
file <- fread(cmd = paste("xzcat", "/home/dmc2245/susie_mwe/01_159913048_162346721_hg19_test.xz"), header = FALSE, sep = " ")

In [27]:
file

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V17707,V17708,V17709,V17710,V17711,V17712,V17713,V17714,V17715,V17716
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,-0.03952,0.076965,-0.010262,-0.011528,-0.007244,0.025620,0.349854,-0.040009,-0.003162,⋯,-0.000100,0.003292,0.001616,-0.002359,0.004601,-0.001753,-0.000437,0.004528,0.003477,-0.001143
0,1.00000,-0.199219,-0.043427,-0.050507,-0.029083,0.034241,-0.014709,0.577148,-0.015778,⋯,0.001030,-0.001472,0.001519,-0.004887,0.001469,0.000762,0.002153,0.000074,0.005013,-0.001247
0,0.00000,1.000000,0.087280,-0.149902,-0.081055,0.316406,0.036865,-0.458984,-0.043335,⋯,-0.003998,0.000465,-0.000174,0.003233,0.004463,0.003841,-0.005527,-0.001074,-0.002356,0.002304
0,0.00000,0.000000,1.000000,-0.013237,-0.007828,0.028839,-0.003292,-0.041077,-0.004299,⋯,0.000781,0.004204,0.001773,-0.002052,-0.000001,-0.000024,-0.002142,-0.001593,0.000161,-0.000206
0,0.00000,0.000000,0.000000,1.000000,-0.009552,0.030594,-0.005737,-0.045593,-0.001204,⋯,0.005833,0.000477,-0.001369,0.003685,0.000412,-0.002626,0.001747,-0.004787,-0.004929,-0.002132
0,0.00000,0.000000,0.000000,0.000000,1.000000,0.005970,-0.004547,-0.024643,-0.002275,⋯,-0.003075,0.001641,0.001968,-0.000655,0.002501,-0.001268,-0.003843,-0.001417,-0.003532,-0.001376
0,0.00000,0.000000,0.000000,0.000000,0.000000,1.000000,0.012573,0.095459,-0.129761,⋯,0.000926,0.001922,0.002653,0.002949,0.004833,0.002409,-0.001778,-0.005009,-0.001427,0.000268
0,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,-0.016174,-0.002609,⋯,-0.000933,0.003387,0.003214,-0.000992,0.002140,-0.002987,-0.002409,0.003239,0.000944,0.000175
0,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,-0.012650,⋯,0.001830,-0.002825,-0.001021,-0.005630,-0.000737,0.001626,0.003086,-0.000947,0.009758,0.001248
0,0.00000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.000000,⋯,0.003416,-0.001208,0.002728,0.001072,-0.001986,-0.002262,-0.002268,0.001803,-0.001597,-0.000621


In [197]:
 temp_file <- tempfile()
  system(paste("xzdec", "/home/dmc2245/susie_mwe/01_159913048_162346721_hg19_test.xz", ">", temp_file))
  LD.matrix <- fread(temp_file, header = TRUE, sep = "\t")

ERROR: Error in fread(temp_file, header = TRUE, sep = "\t"): could not find function "fread"


xzdec: /home/dmc2245/susie_mwe/01_159913048_162346721_hg19_test.xz: No such file or directory
