# How to conduct fine-mapping analysis for eQTL data from islet tissue on exon level

Islet exon-level QTL data were downloaded from Viñuela, A., Varshney, A., van de Bunt, M. et al. Genetic variant effects on gene expression in human pancreatic islets and their implications for T2D. Nat Commun 11, 4912 (2020). https://doi.org/10.1038/s41467-020-18581-8. Specifically, the list of lead signals and the full summary statistic can be downloaded from https://zenodo.org/record/3408356

This documentation contains instruction on how to conduct fine-mapping analysis for islet exon-level eQTL data.

In this analysis, we employed genotype data from 40,000 unrelated British individuals in the UK Biobank.

We thank Dr. Arushi Varshney (Parker Lab) for their valuable support in shaping the analysis strategies and code development.

## Step 1: LiftOver the summary data from hg19 to hg38

Inputs from this analysis involved the UKBB reference data from 40K unrelated individuals, full summary stat file `InsPIRE_Islets_Exons_eQTLs_Nominal_Pvalues.txt.gz` and list of eGenes (i.e. genes with significant correlation with changes in expression) `PacreaticIslets_independent_Exons_eQTLs.txt`. See example of a Snakemake file at `scripts/hg19liftOverToHg38.sf` and `scripts/hg19liftOverToHg38_leads.sf`.

The example Snakefile `scripts/hg19liftOverToHg38.sf` was developed to conduct the follwing steps:
- Step 1 (`rule ukbb_rsids`): this rule excludes genotypes from the UKBB reference vcf files to make a smaller vcf file per chromosome.
- Step 2 (`rule merge_ukbb`): this rule merges the smaller vcf file per chromosome to create a genome-wide reference file.
- Step 3 (`rule liftover`): this rule translates the hg19 coordinates of variants in the summary stat file to hg38. LiftOver tool and hg19toHg19 chain file. To download LiftOver, follow here https://hgdownload.soe.ucsc.edu/downloads.html#liftover
- Step 4 (`rule getNewCoor`): this rule will create a file with the new hg38 coordinates and the summary stats
- Step 5 (`rule alignGenes`): this rule will keep SNPs within 500kb of gene TSS only, because after liftover some gene and SNP coordinates may change.
- Step 6 (`rule splitGene`): this rule gives per-feature (e.g., per exon) variants and summary stats
- Step 7 (`rule getSummStatEqtl`): this rule merges variants and summ. stats across all chromosomes and index the file

The Snakefile `scripts/hg19liftOverToHg38_leads.sf` is organized similar to `scripts/hg19liftOverToHg38.sf`; however, it has a different purpose as it was used to liftover lead SNPs of eGenes.

It is worth noting that although we liftover'd all variant coordinates from hg19 to hg38, all exon IDs are kept the same as those in the original study for tracking purpose.

## Step 2: Set up data for the fine-mapping pipeline

We set up a file with gene-level summary stat files for all lead signals, which will be used in the next step.

In [5]:
library(dplyr)

#ind are SNPs in hg19 genome
# exons
ind <- read.table("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/data/PacreaticIslets_independent_Exons_eQTLs.txt", header = T)
ind <- distinct(ind)
ind$DiscoveryOrder2 <- ifelse(ind$PvalueOrder == 1, "P", "S")
ind$GeneStableID <- unlist(lapply(strsplit(ind$GeneID, '\\.'), '[', 1))
head(ind)

Unnamed: 0_level_0,GeneName,Strand,GencodeLevel,GeneType,GeneID,ChrPheno,StartPheno,EndPheno,BestExonID,NumExons,⋯,Slope,EmpiricalAdjustedPval,BetaAdjustedPval,eQTLnum,OrderByDistanceTSS,OrderBySlope,PvalueOrder,Probability,DiscoveryOrder2,GeneStableID
Unnamed: 0_level_1,<chr>,<chr>,<int>,<chr>,<chr>,<int>,<int>,<int>,<chr>,<int>,⋯,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>,<dbl>,<chr>,<chr>
1,GCLC,-,2,protein_coding,ENSG00000001084.6,6,53481768,53481768,ENSG00000001084.6_53373974_53375246,14,⋯,-0.188911,0.000999001,0.00238377,1,1,1,1,0.27124,P,ENSG00000001084
2,STPG1,-,2,protein_coding,ENSG00000001460.13,1,24743424,24743424,ENSG00000001460.13_24717693_24718169,10,⋯,0.320629,0.020979,0.0208192,2,1,2,2,0.400682,S,ENSG00000001460
3,STPG1,-,2,protein_coding,ENSG00000001460.13,1,24743424,24743424,ENSG00000001460.13_24717693_24718169,10,⋯,-0.790738,0.000999001,4.74265e-44,1,1,1,1,0.863734,P,ENSG00000001460
4,ENPP4,+,2,protein_coding,ENSG00000001561.6,6,46097730,46097730,ENSG00000001561.6_46107288_46108146,4,⋯,0.110524,0.011988,0.0125329,2,1,2,2,0.212939,S,ENSG00000001561
5,ENPP4,+,2,protein_coding,ENSG00000001561.6,6,46097730,46097730,ENSG00000001561.6_46111013_46114436,4,⋯,-0.169294,0.000999001,1.15947e-06,1,1,1,1,0.34211,P,ENSG00000001561
6,ANKIB1,+,2,protein_coding,ENSG00000001629.5,7,91875548,91875548,ENSG00000001629.5_91948934_91949280,21,⋯,-0.425168,0.000999001,7.29654e-05,1,2,1,1,0.258469,P,ENSG00000001629


In [6]:
#lead are SNPs in hg38 genome and matching with ukbb
lead <- read.table("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_leads/eqtl_Exons.bed.gz",
                   comment.char = "", header = T, fill = T)
lead <- inner_join(lead, ind[, c("Nominal_Pval",  "GeneStableID", "DiscoveryOrder2")], by = c("Pvalue" = "Nominal_Pval", "GeneStableID" = "GeneStableID"))
lead <- distinct(lead)
df38 <- data.frame(seqnames = lead$X.snp_chrom, start = lead$snp_start, end = lead$snp_end,
                   name = paste0(lead$GeneName, "__", lead$SNP, "__", lead$ExonsID, "__", lead$DiscoveryOrder2),
                   gene_id = lead$GeneStableID, exons_id = lead$ExonsID)

head(df38)

Unnamed: 0_level_0,seqnames,start,end,name,gene_id,exons_id
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>
1,chr1,958250,958251,NOC2L__rs6605069__ENSG00000188976.6_888555_888668__P,ENSG00000188976,ENSG00000188976.6_888555_888668
2,chr1,998409,998410,RP11-54O7.17__rs9442392__ENSG00000272512.1_931346_933431__P,ENSG00000272512,ENSG00000272512.1_931346_933431
3,chr1,1063201,1063202,RP11-54O7.18__rs3813194__ENSG00000273443.1_998441_998668__P,ENSG00000273443,ENSG00000273443.1_998441_998668
4,chr1,1303111,1303112,MRPL20__rs12408158__ENSG00000242485.1_1340489_1342399__P,ENSG00000242485,ENSG00000242485.1_1340489_1342399
5,chr1,1385552,1385553,RP4-758J18.13__rs9661288__ENSG00000272455.1_1344476_1345998__P,ENSG00000272455,ENSG00000272455.1_1344476_1345998
6,chr1,1407231,1407232,B3GALT6__rs2275915__ENSG00000176022.3_1167629_1170421__P,ENSG00000176022,ENSG00000176022.3_1167629_1170421


In [7]:
files <- list.files("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/", pattern = "gz")
files <- files[grep("tbi", files, invert = T)]
file_df <- data.frame(eqtl_input = files)
file_df$gene <- unlist(lapply(strsplit(file_df$eqtl_input, '__'), '[', 1))
file_df$GeneStableID <- unlist(lapply(strsplit(file_df$gene, '\\.'), '[', 1))

df38 <- inner_join(df38, file_df, by = c("gene_id" = "GeneStableID"))
df38$eqtl_input <- paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/",
                          df38$eqtl_input)

tss <- read.table("/scratch/scjp_root/scjp99/vthihong/genome/geneTSS.bed", header = F)
df38 <- inner_join(df38, tss[, c(1, 2, 3, 6)], by = c("gene_id" = "V6"))
df <- df38[, c("V1", "V2", "V3", "name", "gene_id", "eqtl_input", "exons_id")]
colnames(df) <- c("chr", "start", "end", "locus", "gene_id", "eqtl_input", "exons_id")
head(df)

write.table(df, row.names = F, sep = "\t", quote = F,
            "/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/data/exons_eQTLs-selected.tsv")

Unnamed: 0_level_0,chr,start,end,locus,gene_id,eqtl_input,exons_id
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>
1,chr1,959309,959310,NOC2L__rs6605069__ENSG00000188976.6_888555_888668__P,ENSG00000188976,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000188976.6__InsPIRE_Islets_Exons__1:944203.bed.gz,ENSG00000188976.6_888555_888668
2,chr1,998051,998052,RP11-54O7.17__rs9442392__ENSG00000272512.1_931346_933431__P,ENSG00000272512,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000272512.1__InsPIRE_Islets_Exons__1:995966.bed.gz,ENSG00000272512.1_931346_933431
3,chr1,1063288,1063289,RP11-54O7.18__rs3813194__ENSG00000273443.1_998441_998668__P,ENSG00000273443,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000273443.1__InsPIRE_Islets_Exons__1:1062208.bed.gz,ENSG00000273443.1_998441_998668
4,chr1,1407313,1407314,MRPL20__rs12408158__ENSG00000242485.1_1340489_1342399__P,ENSG00000242485,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000242485.1__InsPIRE_Islets_Exons__1:1401909.bed.gz,ENSG00000242485.1_1340489_1342399
5,chr1,1409096,1409097,RP4-758J18.13__rs9661288__ENSG00000272455.1_1344476_1345998__P,ENSG00000272455,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000272455.1__InsPIRE_Islets_Exons__1:1409096.bed.gz,ENSG00000272455.1_1344476_1345998
6,chr1,1232265,1232266,B3GALT6__rs2275915__ENSG00000176022.3_1167629_1170421__P,ENSG00000176022,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000176022.3__InsPIRE_Islets_Exons__1:1232237.bed.gz,ENSG00000176022.3_1167629_1170421


Save the `df` object in a file named `exons_eQTLs-selected.tsv`

## Step 3: Set up scripts for every eGene of interest

First, we need to set up a config file with some house-keeping information such as directory of files and parameters. See example in `config.yaml`. The file `exons_eQTLs-selected.tsv` is used for `trait1-leads` and `selected-stats`. Then, we can use `scripts/make_susie-sh.py` script to create a SLURM job per region of interest.

Important note: the script `scripts/make_susie-sh.py` requires two other scripts that should be specified in the config file, namely:
```
prep-template: "{base}/scripts/dosage-template.sh"
susie-template: "{base}/scripts/susie-template.sh"
```

```
cd /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_exonQTL-inspire-susie/results/susie-region

python /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_exonQTL-inspire-susie/scripts/make_susie-sh.py --config /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_exonQTL-inspire-susie/scripts/config.yaml
```

At this point, we have a series of individual scripts for each region, with names in the format `exons_eQTLs__<locus name, no other special characters like ;() etc>__<lead snp rsid>__<exon IDs>__<primary P or secondary S>__<region>__<window>.susieprep.sh` and `exons_eQTLs__<locus name, no other special characters like ;() etc>__<lead snp rsid>__<exon IDs>__<primary P or secondary S>__<region>__<window>.susie.sh`. The `*susieprep.sh` is necessary to fetch information such as variants and dosages. The `*susie.sh` is to run the fine-mapping analysis.

Example of a `susieprep.sh` file is as the following:
```
cat exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.susieprep.sh

#!/bin/bash

## fetch variants in the region and intersect UKBB and FUSION vcfs
for i in /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/0_data/hg38/chr22.imputed.poly.vcf.gz; do tabix $i chr22:42471298-42971299 | awk '{if (($0 !~ /^#/ && $0 !~ /^chr/)) print "chr"$0; else print $0}' ; done | sort | uniq > exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.genotypes
zcat /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/0_data/hg38/chr22.imputed.poly.vcf.gz | head -10000 | awk '{if (($0 ~ /^#/)) print $0}' > exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.header
cat exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.header exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.genotypes | bgzip -c > exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.vcf.gz; tabix exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.vcf.gz
rm exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.genotypes exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.header

## fetch UKBB dosages 
zcat exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.vcf.gz | head -10000 | awk -F'\t' '{if (($0 ~/^#CHROM/)) print $0}' OFS='\t' | sed -e 's:#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT:ID:g' > exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb-header.txt 
bcftools query -f "%ID-%REF-%ALT[\t%DS]\n" exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.vcf.gz | cat exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb-header.txt - > exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb-dosages.tsv 

## bgzip to save space
module load Bioinformatics
module load Bioinformatics  gcc/10.3.0-k2osx5y
module load samtools/1.13-fwwss5n

bgzip -@ 2 exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb-dosages.tsv

## cleanup
rm -rf exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb-header.txt exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb.vcf.gz*
```

Example of a `susie.sh` file is as the following:
```
cat exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.susie.sh 

#!/bin/bash

################## running SuSiE for  exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb:

## Susie 
/scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_exonQTL-inspire-susie/scripts/susie-eqtl.R --prefix exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb --type quant --beta beta --p p_nominal --effect ALT --non_effect REF --sdY 1 --coverage 0.95 --maxit 10000 --min_abs_corr 0.1 --s_threshold 0.3 --number_signals_default 10 --number_signals_high_s 1 --marker rs8138197 --trait1 /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000128274.11__InsPIRE_Islets_Exons__22:42692121.bed.gz --trait1_ld exons_eQTLs__A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299__250kb.ukbb-dosages.tsv.gz  --exon_id ENSG00000128274.11_43088127_43090003
```

## Step 4: Conduct fine-mapping analysis for all regions of interest

After we set up analysis scripts for each eGene, we can run the analysis for every eGene using Snakemake. See example of a Snakemake file at `scripts/susie.sf`.

Signals of each eGene by default will be saved in a R object names `*.susie.Rda`.

## Step 5: Obtain output files for PanKgraph

For the purpose of PanKgraph, we will extract some outputs into text files. Example of code is the following:

In [8]:
library(glue)
library(tidyr)
suppressPackageStartupMessages(library(dplyr))

In [9]:
process_dosage = function(f, snplist){
    ld = read.csv(f, sep='\t', check.names = F)
    dups = ld[ (duplicated(ld$ID) | duplicated(ld$ID, fromLast = TRUE)),]
    print(glue("N duplicates = {nrow(dups)}"))
    ld = ld[! (duplicated(ld$ID) | duplicated(ld$ID, fromLast = TRUE)),]
    row.names(ld) = ld$ID
    ld$ID = NULL
    idlist = intersect(snplist, row.names(ld))
    ld = ld[idlist,]
    print(ld[1:5, 1:10])
    ld = cor(t(ld))
    return(ld)
}

meta <- read.table("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/data/exons_eQTLs-selected.tsv", header = T)
meta <- distinct(meta)
head(meta)

Unnamed: 0_level_0,chr,start,end,locus,gene_id,eqtl_input,exons_id
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>
1,chr1,959309,959310,NOC2L__rs6605069__ENSG00000188976.6_888555_888668__P,ENSG00000188976,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000188976.6__InsPIRE_Islets_Exons__1:944203.bed.gz,ENSG00000188976.6_888555_888668
2,chr1,998051,998052,RP11-54O7.17__rs9442392__ENSG00000272512.1_931346_933431__P,ENSG00000272512,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000272512.1__InsPIRE_Islets_Exons__1:995966.bed.gz,ENSG00000272512.1_931346_933431
3,chr1,1063288,1063289,RP11-54O7.18__rs3813194__ENSG00000273443.1_998441_998668__P,ENSG00000273443,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000273443.1__InsPIRE_Islets_Exons__1:1062208.bed.gz,ENSG00000273443.1_998441_998668
4,chr1,1407313,1407314,MRPL20__rs12408158__ENSG00000242485.1_1340489_1342399__P,ENSG00000242485,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000242485.1__InsPIRE_Islets_Exons__1:1401909.bed.gz,ENSG00000242485.1_1340489_1342399
5,chr1,1409096,1409097,RP4-758J18.13__rs9661288__ENSG00000272455.1_1344476_1345998__P,ENSG00000272455,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000272455.1__InsPIRE_Islets_Exons__1:1409096.bed.gz,ENSG00000272455.1_1344476_1345998
6,chr1,1232265,1232266,B3GALT6__rs2275915__ENSG00000176022.3_1167629_1170421__P,ENSG00000176022,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/2_t1d-eQTL-coloc/results/hg38/eqtl_Exons_indexed/ENSG00000176022.3__InsPIRE_Islets_Exons__1:1232237.bed.gz,ENSG00000176022.3_1167629_1170421


In [11]:
l <- "A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P__chr22-42471298-42971299"
input <- meta[meta$locus == "A4GALT__rs8138197__ENSG00000128274.11_43088127_43090003__P", "eqtl_input"]

for (k in 1:length(input)) {
        qtl <- read.csv(input[k], sep='\t', header=T, check.names=F)
        qtl$snp <- paste0(qtl$SNP, "-", qtl$REF, "-", qtl$ALT)
        qtl$Slope <- qtl$Slope / qtl$multiply #get the slope originally reported by the study
        load(paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/results/susie-prep/exons_eQTLs__", 
                    l, "__250kb.susie.Rda"))

        if (length(S2$sets$cs) > 0) {
        for (j in 1:length(S2$sets$cs)) {
                pip <- data.frame(pip=S2$pip[names(S2$sets$cs[[j]])])
                if (S2$sets$coverage[[j]] < 0.95) {
                        print(names(S2$sets$cs[[j]]))
                        next
                }

                pip$snp <- row.names(pip)
                pip <- inner_join(pip, qtl[,c("snp", "Pvalue", "effect_allele", "other_allele", "Slope")])
                idx = S2$sets$cs_index[j]
                isnps = colnames(S2$lbf_variable)
                bf = S2$lbf_variable[idx, isnps, drop=FALSE]
                bf = data.frame(snp = isnps, lbf = t(bf)[,1])
                pip <- inner_join(pip, bf, by = c("snp" = "snp"))
                print(head(pip))
                colnames(pip) <- c("pip", "snp", "nominal_p", "effect_allele", "other_allele", "slope", "lbf")

                ldf <- process_dosage(paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/results/susie-prep/exons_eQTLs__", 
                                             l, "__250kb.ukbb-dosages.tsv.gz"), pip$snp)
                ldf <- ldf**2
                colnames(ldf) <- stringr::str_extract(colnames(ldf), "[^-]*")
                rownames(ldf) <- stringr::str_extract(rownames(ldf), "[^-]*")
                #write.table(ldf, paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/results/susie/exons_eQTLs__", 
                #                            report$V1[i], "__250kb__credibleSet", j, "__ld.txt"), sep = "\t", quote = F)

                pip$snp <- stringr::str_extract(pip$snp, "[^-]*")
                print(head(pip))
                #write.table(pip[, c("snp", "pip", "nominal_p", "effect_allele", "other_allele", "slope", "lbf")], 
                #            paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/results/susie/exons_eQTLs__", l, "__250kb__credibleSet", j, ".txt"), row.names = F, sep = "\t", quote = F)
                }
        }

        if (length(S2$sets$cs) > 0) {
            purity <- c()
            coverage <- c()
            p <- data.frame(locus = rep(l, length(S2$sets$cs)), purity = NA, coverage = NA)
            for (j in 1:length(S2$sets$cs)) {
                coverage <- c(coverage, S2$sets$coverage[[j]])
                purity <- c(purity, S2$sets$purity[j, 1])
            }
            p$purity <- purity
            p$coverage <- coverage
            p$credibleset <- 1:length(S2$sets$cs)
            print(head(p))
            #write.table(p, paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_exonQTL-inspire-susie/results/susie/purity/exon_eQTLs__", 
            #                l, ".txt"), row.names = F, sep = "\t", quote = F)
        }
}


[1m[22mJoining with `by = join_by(snp)`


        pip           snp      Pvalue effect_allele other_allele     Slope
1 0.3220922 rs5751348-C-A 8.67839e-17             A            C -0.174645
2 0.3220922 rs5751348-C-A 1.73304e-06             A            C -0.241271
3 0.3220922 rs5751348-C-A 3.52230e-13             A            C -0.268396
4 0.3222278 rs2143918-A-C 8.68527e-17             C            A -0.174644
5 0.3222278 rs2143918-A-C 1.73689e-06             C            A -0.241249
6 0.3222278 rs2143918-A-C 3.54455e-13             C            A -0.268367
       lbf
1 33.71260
2 33.71260
3 33.71260
4 33.71303
5 33.71303
6 33.71303
N duplicates = 0
              1000251 1000534 1000542 1000766 1000898 1000924 1000961 1001059
rs5751348-C-A       2       2       0       0       2       2       0       2
rs2143918-A-C       2       2       0       0       2       2       0       2
rs8138197-G-A       2       2       0       0       2       2       0       2
NA                 NA      NA      NA      NA      NA      NA      NA