# How to conduct fine-mapping analysis for eQTL data from pancreatic tissue on splicing level

Pancreatic gene-level eQTL data were downloaded from https://www.gtexportal.org/home/downloads/adult-gtex/qtl from GTEx v8.

This documentation contains instruction on how to conduct fine-mapping analysis for pancreatic splicing-level QTL data.

In this analysis, we employed genotype data from 40,000 unrelated British individuals in the UK Biobank.

We thank Dr. Arushi Varshney (Parker Lab) for their valuable support in shaping the analysis strategies and code development.

## Step 1: Set up data for the fine-mapping pipeline

We set up a file with gene-level summary stat files for all lead signals, which will be used in the next step.

### Step 1.1: Convert `parquet` format

Data from GTEx is in `parquet` format which can be converted into txt files using the following code:

In [None]:
library("dplyr")
library("tidyr")
library("data.table")
library(arrow)

files <- list.files("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/0_data/GTEx_EUR_sQTL/", pattern="parquet")
files<-files[11:length(files)]
for (i in files) {
        chr <- gsub(".parquet", "", gsub("GTEx_Analysis_v8_QTLs_GTEx_Analysis_v8_EUR_sQTL_all_associations_Pancreas.v8.EUR.sqtl_allpairs.", "", i))
        a <- read_parquet(paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/0_data/GTEx_EUR_sQTL/", i))
        a <- a[,1:(ncol(a)-1)]
        setDT(a)
        a[, c("snp_chr", "snp_stop", "ref_gtex", "alt_gtex", "gtex_code") := tstrsplit(variant_id, "_")]
        a[, c("pheno_chr", "pheno_start", "pheno_stop", "pheno_clu", "gene_id") := tstrsplit(phenotype_id, ":")]
        a <- as.data.frame(a)
        a$snp_stop <- as.numeric(a$snp_stop)
        a$snp_start <- a$snp_stop-1
        write.table(a, paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/", chr, ".bed"), col.names=T, row.names=F, sep="\t", quote=F)
}

### Step 1.2: Split all variants based on feature-level (in this case, gene-level)

GTEx has variant names in the form `chr1_666028_G_A_b38` which is different from our reference data, so we need to map GTEx variants to our reference. Additionally, we'd want variants for each gene in its own separate files for downstream steps.

This step can be done with Snakemake. See example Snakemake file at `scripts/splitGenes.sf`. This Snakemake task utilizes a genome-wide vcf file which could be obtained using instruction in Step 1 of fine-mapping eQTL InsPIRE code. Then every gene file needs indexing, which can be done using the following code
```
cd /nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed
module load Bioinformatics
module load Bioinformatics  gcc/10.3.0-k2osx5y
module load samtools/1.13-fwwss5n

chr=$1

a=`ls *__"$chr":*bed`

for i in $a
do
bgzip -@ 2 $i
tabix --preset=bed "$i".gz
done

```

To map GTEx variant names to our reference data for lead SNPs (which is supplied in the file `Pancreas.v8.sgenes.txt.gz`), one can use the script `scripts/getHg38SummStats_sQTLleads.R`. The end result of this script is a file named `sQTL_EUR_leads.txt`, which should be indexed, and it will be used in the next step.

### Step 1.3: Set up a file with gene-level summary stat files for all lead signals

In [1]:
library(dplyr)


ind <- read.table("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/data/sQTL_EUR_leads.txt.gz", header = F)
df38 <- ind[, c("V9", "V4", "V19")]


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [2]:
files <- list.files("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/", pattern = "gz")
files <- files[grep("tbi", files, invert = T)]
file_df <- data.frame(eqtl_input = files)
file_df$gene <- unlist(lapply(strsplit(file_df$eqtl_input, '__'), '[', 1))
file_df$clu <- unlist(lapply(strsplit(file_df$eqtl_input, '__'), '[', 2))
df38 <- inner_join(df38, file_df, by = c("V9" = "gene", "V19" = "clu"))
df38$eqtl_input <- paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/",
                          df38$eqtl_input)
df38$gene_id <- unlist(lapply(strsplit(df38$V9, '\\.'), '[', 1))
head(df38)

Unnamed: 0_level_0,V9,V4,V19,eqtl_input,gene_id
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>
1,ENSG00000188976.10,rs111463901,chr1:946545:948131:clu_38413:ENSG00000188976.10,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000188976.10__chr1:946545:948131:clu_38413:ENSG00000188976.10__GTEx_Pancreas_sGene__1:944582.bed.gz,ENSG00000188976
2,ENSG00000188157.14,rs551305078,chr1:1054551:1054824:clu_38443:ENSG00000188157.14,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000188157.14__chr1:1054551:1054824:clu_38443:ENSG00000188157.14__GTEx_Pancreas_sGene__1:1020123.bed.gz,ENSG00000188157
3,ENSG00000215790.6,rs11260600,chr1:1722831:1723800:clu_38554:ENSG00000215790.6,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000215790.6__chr1:1722831:1723800:clu_38554:ENSG00000215790.6__GTEx_Pancreas_sGene__1:1724838.bed.gz,ENSG00000215790
4,ENSG00000078808.16,rs60252802,chr1:1228946:1231892:clu_38455:ENSG00000078808.16,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000078808.16__chr1:1228946:1231892:clu_38455:ENSG00000078808.16__GTEx_Pancreas_sGene__1:1216908.bed.gz,ENSG00000078808
5,ENSG00000169972.11,rs115316182,chr1:1310688:1310909:clu_38477:ENSG00000169972.11,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000169972.11__chr1:1310688:1310909:clu_38477:ENSG00000169972.11__GTEx_Pancreas_sGene__1:1308567.bed.gz,ENSG00000169972
6,ENSG00000221978.11,rs307377,chr1:1388065:1388626:clu_38489:ENSG00000221978.11,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000221978.11__chr1:1388065:1388626:clu_38489:ENSG00000221978.11__GTEx_Pancreas_sGene__1:1386482.bed.gz,ENSG00000221978


In [3]:
tss <- read.table("/scratch/scjp_root/scjp99/vthihong/genome/geneTSS.bed", header = F)
df38 <- inner_join(df38, tss[, c(1, 2, 3, 4, 6)], by = c("gene_id" = "V6"))
df38$locus <- paste0(df38$V4.y, "__", df38$V4.x, "__", df38$V19, "__P")
df <- df38[, c("V1", "V2", "V3", "locus", "gene_id", "eqtl_input")]
colnames(df) <- c("chr", "start", "end", "locus", "gene_id", "eqtl_input")
head(df)
write.table(df, row.names = F, sep = "\t", quote = F,
            "/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/data/gene_sQTLs-selected.tsv")

Unnamed: 0_level_0,chr,start,end,locus,gene_id,eqtl_input
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>
1,chr1,959309,959310,NOC2L__rs111463901__chr1:946545:948131:clu_38413:ENSG00000188976.10__P,ENSG00000188976,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000188976.10__chr1:946545:948131:clu_38413:ENSG00000188976.10__GTEx_Pancreas_sGene__1:944582.bed.gz
2,chr1,1020123,1020124,AGRN__rs551305078__chr1:1054551:1054824:clu_38443:ENSG00000188157.14__P,ENSG00000188157,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000188157.14__chr1:1054551:1054824:clu_38443:ENSG00000188157.14__GTEx_Pancreas_sGene__1:1020123.bed.gz
3,chr1,1745992,1745993,SLC35E2__rs11260600__chr1:1722831:1723800:clu_38554:ENSG00000215790.6__P,ENSG00000215790,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000215790.6__chr1:1722831:1723800:clu_38554:ENSG00000215790.6__GTEx_Pancreas_sGene__1:1724838.bed.gz
4,chr1,1232031,1232032,SDF4__rs60252802__chr1:1228946:1231892:clu_38455:ENSG00000078808.16__P,ENSG00000078808,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000078808.16__chr1:1228946:1231892:clu_38455:ENSG00000078808.16__GTEx_Pancreas_sGene__1:1216908.bed.gz
5,chr1,1308567,1308568,PUSL1__rs115316182__chr1:1310688:1310909:clu_38477:ENSG00000169972.11__P,ENSG00000169972,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000169972.11__chr1:1310688:1310909:clu_38477:ENSG00000169972.11__GTEx_Pancreas_sGene__1:1308567.bed.gz
6,chr1,1399328,1399329,CCNL2__rs307377__chr1:1388065:1388626:clu_38489:ENSG00000221978.11__P,ENSG00000221978,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000221978.11__chr1:1388065:1388626:clu_38489:ENSG00000221978.11__GTEx_Pancreas_sGene__1:1386482.bed.gz


Save the `df` object in a file named `gene_sQTLs-selected.tsv`

## Step 3: Set up scripts for every eGene of interest

First, we need to set up a config file with some house-keeping information such as directory of files and parameters. See example in `config.yaml`. The file `gene_sQTLs-selected.tsv` is used for `trait1-leads` and `selected-stats`. Then, we can use `scripts/make_susie-sh.py` script to create a SLURM job per region of interest.

Important note: the script `scripts/make_susie-sh.py` requires two other scripts that should be specified in the config file, namely:
```
prep-template: "{base}/scripts/dosage-template.sh"
susie-template: "{base}/scripts/susie-template.sh"
```

```
cd /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_sQTL-gtex-susie/results/susie-region

python /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_sQTL-gtex-susie/scripts/make_susie-sh.py --config /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_sQTL-gtex-susie/scripts/config.yaml
```

At this point, we have a series of individual scripts for each region, with names in the format `gene_eQTLs__<locus name, no other special characters like ;() etc>__<lead snp rsid>__<primary P or secondary S>__<region>__<window>.susieprep.sh` and `gene_eQTLs___<locus name, no other special characters like ;() etc>__<lead snp rsid>__<primary P or secondary S>__<region>__<window>.susie.sh`. The `*susieprep.sh` is necessary to fetch information such as variants and dosages. The `*susie.sh` is to run the fine-mapping analysis.

Example of a `susieprep.sh` file is as the following:
```
cat gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.susieprep.sh

#!/bin/bash

## fetch variants in the region and intersect UKBB and FUSION vcfs
for i in /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/0_data/hg38/chr7.imputed.poly.vcf.gz; do tabix $i chr7:117215784-117715785 | awk '{if (($0 !~ /^#/ && $0 !~ /^chr/)) print "chr"$0; else print $0}' ; done | sort | uniq > gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.genotypes
zcat /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/0_data/hg38/chr7.imputed.poly.vcf.gz | head -10000 | awk '{if (($0 ~ /^#/)) print $0}' > gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.header
cat gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.header gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.genotypes | bgzip -c > gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.vcf.gz; tabix gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.vcf.gz
rm gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.genotypes gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.header

## fetch UKBB dosages 
zcat gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.vcf.gz | head -10000 | awk -F'\t' '{if (($0 ~/^#CHROM/)) print $0}' OFS='\t' | sed -e 's:#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT:ID:g' > gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb-header.txt 
bcftools query -f "%ID-%REF-%ALT[\t%DS]\n" gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.vcf.gz | cat gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb-header.txt - > gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb-dosages.tsv 

## bgzip to save space
module load Bioinformatics
module load Bioinformatics  gcc/10.3.0-k2osx5y
module load samtools/1.13-fwwss5n

bgzip -@ 2 gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb-dosages.tsv

## cleanup
rm -rf gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb-header.txt gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb.vcf.gz*
```

Example of a `susie.sh` file is as the following:
```
cat gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.susie.sh

#!/bin/bash

################## running SuSiE for  gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb:

## Susie 
/scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/1_sQTL-gtex-susie/scripts/susie-eqtl.R --prefix gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb --type quant --beta slope --p pval_nominal --se slope_se --effect ALT --non_effect REF --sdY 1 --coverage 0.95 --maxit 10000 --min_abs_corr 0.1 --s_threshold 0.3 --number_signals_default 10 --number_signals_high_s 1 --marker rs2402203 --trait1 /scratch/scjp_root/scjp99/vthihong/2_PanKBase/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000001626.14__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__GTEx_Pancreas_sGene__7:117465784.bed.gz --trait1_ld gene_sQTLs__CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785__250kb.ukbb-dosages.tsv.gz 
```

## Step 4: Conduct fine-mapping analysis for all regions of interest

After we set up analysis scripts for each eGene, we can run the analysis for every eGene using Snakemake. See example of a Snakemake file at `scripts/susie.sf`.

Signals of each eGene by default will be saved in a R object names `*.susie.Rda`.

## Step 5: Obtain output files for PanKgraph

For the purpose of PanKgraph, we will extract some outputs into text files. Example of code is the following:

In [4]:
library(glue)
library(tidyr)
suppressPackageStartupMessages(library(dplyr))

In [5]:
process_dosage = function(f, snplist){
    ld = read.csv(f, sep='\t', check.names = F)
    dups = ld[ (duplicated(ld$ID) | duplicated(ld$ID, fromLast = TRUE)),]
    print(glue("N duplicates = {nrow(dups)}"))
    ld = ld[! (duplicated(ld$ID) | duplicated(ld$ID, fromLast = TRUE)),]
    row.names(ld) = ld$ID
    ld$ID = NULL
    idlist = intersect(snplist, row.names(ld))
    ld = ld[idlist,]
    print(ld[1:5, 1:10])
    ld = cor(t(ld))
    return(ld)
}

meta <- read.table("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/data/gene_sQTLs-selected.tsv", header = T)
meta <- distinct(meta)
head(meta)

Unnamed: 0_level_0,chr,start,end,locus,gene_id,eqtl_input
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>
1,chr1,959309,959310,NOC2L__rs111463901__chr1:946545:948131:clu_38413:ENSG00000188976.10__P,ENSG00000188976,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000188976.10__chr1:946545:948131:clu_38413:ENSG00000188976.10__GTEx_Pancreas_sGene__1:944582.bed.gz
2,chr1,1020123,1020124,AGRN__rs551305078__chr1:1054551:1054824:clu_38443:ENSG00000188157.14__P,ENSG00000188157,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000188157.14__chr1:1054551:1054824:clu_38443:ENSG00000188157.14__GTEx_Pancreas_sGene__1:1020123.bed.gz
3,chr1,1745992,1745993,SLC35E2__rs11260600__chr1:1722831:1723800:clu_38554:ENSG00000215790.6__P,ENSG00000215790,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000215790.6__chr1:1722831:1723800:clu_38554:ENSG00000215790.6__GTEx_Pancreas_sGene__1:1724838.bed.gz
4,chr1,1232031,1232032,SDF4__rs60252802__chr1:1228946:1231892:clu_38455:ENSG00000078808.16__P,ENSG00000078808,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000078808.16__chr1:1228946:1231892:clu_38455:ENSG00000078808.16__GTEx_Pancreas_sGene__1:1216908.bed.gz
5,chr1,1308567,1308568,PUSL1__rs115316182__chr1:1310688:1310909:clu_38477:ENSG00000169972.11__P,ENSG00000169972,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000169972.11__chr1:1310688:1310909:clu_38477:ENSG00000169972.11__GTEx_Pancreas_sGene__1:1308567.bed.gz
6,chr1,1399328,1399329,CCNL2__rs307377__chr1:1388065:1388626:clu_38489:ENSG00000221978.11__P,ENSG00000221978,/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/3_t1d-eQTL_GTEx-coloc/results/GTEx_EUR_sQTL/gtex_indexed/ENSG00000221978.11__chr1:1388065:1388626:clu_38489:ENSG00000221978.11__GTEx_Pancreas_sGene__1:1386482.bed.gz


In [7]:
l <- "CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P__chr7-117215784-117715785"
input <- meta[meta$locus == "CFTR__rs2402203__chr7:117542108:117559464:clu_25847:ENSG00000001626.14__P", "eqtl_input"]

qtl <- read.csv(input, sep='\t', header=T, check.names=F)
qtl$snp <- paste0(qtl$SNP, "-", qtl$REF, "-", qtl$ALT)

load(paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/results/susie-prep/gene_sQTLs__", l, "__250kb.susie.Rda"))

if (length(S2$sets$cs) > 0) {
        for (j in 1:length(S2$sets$cs)) {
            pip <- data.frame(pip=S2$pip[names(S2$sets$cs[[j]])])
            if (S2$sets$coverage[[j]] < 0.95) {
                print(names(S2$sets$cs[[j]]))
                next
            }

            pip$snp <- row.names(pip)
            pip <- inner_join(pip, qtl[,c("snp", "pval_nominal", "alt_gtex", "ref_gtex", "slope")]) #The effect sizes of eQTLs are defined as the effect of the alternative allele (ALT) relative to the reference (REF) allele in the human genome reference. In other words, the eQTL effect allele is the ALT allele, not the minor allele. https://gtexportal.org/home/faq
            print(head(pip))

            idx = S2$sets$cs_index[j]
            isnps = colnames(S2$lbf_variable)
            bf = S2$lbf_variable[idx, isnps, drop=FALSE]
            bf = data.frame(snp = isnps, lbf = t(bf)[,1])
            pip <- inner_join(pip, bf, by = c("snp" = "snp"))
            colnames(pip) <- c("pip", "snp", "nominal_p", "effect_allele", "other_allele", "slope", "lbf")

            ldf <- process_dosage(paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/results/susie-prep/gene_sQTLs__", l, "__250kb.ukbb-dosages.tsv.gz"), pip$snp)
            ldf <- ldf**2
            colnames(ldf) <- stringr::str_extract(colnames(ldf), "[^-]*")
            rownames(ldf) <- stringr::str_extract(rownames(ldf), "[^-]*")
            #write.table(ldf, paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/results/susie/", l, "__250kb__credibleSet", j, "__ld.txt"), sep = "\t", quote = F)

            pip$snp <- stringr::str_extract(pip$snp, "[^-]*")
            print(head(pip))
            #write.table(pip[, c("snp", "pip", "nominal_p", "effect_allele", "other_allele", "slope", "lbf")], 
            #            paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/results/susie/", l, "__250kb__credibleSet", j, ".txt"), row.names = F, sep = "\t", quote = F)

            if (length(S2$sets$cs) > 0) {
                purity <- c()
                coverage <- c()
                p <- data.frame(locus = rep(l, length(S2$sets$cs)), purity = NA, coverage = NA)
                for (j in 1:length(S2$sets$cs)) {
                    coverage <- c(coverage, S2$sets$coverage[[j]])
                    purity <- c(purity, S2$sets$purity[j, 1])
                    }
                p$purity <- purity
                p$coverage <- coverage
                p$credibleset <- 1:length(S2$sets$cs)
                print(head(p))
                #write.table(p, paste0("/nfs/turbo/umms-scjp-pank/vthihong/colocGWAS_T1D/1_sQTL-gtex-susie/results/susie/purity/gene_sQTLs__", 
                #                "A1CF", "__", l, ".txt"), row.names = F, sep = "\t", quote = F)
                }
        }
}

[1m[22mJoining with `by = join_by(snp)`


        pip             snp pval_nominal alt_gtex ref_gtex     slope
1 0.5042815    rs177069-A-G  0.000146082        G        A -0.355295
2 0.4955759 rs35320372-A-AT  0.000146082       AT        A -0.355295
N duplicates = 0
                1000251 1000534 1000542 1000766 1000898 1000924 1000961 1001059
rs177069-A-G          1       1       1       1       0       1       1       0
rs35320372-A-AT       1       1       1       1       0       1       1       0
NA                   NA      NA      NA      NA      NA      NA      NA      NA
NA.1                 NA      NA      NA      NA      NA      NA      NA      NA
NA.2                 NA      NA      NA      NA      NA      NA      NA      NA
                1001113 1001172
rs177069-A-G          1       0
rs35320372-A-AT       1       0
NA                   NA      NA
NA.1                 NA      NA
NA.2                 NA      NA
        pip        snp   nominal_p effect_allele other_allele     slope
1 0.5042815   rs177069 0.0001460

[1m[22mJoining with `by = join_by(snp)`


         pip            snp pval_nominal alt_gtex ref_gtex      slope
1 0.05338450  rs2023708-G-A 0.0006463169        A        G -0.3172549
2 0.02907819 rs10264075-G-T 0.0006463169        T        G -0.3172549
3 0.02204619  rs6966836-T-C 0.0005624587        C        T -0.3227859
4 0.02984861  rs6960356-C-T 0.0005624587        T        C -0.3227859
5 0.02777913 rs12534129-T-C 0.0006219257        C        T -0.3249425
6 0.02101997 rs10278130-A-G 0.0005349166        G        A -0.3219897
N duplicates = 0
               1000251 1000534 1000542 1000766 1000898 1000924 1000961 1001059
rs2023708-G-A        2       1       1       1       0       1   0.929       1
rs10264075-G-T       2       1       1       1       0       1   0.996       1
rs6966836-T-C        2       1       1       1       0       1   0.996       1
rs6960356-C-T        2       1       1       1       0       1   1.000       1
rs12534129-T-C       2       1       1       1       0       1   1.000       1
               1001

[1m[22mJoining with `by = join_by(snp)`


         pip                         snp pval_nominal alt_gtex ref_gtex
1 0.69859430               rs2402203-T-C 4.835496e-17        C        T
2 0.21555081              rs11974978-C-T 4.835542e-17        T        C
3 0.08585489 7:117320776_CATTA_C-CATTA-C 4.835542e-17        C    CATTA
      slope
1 -2.361587
2  2.361586
3  2.361586
N duplicates = 0
                            1000251 1000534 1000542 1000766 1000898 1000924
rs2402203-T-C                     2       2       2       2       2  2.0000
rs11974978-C-T                    0       0       0       0       0  0.0353
7:117320776_CATTA_C-CATTA-C       0       0       0       0       0  0.0000
NA                               NA      NA      NA      NA      NA      NA
NA.1                             NA      NA      NA      NA      NA      NA
                            1000961 1001059 1001113 1001172
rs2402203-T-C                     2       2       2       2
rs11974978-C-T                    0       0       0       0
7:117320776