# Liftover conversion from hg19 to hg38

## Aim: 
The GWAS summary statistics were originally based on the hg19 reference genome, whereas our current LD reference panel is hg38-based. To ensure accurate LD positioning for finemapping, we converted the GWAS summary statistics to hg38 using LiftOver.
## Input:
* LiftOver tool: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver
* hg19 → hg38 chain file: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz
* GWAS summary statistics:
1. image_AD: https://drive.google.com/drive/folders/1H1Xj33C-867dxVHOFIh5l_nLluWcnqzx
2. image_aging: https://drive.google.com/drive/folders/1l7BKGK5tDAlRWtHKyyjDo94d4N552wZh
3. MS: https://zenodo.org/records/14548072
* convert.sh: script to convert hg19 to hg38.
* file_path.txt: the file recording the path to hg19 GWAS statistic(.bed)
## Output:
* hg_38 based GWAS summary statistics: `s3://statfungen/ftp_fgc_xqtl/GWAS/image_GWAS_hg38/`
1. image_AD: dne_pheno_normalized_residualized.AD_SurrealGAN_*.glm.linear_hg38.gz
2. image_aging: surrealgan_aging_pheno_normalized_residualized.r1.glm.linear_hg38.gz
3. MS: ms_eur_v4.0.sumstats_hg38.gz

## Procedures:
1. Format GWAS Summary Statistics for Conversion

Convert hg19-based GWAS summary statistics into standard .bed format with the following four required columns:chrom (without the chr prefix) start end region_id (to facilitate merging after conversion)
```
chrom	start	end	region_id
<chr>	<int>	<int>	<chr>
chr5	29439275	29439275	rs667647
chr5	85928892	85928892	rs113534962
```
**Note: LiftOver does not support .bed files with more than six columns. Since AD/aging image GWAS summary statistics share the same variant positions across multiple dimensions, we perform the conversion once for both datasets.**

2. Perform LiftOver Conversion
Use the `convert.sh` script to run LiftOver and map hg19 coordinates to hg38.

**Note: the conversion introduced some additional contigs and scaffolds that represent alternative loci or regions that are difficult to place within the main chromosomes, e.g. 'chr14_GL000009v2_random''chr19_KI270938v1_alt'. They are hard to interpret, so I removed them and only keep chr1-22.**

3. Merge Back to GWAS Summary Statistics
Load the hg38 .bed file and merge it back with the original GWAS summary statistics, preserving all necessary information.

## Simple summary for the conversion

| Studies | before_conversion(original) | unmapped | after_conversion(final) | overall_dropped | proportion_dropped |
|---------|----------------------------|----------|------------------------|-----------------|-------------------|
| MS | 8,957,460 | 1,806 | 8,954,288 | 3,172 | 0.0354% |
| image_AD* | 6,477,810 | 1,225 | 6,475,770 | 2,040 | 0.0315% |
| image_aging* | 8,469,833 | 1,772 | 8,466,963 | 2,870 | 0.0339% |
| longevity | 9,085,648 | 1,101 | 9,083,178 | 2,470 | 0.0272% |
| mvAge | 6,793,878 | (deleted accidentally) | 6,792,478 | 1,400 | 0.0206% |
| PD | 17,510,617 | 1,455 | 17,506,762 | 3,855 | 0.0220% |


## PD

In [1]:
library(data.table)
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.2     [32m✔[39m [34mtibble   [39m 3.3.0
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.1.0     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mbetween()[39m     masks [34mdata.table[39m::between()
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m      masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mfirst()[39m       masks [34mdata.table[39m::first()
[31m✖[39m [34mlubridate[39m::[32mhour()[39m    masks [34mdata.table[39m::hour()
[31m✖[39m [34mlubridate[39m::[32misoweek()[39m masks [34mdata.table[39m::isoweek()
[31m✖[39m 

In [9]:
PD_GWAS = fread("/home/rl3328/GWAS/PD_GWAS/nallsEtAl2019_excluding23andMe_allVariants.tab.zip")

In [10]:
head(PD_GWAS)
dim(PD_GWAS)

SNP,A1,A2,freq,b,se,p,N_cases,N_controls
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>
chr11:88249377,T,C,0.9931,0.1575,0.1783,0.3771,7161,5356
chr1:60320992,A,G,0.9336,0.0605,0.0456,0.1846,26421,442271
chr2:18069070,T,C,0.9988,-0.6774,1.3519,0.6163,582,905
chr8:135908647,A,G,0.2081,-0.0358,0.0273,0.1887,26421,442271
chr12:3871714,A,C,0.9972,0.1489,1.0636,0.8886,749,658
chr16:77148858,A,G,0.9976,-0.1213,0.3874,0.7543,6248,4391


In [5]:
sum(grepl("[eE]", PD_GWAS$POS))

In [13]:
PD_GWAS_needed = PD_GWAS |> mutate(chrom = paste0("chr",CHR), start = POS, end = POS, region_id = SNP) |> select(chrom, start, end, region_id)

In [14]:
head(PD_GWAS_needed)
dim(PD_GWAS_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr11,88249377,88249377,chr11:88249377
chr1,60320992,60320992,chr1:60320992
chr2,18069070,18069070,chr2:18069070
chr8,135908647,135908647,chr8:135908647
chr12,3871714,3871714,chr12:3871714
chr16,77148858,77148858,chr16:77148858


In [15]:
fwrite(PD_GWAS_needed,"/home/rl3328/GWAS/PD_GWAS/hg38conversion/PD_GWAS_hg19.bed", sep = '\t',col.names=FALSE)

# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 

In [18]:
PD_GWAS_hg38 = fread("/home/rl3328/GWAS/PD_GWAS/hg38conversion/PD_GWAS_hg19.to_hg38.bed")

In [19]:
head(PD_GWAS_hg38)


V1,V2,V3,V4
<chr>,<int>,<int>,<chr>
chr11,88516209,88516209,chr11:88249377
chr1,59855320,59855320,chr1:60320992
chr2,17887803,17887803,chr2:18069070
chr8,134896404,134896404,chr8:135908647
chr12,3762548,3762548,chr12:3871714
chr16,77114961,77114961,chr16:77148858


In [21]:
dim(PD_GWAS_hg38)

In [22]:
PD_GWAS_hg38 = PD_GWAS_hg38[,-3]

In [23]:
colnames(PD_GWAS_hg38) <- c("chr","pos","rsid")

In [24]:
PD_GWAS_hg38 = PD_GWAS_hg38 |> mutate(chr = gsub("chr", "", chr))

In [25]:
PD_GWAS_hg38 = PD_GWAS_hg38 |> mutate(chr=as.integer(chr))

[1m[22m[36mℹ[39m In argument: `chr = as.integer(chr)`.
[33m![39m NAs introduced by coercion”


In [26]:
PD_GWAS_hg38 = PD_GWAS_hg38 |> filter(!is.na(chr))

In [27]:
head(PD_GWAS_hg38)
dim(PD_GWAS_hg38)

chr,pos,rsid
<int>,<int>,<chr>
11,88516209,chr11:88249377
1,59855320,chr1:60320992
2,17887803,chr2:18069070
8,134896404,chr8:135908647
12,3762548,chr12:3871714
16,77114961,chr16:77148858


In [29]:
unique(PD_GWAS_hg38$chr)

In [31]:
PD_GWAS_remain = PD_GWAS |> select(-CHR, -POS)
PD_GWAS_hg38_final = PD_GWAS_remain |> inner_join(PD_GWAS_hg38, by = c('SNP' ='rsid'))

In [32]:
head(PD_GWAS_hg38_final)
dim(PD_GWAS_hg38_final)

A1,A2,SNP,freq,b,se,p,N_cases,N_controls,chr,pos
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<int>
T,C,chr11:88249377,0.9931,0.1575,0.1783,0.3771,7161,5356,11,88516209
A,G,chr1:60320992,0.9336,0.0605,0.0456,0.1846,26421,442271,1,59855320
T,C,chr2:18069070,0.9988,-0.6774,1.3519,0.6163,582,905,2,17887803
A,G,chr8:135908647,0.2081,-0.0358,0.0273,0.1887,26421,442271,8,134896404
A,C,chr12:3871714,0.9972,0.1489,1.0636,0.8886,749,658,12,3762548
A,G,chr16:77148858,0.9976,-0.1213,0.3874,0.7543,6248,4391,16,77114961


In [33]:
PD_GWAS = PD_GWAS |> mutate(
    CHR = as.integer(gsub("chr(\\d+):.*", "\\1", SNP)),           # Extract chromosome number
    POS = as.integer(gsub(".*:(\\d+).*", "\\1", SNP)) # Extract position number
  )

In [34]:
PD_GWAS_hg38_final = PD_GWAS_hg38_final |> arrange(chr, pos) |> select(chr, pos, A1, A2, everything())

In [36]:
fwrite(PD_GWAS_hg38_final, "/home/rl3328/GWAS/PD_GWAS/PD_nalls2019.sumstats_hg38.tsv.gz", sep = '\t')

In [None]:
# Method 1: Process header and data separately
!(zcat /home/rl3328/GWAS/PD_GWAS/PD_nalls2019.sumstats_hg38.tsv.gz | head -1 | sed 's/^/#/'; \
 zcat /home/rl3328/GWAS/PD_GWAS/PD_nalls2019.sumstats_hg38.tsv.gz | tail -n +2 | \
 awk 'BEGIN{OFS="\t"} {$2=int($2); print}' | sort -k1,1V -k2,2n) | \
bgzip > /home/rl3328/GWAS/PD_GWAS/PD_nalls2019.sumstats_hg38_sorted.tsv.gz

tabix -s 1 -b 2 -e 2 /home/rl3328/GWAS/PD_GWAS/PD_nalls2019.sumstats_hg38_sorted.tsv.gz

## mvAge

### format the input hg19 .bed for mvAge

In [15]:
mvAge = fread("/home/rl3328/GWAS/mvAge_GWAS/mvAge.summary.EUR.txt")

In [30]:
unique(mvAge$CHR)

In [4]:
head(mvAge)

SNP,CHR,BP,MAF,effect_allele,other_allele,beta,se,Pvalue
<chr>,<int>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
rs3094315,1,752566,0.16004,G,A,-0.00107955,0.001391549,0.4378729
rs3131972,1,752721,0.161034,A,G,-0.001130483,0.00138905,0.415729
rs2073813,1,753541,0.12326,G,A,0.00150233,0.001554297,0.3337611
rs3131969,1,754182,0.128231,A,G,-0.001468959,0.001528045,0.336385
rs3131968,1,754192,0.128231,A,G,-0.001411652,0.001528122,0.3556
rs3131967,1,754334,0.128231,T,C,-0.001514244,0.001528573,0.3218683


In [31]:
sum(grepl("[eE]", mvAge$BP))


In [34]:
mvAge$BP <- as.integer(mvAge$BP)
mvAge[718358, ]

SNP,CHR,BP,MAF,effect_allele,other_allele,beta,se,Pvalue
<chr>,<int>,<int>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>
rs61573637,2,72000000,0.0894632,G,A,-0.001314454,0.001777309,0.4595576


In [35]:
mvAge_needed = mvAge |> mutate(chrom = paste0("chr",CHR), start = BP, end = BP, region_id = SNP) |> select(chrom, start, end, region_id)

In [36]:
head(mvAge_needed)
dim(mvAge_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr1,752566,752566,rs3094315
chr1,752721,752721,rs3131972
chr1,753541,753541,rs2073813
chr1,754182,754182,rs3131969
chr1,754192,754192,rs3131968
chr1,754334,754334,rs3131967


In [37]:
fwrite(mvAge_needed,"/home/rl3328/GWAS/mvAge_GWAS/hg38conversion/mvAge_hg19.bed", sep = '\t',col.names=FALSE)

# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 

In [38]:
mvAge_hg38 = fread("/home/rl3328/GWAS/mvAge_GWAS/hg38conversion/mvAge_hg19.to_hg38.bed")

In [39]:
head(mvAge_hg38)


V1,V2,V3,V4
<chr>,<int>,<int>,<chr>
chr1,817186,817186,rs3094315
chr1,817341,817341,rs3131972
chr1,818161,818161,rs2073813
chr1,818802,818802,rs3131969
chr1,818812,818812,rs3131968
chr1,818954,818954,rs3131967


In [40]:
dim(mvAge_hg38)

In [41]:
mvAge_hg38 = mvAge_hg38[,-3]

In [42]:
colnames(mvAge_hg38) <- c("chr","pos","rsid")

In [43]:
mvAge_hg38 = mvAge_hg38 |> mutate(chr = gsub("chr", "", chr))

In [44]:
mvAge_hg38 = mvAge_hg38 |> mutate(chr=as.integer(chr))

[1m[22m[36mℹ[39m In argument: `chr = as.integer(chr)`.
[33m![39m NAs introduced by coercion”


In [45]:
mvAge_hg38 = mvAge_hg38 |> filter(!is.na(chr))

In [46]:
head(mvAge_hg38)
dim(mvAge_hg38)

chr,pos,rsid
<int>,<int>,<chr>
1,817186,rs3094315
1,817341,rs3131972
1,818161,rs2073813
1,818802,rs3131969
1,818812,rs3131968
1,818954,rs3131967


In [47]:
unique(mvAge_hg38$chr)

In [51]:
mvAge_remain = mvAge |> select(-CHR, -BP)
mvAge_hg38_final = mvAge_remain |> inner_join(mvAge_hg38, by = c('SNP' ='rsid'))

In [52]:
head(mvAge_hg38_final)
dim(mvAge_hg38_final)

SNP,MAF,effect_allele,other_allele,beta,se,Pvalue,chr,pos
<chr>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<int>
rs3094315,0.16004,G,A,-0.00107955,0.001391549,0.4378729,1,817186
rs3131972,0.161034,A,G,-0.001130483,0.00138905,0.415729,1,817341
rs2073813,0.12326,G,A,0.00150233,0.001554297,0.3337611,1,818161
rs3131969,0.128231,A,G,-0.001468959,0.001528045,0.336385,1,818802
rs3131968,0.128231,A,G,-0.001411652,0.001528122,0.3556,1,818812
rs3131967,0.128231,T,C,-0.001514244,0.001528573,0.3218683,1,818954


In [55]:
mvAge_hg38_final = mvAge_hg38_final |> arrange(chr, pos) |> select(chr, pos, effect_allele, other_allele, everything())

In [57]:
mvAge_hg38_final = mvAge_hg38_final |> mutate(N = 1958774)

In [58]:
fwrite(mvAge_hg38_final, "/home/rl3328/GWAS/mvAge_GWAS/mvAge.sumstats_hg38.gz", sep = '\t')

In [None]:
# Method 1: Process header and data separately
(zcat /home/rl3328/GWAS/mvAge_GWAS/mvAge.sumstats_hg38.gz | head -1 | sed 's/^/#/'; \
 zcat /home/rl3328/GWAS/mvAge_GWAS/mvAge.sumstats_hg38.gz | tail -n +2 | \
 awk 'BEGIN{OFS="\t"} {$2=int($2); print}' | sort -k1,1V -k2,2n) | \
bgzip > /home/rl3328/GWAS/mvAge_GWAS/mvAge.sumstats_hg38_sorted.tsv.gz

tabix -s 1 -b 2 -e 2 /home/rl3328/GWAS/mvAge_GWAS/mvAge.sumstats_hg38_sorted.tsv.gz

## Longevity

### format the input hg19 .bed for Longevity

In [1]:
library(data.table)
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.4     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mbetween()[39m     masks [34mdata.table[39m::between()
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m      masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mfirst()[39m       masks [34mdata.table[39m::first()
[31m✖[39m [34mlubridate[39m::[32mhour()[39m    masks [34mdata.table[39m::hour()
[31m✖[39m [34mlubridate[39m::[32misoweek()[39m masks [34mdata.table[39m::isoweek()
[31m✖[39m 

In [None]:
longevity = fread("/home/rl3328/GWAS/Longevity_GWAS/lifegen_phase2_bothpl_alldr_2017_09_18.tsv.gz")

In [None]:
unique(longevity$chr)

In [None]:
head(longevity)

rsid,snpid,chr,pos,a1,a0,n,freq1,beta1,se,p,direction,info,freq_se,min_freq1,max_freq1,V17
<chr>,<chr>,<int>,<int>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<lgl>
rs113345124,8_145793211,8,145793211,T,C,620911,0.98036746,0.00055994,0.01470412,0.96962368,+-,0.955205,0.00248008,0.976354,0.9819,
rs145210131,9_11898949,9,11898949,T,C,483897,0.00895121,0.0177755,0.02438387,0.46601024,++,0.889843,0.00062496,0.0087,0.010506,
rs138102812,13_41519377,13,41519377,T,C,348613,0.99059433,0.02986577,0.02273712,0.18900618,+-,0.99836,0.00031691,0.9905,0.991659,
rs113210771,1_240289734,1,240289734,T,C,638113,0.02765505,0.00504805,0.01240526,0.68406061,+-,0.962098,0.0013175,0.0269,0.029954,
rs10859433,12_78219936,12,78219936,A,T,638103,0.62959171,0.00853945,0.00399792,0.03268147,++,0.992782,0.00260477,0.62562,0.6313,
rs545489794,5_131569666,5,131569666,A,G,344932,0.00732857,0.00716053,0.02807414,0.79867843,-+,0.864036,0.00095907,0.007,0.010128,


In [None]:
longevity_needed = longevity |> mutate(chrom = paste0("chr",chr), start = pos, end = pos, region_id = rsid) |> select(chrom, start, end, region_id)

In [None]:
head(longevity_needed)
dim(longevity_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr8,145793211,145793211,rs113345124
chr9,11898949,11898949,rs145210131
chr13,41519377,41519377,rs138102812
chr1,240289734,240289734,rs113210771
chr12,78219936,78219936,rs10859433
chr5,131569666,131569666,rs545489794


In [None]:
fwrite(longevity_needed,"/home/rl3328/GWAS/Longevity_GWAS/hg38conversion/longevity_hg19.bed", sep = '\t',col.names=FALSE)

# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 

In [None]:
longevity_hg38 = fread("/home/rl3328/hg_conversion/longevity_hg19.to_hg38.bed")

In [None]:
head(longevity_hg38)


V1,V2,V3,V4
<chr>,<int>,<int>,<chr>
chr8,144567827,144567827,rs113345124
chr9,11898949,11898949,rs145210131
chr13,40945241,40945241,rs138102812
chr1,240126434,240126434,rs113210771
chr12,77826156,77826156,rs10859433
chr5,132233973,132233973,rs545489794


In [None]:
dim(longevity_hg38)

In [None]:
longevity_hg38 = longevity_hg38[,-3]

In [None]:
colnames(longevity_hg38) <- c("chr","pos","rsid")

In [None]:
longevity_hg38 = longevity_hg38 |> mutate(chr = gsub("chr", "", chr))

In [None]:
longevity_hg38 = longevity_hg38 |> mutate(chr=as.integer(chr))

[1m[22m[36mℹ[39m In argument: `chr = as.integer(chr)`.
[33m![39m NAs introduced by coercion”


In [None]:
longevity_hg38 = longevity_hg38 |> filter(!is.na(chr))

In [None]:
head(longevity_hg38)
dim(longevity_hg38)

chr,pos,rsid
<int>,<int>,<chr>
8,144567827,rs113345124
9,11898949,rs145210131
13,40945241,rs138102812
1,240126434,rs113210771
12,77826156,rs10859433
5,132233973,rs545489794


In [None]:
unique(longevity_hg38$chr)

In [None]:
longevity_remain = longevity |> select(-chr, -pos)
longevity_hg38_final = longevity_remain |> inner_join(longevity_hg38, by = 'rsid')

In [2]:
longevity_hg38_final = fread("/home/rl3328/GWAS/Longevity_GWAS/longevity.sumstats_hg38.gz")


In [3]:
head(longevity_hg38_final)
dim(longevity_hg38_final)

chr,pos,a1,a0,rsid,n,freq1,beta1,se,p,direction,info,freq_se,min_freq1,max_freq1
<int>,<int>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>
8,144567827,T,C,rs113345124,620911,0.98036746,0.00055994,0.01470412,0.96962368,+-,0.955205,0.00248008,0.976354,0.9819
9,11898949,T,C,rs145210131,483897,0.00895121,0.0177755,0.02438387,0.46601024,++,0.889843,0.00062496,0.0087,0.010506
13,40945241,T,C,rs138102812,348613,0.99059433,0.02986577,0.02273712,0.18900618,+-,0.99836,0.00031691,0.9905,0.991659
1,240126434,T,C,rs113210771,638113,0.02765505,0.00504805,0.01240526,0.68406061,+-,0.962098,0.0013175,0.0269,0.029954
12,77826156,A,T,rs10859433,638103,0.62959171,0.00853945,0.00399792,0.03268147,++,0.992782,0.00260477,0.62562,0.6313
5,132233973,A,G,rs545489794,344932,0.00732857,0.00716053,0.02807414,0.79867843,-+,0.864036,0.00095907,0.007,0.010128


In [10]:
fwrite(longevity_hg38_final, "/home/rl3328/GWAS/Longevity_GWAS/hg38GWAS/longevity.sumstats_hg38.gz", sep = '\t')

## MS

### format the input hg19 .bed for MS

In [135]:
ms = fread("~/data/GWAS/image_GWAS/ms_eur_v4.0.sumstats.gz")

In [136]:
unique(ms$CHR)

In [137]:
head(ms)

MarkerName,CHR,BP,A1,A2,Neff,Zscore,P,Direction,EAF,beta,se
<chr>,<int>,<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>
rs667647,5,29439275,T,C,58174.06,1.991,0.04652,++,0.3489,0.012246231,0.006150794
rs113534962,5,85928892,T,C,58174.06,2.545,0.01093,++,0.0626,0.030798835,0.012101703
rs559397866,2,170966953,T,C,20081.51,-0.205,0.8373,?-,0.9851,-0.008443197,0.041186328
rs2366866,10,128341232,T,C,58174.06,-1.134,0.2566,+-,0.4592,-0.006671282,0.005882965
rs472303,3,62707519,T,C,58174.06,-0.665,0.5059,--,0.0636,-0.007988791,0.012013219
rs13417735,2,80464120,T,G,58174.06,0.888,0.3746,+-,0.9901,0.026295002,0.029611489


In [151]:
ms_needed = ms |> mutate(chrom = paste0("chr",CHR), start = BP, end = BP, region_id = MarkerName) |> select(chrom, start, end, region_id)

In [152]:
head(ms_needed)
dim(ms_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr5,29439275,29439275,rs667647
chr5,85928892,85928892,rs113534962
chr2,170966953,170966953,rs559397866
chr10,128341232,128341232,rs2366866
chr3,62707519,62707519,rs472303
chr2,80464120,80464120,rs13417735


In [153]:
fwrite(ms_needed,"ms_hg19.bed", sep = '\t',col.names=FALSE)

# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 

In [168]:
ms_hg38 = fread("/home/ubuntu/project/conversion/ms_hg19.to_hg38.bed")

In [175]:
head(ms_hg38)

CHR,BP,MarkerName
<chr>,<int>,<chr>
chr5,29439168,rs667647
chr5,86633075,rs113534962
chr2,170110443,rs559397866
chr10,126652663,rs2366866
chr3,62721844,rs472303
chr2,80236995,rs13417735


In [170]:
ms_hg38 = ms_hg38[,-3]

In [174]:
colnames(ms_hg38) <- c("CHR","BP","MarkerName")

In [176]:
ms_hg38 = ms_hg38 |> mutate(CHR = gsub("chr", "", CHR))

In [177]:
ms_hg38 = ms_hg38 |> mutate(CHR=as.integer(CHR))

[1m[22m[36mi[39m In argument: `CHR = as.integer(CHR)`.
[33m![39m NAs introduced by coercion"


In [180]:
ms_hg38 = ms_hg38 |> filter(!is.na(CHR))

In [181]:
head(ms_hg38)
dim(ms_hg38)

CHR,BP,MarkerName
<int>,<int>,<chr>
5,29439168,rs667647
5,86633075,rs113534962
2,170110443,rs559397866
10,126652663,rs2366866
3,62721844,rs472303
2,80236995,rs13417735


In [182]:
unique(ms_hg38$CHR)

In [185]:
ms_remain = ms |> select(-CHR, -BP)
ms_hg38_final = ms_remain |> inner_join(ms_hg38, by = 'MarkerName')

In [189]:
head(ms_hg38_final)
dim(ms_hg38_final)

MarkerName,A1,A2,Neff,Zscore,P,Direction,EAF,beta,se,CHR,BP
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<int>,<int>
rs667647,T,C,58174.06,1.991,0.04652,++,0.3489,0.012246231,0.006150794,5,29439168
rs113534962,T,C,58174.06,2.545,0.01093,++,0.0626,0.030798835,0.012101703,5,86633075
rs559397866,T,C,20081.51,-0.205,0.8373,?-,0.9851,-0.008443197,0.041186328,2,170110443
rs2366866,T,C,58174.06,-1.134,0.2566,+-,0.4592,-0.006671282,0.005882965,10,126652663
rs472303,T,C,58174.06,-0.665,0.5059,--,0.0636,-0.007988791,0.012013219,3,62721844
rs13417735,T,G,58174.06,0.888,0.3746,+-,0.9901,0.026295002,0.029611489,2,80236995


In [191]:
fwrite(ms_hg38_final, "/home/ubuntu/project/image_QTL/hg38/ms_eur_v4.0.sumstats_hg38.gz", sep = '\t')

## AD1

### format the input hg19 .bed for AD

In [199]:
AD1 = fread("~/data/GWAS/image_GWAS/dne_pheno_normalized_residualized.AD_SurrealGAN_1.glm.linear")

In [215]:
head(AD1)

#CHROM,POS,ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,717587,rs144155419,G,A,A,ADD,31247,-0.000334046,0.0392128,-0.0085188,0.993203,.
1,719854,1:719854_CAG_C,CAG,C,C,ADD,31168,-0.0975266,0.0726473,-1.34247,0.179454,.
1,723891,rs2977670,C,G,G,ADD,31509,0.0313952,0.0222987,1.40794,0.15916,.
1,724295,1:724295_TGGAAC_T,TGGAAC,T,T,ADD,30942,0.0345953,0.0537308,0.643863,0.519669,.
1,736689,rs181876450,T,C,C,ADD,31104,-0.0480637,0.0579955,-0.828749,0.407253,.
1,752721,rs3131972,G,A,A,ADD,31929,0.00322884,0.0110722,0.291616,0.770582,.


In [201]:
AD1_needed = AD1 |> mutate(chrom = paste0("chr",`#CHROM`), start = POS, end = POS, region_id = ID) |> select(chrom, start, end, region_id)


In [203]:
head(AD1_needed)
dim(AD1_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr1,717587,717587,rs144155419
chr1,719854,719854,1:719854_CAG_C
chr1,723891,723891,rs2977670
chr1,724295,724295,1:724295_TGGAAC_T
chr1,736689,736689,rs181876450
chr1,752721,752721,rs3131972


In [158]:
fwrite(AD1_needed,"AD1_hg19.bed", sep = '\t',col.names=FALSE)

# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 
All AD image GWAS summary statistics have the same variant pos

In [204]:
AD_hg38 = fread("/home/ubuntu/project/conversion/AD1_hg19.to_hg38.bed")

In [216]:
head(AD_hg38)

#CHROM,POS,ID
<int>,<int>,<chr>
1,782207,rs144155419
1,784474,1:719854_CAG_C
1,788511,rs2977670
1,788915,1:724295_TGGAAC_T
1,801309,rs181876450
1,817341,rs3131972


In [206]:
AD_hg38 = AD_hg38[,-3]

In [None]:
colnames(AD_hg38) <- c("#CHROM","POS","ID")

In [None]:
AD_hg38 = AD_hg38 |> mutate(`#CHROM` = gsub("chr", "", `#CHROM`))

In [217]:
AD_hg38 = AD_hg38 |> mutate(`#CHROM`=as.integer(`#CHROM`))

In [220]:
AD_hg38 = AD_hg38 |> filter(!is.na(`#CHROM`))

In [221]:
head(AD_hg38)
dim(AD_hg38)

#CHROM,POS,ID
<int>,<int>,<chr>
1,782207,rs144155419
1,784474,1:719854_CAG_C
1,788511,rs2977670
1,788915,1:724295_TGGAAC_T
1,801309,rs181876450
1,817341,rs3131972


In [222]:
unique(AD_hg38$`#CHROM`)

In [223]:
AD_remain = AD |> select(-`#CHROM`, -POS)
AD1_hg38_final = AD_remain |> inner_join(AD_hg38, by = 'ID')

In [224]:
head(AD1_hg38_final)
dim(AD1_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,31247,-0.000334046,0.0392128,-0.0085188,0.993203,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,31168,-0.0975266,0.0726473,-1.34247,0.179454,.,1,784474
rs2977670,C,G,G,ADD,31509,0.0313952,0.0222987,1.40794,0.15916,.,1,788511
1:724295_TGGAAC_T,TGGAAC,T,T,ADD,30942,0.0345953,0.0537308,0.643863,0.519669,.,1,788915
rs181876450,T,C,C,ADD,31104,-0.0480637,0.0579955,-0.828749,0.407253,.,1,801309
rs3131972,G,A,A,ADD,31929,0.00322884,0.0110722,0.291616,0.770582,.,1,817341


In [225]:
fwrite(AD1_hg38_final, "/home/ubuntu/project/image_QTL/hg38/dne_pheno_normalized_residualized.AD_SurrealGAN_1.glm.linear_hg38.gz", sep = '\t')

## AD2

### Merge

In [226]:
AD2 = fread("~/data/GWAS/image_GWAS/dne_pheno_normalized_residualized.AD_SurrealGAN_2.glm.linear")

In [228]:
head(AD2)

#CHROM,POS,ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,717587,rs144155419,G,A,A,ADD,31247,-0.0024721,0.0391868,-0.063085,0.949699,.
1,719854,1:719854_CAG_C,CAG,C,C,ADD,31168,-0.0232367,0.0725512,-0.32028,0.748758,.
1,723891,rs2977670,C,G,G,ADD,31509,-0.0364338,0.022301,-1.63373,0.102326,.
1,724295,1:724295_TGGAAC_T,TGGAAC,T,T,ADD,30942,0.0426242,0.0537116,0.793575,0.427449,.
1,736689,rs181876450,T,C,C,ADD,31104,0.0651785,0.0580388,1.12302,0.261439,.
1,752721,rs3131972,G,A,A,ADD,31929,-0.00897978,0.0110721,-0.811024,0.417358,.


In [240]:
AD2_remain = AD2 |> select(-`#CHROM`, -POS)
AD2_hg38_final = AD2_remain |> inner_join(AD2_hg38, by = 'ID')

In [241]:
head(AD2_hg38_final)
dim(AD2_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,31247,-0.0024721,0.0391868,-0.063085,0.949699,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,31168,-0.0232367,0.0725512,-0.32028,0.748758,.,1,784474
rs2977670,C,G,G,ADD,31509,-0.0364338,0.022301,-1.63373,0.102326,.,1,788511
1:724295_TGGAAC_T,TGGAAC,T,T,ADD,30942,0.0426242,0.0537116,0.793575,0.427449,.,1,788915
rs181876450,T,C,C,ADD,31104,0.0651785,0.0580388,1.12302,0.261439,.,1,801309
rs3131972,G,A,A,ADD,31929,-0.00897978,0.0110721,-0.811024,0.417358,.,1,817341


In [242]:
fwrite(AD2_hg38_final, "/home/ubuntu/project/image_QTL/hg38/dne_pheno_normalized_residualized.AD_SurrealGAN_2.glm.linear_hg38.gz", sep = '\t')

## Aging 1

### format the input hg19 .bed for aging

In [243]:
aging1 = fread("~/data/GWAS/image_GWAS/surrealgan_aging_pheno_normalized_residualized.r1.glm.linear")

In [248]:
head(aging1)

#CHROM,POS,ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,717587,rs144155419,G,A,A,ADD,32139,0.0387506,0.0384495,1.00783,0.313543,.
1,719854,1:719854_CAG_C,CAG,C,C,ADD,32051,-0.00561508,0.071265,-0.0787915,0.937199,.
1,725401,rs553642122,C,T,T,ADD,32156,-0.0212315,0.0709291,-0.299334,0.764687,.
1,736689,rs181876450,T,C,C,ADD,31985,-0.0461184,0.0570188,-0.808827,0.41862,.
1,746211,rs201075335,A,AG,AG,ADD,31857,0.00762611,0.0242194,0.314876,0.752858,.
1,751343,rs28544273,T,A,A,ADD,32270,-0.00561255,0.0120832,-0.464492,0.642298,.


In [245]:

aging1_needed = aging1 |> mutate(chrom = paste0("chr",`#CHROM`), start = POS, end = POS, region_id = ID) |> select(chrom, start, end, region_id)

In [246]:
unique(aging1_needed$chrom)

In [249]:
head(aging1_needed)
dim(aging1_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr1,717587,717587,rs144155419
chr1,719854,719854,1:719854_CAG_C
chr1,725401,725401,rs553642122
chr1,736689,736689,rs181876450
chr1,746211,746211,rs201075335
chr1,751343,751343,rs28544273


In [164]:
fwrite(aging1_needed,"aging1_hg19.bed", sep = '\t',col.names=FALSE)

### Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 
All aging image GWAS summary statistics have the same variant pos

In [293]:
aging_hg38 = fread("/home/ubuntu/project/conversion/aging1_hg19.to_hg38.bed")

In [294]:
head(aging_hg38)

V1,V2,V3,V4
<chr>,<int>,<int>,<chr>
chr1,782207,782207,rs144155419
chr1,784474,784474,1:719854_CAG_C
chr1,790021,790021,rs553642122
chr1,801309,801309,rs181876450
chr1,810831,810831,rs201075335
chr1,815963,815963,rs28544273


In [295]:
aging_hg38 = aging_hg38[,-3]

In [296]:
colnames(aging_hg38) <- c("#CHROM","POS","ID")

In [297]:
aging_hg38 = aging_hg38 |> mutate(`#CHROM` = gsub("chr", "", `#CHROM`))

In [298]:
aging_hg38 = aging_hg38 |> mutate(`#CHROM`=as.integer(`#CHROM`))

[1m[22m[36mi[39m In argument: `#CHROM = as.integer(`#CHROM`)`.
[33m![39m NAs introduced by coercion"


In [299]:
aging_hg38 = aging_hg38 |> filter(!is.na(`#CHROM`))

In [300]:
head(aging_hg38)
dim(aging_hg38)

#CHROM,POS,ID
<int>,<int>,<chr>
1,782207,rs144155419
1,784474,1:719854_CAG_C
1,790021,rs553642122
1,801309,rs181876450
1,810831,rs201075335
1,815963,rs28544273


In [301]:
unique(aging_hg38$`#CHROM`)

In [260]:
aging1_remain = aging1 |> select(-`#CHROM`, -POS)
aging1_hg38_final = aging1_remain |> inner_join(aging_hg38, by = 'ID')

In [261]:
head(aging1_hg38_final)
dim(aging1_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,32139,0.0387506,0.0384495,1.00783,0.313543,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,32051,-0.00561508,0.071265,-0.0787915,0.937199,.,1,784474
rs553642122,C,T,T,ADD,32156,-0.0212315,0.0709291,-0.299334,0.764687,.,1,790021
rs181876450,T,C,C,ADD,31985,-0.0461184,0.0570188,-0.808827,0.41862,.,1,801309
rs201075335,A,AG,AG,ADD,31857,0.00762611,0.0242194,0.314876,0.752858,.,1,810831
rs28544273,T,A,A,ADD,32270,-0.00561255,0.0120832,-0.464492,0.642298,.,1,815963


In [262]:
fwrite(aging1_hg38_final, "/home/ubuntu/project/image_QTL/hg38/surrealgan_aging_pheno_normalized_residualized.r1.glm.linear_hg38.gz", sep = '\t')

## aging2

In [263]:
aging2 = fread("~/data/GWAS/image_GWAS/surrealgan_aging_pheno_normalized_residualized.r2.glm.linear")

In [105]:
head(aging2)

#CHROM,POS,ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,717587,rs144155419,G,A,A,ADD,32139,0.031892,0.0384615,0.829192,0.407002,.
1,719854,1:719854_CAG_C,CAG,C,C,ADD,32051,-0.0190691,0.0713036,-0.267435,0.789136,.
1,725401,rs553642122,C,T,T,ADD,32156,-0.15762,0.0709128,-2.22273,0.0262407,.
1,736689,rs181876450,T,C,C,ADD,31985,0.0817224,0.0570835,1.43163,0.152259,.
1,746211,rs201075335,A,AG,AG,ADD,31857,-0.0145913,0.0242313,-0.602165,0.547069,.
1,751343,rs28544273,T,A,A,ADD,32270,0.000760252,0.0120759,0.062956,0.949802,.


In [274]:
aging2_remain = aging2 |> select(-`#CHROM`, -POS)
aging2_hg38_final = aging2_remain |> inner_join(aging_hg38, by = 'ID')

In [275]:
head(aging2_hg38_final)
dim(aging2_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,32139,0.031892,0.0384615,0.829192,0.407002,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,32051,-0.0190691,0.0713036,-0.267435,0.789136,.,1,784474
rs553642122,C,T,T,ADD,32156,-0.15762,0.0709128,-2.22273,0.0262407,.,1,790021
rs181876450,T,C,C,ADD,31985,0.0817224,0.0570835,1.43163,0.152259,.,1,801309
rs201075335,A,AG,AG,ADD,31857,-0.0145913,0.0242313,-0.602165,0.547069,.,1,810831
rs28544273,T,A,A,ADD,32270,0.000760252,0.0120759,0.062956,0.949802,.,1,815963


In [276]:
fwrite(aging2_hg38_final, "/home/ubuntu/project/image_QTL/hg38/surrealgan_aging_pheno_normalized_residualized.r2.glm.linear_hg38.gz", sep = '\t')

## aging3

### Merge

In [277]:
aging3 = fread("~/data/GWAS/image_GWAS/surrealgan_aging_pheno_normalized_residualized.r3.glm.linear")

In [278]:
head(aging3)

#CHROM,POS,ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
1,717587,rs144155419,G,A,A,ADD,32139,0.0265442,0.0384743,0.68992,0.49025,.
1,719854,1:719854_CAG_C,CAG,C,C,ADD,32051,-0.130971,0.071376,-1.83494,0.0665241,.
1,725401,rs553642122,C,T,T,ADD,32156,-0.0547216,0.0709278,-0.771511,0.44041,.
1,736689,rs181876450,T,C,C,ADD,31985,0.010966,0.0571502,0.191881,0.847837,.
1,746211,rs201075335,A,AG,AG,ADD,31857,0.0174368,0.0242042,0.720404,0.471282,.
1,751343,rs28544273,T,A,A,ADD,32270,-0.00256608,0.0120756,-0.212502,0.831717,.


In [288]:
aging3_remain = aging3 |> select(-`#CHROM`, -POS)
aging3_hg38_final = aging3_remain |> inner_join(aging_hg38, by = 'ID')

In [289]:
head(aging3_hg38_final)
dim(aging3_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,32139,0.0265442,0.0384743,0.68992,0.49025,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,32051,-0.130971,0.071376,-1.83494,0.0665241,.,1,784474
rs553642122,C,T,T,ADD,32156,-0.0547216,0.0709278,-0.771511,0.44041,.,1,790021
rs181876450,T,C,C,ADD,31985,0.010966,0.0571502,0.191881,0.847837,.,1,801309
rs201075335,A,AG,AG,ADD,31857,0.0174368,0.0242042,0.720404,0.471282,.,1,810831
rs28544273,T,A,A,ADD,32270,-0.00256608,0.0120756,-0.212502,0.831717,.,1,815963


In [290]:
fwrite(aging3_hg38_final, "/home/ubuntu/project/image_QTL/hg38/surrealgan_aging_pheno_normalized_residualized.r3.glm.linear_hg38.gz", sep = '\t')

## aging4

### Merge

In [302]:
aging4 = fread("~/data/GWAS/image_GWAS/surrealgan_aging_pheno_normalized_residualized.r4.glm.linear")

In [303]:
aging4_remain = aging4 |> select(-`#CHROM`, -POS)
aging4_hg38_final = aging4_remain |> inner_join(aging_hg38, by = 'ID')

In [304]:
head(aging4_hg38_final)
dim(aging4_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,32139,0.070446,0.0384655,1.83141,0.0670486,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,32051,-0.0217218,0.0712553,-0.304844,0.760487,.,1,784474
rs553642122,C,T,T,ADD,32156,0.119808,0.0709338,1.689,0.0912282,.,1,790021
rs181876450,T,C,C,ADD,31985,0.0088507,0.0570479,0.155145,0.876708,.,1,801309
rs201075335,A,AG,AG,ADD,31857,0.0253933,0.0242232,1.0483,0.294506,.,1,810831
rs28544273,T,A,A,ADD,32270,-0.00208279,0.0120782,-0.172442,0.863091,.,1,815963


In [305]:
fwrite(aging4_hg38_final, "/home/ubuntu/project/image_QTL/hg38/surrealgan_aging_pheno_normalized_residualized.r4.glm.linear_hg38.gz", sep = '\t')

## aging5

In [306]:
aging5 = fread("~/data/GWAS/image_GWAS/surrealgan_aging_pheno_normalized_residualized.r5.glm.linear")

In [307]:
aging5_remain = aging5 |> select(-`#CHROM`, -POS)
aging5_hg38_final = aging5_remain |> inner_join(aging_hg38, by = 'ID')

In [309]:
head(aging5_hg38_final)
dim(aging5_hg38_final)

ID,REF,ALT,A1,TEST,OBS_CT,BETA,SE,T_STAT,P,ERRCODE,#CHROM,POS
<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<int>,<int>
rs144155419,G,A,A,ADD,32139,0.0124214,0.0384705,0.322883,0.746786,.,1,782207
1:719854_CAG_C,CAG,C,C,ADD,32051,-0.0194333,0.071249,-0.272751,0.785046,.,1,784474
rs553642122,C,T,T,ADD,32156,-0.0183928,0.0709402,-0.259272,0.795427,.,1,790021
rs181876450,T,C,C,ADD,31985,0.0692841,0.0570195,1.21509,0.224339,.,1,801309
rs201075335,A,AG,AG,ADD,31857,0.066206,0.024197,2.73612,0.00622025,.,1,810831
rs28544273,T,A,A,ADD,32270,-0.00975134,0.012071,-0.807833,0.419193,.,1,815963


In [310]:
fwrite(aging5_hg38_final, "/home/ubuntu/project/image_QTL/hg38/surrealgan_aging_pheno_normalized_residualized.r5.glm.linear_hg38.gz", sep = '\t')

## Aging_maf

In [2]:
Aging_maf = fread("~/project/surrealgan_aging_pheno_normalized_residualized.afreq.gz")
head(Aging_maf)

#CHROM,ID,REF,ALT,ALT_FREQS,OBS_CT
<int>,<chr>,<chr>,<chr>,<dbl>,<int>
1,rs144155419,G,A,0.0106568,64278
1,1:719854_CAG_C,CAG,C,0.00308883,64102
1,rs553642122,C,T,0.00310984,64312
1,rs181876450,T,C,0.00481476,63970
1,rs201075335,A,AG,0.028314,63714
1,rs28544273,T,A,0.121893,64540


In [11]:
Aging_pos = fread("~/project/surrealgan_aging_pheno_normalized_residualized.r1.glm.linear.gz") |> select(-`#CHROM`, -OBS_CT, -ALT, -REF, -A1)
head(Aging_pos)

POS,ID,TEST,BETA,SE,T_STAT,P,ERRCODE
<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
717587,rs144155419,ADD,0.0387506,0.0384495,1.00783,0.313543,.
719854,1:719854_CAG_C,ADD,-0.00561508,0.071265,-0.0787915,0.937199,.
725401,rs553642122,ADD,-0.0212315,0.0709291,-0.299334,0.764687,.
736689,rs181876450,ADD,-0.0461184,0.0570188,-0.808827,0.41862,.
746211,rs201075335,ADD,0.00762611,0.0242194,0.314876,0.752858,.
751343,rs28544273,ADD,-0.00561255,0.0120832,-0.464492,0.642298,.


In [12]:
dim(Aging_maf)
dim(Aging_pos)

In [13]:

Aging_full = merge(Aging_maf, Aging_pos, by = "ID")
head(Aging_full)
dim(Aging_full)


ID,#CHROM,REF,ALT,ALT_FREQS,OBS_CT,POS,TEST,BETA,SE,T_STAT,P,ERRCODE
<chr>,<int>,<chr>,<chr>,<dbl>,<int>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
10:100014847_CT_C,10,C,CT,0.361328,65464,100014847,ADD,-0.0221069,0.00815256,-2.71165,0.00669835,.
10:100038800_TTTTTG_T,10,T,TTTTTG,0.0522148,64158,100038800,ADD,0.0167791,0.0177239,0.946695,0.343802,.
10:10005683_TATA_T,10,T,TATA,0.064921,64540,10005683,ADD,0.00175404,0.0159961,0.109654,0.912684,.
10:100057146_AG_A,10,AG,A,0.0227427,64636,100057146,ADD,-0.0667994,0.0263679,-2.53336,0.0113023,.
10:100083551_CTTTCTT_C,10,CTTTCTT,C,0.00824576,64518,100083551,ADD,0.0494061,0.0434806,1.13628,0.255848,.
10:100090169_CTGCAGAAGA_C,10,CTGCAGAAGA,C,0.217569,65446,100090169,ADD,0.0180947,0.00949446,1.90582,0.0566827,.


In [14]:
Aging_full_needed = Aging_full |> mutate(chrom = paste0("chr",`#CHROM`), start = POS, end = POS, region_id = ID) |> select(chrom, start, end, region_id)


In [15]:
head(Aging_full_needed)
dim(Aging_full_needed)

chrom,start,end,region_id
<chr>,<int>,<int>,<chr>
chr10,100014847,100014847,10:100014847_CT_C
chr10,100038800,100038800,10:100038800_TTTTTG_T
chr10,10005683,10005683,10:10005683_TATA_T
chr10,100057146,100057146,10:100057146_AG_A
chr10,100083551,100083551,10:100083551_CTTTCTT_C
chr10,100090169,100090169,10:100090169_CTGCAGAAGA_C


In [16]:
fwrite(Aging_full_needed,"Aging_full_hg19.bed", sep = '\t',col.names=FALSE)

# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 
All Aging image GWAS summary statistics have the same variant pos

In [18]:
Aging_hg38 = fread("/home/ubuntu/project/conversion/aging1_hg19.to_hg38.bed")

In [20]:
head(Aging_hg38)
dim(Aging_hg38)

V1,V2,V3,V4
<chr>,<int>,<int>,<chr>
chr1,782207,782207,rs144155419
chr1,784474,784474,1:719854_CAG_C
chr1,790021,790021,rs553642122
chr1,801309,801309,rs181876450
chr1,810831,810831,rs201075335
chr1,815963,815963,rs28544273


In [21]:
Aging_hg38 = Aging_hg38[,-3]

In [22]:
colnames(Aging_hg38) <- c("#CHROM","POS","ID")

In [23]:
Aging_hg38 = Aging_hg38 |> mutate(`#CHROM` = gsub("chr", "", `#CHROM`))

In [24]:
Aging_hg38 = Aging_hg38 |> mutate(`#CHROM`=as.integer(`#CHROM`))

[1m[22m[36mℹ[39m In argument: `#CHROM = as.integer(`#CHROM`)`.
[33m![39m NAs introduced by coercion”


In [25]:
Aging_hg38 = Aging_hg38 |> filter(!is.na(`#CHROM`))

In [26]:
head(Aging_hg38)
dim(Aging_hg38)

#CHROM,POS,ID
<int>,<int>,<chr>
1,782207,rs144155419
1,784474,1:719854_CAG_C
1,790021,rs553642122
1,801309,rs181876450
1,810831,rs201075335
1,815963,rs28544273


In [27]:
unique(Aging_hg38$`#CHROM`)

In [30]:
Aging_remain = Aging_full |> select(-`#CHROM`, -POS)
Aging_hg38_final = Aging_remain |> inner_join(Aging_hg38, by = 'ID')

In [32]:
Aging_hg38_maf_lookup = Aging_hg38_final |> select(`#CHROM`, POS, ID,REF, ALT, ALT_FREQS, everything())

In [34]:
head(Aging_hg38_maf_lookup)

#CHROM,POS,ID,REF,ALT,ALT_FREQS,OBS_CT,TEST,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<dbl>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
10,98255090,10:100014847_CT_C,C,CT,0.361328,65464,ADD,-0.0221069,0.00815256,-2.71165,0.00669835,.
10,98279043,10:100038800_TTTTTG_T,T,TTTTTG,0.0522148,64158,ADD,0.0167791,0.0177239,0.946695,0.343802,.
10,9963720,10:10005683_TATA_T,T,TATA,0.064921,64540,ADD,0.00175404,0.0159961,0.109654,0.912684,.
10,98297389,10:100057146_AG_A,AG,A,0.0227427,64636,ADD,-0.0667994,0.0263679,-2.53336,0.0113023,.
10,98323794,10:100083551_CTTTCTT_C,CTTTCTT,C,0.00824576,64518,ADD,0.0494061,0.0434806,1.13628,0.255848,.
10,98330412,10:100090169_CTGCAGAAGA_C,CTGCAGAAGA,C,0.217569,65446,ADD,0.0180947,0.00949446,1.90582,0.0566827,.


In [35]:
fwrite(Aging_hg38_maf_lookup, "Aging_hg38_maf_lookup.tsv.gz", sep='\t')

## AD_maf

In [39]:
AD_maf = fread("~/project/SmileGAN_scores_pheno_normalized_residualized.afreq.gz")
head(AD_maf)

#CHROM,ID,REF,ALT,ALT_FREQS,OBS_CT
<int>,<chr>,<chr>,<chr>,<dbl>,<int>
1,rs144155419,G,A,0.0105855,65656
1,1:719854_CAG_C,CAG,C,0.00310018,65480
1,rs553642122,C,T,0.00305936,65700
1,rs181876450,T,C,0.00486611,65350
1,rs201075335,A,AG,0.0283129,65094
1,rs28544273,T,A,0.121569,65938


In [36]:
AD_pos = fread("~/project/dne_pheno_normalized_residualized.AD_SurrealGAN_1.glm.linear.gz") |> select(-`#CHROM`, -OBS_CT, -ALT, -REF, -A1)
head(AD_pos)

POS,ID,TEST,BETA,SE,T_STAT,P,ERRCODE
<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
717587,rs144155419,ADD,-0.000334046,0.0392128,-0.0085188,0.993203,.
719854,1:719854_CAG_C,ADD,-0.0975266,0.0726473,-1.34247,0.179454,.
723891,rs2977670,ADD,0.0313952,0.0222987,1.40794,0.15916,.
724295,1:724295_TGGAAC_T,ADD,0.0345953,0.0537308,0.643863,0.519669,.
736689,rs181876450,ADD,-0.0480637,0.0579955,-0.828749,0.407253,.
752721,rs3131972,ADD,0.00322884,0.0110722,0.291616,0.770582,.


In [40]:
dim(AD_maf)
dim(AD_pos)

In [41]:

AD_full = merge(AD_maf, AD_pos, by = "ID")
head(AD_full)
dim(AD_full)


ID,#CHROM,REF,ALT,ALT_FREQS,OBS_CT,POS,TEST,BETA,SE,T_STAT,P,ERRCODE
<chr>,<int>,<chr>,<chr>,<dbl>,<int>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
10:100014847_CT_C,10,C,CT,0.361349,66880,100014847,ADD,-0.00108255,0.00827278,-0.130857,0.895889,.
10:10005683_TATA_T,10,T,TATA,0.0648665,65920,10005683,ADD,-0.0228248,0.0162273,-1.40657,0.159566,.
10:100057146_AG_A,10,AG,A,0.0227018,66030,100057146,ADD,0.0276296,0.0267828,1.03162,0.302259,.
10:100083551_CTTTCTT_C,10,CTTTCTT,C,0.00822358,65908,100083551,ADD,-0.0101553,0.0443864,-0.228793,0.819031,.
10:100090169_CTGCAGAAGA_C,10,CTGCAGAAGA,C,0.217463,66862,100090169,ADD,0.0120323,0.00963658,1.24861,0.211817,.
10:100104300_TC_T,10,TC,T,0.301655,66354,100104300,ADD,0.0133979,0.00867733,1.54401,0.122596,.


# Read in the hg38 .bed(three columns-chrom, pos, id) and merge it back to the original summary statistics 
All AD image GWAS summary statistics have the same variant pos

In [42]:
AD_hg38 = fread("/home/ubuntu/project/conversion/AD1_hg19.to_hg38.bed")

In [43]:
head(AD_hg38)
dim(AD_hg38)

V1,V2,V3,V4
<chr>,<int>,<int>,<chr>
chr1,782207,782207,rs144155419
chr1,784474,784474,1:719854_CAG_C
chr1,788511,788511,rs2977670
chr1,788915,788915,1:724295_TGGAAC_T
chr1,801309,801309,rs181876450
chr1,817341,817341,rs3131972


In [44]:
AD_hg38 = AD_hg38[,-3]

In [45]:
colnames(AD_hg38) <- c("#CHROM","POS","ID")

In [46]:
AD_hg38 = AD_hg38 |> mutate(`#CHROM` = gsub("chr", "", `#CHROM`))

In [47]:
AD_hg38 = AD_hg38 |> mutate(`#CHROM`=as.integer(`#CHROM`))

[1m[22m[36mℹ[39m In argument: `#CHROM = as.integer(`#CHROM`)`.
[33m![39m NAs introduced by coercion”


In [48]:
AD_hg38 = AD_hg38 |> filter(!is.na(`#CHROM`))

In [49]:
head(AD_hg38)
dim(AD_hg38)

#CHROM,POS,ID
<int>,<int>,<chr>
1,782207,rs144155419
1,784474,1:719854_CAG_C
1,788511,rs2977670
1,788915,1:724295_TGGAAC_T
1,801309,rs181876450
1,817341,rs3131972


In [50]:
unique(AD_hg38$`#CHROM`)

In [51]:
AD_remain = AD_full |> select(-`#CHROM`, -POS)
AD_hg38_final = AD_remain |> inner_join(AD_hg38, by = 'ID')

In [52]:
AD_hg38_maf_lookup = AD_hg38_final |> select(`#CHROM`, POS, ID,REF, ALT, ALT_FREQS, everything())

In [53]:
head(AD_hg38_maf_lookup)

#CHROM,POS,ID,REF,ALT,ALT_FREQS,OBS_CT,TEST,BETA,SE,T_STAT,P,ERRCODE
<int>,<int>,<chr>,<chr>,<chr>,<dbl>,<int>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
10,98255090,10:100014847_CT_C,C,CT,0.361349,66880,ADD,-0.00108255,0.00827278,-0.130857,0.895889,.
10,9963720,10:10005683_TATA_T,T,TATA,0.0648665,65920,ADD,-0.0228248,0.0162273,-1.40657,0.159566,.
10,98297389,10:100057146_AG_A,AG,A,0.0227018,66030,ADD,0.0276296,0.0267828,1.03162,0.302259,.
10,98323794,10:100083551_CTTTCTT_C,CTTTCTT,C,0.00822358,65908,ADD,-0.0101553,0.0443864,-0.228793,0.819031,.
10,98330412,10:100090169_CTGCAGAAGA_C,CTGCAGAAGA,C,0.217463,66862,ADD,0.0120323,0.00963658,1.24861,0.211817,.
10,98344543,10:100104300_TC_T,TC,T,0.301655,66354,ADD,0.0133979,0.00867733,1.54401,0.122596,.


In [54]:
fwrite(AD_hg38_maf_lookup, "AD_hg38_maf_lookup.tsv.gz", sep='\t')

In [None]:
## example commands for MAF annotation

In [None]:
bash annotate_maf.sh \
  ~/project/AD_hg38_maf_lookup.tsv.gz \
  ~/data/GWAS/rss_imputed_qced_GWAS_image_PD_Aging/image_AD1 \
  ~/GWAS/rss_imputed_qced_GWAS_image_PD_Aging/image_AD1 \
  "*.tsv.gz"

In [None]:
bash annotate_maf.sh \
  ~/project/Aging_hg38_maf_lookup.tsv.gz \
  ~/data/GWAS/rss_imputed_qced_GWAS_image_PD_Aging/image_Aging1 \
  ~/GWAS/rss_imputed_qced_GWAS_image_PD_Aging/image_Aging1 \
  "*.tsv.gz"