In [1]:
library(tidyverse)
library(data.table)


── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.0       ✔ purrr   0.3.1  
✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
✔ tidyr   0.8.3       ✔ stringr 1.4.0  
✔ readr   1.3.1       ✔ forcats 0.4.0  
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

    between, first, last

The following object is masked from ‘package:purrr’:

    transpose



We first extract variants that are successfully mapped to hg38

```
zcat ukb24983_cal_cALL_v2_hg19.liftover.hg38.flip.checked.tsv.gz \
| awk -v FS='\t' '(($1 == $6) && ($4 == toupper($NF) || $5 == toupper($NF))){print $3}'\
| plink2  --threads 4 --memory 30000 \
--pfile /oak/stanford/groups/mrivas/ukbb24983/cal/pgen/ukb24983_cal_cALL_v2_hg19 \
--extract /dev/stdin --sort-vars --out ukb24983_cal_cALL_v2_hg38_tmp --make-pgen
```

In [30]:
mapping <- fread(
    'zcat ukb24983_cal_cALL_v2_hg19.liftover.hg38.flip.checked.tsv.gz | sed -e "s/#//g"',
    sep='\t'
)

In [31]:
pvar_tmp <- fread(
    'cat ukb24983_cal_cALL_v2_hg38_tmp.pvar | sed -e "s/#//g"', 
    sep='\t'
)

In [36]:
merged <- pvar_tmp %>% 
mutate(
    pvar_order = 1:n()
) %>% 
left_join(
    mapping %>% 
    mutate(FASTA_hg38_REF = toupper(FASTA_hg38_REF)) %>%
    select(POS_hg38, ID, FASTA_hg38_REF),
    by='ID'
) %>%
rename(
    POS_hg19 = POS,
    POS = POS_hg38
) %>% 
arrange(pvar_order) %>%
select(-pvar_order) %>%
select(CHROM, POS, ID, REF, ALT, FASTA_hg38_REF, POS_hg19)

In [37]:
merged %>% head()

CHROM,POS,ID,REF,ALT,FASTA_hg38_REF,POS_hg19
1,5426013,rs7411401,A,G,A,5486073
1,5430041,rs12752025,T,C,T,5490101
1,5432923,rs28407474,T,C,T,5492983
1,5437247,rs7367086,G,T,G,5497307
1,5441524,rs12563995,G,A,G,5501584
1,5447070,rs12402981,T,C,T,5507130


In [41]:
merged %>% 
rename(
    '#CHROM' = 'CHROM'
) %>%
fwrite(
    'ukb24983_cal_cALL_v2_hg38_tmp.merged.pvar',
    sep='\t'
)


We apply flipfix and generate pgen file.

```
cat ukb24983_cal_cALL_v2_hg38_tmp.merged.pvar \
| grep -v '#' \
| plink2 --threads 4 --memory 30000 \
--pgen ukb24983_cal_cALL_v2_hg38_tmp.pgen \
--psam ukb24983_cal_cALL_v2_hg38_tmp.psam \
--pvar ukb24983_cal_cALL_v2_hg38_tmp.merged.pvar \
--ref-allele force /dev/stdin 6 3 \
--sort-vars \
--out ukb24983_cal_cALL_v2_hg38 --make-pgen
```

#### PLINK logs

In [45]:
cat(system("cat ukb24983_cal_cALL_v2_hg38_tmp.log", intern=TRUE), sep='\n')

PLINK v2.00a2LM AVX2 Intel (26 Aug 2019)
Options in effect:
  --extract /dev/stdin
  --make-pgen
  --memory 30000
  --out ukb24983_cal_cALL_v2_hg38_tmp
  --pfile /oak/stanford/groups/mrivas/ukbb24983/cal/pgen/ukb24983_cal_cALL_v2_hg19
  --sort-vars
  --threads 4

Hostname: sh-109-53.int
Working directory: /oak/stanford/groups/mrivas/users/ytanigaw/repos/rivas-lab/ukbb-tools/09_liftOver
Start time: Wed Sep 11 03:15:21 2019

Random number seed: 1568196921
385212 MiB RAM detected; reserving 30000 MiB for main workspace.
Using up to 4 compute threads.
488377 samples (264861 females, 223509 males, 7 ambiguous; 488377 founders)
loaded from
/oak/stanford/groups/mrivas/ukbb24983/cal/pgen/ukb24983_cal_cALL_v2_hg19.psam.
805426 variants loaded from
/oak/stanford/groups/mrivas/ukbb24983/cal/pgen/ukb24983_cal_cALL_v2_hg19.pvar.
Note: No phenotype data present.
--extract: 800354 variants remaining.
800354 variants remaining after main filters.
Writing ukb24983_cal_cALL_v2_hg38_tmp.pvar ... done.
Wr

In [46]:
cat(system("cat ukb24983_cal_cALL_v2_hg38.log", intern=TRUE), sep='\n')

PLINK v2.00a2LM AVX2 Intel (26 Aug 2019)
Options in effect:
  --make-pgen
  --memory 30000
  --out ukb24983_cal_cALL_v2_hg38
  --pgen ukb24983_cal_cALL_v2_hg38_tmp.pgen
  --psam ukb24983_cal_cALL_v2_hg38_tmp.psam
  --pvar ukb24983_cal_cALL_v2_hg38_tmp.merged.pvar
  --ref-allele force /dev/stdin 6 3
  --sort-vars
  --threads 4

Hostname: sh-109-53.int
Working directory: /oak/stanford/groups/mrivas/users/ytanigaw/repos/rivas-lab/ukbb-tools/09_liftOver
Start time: Wed Sep 11 03:24:38 2019

Random number seed: 1568197478
385212 MiB RAM detected; reserving 30000 MiB for main workspace.
Using up to 4 compute threads.
488377 samples (264861 females, 223509 males, 7 ambiguous; 488377 founders)
loaded from ukb24983_cal_cALL_v2_hg38_tmp.psam.
800354 variants loaded from ukb24983_cal_cALL_v2_hg38_tmp.merged.pvar.
Note: No phenotype data present.
--ref-allele: 1557 sets of allele codes rotated.
Writing ukb24983_cal_cALL_v2_hg38.pvar ... done.
Writing ukb24983_cal_cALL_v2_hg38.psam ... done.
Writin