# NGC LDSC Regression
**Author**: Jesse Marks <br>
**GitHub Issue:** [#140](https://github.com/RTIInternational/bioinformatics/issues/140) <br>
**Results Location:** `s3://rti-heroin/ldsc/results/20200109_fou_vs_oaall`
**Local:** `~/Projects/heroin/ldsc/fou_vs_oaall/`

**TLDR**: performing ldsc for fou vs oaall with only UHS to see if correlation is still inverse of what we expect. <br>
**Description**: The FOU phenotype for NGC continues to be flipped in the opposite direction from the other NGC phenotypes (OAall and OAexp). See figures in [this GitHub comment](https://github.com/RTIInternational/bioinformatics/issues/140#issuecomment-566154881). We are going to investigate this issue by performing an LDSC regression of OAall vs FOU. This analysis will only have UHS contributing to each phenotype. Furthermore, there will be two separate FOU analyses going into the LDSC: one with old UHS4 and one with new UHS4. Remember that there were issues with the UHS4 data coming from Rutgers where they had to resend some of the data because of corrupt/erroneous samples that had poor GCR quality. So, we have the following analyses that we will be comparing:

* FOU_old_uhs4 -- UHS1(897) + UHS2-3(772) + UHS4(861) = 2,530
* FOU_new_uhs4 -- UHS1(897) + UHS2-3(772) + UHS4(1,067) = 2,736
* OAall -- UHS1(9,245)

We are going to utilize the [LD score regression pipeline](https://github.com/RTIInternational/ld-regression-pipeline) that Alex Waldrop developed to perform LD score regression. 

## Data
NGC summary stats results location:
* FOU with new UHS4: s3://rti-midas-data/studies/ngc/meta/095/processing/fou/uhs1-4.ea.fou.chr{1..22}.maf_gt_0.01.rsq_gt_0.3.gz
* FOU with old UHS4: s3://rti-midas-data/studies/ngc/meta/096/processing/fou/uhs1-4.ea.fou.chr{1..22}.maf_gt_0.01.rsq_gt_0.3.gz

OAall:
* UHS1: s3://rti-midas-data/studies/ngc/uhs1/association_tests/002/ea/oaall/uhs1.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz
* CATS-MOLE: s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* CATS-PERHUNT: s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* COGA: s3://rti-midas-data/studies/ngc/coga/association_tests/001/ea/oaall/coga.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_coga.se.rsq_gt_0.3.gz
* Kreek: s3://rti-midas-data/studies/ngc/kreek/association_tests/003/ea/oaall/kreek.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz
* Bulgaria: s3://rti-heroin/gwas/op_dep_bulgaria/results/oaall/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr{1..22}.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz
* VIDUS: s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz
* Yale-Penn-CIDR: s3://rti-midas-data/studies/ngc/yale-penn/association_tests/004/ea/oaall/yale-penn.cidr.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
* Yale-Penn-GO: s3://rti-midas-data/studies/ngc/yale-penn/association_tests/005/ea/oaall/yale-penn.go.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
* deCODE OAall (N=275468‬): s3://rti-midas-data/studies/ngc/decode/association_tests/001/ea/oaall/decode.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz
* deCODE OAexp (N=2581): s3://rti-midas-data/studies/ngc/decode/association_tests/002/ea/oaexp/decode.ea.oaexp.chr{1..22}.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz

FOU: 
* alive: s3://rti-midas-data/studies/ngc/alive/association_tests/001/ea/fou/alivess.ea.fou.chr{1..22}.1000g_ids.maf_gt_0.01_eur_alive.se.rsq_gt_0.3.gz
* cats: s3://rti-midas-data/studies/ngc/cats/association_tests/012/ea/fou/cats.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* cogend: s3://rti-midas-data/studies/ngc/cogend/association_tests/001/ea/fou/cogend.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_cogend.rsq_gt_0.3.gz
* start: s3://rti-midas-data/studies/ngc/start/association_tests/001/ea/fou/start.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_start.se.rsq_gt_0.3.gz
* uhs1: s3://rti-midas-data/studies/ngc/uhs1/association_tests/001/ea/fou/uhs1.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz
* uhs2-3: s3://rti-midas-data/studies/ngc/uhs2-3/association_tests/001/ea/fou/uhs2-3.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.03_eur_uhs2-3.se.rsq_gt_0.3.gz
* uhs4: s3://rti-shared/gwas/uhs4/results/box_cox_totopioid_tot_30d/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr{1.22}.1000g_ids.maf_gt_0.01_eur_uhs4.rsq_gt_0.3.gz
* vidus: s3://rti-shared/gwas/vidus/results/box_cox_useropioid6mfq/0001/mis_minimac3_shapeit2/1000g_p3/eur/chr{1.22}.1000g_ids.maf_gt_0.01_eur_vidus.rsq_gt_0.3.gz
* yale-penn: s3://rti-midas-data/studies/ngc/yale-penn/association_tests/001/ea/fou/yale-penn.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz

**Note**: The effect allele for the FOU analyses is the same as for the other NGC metas (A2). The effect allele for the OAall analysis is ALT.

<br>

**sample sizes**
* OAall_UHS1: 9,245
* FOU_old_UHS4: 2,530
* FOU_new_UHS4: 2,736

OAall:
* Bulgaria (N=2,765)
* CATS-MOLE (N=3,162)
* CATS-PERHUNT (N=1,920)
* COGA (N=7,631)
* deCODE OAall (N=275,468)
* deCODE OAexp (N=2,581)
* Kreek (N=556)
* UHS1 (N=9,245)
* VIDUS (N=2,177)
* Yale-Penn-CIDR (N=666)
* Yale-Penn-GO (N=917)

FOU:
* alive (N=152)
* cats (N=1226)
* cogend (N=99)
* start (N=231)
* uhs1 (N=897)
* uhs2-3 (N=772)
* uhs4 (N=1067)
* vidus (N=300)
* yale-penn (N=850)

### Data wrangling
Format the summary stats for input into cromwell.

In [None]:
## FOU meta 087
for chr in {1..22}; do
   aws s3 cp s3://rti-midas-data/studies/ngc/meta/087/processing/fou/alive+cats+cogend+start+uhs1-4+vidus+yale-penn.ea.fou.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz . --quiet
done &

outf=fou_meta087_n5833.txt
zcat alive+cats+cogend+start+uhs1-4+vidus+yale-penn.ea.fou.chr22.maf_gt_0.01.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$6,$8; exit}' > $outf
for chr in {1..22};do
    inf=alive+cats+cogend+start+uhs1-4+vidus+yale-penn.ea.fou.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
## upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz


########################################################################################################################################################################## 
## OAall meta  089
for chr in {1..22}; do
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/089/processing/oaall/cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz . --quiet
done &

outf=oaall_meta089_n304507.txt
zcat cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr22.maf_gt_0.01.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$6,$8; exit}' > $outf
for chr in {1..22};do
    inf=cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
## upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_oaall/$outf.gz


########################################################################################################################################################################## 
## FOU_old_UHS4
cd /shared/jmarks/heroin/ldsc/fou_oaall/001/processing
for chr in {1..22};do 
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/096/processing/fou/uhs1-4.ea.fou.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz . --quiet &
done

outf=fou_meta096_n2530.txt
for chr in {1..22};do
    inf=uhs1-4.ea.fou.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
## upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz
    
    
########################################################################################################################################################################## 
## FOU_new_UHS4
cd /shared/jmarks/heroin/ldsc/fou_oaall/001/processing
for chr in {1..22};do 
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/095/processing/fou/uhs1-4.ea.fou.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz . --quiet &
done

outf=fou_meta095_n2736.txt
for chr in {1..22};do
    inf=uhs1-4.ea.fou.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
## upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/opioid_fou/$outf.gz


########################################################################################################################################################################## 
## OAall_decode_subset
cd /shared/jmarks/heroin/ldsc/fou_oaall/002/processing
for chr in {1..22};do 
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/094/processing/oaall/cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz . --quiet &
done

outf=oaall_meta094_n31620.txt
zcat cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr22.maf_gt_0.01.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$6,$8; exit}' > $outf
for chr in {1..22};do
    inf=cats+coga+decode+kreek+odb+uhs+vidus+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
## upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_oaall/$outf.gz



########################################################################################################################################################################## 
## CATS-MOLE (3162)
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz .
done


outf=catsmole_oaall010_n3162.txt
zcat catsmole.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18, $6; exit}' > $outf
for chr in {1..22};do
    inf=catsmole.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18, $6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz



########################################################################################################################################################################## 
## CATS-PERHUNT 
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz .
done

outf=catsperthunt_oaall011_n1920.txt
zcat catsperthunt.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18, $6; exit}' > $outf
for chr in {1..22};do
    inf=catsperthunt.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18, $6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## COGA
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/coga/association_tests/001/ea/oaall/coga.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_coga.se.rsq_gt_0.3.gz .
done

outf=coga_oaall001_n7631.txt
zcat coga.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_coga.se.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17, $6; exit}' > $outf
for chr in {1..22};do
    inf=coga.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_coga.se.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17, $6}' <(zcat $inf | tail -n +2) >> $outf
done  &

gzip $outf &
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## Kreek
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/kreek/association_tests/003/ea/oaall/kreek.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz . --quiet 
done &

outf=kreek_oaall003_n556.txt
zcat kreek.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18, $6; exit}' > $outf
for chr in {1..22};do
    inf=kreek.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18, $6}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## Bulgaria
for chr in {1..22}; do 
    aws s3 cp s3://rti-heroin/gwas/op_dep_bulgaria/results/oaall/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr$chr.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz .
done

outf=bulgaria_oaall0001_n2765.txt
zcat chr22.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18, $6; exit}' > $outf
for chr in {1..22};do
    inf=chr$chr.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18, $6}' <(zcat $inf | tail -n +2) >> $outf
done  &

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## VIDUS
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz .
done

outf=vidus_oaall002_n2177.txt
zcat vidus.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17, $6; exit}' > $outf
for chr in {1..22};do
    inf=vidus.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value , N
    awk '{print $1,$2,$3,$4,$5,$16,$17, $6}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf &
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## Yale-Penn-CIDR
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/yale-penn/association_tests/004/ea/oaall/yale-penn.cidr.ea.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz .
done

outf=yalepenn_cidr_oaall004_n666.txt
zcat yale-penn.cidr.ea.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$5,$4,$16,$20, $10; exit}' > $outf
for chr in {1..22};do
    inf=yale-penn.cidr.ea.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, n
    awk '{print $1,$2,$3,$5,$4,$16,$20, $10}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## Yale-Penn-GO
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/yale-penn/association_tests/005/ea/oaall/yale-penn.go.ea.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz . --quiet
done &

outf=yalepenn_go_oaall005_n917.txt
zcat yale-penn.go.ea.chr22.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$5,$4,$16,$20, $10; exit}' > $outf
for chr in {1..22};do
    inf=yale-penn.go.ea.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, n
    awk '{print $1,$2,$3,$5,$4,$16,$20, $10}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## UHS1
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/uhs1/association_tests/002/ea/oaall/uhs1.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz .
done


outf=uhs1_oaall002_n9245.txt
zcat uhs1.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17, $6; exit}' > $outf
for chr in {1..22};do
    inf=uhs1.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$16,$17, $6}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## deCODE OAall (N=275,468‬) 
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/decode/association_tests/001/ea/oaall/decode.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz .
done

outf=decode_oaall001_n275468.txt
zcat decode.ea.oaall.chr22.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$5,$4,$8,$9; exit}' > $outf
for chr in {1..22};do
    inf=decode.ea.oaall.chr$chr.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$5,$4,$8,$9}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz


########################################################################################################################################################################## 
## deCODE OAexp (N=2,581)
for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/decode/association_tests/002/ea/oaexp/decode.ea.oaexp.chr$chr.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz .
done

outf=decode_oaexp002_n2581.txt
zcat decode.ea.oaexp.chr22.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$5,$4,$8,$9; exit}' > $outf
for chr in {1..22};do
    inf=decode.ea.oaexp.chr$chr.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$5,$4,$8,$9}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/$outf.gz



########################################################################################################################################################################## 
## OAall (N=302,330) no VIDUS

for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/098/processing/oaall/cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz .
done

outf=oaall_meta098_n302330.txt
zcat cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr22.maf_gt_0.01.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$6,$8; exit}' > $outf
for chr in {1..22};do
    inf=cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2 ) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_oaall/$outf.gz



########################################################################################################################################################################## 
## OAall_099 (N=29,443) no VIDUS and deCODE OAexp


for chr in {1..22}; do 
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/099/processing/oaall/cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz .
done

outf=oaall_meta099_n29443.txt
zcat cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr22.maf_gt_0.01.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$6,$8; exit}' > $outf
for chr in {1..22};do
    inf=cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6,$8}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_oaall/$outf.gz


########################################################################################################################################################################## 
## CATS FOU (N=1226)

for chr in {1..22};
    do aws s3 cp s3://rti-midas-data/studies/ngc/cats/association_tests/012/ea/fou/cats.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz .
done 


outf=cats_fou012_n1226.txt
zcat cats.ea.fou.chr22.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18, $6; exit}' > $outf
for chr in {1..22};do
    inf=cats.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18, $6}' <(zcat $inf | tail -n +2) >> $outf
done 

gzip $outf
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_oaall/$outf.gz




####################################################################################################
## OAall meta105

for chr in {1..22}; do
    aws s3 cp s3://rti-midas-data/studies/ngc/meta/105/processing/oaall/cats+kreek+odb+uhs+vidus.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz . --quiet
done &


outf=oaall_meta105_n18825.txt
zcat cats+kreek+odb+uhs+vidus.ea.chr2.maf_gt_0.01.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$6, $8; exit}' > $outf
for chr in {1..22};do
    inf=cats+kreek+odb+uhs+vidus.ea.chr$chr.maf_gt_0.01.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value
    awk '{print $1,$2,$3,$4,$5,$6, $8}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf &
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_oaall/$outf.gz


####################################################################################################
## FOU alive

for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/alive/association_tests/001/ea/fou/alivess.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_alive.se.rsq_gt_0.3.gz . --quiet 
done 

outf=alive_fou001_n152.txt
zcat alivess.ea.fou.chr7.1000g_ids.maf_gt_0.01_eur_alive.se.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=alivess.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_alive.se.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz


####################################################################################################
## FOU cats

for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/cats/association_tests/012/ea/fou/cats.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz . --quiet
done &


outf=cats_fou001_n152.txt
zcat cats.ea.fou.chr2.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18,$6; exit}' > $outf
for chr in {1..22};do
    inf=cats.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz

####################################################################################################
## FOU cogend

for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/cogend/association_tests/001/ea/fou/cogend.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_cogend.rsq_gt_0.3.gz . --quiet
done &


outf=cogend_fou001_n99.txt
zcat cogend.ea.fou.chr1.1000g_ids.maf_gt_0.01_eur_cogend.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=cogend.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_cogend.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz
####################################################################################################
## FOU start


for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/start/association_tests/001/ea/fou/start.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_start.se.rsq_gt_0.3.gz . --quiet
done &


outf=start_fou001_n231.txt
zcat start.ea.fou.chr1.1000g_ids.maf_gt_0.01_eur_start.se.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=start.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_start.se.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz
####################################################################################################
## FOU uhs1

for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/uhs1/association_tests/001/ea/fou/uhs1.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz . --quiet
done &


outf=uhs1_fou001_n897.txt
zcat uhs1.ea.fou.chr1.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=uhs1.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz

####################################################################################################
## FOU uhs2-3

for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/uhs2-3/association_tests/001/ea/fou/uhs2-3.ea.fou.chr$chr.1000g_ids.maf_gt_0.03_eur_uhs2-3.se.rsq_gt_0.3.gz . --quiet
done &

outf=uhs2_3_fou001_n772.txt
zcat uhs2-3.ea.fou.chr3.1000g_ids.maf_gt_0.03_eur_uhs2-3.se.rsq_gt_0.3.gz   |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=uhs2-3.ea.fou.chr$chr.1000g_ids.maf_gt_0.03_eur_uhs2-3.se.rsq_gt_0.3.gz                                                                
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz

####################################################################################################
## FOU uhs4
        
for chr in {1..22};do
    aws s3 cp s3://rti-shared/gwas/uhs4/results/box_cox_totopioid_tot_30d/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr$chr.1000g_ids.maf_gt_0.01_eur_uhs4.rsq_gt_0.3.gz . --quiet
done &

outf=uhs4_fou0001_n1067.txt
zcat chr7.1000g_ids.maf_gt_0.01_eur_uhs4.rsq_gt_0.3.gz  |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18,$6; exit}' > $outf
for chr in {1..22};do
    inf=chr$chr.1000g_ids.maf_gt_0.01_eur_uhs4.rsq_gt_0.3.gz                                                                
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz

####################################################################################################
## FOU vidus

for chr in {1..22};do
    aws s3 cp s3://rti-shared/gwas/vidus/results/box_cox_useropioid6mfq/0001/mis_minimac3_shapeit2/1000g_p3/eur/chr$chr.1000g_ids.maf_gt_0.01_eur_vidus.rsq_gt_0.3.gz . --quiet
done &

outf=vidus_fou0001_n300.txt
zcat  chr1.1000g_ids.maf_gt_0.01_eur_vidus.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$17,$18,$6; exit}' > $outf
for chr in {1..22};do
    inf=chr$chr.1000g_ids.maf_gt_0.01_eur_vidus.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$17,$18,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz

####################################################################################################
## FOU yale-penn

for chr in {1..22};do
    aws s3 cp s3://rti-midas-data/studies/ngc/yale-penn/association_tests/001/ea/fou/yale-penn.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz . --quiet
done &

outf=yalepenn_fou001_n850.txt
zcat  yale-penn.ea.fou.chr1.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=yale-penn.ea.fou.chr$chr.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &

gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz



####################################################################################################
## FOU UHS2-4
for chr in {1..22}; do
    aws s3 cp s3://rti-heroin/gwas/uhs234/results/eur/stats/final/uhs234.eur.1000G_p3.fou.chr$chr.rsq.maf_study_eur.txt.gz .
done 


outf=uhs234_fou_n1562.txt
zcat  uhs234.eur.1000G_p3.fou.chr1.rsq.maf_study_eur.txt.gz |\
    awk 'NR==1{print $1,$2,$3,$4,$5,$16,$17,$6; exit}' > $outf
for chr in {1..22};do
    inf=uhs234.eur.1000G_p3.fou.chr$chr.rsq.maf_study_eur.txt.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    awk '{print $1,$2,$3,$4,$5,$16,$17,$6}' <(zcat $inf | tail -n +2) >> $outf
done &


gzip $outf 
# upload to S3
aws s3 cp $outf.gz s3://rti-heroin/ldsc/data/opioid_fou/$outf.gz


## Create WorkFlow inputs
Here is an example entry in the Excel Phenotype File:

**trait	plot_label	sumstats_path	pmid	category	sample_size	id_col	chr_col	pos_col	effect_allele_col	ref_allele_col	effect_col	pvalue_col	sample_size_col	effect_type	w_ld_chr**
```
COPDGWAS Hobbs et al.	COPD	s3://rti-nd/LDSC/COPDGWAS_HobbsEtAl/modGcNoOtherMinMissSorted.withchrpos.txt.gz	28166215	Respiratory	51772	3	1	2	4	5	10	12		beta	s3://clustername--files/eur_w_ld_chr.tar.bz2
```


In [None]:
## 1. upload Excel phenotype file to EC2 instance
## 2. then edit full_ld_regression_wf_template.json to include the reference data of choice
## 3. lastly use dockerized tool to finish filling out the json file that will be input for workflow

## login to a larger compute node
#screen
#qrsh

#phenD=20200109_heroin_ldsc_phenotypes_local.xlsx
#procD=/shared/jmarks/heroin/ldsc/fou_oaall/001
git clone https://github.com/RTIInternational/ld-regression-pipeline/ $procD/ld-regression-pipeline
mkdir $procD/ld-regression-pipeline/workflow_inputs
#mkdir -p $procD/{ldhub,plot} # for later processing
## upload phenotype file (Excel) to */workflow_inputs/

# create final workflow input (a json file) 
# edit this file
cp $procD/ld-regression-pipeline/json_input/full_ld_regression_wf_template.json \
    $procD/ld-regression-pipeline/workflow_inputs


docker run -v $procD/ld-regression-pipeline/workflow_inputs/:/data/ \
    rticode/generate_ld_regression_input_json:1ddbd682cb1e44dab6d11ee571add34bd1d06e21 \
    --json-input /data/full_ld_regression_wf_template.json \
    --pheno-file /data/$phenD >\
        $procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json

## Run Analysis Workflow

In [None]:
## zip appropriate files 
# Change to directory immediately above ld-regression-pipeline repo
cd $procD/ld-regression-pipeline
cd ..
# Make zipped copy of repo somewhere
zip --exclude=*var/* --exclude=*.git/* -r \
    $procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip \
    ld-regression-pipeline


In [None]:
## copy cromwell config file from S3 to EC2 instance
cd /shared/jmarks/bin/cromwell
#aws s3 cp s3://rti-cromwell-output/cromwell-config/cromwell_default_genomics_queue.conf .

## Run workflow—Navigate to cromwell directory
java -Dconfig.file=/shared/jmarks/bin/cromwell/cromwell_default_genomics_queue.conf \
    -jar cromwell-44.jar \
    run $procD/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl \
    -i $procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json \
    -p $procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip


Record the workflow log-ID. Then get the results on s3 at `s3:///rti-cromwell-output/cromwell-execution/full_ld_regression_wf/<log-ID>/` <br>
You can find the log-ID in the directory `/shared/jmarks/bin/cromwell/cromwell-workflow-logs/` (for example).
<br>
<br>
<br>

## View Plot

In [None]:
library("png")
#setwd("C:/Users/jmarks/OneDrive - Research Triangle Institute/Projects/heroin/ldsc/ngc_all/fou/001/processing/output/cromwell")

#orig <- readPNG("ngc_fou.ld_regression_results-1.png")
#zoom <- readPNG("20191213_heroin_ngc_ldsc_fou_rg_results-1.png")

grid::grid.raster(orig)

# Run Analyses

## 001
FOU_old_uhs4 -- UHS1(897) + UHS2-3(772) + UHS4(861) = 2,530 <br>
FOU_new_uhs4 -- UHS1(897) + UHS2-3(772) + UHS4(1,067) = 2,736 <br>
\*OAall -- UHS1(9,245) <br>
\* reference trait

`df0c2159-798c-4b5b-a0d5-40b1defaee9a`

In [None]:
phenD=20200109_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/001

## 002
This analysis will be FOU_new_UHS4 (N=2,736) compared to:
    * OAall_094 (deCODE subset of OAexp, N=31,620)
    * OAall_089 (all, N=304,507)


`44a93a8c-2047-4896-88ce-16342857670a`

In [None]:
phenD=20200110_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/002

## 003
Complete FOU_087(N=5,388) compared to:
  * OAall_UHS1 GWAS results (N=9,245)


`0b3a6bd5-7d63-47fc-a70e-ae93a9894bf6`

In [None]:
phenD=20200110_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/003

## 004
Complete FOU_087(N=5,388) compared to OAall GWAS results:
  * CATS-MOLE (N=3,162)
  * CATS-PERHUNT (N=1,920)
  * COGA (N=7,631)
  * Kreek (N=556)
  * Bulgaria (N=2,765)
  * VIDUS (N=2,177)
  * Yale-Penn-CIDR (N=666)
  * Yale-Penn-GO (N=917)
  * deCODE OAall (N=275,468)
  * deCODE OAexp (N=2,581)

`dec6127d-106d-45c1-93e3-dc6df5794bb1`

In [None]:
phenD=20200121_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/004

## 005
Complete FOU_087(N=5,388) compared to OAall GWAS results:
  * CATS-MOLE (N=3,162)
  * CATS-PERHUNT (N=1,920)
  * COGA (N=7,631)
  * Kreek (N=556)
  * Bulgaria (N=2,765)
  * VIDUS (N=2,177)
  * Yale-Penn-CIDR (N=666)
  * Yale-Penn-GO (N=917)
  * deCODE OAall (N=275,468)
  * deCODE OAexp (N=2,581)
  * UHS1 (N=9,245)


`d068dcb8-aa95-4c34-b284-b5110c1cd10e`

In [None]:
phenD=20200122_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/005

## 006 
(no VIDUS)
EUR metas without VIDUS. For OAall, there were 2177 VIDUS individuals removed. For FOU, there were 300 VIDUS individuals removed. You can find the results at the following locations in S3:

FOU: s3://rti-midas-data/studies/ngc/meta/097/processing/fou/alive+cats+cogend+start+uhs1-4+yale-penn.ea.fou.chr{1..22}.maf_gt_0.01.rsq_gt_0.3.gz
OAall (deCODE OAall): s3://rti-midas-data/studies/ngc/meta/098/processing/oaall/cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr{1..22}.maf_gt_0.01.rsq_gt_0.3.gz
OAall (deCODE OAexp): s3://rti-midas-data/studies/ngc/meta/099/processing/oaall/cats+coga+decode+kreek+odb+uhs+yale-penn.ea.chr{1..22}.maf_gt_0.01.rsq_gt_0.3.gz

`59304359-aa17-42ca-b341-e11aec16a98e`

In [None]:
phenD=20200127_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/006

## 007
VIDUS_FOU vs VIDUS_OAall

`99c0b4c7-cf16-46e9-8e9e-f73552dc82e1`

In [None]:
phenD=20200130_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/007

## 008
FOU UHS1–4 (with old UHS4) compared to UHS1_OAall

`c5aafd20-ec79-4be5-be1c-501d0136ca03`

In [None]:
phenD=20200131_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/008

## 009
FOU_092 (N=4,527) compared to
* FOU_087 (N=5388)
* OAall_UHS1 GWAS results (N=9,245)
* COGA (N=7,631)

`3e0994ff-3628-4cb9-9d94-946a83ce4300`

In [None]:
phenD=20200131_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/009

## 010
FOU_092 (N=4,527) compared to
* FOU_087 (N=5388)
* OAall_UHS1 GWAS results (N=9,245)
* COGA (N=7,631)

`fd1dacd6-5318-4b8f-82cd-7a7fe5d700c6`

In [None]:
phenD=20200131_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/010

## 011
`CATS_FOU (N=1226) vs CATSPERTHUNT_OAall (N=1920) & CATSMOLE_OAall (N=3162)`

FOU location:
```
s3://rti-midas-data/studies/ngc/cats/association_tests/012/ea/fou/cats.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
```
OAall location:
```
CATS-MOLE: s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz

CATS-PERHUNT: s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
```


`6cdb291a-c900-4963-b2ae-24803e9090ea`

In [None]:
phenD=20200211_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/011

## 012
Same as 011 but changing the reference trait vs compared to traits. Testing to see if order matters in the LDSC regression pipeline. <br>
`CATSMOLE_OAall (N=3162) vs CATS_FOU (N=1226) & CATSPERTHUNT_OAall (N=1920) `

FOU location:
```
s3://rti-midas-data/studies/ngc/cats/association_tests/012/ea/fou/cats.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
```
OAall location:
```
CATS-MOLE: s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz

CATS-PERHUNT: s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
```

`fd9bdc16-346c-4bba-bdf4-3e277d69380c`

In [None]:
phenD=20200211_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/012


## 013
Same as 011 & 012 but changing the reference trait vs compared to traits. Testing to see if order matters in the LDSC regression pipeline. <br>
CATSPERTHUNT_OAall (N=1920) vs CATS_FOU (N=1226) & CATSMOLE_OAall (N=3162)

FOU location:
```
s3://rti-midas-data/studies/ngc/cats/association_tests/012/ea/fou/cats.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
```
OAall location:
```
CATS-MOLE: s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz

CATS-PERHUNT: s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
```


`d684a632-d265-4403-9855-aefefe279b7c`

In [None]:
phenD=20200211_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/013


## 014
Yale-Penn FOU (N=850) vs Yale-Penn-CIDR (N=666) & Yale-Penn-GO (N=917)

FOU location
```
s3://rti-midas-data/studies/ngc/yale-penn/association_tests/001/ea/fou/yale-penn.ea.fou.chr{1.22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
```

OAall location:
```
Yale-Penn-CIDR: s3://rti-midas-data/studies/ngc/yale-penn/association_tests/004/ea/oaall/yale-penn.cidr.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz

Yale-Penn-GO: s3://rti-midas-data/studies/ngc/yale-penn/association_tests/005/ea/oaall/yale-penn.go.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
```


`c1da79b8-567f-4e6a-ad92-77bbc766150f`

In [None]:
phenD=20200211_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/014


## 015
Complete FOU_087(N=5,388) compared to OAall GWAS results:

* Bulgaria (N=2,765)
* CATS-MOLE (N=3,162)
* CATS-PERHUNT (N=1,920)
* COGA (N=7,631)
* deCODE OAall (N=275,468)
* deCODE OAexp (N=2,581)
* Kreek (N=556)
* UHS1 (N=9,245)
* VIDUS (N=2,177)
* Yale-Penn-CIDR (N=666)
* Yale-Penn-GO (N=917)

`5e5449fb-1242-410c-b26d-1a145eff51e2`

In [None]:
phenD=20200211_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/015


## 016
`FOU meta087 vs Yale-Penn_GO`
try to get Yale-Penn_GO (N=917) to run

`638d5ae7-7c30-4661-91fc-ac64f746cf1a`

In [None]:
phenD=20200211_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/016

## 017
`FOU meta106 (N=4592) vs OAall meta105 (N=19,825)`  <br>
Excluding cohorts that ran LMM. 

`165b5b52-f932-4046-8da5-97bacf508753`

In [None]:
phenD=20200217_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/017

## 018
FOU meta106 (N=4592) vs OAall GWAS cohorts:
* CATS-MOLE: s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* CATS-PERHUNT: s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* Kreek: s3://rti-midas-data/studies/ngc/kreek/association_tests/003/ea/oaall/kreek.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz
* Bulgaria: s3://rti-heroin/gwas/op_dep_bulgaria/results/oaall/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr{1..22}.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz
* UHS1: s3://rti-midas-data/studies/ngc/uhs1/association_tests/002/ea/oaall/uhs1.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz
* VIDUS: s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz


`5bec8de6-0987-4462-97f7-b1a29939a984`

In [None]:
phenD=20200217_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/018

## 019
VIDUS_FOU (N=300) vs OAall_meta105 individual cohorts

**reference**:
VIDUS_FOU
* s3://rti-midas-data/studies/ngc/vidus/association_tests/001/ea/fou/vidus.ea.fou.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz


**compared to**:
* Bulgaria (N=2,765)  s3://rti-heroin/gwas/op_dep_bulgaria/results/oaall/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr{1..22}.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz
* CATS-MOLE (N=3,162)  s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* CATS-PERHUNT (N=1,920)   s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz
* COGA (N=7,631)   s3://rti-midas-data/studies/ngc/coga/association_tests/001/ea/oaall/coga.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_coga.se.rsq_gt_0.3.gz
* deCODE OAall (N=275,468) s3://rti-midas-data/studies/ngc/decode/association_tests/001/ea/oaall/decode.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz
* deCODE OAexp (N=2,581)  s3://rti-midas-data/studies/ngc/decode/association_tests/002/ea/oaexp/decode.ea.oaexp.chr{1..22}.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz
* Kreek (N=556)   s3://rti-midas-data/studies/ngc/kreek/association_tests/003/ea/oaall/kreek.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz
* UHS1 (N=9,245)   s3://rti-midas-data/studies/ngc/uhs1/association_tests/002/ea/oaall/uhs1.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz
* VIDUS (N=2,177)   s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz
* Yale-Penn-CIDR (N=666)   s3://rti-midas-data/studies/ngc/yale-penn/association_tests/004/ea/oaall/yale-penn.cidr.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz
* Yale-Penn-GO (N=917)   s3://rti-midas-data/studies/ngc/yale-penn/association_tests/005/ea/oaall/yale-penn.go.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz


**old**
a:
`acdb0afb-2a04-41e7-bc66-ec9db2bdc7f8`

b:
`6d6d2e6d-fa00-42f1-b594-71152a90ddd1`


___
**new**
a:
`c66d4e68-013a-4b84-ace1-ac9ae26540aa` <br>
`f358dd7e-078f-45cd-9bda-6b91a6322dea` <br>
`8da94fa6-4e36-4ce2-ae0c-880f6bbbc6c9` <br>

b:
`02a88a65-6871-4c4d-9905-ad5c2e2fe2fd` <br>
`8d98e8bc-6c77-45a1-af37-fc9b9803796a`

c:
`fe7c6248-b96f-41dc-b1e1-8c007d4d28b3`

In [None]:
phenD=20200225_heroin_ldsc_phenotypes_local.xlsx
#procD=/shared/jmarks/heroin/ldsc/fou_oaall/019/a
procD=/shared/jmarks/heroin/ldsc/fou_oaall/019/b

In [None]:
for file in {decode-oaexp,decode-oaall,coga,cats-perthunt,cats-mole,bulgaria,yp-go,yp-cidr,vidus,uhs1,kreek}/*; do
    ~/bin/ldsc/ldsc.py \
    --h2 $file \
    --ref-ld-chr eur_w_ld_chr/ \
    --w-ld-chr eur_w_ld_chr/ \
    --out $file
done


```
bulgaria/bulgaria_oaall0001_n2765.txt.munged.merged.txt.log:Total Observed scale h2: 0.2539 (0.1536)
cats-mole/catsmole_oaall010_n3162.txt.munged.merged.txt.log:Total Observed scale h2: 0.3934 (0.1543)
cats-perthunt/catsperthunt_oaall011_n1920.txt.munged.merged.txt.log:Total Observed scale h2: 0.7583 (0.2346)
coga/coga_oaall001_n7631.txt.munged.merged.txt.log:Total Observed scale h2: -0.2965 (0.0534)
decode-oaall/decode_oaall001_n275468.txt.munged.merged.txt.log:Total Observed scale h2: 0.0068 (0.0017)
decode-oaexp/decode_oaexp002_n2581.txt.munged.merged.txt.log:Total Observed scale h2: 0.0385 (0.1841)
kreek/kreek_oaall003_n556.txt.munged.merged.txt.log:Total Observed scale h2: 2.8865 (0.9397)
uhs1/uhs1_oaall002_n9245.txt.munged.merged.txt.log:Total Observed scale h2: 0.1642 (0.0478)
vidus/vidus_oaall002_n2177.txt.munged.merged.txt.log:Total Observed scale h2: 0.4593 (0.2154)
yp-cidr/yalepenn_cidr_oaall004_n666.txt.munged.merged.txt.log:Total Observed scale h2: -0.4239 (0.6054)
yp-go/yalepenn_go_oaall005_n917.txt.munged.merged.txt.log:Total Observed scale h2: 0.1106 (0.5307)
```

## 020
  VIDUS_OAall (2177) vs FOU_meta cohorts

```
    reference trait:
        VIDUS_OAall (s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz)

    compared to:
        * alive (N=152)   s3://rti-heroin/ldsc/data/opioid_fou/alive_fou001_n152.txt.gz
        * cats (N=1226)   s3://rti-heroin/ldsc/data/opioid_fou/cats_fou001_n152.txt.gz
        * cogend (N=99)   s3://rti-heroin/ldsc/data/opioid_fou/cogend_fou001_n99.txt.gz
        * start (N=231)   s3://rti-heroin/ldsc/data/opioid_fou/start_fou001_n231.txt.gz
        * uhs1 (N=897)    s3://rti-heroin/ldsc/data/opioid_fou/uhs1_fou001_n897.txt.gz
        * uhs2-3 (N=772)  s3://rti-heroin/ldsc/data/opioid_fou/uhs2_3_fou001_n772.txt.gz
        * uhs4 (N=1067)   s3://rti-heroin/ldsc/data/opioid_fou/uhs4_fou0001_n1067.txt.gz
        * vidus (N=300)   s3://rti-heroin/ldsc/data/opioid_fou/vidus_fou0001_n300.txt.gz
        * yale-penn (N=850)  s3://rti-heroin/ldsc/data/opioid_fou/yalepenn_fou001_n850.txt.gz
```

`dc83a580-8b5e-4a23-92d0-1c0a0cdab14b`

In [None]:
phenD=20200220_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/020


## 021
VIDUS_OAall (N=2177) vs UHS2-4 FOU (N=1562).

`723b1e4c-8454-427d-a4fd-4f47b1ce87c5`

In [None]:
phenD=20200317_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/jmarks/heroin/ldsc/fou_oaall/021


In [None]:
for file in {alive,cats,cogend,start,uhs1,uhs23,uhs4,vidus,yp}/*; do
    ~/bin/ldsc/ldsc.py \
    --h2 $file \
    --ref-ld-chr eur_w_ld_chr/ \
    --w-ld-chr eur_w_ld_chr/ \
    --out $file.log
done


```
alive/alive_fou001_n152.txt.munged.merged.txt.log.log:Total Observed scale h2: -1.2532 (2.6704)
cats/cats_fou001_n1226.txt.munged.merged.txt.log.log:Total Observed scale h2: 1.8512 (0.4473)
cogend/cogend_fou001_n99.txt.munged.merged.txt.log.log:Total Observed scale h2: -6.3417 (4.0912)
start/start_fou001_n231.txt.munged.merged.txt.log.log:Total Observed scale h2: 0.0173 (1.7918)
uhs1/uhs1_fou001_n897.txt.munged.merged.txt.log.log:Total Observed scale h2: 0.6334 (0.491)
uhs23/uhs2_3_fou001_n772.txt.munged.merged.txt.log.log:Total Observed scale h2: 0.03 (0.6457)
uhs4/uhs4_fou0001_n1067.txt.munged.merged.txt.log.log:Total Observed scale h2: 0.2717 (0.386)
vidus/vidus_fou0001_n300.txt.munged.merged.txt.log.log:Total Observed scale h2: -1.3093 (1.4063)
yp/yalepenn_fou001_n850.txt.munged.merged.txt.log.log:Total Observed scale h2: -0.8698 (0.4161)

```

# LD Hub

## Data Selection
* Neurological diseases
* Personality traits
* Cognitive
* Education
* Brain Volume (ENIGMA)
* Psychiatric diseases

## create input files

In [None]:
#cd /shared/jmarks/heroin/ldsc/ngc_all/oaall/001/ldhub
#
#outF=ngc_oaall_ldhub_with_pvalues.txt # name of file to create for ldhub
#samp_size=304507
#
#### Download outputs for each ref chr from rftm_sumstats step ###
#aws s3 sync s3://rti-heroin/ldsc/ngc_20191211/oaall/call-munge_ref/MUNGE_REF_WF.munge_sumstats_wf/ec7b872f-5fd0-431f-a6de-69f43a787aab/call-munge_chr_wf .
#        
#mv  */MUNGE_CHR.munge_sumstats_chr_wf/*/call-rfmt_sumstats/*.standardized.phase3ID.munge_ready.txt .
#rm -rf shard*
#
### Concat into single file ##
#cat *.chr1.*.standardized.phase3ID.munge_ready.txt > $outF
#for chr in {2..22}; do
#    tail -n +2  *.chr$chr.*.standardized.phase3ID.munge_ready.txt >> $outF
#done
#
### Remove unnecessary columns (need snpID, A1, A2 Beta, Pvalue) in that order ##
#head -1 $outF | cut -f1,4,5,6,7 > tmp
#tail -n +2 $outF | awk 'BEGIN{OFS="\t"}{print $1, $4, $5, $6, $7}'  >> tmp && mv tmp $outF
#
### Add sample size column (sample = ) and change header names ##
#cat $outF | awk -v var=$samp_size -F "\t"  \
#    'BEGIN{OFS="\t";} NR==1{print "snpid", "A1", "A2", "BETA", "N", "P-value"} \
#    NR>1{print $1,$2,$3,$4,var, $5}' > tmp && mv tmp $outF

In [None]:
## local ##
cd /home/jmarks/Projects/heroin/ldsc/ngc_all/oaall/001/processing/input/ldhub
scp -i ~/.ssh/gwas_rsa    ec2-user@34.195.174.206:/shared/jmarks/heroin/ldsc/ngc_all/oaall/001/ldhub/ngc_oaall_ldhub_with_pvalues.txt .
    
# zip file with 7zip
ngc_oaall_ldhub_with_pvalues.txt.zip

## upload input file
Follow the steps above to zip and upload input file. Essentially, 
* zip the LDHub input file created above (and only this file)
* download the file to your local machine.
* Login to [LDHub](http://ldsc.broadinstitute.org/ldhub/) by clicking on `Get Started with LDHub` and then sign in with your Google email account. (jambqc for me)
* Click `Go Test Center`
* Click `Continue`
* Upload zipped file by clicking `Choose File`, naming your trait, and clicking `Continue`.
* Select traits of interest from LDHub by checking the box next to the trait of interest and then clicking `Submit your request`

**Note**: keep browser open during LDSC analysis on LDHub.

<br>

___

```
Important notes for your uploaded file:

1. To save the uploading time, LD Hub only accepts zipped files as input (e.g. mydata.zip).

2. Please check that there is ONLY ONE plain TXT file (e.g. mydata.txt) in your zipped file.

3. Please make sure you do NOT zip any folder together with the plain txt file (e.g. /myfolder/mydata.txt), otherwise you will get an error: [Errno 2] No such file or directory

4. Please do NOT zip multiple files (e.g. zip mydata.zip file1.txt file2.txt ..) or zip a file with in a folder (e.g. zip mydata.zip /path/to/my/file/mydata.txt).

5. Please keep the file name of your plain txt file short (less than 50 characters), otherwise you may get an error: [Errno 2] No such file or directory

6. Please zip your plain txt file using following command (ONE file at a time):

For Windows system: 1) Locate the file that you want to compress. 2) Right-click the file, point to Send to, and then click Compressed (zipped) folder.

For Linux and Mac OS system: zip mydata.zip mydata.txt

Reminder: for Mac OS system, please do NOT zip you file by right click mouse and click "Compress" to zip your file, this will automatically create a folder called "__MACOS". You will get an error: [Errno 2] No such file or directory.

Upload the trait of interest
To save your upload time, we highly recommend you to use the SNP list we used in LD Hub to reduce the number of SNPs in your uploaded file. Click here to download our SNP list (w_hm3.noMHC.snplist.zip).

Please upload the zipped file you just created. Click here to download an input example.
```

In [None]:
cd /shared/jmarks/heroin/ldsc/ngc_all/oaexp/002/ldhub
outF=oaexp_with_pvalues.txt # name of file to create for ldhub
samp_size=5561

### Download outputs for each ref chr from rftm_sumstats step ###
aws s3 sync s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/b5167ef6-cc76-4f70-8eef-f7414e441cc8/call-munge_ref/MUNGE_REF_WF.munge_sumstats_wf/b25fcc19-9146-4fd4-b7bf-2ca291b908c9/call-munge_chr_wf/ . 
        
mv  */MUNGE_CHR.munge_sumstats_chr_wf/*/call-rfmt_sumstats/*.standardized.phase3ID.munge_ready.txt .
rm -rf shard*

## Concat into single file ##
cat *.chr1.*.standardized.phase3ID.munge_ready.txt > $outF
for chr in {2..22}; do
    tail -n +2  *.chr$chr.*.standardized.phase3ID.munge_ready.txt >> $outF
done

## Remove unnecessary columns (need snpID, A1, A2 Beta, Pvalue) in that order ##
head -1 $outF | cut -f1,4,5,6,7 > tmp
tail -n +2 $outF | awk 'BEGIN{OFS="\t"}{print $1, $4, $5, $6, $7}'  >> tmp && mv tmp $outF

## Add sample size column (sample = 18245.00) and change header names ##
cat $outF | awk -v var=$samp_size -F "\t"  \
    'BEGIN{OFS="\t";} NR==1{print "snpid", "A1", "A2", "BETA", "N", "P-value"} \
    NR>1{print $1,$2,$3,$4,var, $5}' > tmp && mv tmp $outF

# Create Final Plot
Merge the output tables from cromwell and LDHub. The merged table should have the header:
```
trait2	Trait_Label	Trait_Group	rg	se	z	p	h2_obs	h2_obs_se	h2_int	h2_int_se	gcov_int	gcov_int_se
```

**Note**: upload the plot table to EC2 instance to run docker and create the plot.

In [None]:
## enter interactive mode ##
# note that the image tag corresponds to the latest tag for this image



docker run -it -v"/shared/jmarks/heroin/ldsc/fou_oaall/019/plot/:/data/" \
    rticode/plot_ld_regression_results:b018e08753390ee773ed7e9eb2ca851c88eee749  /bin/bash


Rscript /opt/plot_ld_regression/plot_ld_regression_results.R  \
    --input_file 20200226_vidus_fou_vs_oaall.ld_regression_results.csv \
    --output_file 20200226_vidus_fou_vs_oaall.ld_regression_results.pdf  \
    --comma_delimited \
    --xmax 1 \
    --xmin -1.6 \
    --title "VIDUS_FOU vs OAall Cohorts"

# sandbox

In [None]:
java -Dconfig.file=/shared/jmarks/bin/cromwell/cromwell_default_genomics_queue.conf \
    -jar /shared/jmarks/bin/cromwell/cromwell-44.jar \
    run hello.wdl \
    -i inputs.json 
