# LDSC on Opioid Addiction GWAS
**author**: Jesse Marks<br>

We want to perform LDSC to obtain the genetic correlation of our NGC opioid addiction phenotype with several other phenotypes. Specifically, we will create a plot analogous to [Figure 3](https://www.nature.com/articles/s41467-020-19265-z/figures/3) in the paper *Expanding the genetic architecture of nicotine dependence and its shared genetics with multiple traits*. One difference is that we will exclude the groups Cardiometabolic and Respiratory.

We will use the automated WDL/cromwell [ld-regression-pipeline](https://github.com/RTIInternational/ld-regression-pipeline) workflow created by Alex Waldrop that implements [LDSC](https://github.com/bulik/ldsc). We will also need to use [LD-Hub](http://ldsc.broadinstitute.org/ldhub/) to get the genetic correlation of phenotypes that we don't have in-house.

# Create Phenotype Table
Perform LDSC on the following phenotypes:

**Cigarette Smoking**
* Heaviness of smoking (UKB): `s3://rti-shared/ldsc/data/ukb_hsi/`
* Cigaretts per day (GSCAN): `s3://rti-shared/ldsc/data/gscan_liu2019/`
* Smoking cessation (GSCAN): `s3://rti-shared/ldsc/data/gscan_liu2019/`
* Cotinine levels: `s3://rti-shared/ldsc/data/cotinine_ware2016/`
* Smoking initiation (GSCAN): `s3://rti-shared/ldsc/data/gscan_liu2019/`
* Age of initiation (GSCAN): `s3://rti-shared/ldsc/data/gscan_liu2019/`

**Drug and alcohol use**
* Alcohol dependence: `s3://rti-shared/ldsc/data/alcdep_walters2018/`
* Cannabis use disorder: `s3://rti-shared/ldsc/data/cud_demontis2019/`
* Alcohol drinks per week (GSCAN): `s3://rti-shared/ldsc/data/gscan_liu2019/`
* Lifetime cannabis use (ever vs never): `s3://rti-shared/ldsc/data/cannabis_icc_ukb/`
    - Note I need to process these data probably. I should rename it to cannabis_ever_vs_never_pasman2018. This is without 23andMe results, so the N=162,082. Need to make a note of this in the S3 directory. (https://www.nature.com/articles/s41593-018-0206-1)
    
~**Cancer**~
* ~Squamous cell lung cancer~
* ~Lung cancer~
* ~Lung adenocarcinoma~
* ~Small cell carcinoma~
    - We don't actually have the results to these cancer phenotypes. The genetic correlation results with ND were provide to us by Michael Bray and Laura Bierut at Wash U. See [this](https://github.com/RTIInternational/bioinformatics/issues/103#issuecomment-477357675) GitHub comment. 

~**Cardiometabolic**~
- Don't include these results.

~**Respiratory**~
- Don't include these results.

**Neurological**
* Parkinsons disease: [PMID 19915575](https://pubmed.ncbi.nlm.nih.gov/19915575/) on LDHub
* Amyotrophic lateral sclerosis: [PMID 27455348](https://pubmed.ncbi.nlm.nih.gov/27455348/) on LDHub
* Alzheimers disease: [PMID 24162737](https://pubmed.ncbi.nlm.nih.gov/24162737/) on LDHub

**Cognitive/education**
* Intelligence: [PMID 28530673](https://pubmed.ncbi.nlm.nih.gov/28530673/) on LDHub
* Childhood IQ:  [PMID 23358156](https://pubmed.ncbi.nlm.nih.gov/23358156/) on LDHub
* College completion: [PMID 23722424](https://pubmed.ncbi.nlm.nih.gov/23722424/) on LDHub
* Years of schooling: [PMID 25201988](https://pubmed.ncbi.nlm.nih.gov/25201988/) on LDHub

**Personality**
* Neuroticism: [PMID 24828478](https://pubmed.ncbi.nlm.nih.gov/24828478/) on LDHub
* Conscientiousness: [PMID 21173776](https://pubmed.ncbi.nlm.nih.gov/21173776/) on LDHub
* Openness to experience: [PMID 21173776](https://pubmed.ncbi.nlm.nih.gov/21173776/) on LDHub


**Psychiatric**
* Post traumatic stress disorder: `s3://rti-shared/ldsc/data/ptsd_nievergelt2019/`
* Attention deficit hyperactivity disorder: `s3://rti-shared/ldsc/data/adhd_demontis2018/`
* Depressive symptoms: `s3://rti-shared/ldsc/data/depressive_symptoms_okbay2016/`
* Major depressive disorder: `s3://rti-shared/ldsc/data/mdd_howard2019/`
* Bipolar disorder: `s3://rti-shared/ldsc/data/bipolar_stahl2019/`
* Autism spectrum disorder: `s3://rti-shared/ldsc/data/autism_groves2019/`
* Schizophrenia: `s3://rti-shared/ldsc/data/schizophrenia_ripke2014/`
* Psychiatric cross-disorder: [PMID 23453885](https://pubmed.ncbi.nlm.nih.gov/23453885/) on LDHub
* Anorexia nervosa: `s3://rti-shared/ldsc/data/anorexia_watson2019/`
* Subjective well being: [PMID 27089181](https://pubmed.ncbi.nlm.nih.gov/27089181/) on LDHub

**Brain volume**
* Putamen volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub
* Accumbens volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub
* Pallidum volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub
* Caudate volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub
* Thalamus volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub
* Hippocampus volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub
* Intracranial volume: [PMID 25607358](https://pubmed.ncbi.nlm.nih.gov/25607358/) on LDHub

<br>

___

Note that we should recreate the Nicotine Dependence plot to verify the allele directions, then run it again for opioid addiction.

See /mnt/c/Users/jmarks/OneDrive - Research Triangle Institute/Projects/heroin/ldsc/oaall/0006/processing/input/20210226_heroin_ldsc_phenotypes_local.xlsx

In [None]:
## Process OA results
aws s3 cp s3://rti-heroin/rti-midas-data/studies/ngc/GenomicSEM/results/29/gSEM/final/genomicSEM_GWAS.oaALL.MVP1_MVP2_YP_SAGE.PGC.Song.table .

# A2 is the coded allele. est = effect
#SNP     CHR     BP      MAF     A1      A2      est     SE      Z_Estimate      Pval_Estimate
awk 'NR==1{h = $0} NR>1{ print (!a[$2]++ ? h ORS $0: $0) > "genomic_sem_gwas_chr"$2".txt"}' \
    genomicSEM_GWAS.oaALL.MVP1_MVP2_YP_SAGE.PGC.Song.table

for file in genomic_sem*; do
    aws s3 cp $file s3://rti-heroin/rti-midas-data/studies/ngc/GenomicSEM/results/29/gSEM/final/ldsc_ready/
done

In [None]:
## 1. upload Excel phenotype file to EC2 instance
## 2. then edit full_ld_regression_wf_template.json to include the reference data of choice
## 3. lastly use dockerized tool to finish filling out the json file that will be input for workflow

## login to a larger compute node
qrsh

phen=20210226_heroin_ldsc_phenotypes_local.xlsx
procD=/shared/rti-heroin/ldsc/results/oaall/eur/0006/ecr_test2
#mkdir -p $procD/{ldhub,plot} # for later processing
git clone https://github.com/RTIInternational/ld-regression-pipeline/ $procD/ld-regression-pipeline
mkdir $procD/ld-regression-pipeline/workflow_inputs

# create final workflow input (a json file) 
cp $procD/ld-regression-pipeline/json_input/full_ld_regression_wf_template.json \
    $procD/ld-regression-pipeline/workflow_inputs/
# edit this file

## upload files to */workflow_inputs/
#scp -i ~/.ssh/gwas_rsa 20200812_heroin_ldsc_phenotypes_local.xlsx     ec2-user@34.195.174.206:/shared/rti-heroin/ldsc_genetic_correlation/results/oaall/0005/eur/ld-regression-pipeline/workflow_inputs/

docker run -v $procD/ld-regression-pipeline/workflow_inputs/:/data/ \
    rticode/generate_ld_regression_input_json:1ddbd682cb1e44dab6d11ee571add34bd1d06e21 \
    --json-input /data/full_ld_regression_wf_template.json \
    --pheno-file /data/$phen >\
        $procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json

In [None]:
## zip appropriate files 
# Change to directory immediately above metaxcan-pipeline repo
cd $procD/ld-regression-pipeline
cd ..
# Make zipped copy of repo somewhere
zip --exclude=*var/* --exclude=*.git/* -r \
    $procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip \
    ld-regression-pipeline

#cd /shared/jmarks/bin/cromwell

curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" \
    -F "workflowSource=@$procD/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl" \
    -F "workflowInputs=@$procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json" \
    -F "workflowDependencies=@$procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip" 

In [None]:
### alternative job submission
#
### run ldsc workflow on AWS EC2 instance
#java -Dconfig.file=/shared/jmarks/bin/cromwell/cromwell-server-new.conf \
#    -jar ~/bin/cromwell/cromwell-54.jar \
#    run $procD/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl \
#    -i $procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json \
#    -p $procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip #> $procD/output.log 2>&1
#
#java -jar ~/bin/cromwell/cromwell-54.jar \
#    run $procD/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl \
#    -i $procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs_local.json \
#    -p $procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip > $procD/output.log 2>&1

In [7]:
%%bash 

job="19952256-d8d3-4c02-bf05-8c63dae8adde"
phen="oaall"
project="heroin"
version="0006"
aws_path="s3://rti-heroin/ldsc/results/$phen/eur/$version/output/cromwell/"


mkdir -p ~/Projects/$project/ldsc/$phen/$version/processing/output/cromwell/logs/
cd ~/Projects/$project/ldsc/$phen/$version/processing/output/cromwell
    
# Download output JSON from Swagger UI.
curl -X GET "http://localhost:8000/api/workflows/v1/$job/outputs" -H "accept: application/json" \
    > final_outputs_${job}.json

curl -X GET "http://localhost:8000/api/workflows/v1/$job/logs" -H "accept: application/json" \
    > final_logs_${job}.json

# Download logs that contain h2
aws s3 cp s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/$job/call-ld_regression/ \
    logs/ --recursive --exclude "*" --include "*.ldsc_regression.log" 
    
mv logs/shard-*/LDSC.single_ld_regression_wf/*/call-ld_regression/*.ldsc_regression.log logs/
aws s3 sync logs/ $aws_path
rm -rf logs/shard-*


# Download plot
aws s3 cp s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/$job/call-plot_ld/ \
    . --recursive --exclude "*" --include "*.ld_regression_results.*"
    
mv PLOT.plot_ld_regression_wf/*/call-*/*ld_regression_results* .


## upload to S3
# Projects note that I had to unindent here to get it to run
#python - <<EOF
#import json
#import os
#
#
#def traverse(o, tree_types=(list, tuple)):
#    if isinstance(o, tree_types):
#        for value in o:
#            for subvalue in traverse(value, tree_types):
#                yield subvalue
#    else:
#        yield o
#
#with open('final_outputs_' + "$job" + '.json') as f:
#    outputs = json.load(f)
#    outputs = outputs["outputs"]
#    for key in outputs:
#        if (type(outputs[key]) == list):
#            for value in traverse(outputs[key]):
#                if (str(value)[0:2] == "s3"):
#                    message = "aws s3 cp {} $aws_path".format(value)
#                    os.system(message)
#        else:
#            if (str(outputs[key])[0:2] == "s3"):
#                message = "aws s3 cp {} $aws_path".format(outputs[key])
#                os.system(message)
#EOF

rm -rf PLOT.plot_ld_regression_wf

download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/19952256-d8d3-4c02-bf05-8c63dae8adde/call-ld_regression/shard-0/LDSC.single_ld_regression_wf/a3babf3c-e1ca-4c4b-a772-f9d0386b7fd8/call-ld_regression/oa_by_hsi.ldsc_regression.log to logs/shard-0/LDSC.single_ld_regression_wf/a3babf3c-e1ca-4c4b-a772-f9d0386b7fd8/call-ld_regression/oa_by_hsi.ldsc_regression.log
download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/19952256-d8d3-4c02-bf05-8c63dae8adde/call-ld_regression/shard-1/LDSC.single_ld_regression_wf/0b072573-40a8-492f-81c5-aca0e6f8d2ce/call-ld_regression/oa_by_cpd.ldsc_regression.log to logs/shard-1/LDSC.single_ld_regression_wf/0b072573-40a8-492f-81c5-aca0e6f8d2ce/call-ld_regression/oa_by_cpd.ldsc_regression.log
download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/19952256-d8d3-4c02-bf05-8c63dae8adde/call-ld_regression/shard-17/LDSC.single_ld_regression_wf/b4c0e86c-c682-484d-a4e5-f86c23edb388/call-ld_regression/oa

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6089  100  6089    0     0  21216      0 --:--:-- --:--:-- --:--:-- 21216
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    45  100    45    0     0    226      0 --:--:-- --:--:-- --:--:--   226


# LD Hub

How to prepare/format data for LDHub.

```
Important notes for your uploaded file:

1. To save the uploading time, LD Hub only accepts zipped files as input (e.g. mydata.zip).

2. Please check that there is ONLY ONE plain TXT file (e.g. mydata.txt) in your zipped file.

3. Please make sure you do NOT zip any folder together with the plain txt file (e.g. /myfolder/mydata.txt), otherwise you will get an error: [Errno 2] No such file or directory

4. Please do NOT zip multiple files (e.g. zip mydata.zip file1.txt file2.txt ..) or zip a file with in a folder (e.g. zip mydata.zip /path/to/my/file/mydata.txt).

5. Please keep the file name of your plain txt file short (less than 50 characters), otherwise you may get an error: [Errno 2] No such file or directory

6. Please zip your plain txt file using following command (ONE file at a time):

For Windows system: 1) Locate the file that you want to compress. 2) Right-click the file, point to Send to, and then click Compressed (zipped) folder.

For Linux and Mac OS system: zip mydata.zip mydata.txt

Reminder: for Mac OS system, please do NOT zip you file by right click mouse and click "Compress" to zip your file, this will automatically create a folder called "__MACOS". You will get an error: [Errno 2] No such file or directory.

Upload the trait of interest
To save your upload time, we highly recommend you to use the SNP list we used in LD Hub to reduce the number of SNPs in your uploaded file. Click here to download our SNP list (w_hm3.noMHC.snplist.zip).

Please upload the zipped file you just created. Click here to download an input example.
```

## Prepare input file
Create LD Hub input file.

```css
MarkerName      A1      A2      BETA    P    N
```

In [None]:
## Download outputs for each ref chr from rftm_sumstats step
procD=/shared/rti-heroin/ldsc/results/oaall/eur/0006/ 
mkdir -p $procD/ldhub/
cd ldhub

aws s3 sync s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/1b601278-9a63-496c-a3e5-4859ea396abd/call-munge_ref/MUNGE_REF_WF.munge_sumstats_wf/1686181f-a5f8-4a6c-9b92-bcd227f89af9/call-munge_chr_wf/ .
    
mv shard-*/MUNGE_CHR.munge_sumstats_chr_wf/*/call-rfmt_sumstats/genomic_sem_gwas_chr*.txt.standardized.phase3ID.munge_ready.txt  .

## Concat into single file
cp genomic_sem_gwas_chr1.txt.standardized.phase3ID.munge_ready.txt \
oa_ngc_gsem_0006_ld_hub_with_pvalues.txt

for chr in {2..22}; do
    tail -n +2   genomic_sem_gwas_chr${chr}.txt.standardized.phase3ID.munge_ready.txt >>\
    oa_ngc_gsem_0006_ld_hub_with_pvalues.txt
done


## Remove unnecessary columns (need snpID, A1, A2, Beta, Pvalue)
cat oa_ngc_gsem_0006_ld_hub_with_pvalues.txt |\
cut -f 1,4,5,6,7 > tmp && mv tmp oa_ngc_gsem_0006_ld_hub_with_pvalues.txt

## Add sample size column 
cat oa_ngc_gsem_0006_ld_hub_with_pvalues.txt |\
awk   '{print $1,$2,$3,$4,"88114.00",$5}' OFS="\t" >\
oa_ngc_gsem_0006_ld_hub_final.txt

## Use vim to change column names to be:
#SNP A1 A2 BETA N P-value

## zip file
zip oa_ngc_gsem_0006_ld_hub_final.txt.zip oa_ngc_gsem_0006_ld_hub_final.txt

# Final Plot
We combine the Cromwell/WDL results with the LD Hub updated results with the previous set of genetic correlation results.

Upload the plot table to EC2 instance to run docker and create the plot.

**Example:**
```

```

In [None]:
## enter interactive mode ##
# note that the image tag corresponds to the latest tag for this image

cd /shared/rti-heroin/ldsc/results/oaall/eur/0006/ecr_test2/plot/

$(aws ecr get-login --no-include-email --region us-east-1) # docker login
docker run -it -v"$PWD:/data/" \
    404545384114.dkr.ecr.us-east-1.amazonaws.com/plot_ld_regression_results:v1.0_0f1f25f /bin/bash
    
inf=20210301_oa_ngc_gsem_ldhub_genetic_correlation.csv
outf=20210301_oa_ngc_gsem_ldhub_genetic_correlation.pdf
plot_order=oa_ngc_gsem_rg_plot_order.csv


Rscript /opt/plot_ld_regression/plot_ld_regression_results.R  \
    --input_file $inf \
    --output_file $outf \
    --group_order_file $plot_order \
    --comma_delimited \
    --vertical_rg 1 \
    --colorblind

    #--title "" \