# FTND LD Score Regression (LDSC) Update
**Author:** Jesse Marks  <br>
**GitHub Issue:** [#103](https://github.com/RTIInternational/bioinformatics/issues/103#issuecomment-680284119)

In this notebook we document the [LD Score Regression](https://github.com/bulik/ldsc) (LDSC) analysis performed for our paper [Expanding the Genetic Architecture of Nicotine Dependence and its Shared Genetics with Multiple Traits: Findings from the Nicotine Dependence GenOmics (iNDiGO) Consortium](https://www.biorxiv.org/content/10.1101/2020.01.15.898858v1.full). The reviewer-3 for this paper commented about adding UK Biobank heaviness of smoking index (HSI) to the plot, as well as updating the color scheme to ensure that our figures are accessable to color-blind readers.



## Data Locations
* `s3://rti-nd/gwas/uk_biobank/GWA_003/ukb.hsi.sex.age.4evs.white.chr{1..22}.1df.1df.1000g_p3.maf_gt_0.01.rsq_gt_0.3.txt.gz`
* `s3://rti-nd/gwas_meta/categorical_ftnd/results/1df/0001/eur/final_stats/ftnd_wave3_meta_analysis_chr{1..22}_eur.txt.gz`

## Workflow Guideline
1. Create Excel phenotype file locally then upload to EC2 instance
2. Clone https://github.com/RTIInternational/ld-regression-pipeline
3. Then edit full_ld_regression_wf_template.json to include the reference data of choice
4. Use dockerized tool to finish filling out the json file that will be input for workflow
5. Run the WDL workflow for LDSC

## Create Input Files


In [None]:
## UK Biobank
cd /shared/rti-nd/ldsc/0002/processing/

# donwload results
for chr in {1..22}; do
    file=s3://rti-nd/gwas/uk_biobank/GWA_003/ukb.hsi.sex.age.4evs.white.chr$chr.1df.1000g_p3.maf_gt_0.01.rsq_gt_0.3.txt.gz
    aws s3 cp $file .
done


# combine into one file
outf=ukb_gwa_003_ldsc_ready.txt
echo -e "SNP\tCHR\tPOS\tA1\tA2\tBETA\tP\tN" > $outf
for chr in {1..22};do
    inf=ukb.hsi.sex.age.4evs.white.chr$chr.1df.1000g_p3.maf_gt_0.01.rsq_gt_0.3.txt.gz
    # MarkerName, CHR, POS, Allele1, Allele2, Effect, P-value, N
    tail -n +2 <(zcat $inf) | \
    awk 'BEGIN{OFS="\t"} {print $2,$1,$3,$4,$5,$12,$16,$10}'  >> $outf
done &

gzip $outf

## clean up directory
rm ukb.hsi.sex.age.4evs.white.chr*.1df.1000g_p3.maf_gt_0.01.rsq_gt_0.3.txt.gz

## upload to S3
aws s3 cp $outf.gz s3://rti-shared/ldsc/data/ukb_hsi/ 

## Run Analysis Workflow
``

In [None]:
procD=/shared/rti-nd/ldsc/0002/

# enter compute node and use screen tool

# clone github repo
cd $procD
git clone https://github.com/RTIInternational/ld-regression-pipeline
    
# edit file-input json
cd ld-regression-pipeline
mkdir workflow_inputs
cp json_input/full_ld_regression_wf_template.json workflow_inputs
cd workflow_inputs

## vim edit file (see README.md at https://github.com/RTIInternational/ld-regression-pipeline)


# create final workflow input (a json file)
docker run -v $procD/ld-regression-pipeline/workflow_inputs:/data/ \
    rticode/generate_ld_regression_input_json:1ddbd682cb1e44dab6d11ee571add34bd1d06e21 \
    --json-input /data/full_ld_regression_wf_template.json \
    --pheno-file /data/ftnd_ldsc_phenotypes_local.xlsx >\
        $procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json

## zip appropriate files 
# Change to directory immediately above ld-regression-pipeline
cd $procD/ld-regression-pipeline
cd ..
# Make zipped copy of repo somewhere
zip --exclude=*var/* --exclude=*.git/* -r \
    $procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip \
    ld-regression-pipeline


## run ldsc workflow on AWS EC2 instance
curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" \
    -F "workflowSource=@$procD/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl" \
    -F "workflowInputs=@$procD/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json" \
    -F "workflowDependencies=@$procD/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip"

In [2]:
%%bash 

# local

job="e249ac6d-7f7b-4e40-b5e4-e45340fa771e"
phen="ftnd_wave3"
project="nicotine"
version="0002b"
aws_path="s3://rti-nd/ldsc_genetic_correlation/results/$phen/eur/$version/output/processing/"

mkdir -p ~/Projects/$project/ldsc/$phen/$version/output/processing/logs/
cd ~/Projects/$project/ldsc/$phen/$version/output/processing/


    
# Download output JSON from Swagger UI.
curl -X GET "http://localhost:8000/api/workflows/v1/$job/outputs" -H "accept: application/json" \
    > final_outputs_${job}.json

curl -X GET "http://localhost:8000/api/workflows/v1/$job/logs" -H "accept: application/json" \
    > final_logs_${job}.json

# Projects note that I had to unindent here to get it to run
python - <<EOF
import json
import os

def traverse(o, tree_types=(list, tuple)):
    if isinstance(o, tree_types):
        for value in o:
            for subvalue in traverse(value, tree_types):
                yield subvalue
    else:
        yield o

with open('final_outputs_' + "$job" + '.json') as f:
    outputs = json.load(f)
    outputs = outputs["outputs"]
    for key in outputs:
        if (type(outputs[key]) == list):
            for value in traverse(outputs[key]):
                if (str(value)[0:2] == "s3"):
                    message = "aws s3 cp {} .".format(value)
                    os.system(message)
        else:
            if (str(outputs[key])[0:2] == "s3"):
                message = "aws s3 cp {} .".format(outputs[key])
                os.system(message)
EOF

# Download logs that contain h2
aws s3 cp s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/$job/call-ld_regression/ \
    logs/ --recursive --exclude "*" --include "*.ldsc_regression.log" 
    
mv logs/shard-*/LDSC.single_ld_regression_wf/*/call-ld_regression/*.ldsc_regression.log logs/
aws s3 sync logs/ $aws_path
rm -rf logs/shard-*

gzip *txt 
aws s3 sync . $aws_path

download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/e249ac6d-7f7b-4e40-b5e4-e45340fa771e/call-plot_ld/PLOT.plot_ld_regression_wf/db51a7ef-ad81-4c71-9562-c62da4e3cb00/call-adj_pvalues/ftnd_test.ld_regression_results.adj_pvalue.csv to ./ftnd_test.ld_regression_results.adj_pvalue.csv
download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/e249ac6d-7f7b-4e40-b5e4-e45340fa771e/call-munge_pheno/shard-0/MUNGE_TRAIT_WF.munge_phenotype_sumstats_wf/f76a178a-2d7f-49d5-bc49-758ac35ed401/call-merge_munge_output/ukb_gwa_003_ldsc_ready.txt.munged.merged.txt to ./ukb_gwa_003_ldsc_ready.txt.munged.merged.txt
download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/e249ac6d-7f7b-4e40-b5e4-e45340fa771e/call-munge_ref/MUNGE_REF_WF.munge_sumstats_wf/dd31f839-1fac-4e22-9bd0-03c763a60fd2/call-merge_munge_output/test.merged.txt to ./test.merged.txt
download: s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/e249ac6d-7f7b-4e40-b5e4-

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1269  100  1269    0     0   8348      0 --:--:-- --:--:-- --:--:--  8348
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    45  100    45    0     0    326      0 --:--:-- --:--:-- --:--:--   326


# Final Plot
We combine these updated results with the previous set of genetic correlation results.

Upload the plot table to EC2 instance to run docker and create the plot.

In [None]:
## enter interactive mode ##
# note that the image tag corresponds to the latest tag for this image

docker run -it -v"/shared/rti-nd/ldsc/0002/plot/:/data/" \
    rticode/plot_ld_regression_results:d3e8c7694a472ba6125a43a2d6fa32da7342ee1d  /bin/bash

inf=20200827_ftnd_ld_regression_results.csv
#outf=20200827_ftnd_ld_regression_results_colorblind_with_ukb_hsi.pdf
#inf=20200827_ftnd_ld_regression_results_no_ukb.csv
#outf=20200827_ftnd_ld_regression_results_colorblind.pdf
outf=20200827_ftnd_ld_regression_results.pdf

Rscript /opt/plot_ld_regression/plot_ld_regression_results.R  \
    --input_file $inf \
    --output_file $outf \
    --group_order_file ftnd_rg_plot_order.csv \
    --comma_delimited \
    --vertical_rg 1 

inf=20200827_ftnd_ld_regression_results.csv
outf=test03.pdf

Rscript plot_ld_regression_results.R  \
    --input_file $inf \
    --output_file $outf \
    --group_order_file ftnd_rg_plot_order.csv \
    --comma_delimited \
    --vertical_rg 1 \
    --colorblind
