# LD Score Regression: WDL Pipeline
**Author**: Jesse Marks
**GitHub Issue:** [#126](https://github.com/RTIInternational/bioinformatics/issues/126)

LD score regression (LDSC) analyses are needed for EUR-specific meta-analysis results for HIV acquisition. We currently have two sets of EUR-specific meta-analysis results
* s3://rti-hiv/meta_new/016 
* s3://rti-hiv/meta_new/019  

The 016 meta-analysis has `n=4,664` and includes:
* UHS1-4 EA (n=3013)
* WIHS1 EA (n=720)
* VIDUS EA (n=931)

The 019 meta-analysis has `n=3,733` and includes:
* UHS1-4 EA (n=3013)
* WIHS1 EA (n=720)

We are going to utilize the [LD score regression pipeline](https://github.com/RTIInternational/ld-regression-pipeline) that Alex Waldrop developed to perform LD score regression. 

<br><br>

### workflow number:
**600a3079-f75b-4de7-8517-78632b30bb1b**

## Create WorkFlow inputs
Here is an example entry in the Excel Phenotype File:

**trait	plot_label	sumstats_path	pmid	category	sample_size	id_col	chr_col	pos_col	effect_allele_col	ref_allele_col	effect_col	pvalue_col	sample_size_col	effect_type	w_ld_chr**
```
COPDGWAS Hobbs et al.	COPD	s3://rti-nd/LDSC/COPDGWAS_HobbsEtAl/modGcNoOtherMinMissSorted.withchrpos.txt.gz	28166215	Respiratory	51772	3	1	2	4	5	10	12		beta	s3://clustername--files/eur_w_ld_chr.tar.bz2
```

In [None]:
## 1. upload Excel phenotype file to EC2 instance
## 2. then edit full_ld_regression_wf_template.json to include the reference data of choice
## 3. lastly use dockerized tool to finish filling out the json file that will be input for workflow


# create final workflow input (a json file)
docker run -v /shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/:/data/ \
    rticode/generate_ld_regression_input_json:1ddbd682cb1e44dab6d11ee571add34bd1d06e21 \
    --json-input /data/full_ld_regression_wf_template.json \
    --pheno-file /data/hiv_acquisition_ldsc_phenotypes_local.xlsx >\
        /shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json

## Run Analysis Workflow

In [None]:
## copy cromwell config file from S3 to EC2 instance
cd /shared/jmarks/bin/cromwell
aws s3 cp s3://rti-cromwell-output/cromwell-config/cromwell_default_genomics_queue.conf .
    
## zip appropriate files 
# Change to directory immediately above metaxcan-pipeline repo
cd /shared/jmarks/hiv/ldsc/ld-regression-pipeline
cd ..
# Make zipped copy of repo somewhere
zip --exclude=*var/* --exclude=*.git/* -r \
    /shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip \
    ld-regression-pipeline

    
## Run workflow—Navigate to cromwell directory
#cd ~/cromwell
#java -Dconfig.file=/full/path/to/metaxcan-pipeline/var/cromwell_default_genomics_queue.conf -jar cromwell-36.jar \
#    run /path/to/metaxcan-pipeline/workflow/s-mulTiXcan_test_wf.wdl \
#    -i ~/PycharmProjects/metaxcan-pipeline/json_input/s-mulTiXcan_test_wf_example_input.json \
#    -p ~/Desktop/metaxcan-pipeline.zip

cd /shared/jmarks/bin/cromwell
java -Dconfig.file=/shared/jmarks/bin/cromwell/cromwell_default_genomics_queue.conf \
    -jar cromwell-44.jar \
    run /shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl \
    -i /shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json \
    -p /shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip


workflow.ed5747ed-ccbe-4bc9-bb44-1f2d750a27eb.log 

600a3079-f75b-4de7-8517-78632b30bb1b

    ### run a workflow on AWS via CODE's AWS Cromwell-Server

In [None]:
ssh -L localhost:8000:localhost:8000 ec2-user@3.87.9.200
curl -X POST "http://localhost:8000/api/workflows/v1" -H "accept: application/json" \
    -F "workflowSource=@/shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow/full_ld_regression_wf.wdl" \
    -F "workflowInputs=@/shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/final_wf_inputs.json" \
-F "workflowDependencies=@/shared/jmarks/hiv/ldsc/ld-regression-pipeline/workflow_inputs/ld-regression-pipeline.zip"

## LD Hub
```
Important notes for your uploaded file:

1. To save the uploading time, LD Hub only accepts zipped files as input (e.g. mydata.zip).

2. Please check that there is ONLY ONE plain TXT file (e.g. mydata.txt) in your zipped file.

3. Please make sure you do NOT zip any folder together with the plain txt file (e.g. /myfolder/mydata.txt), otherwise you will get an error: [Errno 2] No such file or directory

4. Please do NOT zip multiple files (e.g. zip mydata.zip file1.txt file2.txt ..) or zip a file with in a folder (e.g. zip mydata.zip /path/to/my/file/mydata.txt).

5. Please keep the file name of your plain txt file short (less than 50 characters), otherwise you may get an error: [Errno 2] No such file or directory

6. Please zip your plain txt file using following command (ONE file at a time):

For Windows system: 1) Locate the file that you want to compress. 2) Right-click the file, point to Send to, and then click Compressed (zipped) folder.

For Linux and Mac OS system: zip mydata.zip mydata.txt

Reminder: for Mac OS system, please do NOT zip you file by right click mouse and click "Compress" to zip your file, this will automatically create a folder called "__MACOS". You will get an error: [Errno 2] No such file or directory.

Upload the trait of interest
To save your upload time, we highly recommend you to use the SNP list we used in LD Hub to reduce the number of SNPs in your uploaded file. Click here to download our SNP list (w_hm3.noMHC.snplist.zip).

Please upload the zipped file you just created. Click here to download an input example.
```

In [None]:
# Download outputs for each ref chr from rftm_sumstats step
cd /shared/jmarks/hiv/ldsc/ldhub
aws s3 sync s3://rti-cromwell-output/cromwell-execution/full_ld_regression_wf/ed5747ed-ccbe-4bc9-bb44-1f2d750a27eb/call-munge_ref/MUNGE_REF_WF.munge_sumstats_wf/e6c9491a-ca22-4ca0-8ad6-79d2b13a6dbe/call-munge_chr_wf/ .
    
mv  */MUNGE_CHR.munge_sumstats_chr_wf/*/call-rfmt_sumstats/hiv_acquisition_1df_meta_analysis_uhs1-4_ea+vidus_ea+wihs1_ea.chr*.exclude_singletons.1df.standardized.phase3ID.munge_ready.txt .

# Concat into single file
cat hiv_acquisition_1df_meta_analysis_uhs1-4_ea+vidus_ea+wihs1_ea.chr1.exclude_singletons.1df.standardized.phase3ID.munge_ready.txt >\
    hiv016_ld_hub_with_pvalues.txt
for chr in {2..22}
do
    tail -n +2  hiv_acquisition_1df_meta_analysis_uhs1-4_ea+vidus_ea+wihs1_ea.chr$chr.exclude_singletons.1df.standardized.phase3ID.munge_ready.txt >>\
        hiv016_ld_hub_with_pvalues.txt
done


# Remove unnecessary columns (need snpID, A1, A2, Beta, Pvalue)
cat hiv016_ld_hub_with_pvalues.txt | cut -f 1,4,5,6,7 > tmp && mv tmp hiv016_ld_hub_with_pvalues.txt

# Add sample size column (sample = 46213.00)
cat hiv016_ld_hub_with_pvalues.txt | awk -v OFS="\t" -F"\t" '{print $1,$2,$3,$4,"4664.000",$5}' > hiv016_ld_hub.txt

# Use vi to change column names to be:
snpid A1 A2 BETA N P-value




In [None]:
# enter interactive mode
docker run -it -v"/shared/jmarks/hiv/ldsc/final/:/data/" \
    rticode/plot_ld_regression_results:1ddbd682cb1e44dab6d11ee571add34bd1d06e21 /bin/bash
    
Rscript /opt/plot_ld_regression/plot_ld_regression_results.R  \
    --input_file 20170729_hiv_aqcuisition_meta016_ldsc_copd_lung_function_results_table.csv \
    --output_file 20170729_hiv_aqcuisition_meta016_ldsc_copd_lung_function_results_plot.pdf  \
    --comma_delimited
    #--group_order_file 20170729_hiv_aqcuisition_meta016_ldsc_copd_lung_function_plot_order.txt

In [None]:
Rscript /opt/plot_ld_regression/plot_ld_regression_results.R \
    --input_file ftnd_revised_plot_table_7-29-19.csv \
    --output_file ftnd_ld_regression_results_7-29-19.pdf \
    --comma_delimited