# FUMA tool
**Author**: Jesse Marks

The results processed here were saved to:
`\\rcdcollaboration01.rti.ns\johnson_share\1.HIV GWAS II\technical\McLarenGWAS\`

FUMA tool (http://fuma.ctglab.nl/) file prep for HIV - Meta-Analysis for viral load. Need a zipped file with genome-wide results that is <600Mb. Prepare a file that concatenates all of the chr-specific files into one, and includes only the following meta-analysis output columns.

 
```
Chr

Position

MarkerName

P-value

Allele1

Allele2

Effect

StdErr```

Example files:

`ihac2.chr17.eur.meta
ihac2.chr16.eur.meta
`

Results are at:
`s3://rti-uploads/paul.mclarenicgh_VL_results.tgz`

## Processing Viral Load
cat vl.README
```
#### Written by Paul McLaren (paul.mclaren@canada.ca)
#### When using these results and for more information please cite PMID: 26553974

# The tar file  contains 22 separate results files separated by chromosome
# Note: Although labeled OR, effect sizes are betas from linear regression NOT odds ratios
# Betas indicate change in spVL (HIV RNA copies/ml of plasma) per allele copy

# Each file contains the following columns:
# CHR  - Chromosome
# BP   - Position in base pairs (Hg19)
# SNP  - SNP ID if available
# A1   - Allele 1 (reference allele for effect)
# A2   - Allele 2
# N    - Number cohorts constributing genotype information
# P    - Pvalue (fixed effects meta analysis)
# P(R) - Pvalue (random effects meta analysis)
# OR   - Effect size (beta) of A1 (fixed effects meta analysis)
# OR(R)- Effect size (beta) of A1 (random effects meta analysis)
# Q    - Q^2 test for heterogeneity across cohorts
# I    - I^2 test for heterogeneity across cohorts
```

In [None]:
## EC2
cd aws s3 cp s3://rti-uploads/paul.mclaren/icgh_VL_results.tgz .
tar -xvzf icgh_VL_results.tgz

awk '{print $1,$2,$3,$7,$4,$5,$9,;exit}'\
    ihac2.chr1.eur.meta >\
    ihac2.eur.meta.ALL_CHR.FUMA
for chr in {1..22}; do
    echo processing chr$chr
    awk 'NR>=2{print $1,$2,$3,$7,$4,$5,$9}'\
    ihac2.chr$chr.eur.meta >> ihac2.eur.meta.ALL_CHR.FUMA
done &
    
gzip ihac2.eur.meta.ALL_CHR.FUMA

## Processing Acquisition
`cat acquisition.README`
```
#### Written by Paul McLaren (paul.mclaren@canada.ca)
#### When using these results and for more information please cite PMID: 23935489

# The tar file icgh_aquisition_results.tar contains 925 separate
# gzipped results files separated by chromosome and position
# Naming convention: dan_chr$N_$Start_pos[Mb]_$Stop_pos[Mb].assoc.dosage.meta.ngt.metadaner.gz
# Note: these files are unfiltered and contain many low frequency and low imputation quality variants

# Each file contains the following columns:
# CHR  - Chromosome
# SNP  - SNP ID if available
# BP   - Position in base pairs (Hg18)
# A1   - Allele 1 (reference allele for frequency and odds ratio)
# A2   - Allele 2
# FRQ_A_6334 - A1 Frequency in 6,334 HIV infected individuals
# FRQ_U_7247 - A1 Frequency in 7,247 population controls
# INFO - Average imputation quality score across included samples
# OR   - Odds ratio (A1)
# SE   - Stadard error of the odds ratio
# P    - Pvalue (fixed effects meta analysis)
# ngt  - Number of cohorts constributing genotype information
# Direction - Text marker indivating the direction of effect per contributing cohort (+ = OR > 1 ; - = OR < 1)
# HetISqt   - I^2 test for heterogeneity across cohorts
# HetChiSq  - Chi^2 test for heterogeneity across cohorts
# HetDf     - Degrees of freedom for heterogeneity test
# HetPVa    - P value for test of heterogeneity across cohorts
```

**Note**: for the FUMA file submission the number of subjects `N` is required. According to this README file, there are 6,334 cases and 6,247 controls which totals 13,581.

In [None]:
cd /shared/data/sandbox
aws s3 cp s3://rti-uploads/paul.mclaren/icgh_aquisition_results.tar . --quiet &

awk '{print $1,$3,$2,$11,$4,$5,$9,$10;exit}'\
    dan_chr9_141_144.assoc.dosage.meta.ngt.metadaner >\
    dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA

for chr in {1..22};do
    awk 'NR>=2{print $1,$3,$2,$11,$4,$5,$9,$10}' dan_chr$chr\_*.assoc.dosage.meta.ngt.metadaner >>  dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA 
done &

gzip dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA

### FUMA error
We are getting an error for the FUMA file upload. Firstly, the file is too large (>600MB) therefore we will remove the standard error column. Another error we are getting (ERROR:001) states that the input file format was not correct. We use the approach of submitting each chromosome at a time and see if we can narrow down what exactly the error is.

#### chr1 test
Note that this is with the standard error file removed.

`zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head`

```
CHR BP SNP P A1 A2 OR
1 2712128 rs183704632 0.7174 A G 1.36
1 1767948 rs192615700 0.4913 A G 1.501
1 2866021 rs116066119 0.4732 T G 0.8986
1 1801450 rs189263802 0.1137 A C 6.509e+17
1 1417994 rs142344235 0.8742 A G 1.163
1 1122319 rs7415847 0.07094 T C 0.9203
1 1986224 rs28522768 0.7055 T G 0.8675
1 2715720 rs115586403 0.1593 T C 0.8707
1 1499749 rs146184020 0.7471 A G 1.24
```

In [None]:
## EC2 command line ##
zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head -n1 > chr1.test.a

awk ' { if ( $1 == 1 ) { print $0 } if ( $1 == 2 ) { exit } }' \
    <(zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz) >> chr1.test.a

We submitted the job on July 18, 2018 at ~noon under the title `mclaren.hiv.acquisition.test.chr1`.

* Got results back at ~noon:45. Error:005 occurred which is a benign error stating that no significant SNPs were identified.

#### chr2 test

In [None]:
## EC2 command line ##
zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head -n1 > chr2.test.a

awk ' { if ( $1 == 2 ) { print $0 } if ( $1 == 3 ) { exit } }' \
    <(zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz) >> chr2.test.a &

We submitted the job on July 18, 2018 at ~noon:30 under the title `mclaren.hiv.acquisition.test.chr2`.
* Got results back at ~1pm. Error:005 occurred which is a benign error stating that no significant SNPs were identified.

#### chr3 test

In [None]:
## EC2 command line ##
zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head -n1 > chr3.test.a

awk ' { if ( $1 == 3 ) { print $0 } if ( $1 == 4 ) { exit } }' \
    <(zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz) >> test.folder/chr3.test.a &
gzip test.folder/chr3.test.a

We submitted the job on July 18, 2018 at ~noon:45 under the title `mclaren.hiv.acquisition.test.chr3`.
* Got results back at ~1pm. Error:005 occurred which is a benign error stating that no significant SNPs were identified.

#### chr4 test

In [None]:
## EC2 command line ##
zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head -n1 > test.folder/chr4.test.a

awk ' { if ( $1 == 4 ) { print $0 } if ( $1 == 5 ) { exit } }' \
    <(zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz) >> test.folder/chr4.test.a &
gzip test.folder/chr4.test.a

We submitted the job on July 18, 2018 at ~noon:45 under the title `mclaren.hiv.acquisition.test.chr4`.
* Got results back at ~1pm. Error:005 occurred which is a benign error stating that no significant SNPs were identified.

#### chr5 test

In [None]:
## EC2 command line ##
zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head -n1 > test.folder/chr5.test.a

awk ' { if ( $1 == 5 ) { print $0 } if ( $1 == 6 ) { exit } }' \
    <(zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz) >> test.folder/chr5.test.a &
gzip test.folder/chr5.test.a

We submitted the job on July 18, 2018 at ~noon:45 under the title `mclaren.hiv.acquisition.test.chr5`.
* Got results back at ~1pm. Error:005 occurred which is a benign error stating that no significant SNPs were identified.

#### chr6-22 test
Generate the individual chromosome results in parallel.

In [None]:
## EC2 command line
for chr in {6..22};do
    zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz | head -n1 > test.folder/chr$chr.test.a

    awk -v chr="$chr" ' { if ( $1 == chr ) { print $0 } if ( $1 == chr+1 ) { exit } }' \
        <(zcat dan.assoc.dosage.meta.ngt.metadaner.ALL_CHR.FUMA_no_SE.gz) >> test.folder/chr$chr.test.a &
done

gzip test.folder/*a &

## share drive ##
scp -i ~/.ssh/gwas_rsa  ec2-user@35.168.108.18:~/test.folder/chr{6..22}* . &

* chr6: FUMA job completed
* chr7: ERROR 5
* chr8: ERROR 5
* chr9:
* chr10: 
* chr11:
* chr12:
* chr13:
* chr14:
* chr15:
* chr16:
* chr17:
* chr18:
* chr19:
* chr20:
* chr21:
* chr22:

#### 