# Transcriptome Wide Association Study

Transcriptome-wide association analysis (TWAS) is included as a continuation of the SuSiE-TWAS workflow. The output from TWAS is used to perform variant selection used in the causal TWAS (cTWAS) analysis. 

Input:

`--gwas_meta_data`: a file containing study_id, chrom, file_path and column_mapping_file for GWAS information. For example:

```
study_id        chrom   file_path       column_mapping_file
Bellenguez_2022 1       $PATH/RSS_QC_RAISS_imputed.AD_Bellenguez_2022_April9_chr1.tsv.gz        $PATH/Bellenguez.yml
Bellenguez_2022 2       $PATH/RSS_QC_RAISS_imputed.AD_Bellenguez_2022_April9_chr2.tsv.gz        $PATH/Bellenguez.yml
Bellenguez_2022 3       $PATH/RSS_QC_RAISS_imputed.AD_Bellenguez_2022_April9_chr3.tsv.gz        $PATH/Bellenguez.yml
Bellenguez_2022 4       $PATH/RSS_QC_RAISS_imputed.AD_Bellenguez_2022_April9_chr4.tsv.gz        $PATH/Bellenguez.yml
```



`--ld_meta_data`: a file containing chrom, start, end, and path for linkage disequilibrium file information. For example:

```#chrom  start   end     path
chr1    101384274       104443097       chr1/chr1_101384274_104443097.cor.xz,chr1/chr1_101384274_104443097.cor.xz.bim
chr1    104443097       106225286       chr1/chr1_104443097_106225286.cor.xz,chr1/chr1_104443097_106225286.cor.xz.bim
chr1    106225286       109761915       chr1/chr1_106225286_109761915.cor.xz,chr1/chr1_106225286_109761915.cor.xz.bim
```

`--regions`: a file containing a list of linkage disequilibrium regions. For example:

```
chr      start   stop
chr1     16103   2888443
chr1     2888443         4320284
chr1     4320284         5853833
```

`--xqtl_meta_data`: a file containing information on twas weight files. For example:

```
"#chr" "region_id" "TSS" "start" "end" "contexts" "original_data"
"chr11" "ENSG00000073921" 86069881 84957175 87360000 NA "$PATH/multi_context_ROSMAP.chr11_ENSG00000073921.multivariate_twas_weights.rds,$PATH/multi_context_MiGA.chr11_ENSG00000073921.multivariate_twas_weights.rds"
```

`--xqtl_type_table`: a file with type and context columns describing the xqtls. For example:

```
type    context
eQTL    Ast_mega_eQTL
eQTL    Mic_mega_eQTL
eQTL    Oli_mega_eQTL
```

`--mr_pval_cutoff`: p-value cutoff.

`--rsq_cutoff`: r squared cutoff.

## Overview

1. Run TWAS
2. Run cTWAS

## Steps

### i. Run TWAS

```
sos run pipeline/twas_ctwas.ipynb twas \
   --cwd output/twas --name test \
   --gwas_meta_data data/twas/gwas_meta_test.tsv \
   --ld_meta_data reference_data/ADSP_R4_EUR/ld_meta_file.tsv \
   --regions data/twas/EUR_LD_blocks.bed \
   --xqtl_meta_data data/twas/mwe_twas_pipeline_test_small.tsv \
   --xqtl_type_table data/twas/data_type_table.txt \
   --rsq_pval_cutoff 0.05 --rsq_cutoff 0.01    
```

### ii. Run cTWAS

```
sos run pipeline/twas_ctwas.ipynb ctwas \
   --cwd output/twas --name test \
   --gwas_meta_data data/twas/gwas_meta_test.tsv \
   --ld_meta_data data/ld_meta_file_with_bim.tsv \
   --xqtl_meta_data data/twas/mwe_twas_pipeline_test_small.tsv \
   --twas_weight_cutoff 0 \
   --chrom 11 \
   --regions data/twas/EUR_LD_blocks.bed \
   --region-name chr10_80126158_82231647 chr11_84267999_86714492
```

## Anticipated Results

i. Run TWAS

`twas_region.chr1_205972031_208461272.mr_result.tsv.gz`:
* includes:

1. gene_name
2. num_CS
3. num_IV
4. cpip
5. meta_eff
6. se_meta_eff
7. meta_pval
8. Q
9. Q_pval
10. I2
11. context
12. gwas_study

`twas_region.chr1_205972031_208461272.twas_data.rds`:
* includes:

1. weights -  weights for each gene and context
2. z_gene -  gene z-values
3. z_snp - snp z-values
4. susie_weights_intermediate_qced - pip values and credible set variant information from susie
5. snp_info

`twas_region.chr1_205972031_208461272.twas.tsv.gz`:
* includes:

1. chr
2. molecular_id
3. TSS
4. start
5. end
6. context
7. gwas_study
8. method
9. is_imputable
10. is_selected_method
11. rsq_cv
12. pval_cv
13. twas_ztwas_pval
14. type
15. block



ii. Run cTWAS

`ctwas_region_chr1_z_snp_map.rds`
* includes:

1. z_snp - snp z-values per study
2. snp_map


`ctwas_region_chr1_LD_map.rds`
* includes:

1. region_id
2. LD_file
3. SNP_file


`ctwas_region_chr1_ctwas_weights.rds`
* includes ctwas weights for each gene in each study

`ctwas_region_chr1_[study_name].ctwas_boundary_genes.thin0.1.rds`
* one of these files is generated for each study. Includes:
1. chrom
2. id
3. p0
4. p1
5. molecular_id
6. weight_name
7. region_start
8. region_stop
9. region_id
10. n_regions

`ctwas_region_chr1_[study_name].ctwas_region_data.thin0.1.rds`
* one of these files is generated for each study. Include these values for each region:
1. region_id
2. chrom
3. start
4. stop
5. minpos
6. maxpos
7. thin
8. gid
9. sid
10. z_gene
11. z_snp
12. types
13. contexts
14. groups