# Gene set analysis using MAGMA and PoPS analysis

Magma scores where computed and processed at CCC, using bash. Starting point: NodoBIO/pryectos/genpsych/Resultados_Ines

### Workflow:
- MAGMA scores: Gene annotation of all SNPs present in BD GWAS summary statistics and gene-based analysis
- MAGMA analysis: Gene set analysis using GO, HPO and hallmark categories and gene property analysis using WGCNA modules
- PoPS: Polygenic Priority Score based on MAGMA gene based output and a feature matrix.

## Generate MAGMA scores
This step was run at CCC server, file paths not corresponding to repository.

In [None]:
## Files preparation
#Filter of GWAS BD summary statistics to include only the official markers ('rs'), those with classic ACTG alleles and imputation info > 0.6
!awk '$5 ~ /^[ACGT]$/ && $6 ~ /^[ACGT]$/ && $3 ~ /^rs/ && $8 > 0.6 {print $0}' PGC/bd_main > PGC/bd_main_filtered
#Loc file of target SNPs (all present in GWAS BD)
!awk '{print $2 "\t" $1 "\t" $3}' PGC/bd_main_filtered > target_snps.loc
#Prepare annotation file with ENSID column, derived from gene annotation file of PoPS (Weeks et al. (2020))
#To add missing DGE and chr X genes, the genes to add were placed in a table with BioMart (Ensembl hg19) and pasted inside the gene_annot_jun10.txt file
!awk '{print $1 "\t" $3 "\t" $4 "\t" $5 }' gene_annot_jun10_added.txt > magma_ensid.txt
!sed '1d' magma_ensid.txt> magma_ensid.gene.loc #it was necessary to add genes on chrX

## Gene annotation
!module load plink/1.07
!module load magma/1.09

!magma --annotate window=5 --snp-loc pops-master/myfiles/target_snps.loc --gene-loc pops-master/myfiles/magma_ensid.gene.loc --out pops-master/myfiles/bd_genes

## Gene based analysis
!magma --bfile MAGMA/g1000_eur --gene-annot pops-master/myfiles/bd_genes.genes.annot --pval PGC/bd_main_filtered use='SNP','P' N=413466 --gene-model multi=snp-wise --out pops-master/myfiles/bd-scores


## MAGMA analysis
### Gene set analysis for enriched categories

The objective of this step is to validate the enriched categories that come from both GSEA results and ORA results (from WGCNA significant modules)

In [None]:
!module load plink/1.07
!module load magma/1.09

!magma --gene-results pops-master/myfiles/bd-scores.genes.raw --set-annot MAGMA/validate_sets col=2,1 --out MAGMA/validate_sets_results

### Gene property analysis for WGCNA modules

The purpose of this step is to know if the gene coexpression modules are somehow related to the GWAS summary statistics, thus, if the significantly correlated modules have a significant p value among the genome wide associaiton signals.

In [None]:
!magma --gene-results pops-master/myfiles/bd-scores.genes.raw --gene-covar pops-master/myfiles/wgcna_features/raw/MM.tsv --out MAGMA/validate_modules