# Task 6 Gene-wise statistics (MAGMA)

In gene analysis, genetic marker data is aggregated to the level of whole genes, testing the joint association of all markers in the gene with the phenotype. Similarly, in gene-set analysis individual genes are aggregated to groups of genes sharing certain biological, functional or other characteristics.

This is done using MAGMA[1]. The gene-set analysis is divided into two distinct and largely independent parts. In the first part a gene analysis is performed to quantify the degree of association each gene has with the phenotype. In addition the correlations between genes are estimated. These correlations reflect the LD between genes, and are needed in order to compensate for the dependencies between genes during the gene-set analysis. The gene p-values and gene correlation matrix are then used in the second part to perform the actual gene-set analysis.


[1] de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS computational biology. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4401657/. Published April 17, 2015. Accessed August 18, 2020.

In [2]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [None]:
%env path=/mnt/data/GWAS/output/task6_genewise
%env task3path= /mnt/data/GWAS/output/task3_imputation/imputed_files

**Filter out low frequency SNPs (MAF<0.01) for MAGMA**

In [50]:
%%bash
awk '{OFS="\t"; if($6>0.01 && $6<0.99) print $0}' | sed 's/ /\t/g' $task3path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot > $path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot.maf.0.01
wc $path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot.maf.0.01


In [53]:
%%bash
wc $path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot.maf.0.01
wc $task3path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot

   7813810   93765720 1061436709 /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot.maf.0.01
   7813810   93765720 1061436709 /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.full.assoc.dosage.clean.rs.200kb.annotated.annot


In [73]:
%%bash
head $path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot.maf.0.01

CHR	BP	SNP	A1	A2	FRQ	INFO	OR	SE	P	RS	ANNOT
10	100000625	10:100000625:A:G	A	G	0.44	0.9621	0.8031	0.2456	0.3718	rs7899632	HPS1(-175.3kb)|LOXL4(-6.817kb)|MIR1287(-154.3kb)|MIR4685(-190.4kb)|PYROXD2(-142.7kb)|R3HCC1L(0)
10	100000645	10:100000645:A:C	A	C	0.2199	0.8997	0.9008	0.2801	0.709	rs61875309	HPS1(-175.3kb)|LOXL4(-6.797kb)|MIR1287(-154.3kb)|MIR4685(-190.4kb)|PYROXD2(-142.7kb)|R3HCC1L(0)
10	100001867	10:100001867:C:T	C	T	0.0101	0.894	2.8564	0.9312	0.2597	rs150203744	HPS1(-174.1kb)|LOXL4(-5.575kb)|MIR1287(-153.1kb)|MIR4685(-189.2kb)|PYROXD2(-141.5kb)|R3HCC1L(0)
10	100002464	10:100002464:T:C	T	C	0.0121	0.9859	0.1214	1.2119	0.08183	rs111551711	HPS1(-173.5kb)|LOXL4(-4.978kb)|MIR1287(-152.5kb)|MIR4685(-188.6kb)|PYROXD2(-140.9kb)|R3HCC1L(0)
10	100003242	10:100003242:T:G	T	G	0.1317	0.972	1.7964	0.3379	0.08294	rs12258651	HPS1(-172.7kb)|LOXL4(-4.2kb)|MIR1287(-151.7kb)|MIR4685(-187.8kb)|PYROXD2(-140.1kb)|R3HCC1L(0)
10	100003304	10:100003304:A:G	A	G	0.0341	0.8722	3.4588	0.7188	0.08427	rs72828461	

In [74]:
%%bash
# Generate  *.SNP.LOC (SNP, CHR,BP) and *.SNP.VAL files

awk 'BEGIN{OFS="\t";print "SNP","CHR","BP"};{OFS="\t"; if(NR>1)  print $11,$1,$2}' $path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot.maf.0.01 > $path/dataset.b37.imputed.dosage.maf.0.01.SNP.LOC
awk 'BEGIN{OFS="\t";print "SNP","P"};{OFS="\t"; if(NR>1) print $11,$10}' $path/dataset.b37.imputed.assoc.dosage.clean.rs.200kb.annot.maf.0.01 | sed 's/e/E/g' > $path/dataset.b37.imputed.dosage.maf.0.01.SNP.PVAL

head -3 $path/dataset.b37.imputed.dosage.maf.0.01.SNP.LOC 
head -3  $path/dataset.b37.imputed.dosage.maf.0.01.SNP.PVAL



SNP	CHR	BP
rs7899632	10	100000625
rs61875309	10	100000645
SNP	P
rs7899632	0.3718
rs61875309	0.709


**de 200 kb pasa a 50kb - esta bien?**

In [75]:
%%bash
# Annotate
/usr/lib/magma/magma --annotate window=50,50 --snp-loc $path/dataset.b37.imputed.dosage.maf.0.01.SNP.LOC --gene-loc /mnt/Almacen6/Adapted/protocol/ref_files/NCBI37.3.gene.loc --out $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb
head -3 $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot

Welcome to MAGMA v1.06 (linux)
Using flags:
	--annotate window=50,50
	--snp-loc /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.SNP.LOC
	--gene-loc /mnt/Almacen6/Adapted/protocol/ref_files/NCBI37.3.gene.loc
	--out /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb

Start time is 08:35:12, Tuesday 12 Nov 2019

Starting annotation...
Reading gene locations from file /mnt/Almacen6/Adapted/protocol/ref_files/NCBI37.3.gene.loc... 
	adding window: 50000bp
	19427 gene locations read from file
	chromosome  1: 2016 genes
	chromosome  2: 1226 genes
	chromosome  3: 1050 genes
	chromosome  4: 745 genes
	chromosome  5: 856 genes
	chromosome  6: 1016 genes
	chromosome  7: 906 genes
	chromosome  8: 669 genes
	chromosome  9: 775 genes
	chromosome 10: 723 genes
	chromosome 11: 1275 genes
	chromosome 12: 1009 genes
	chromosome 13: 320 genes
	chromosome 14:

In [3]:
%%bash
#Gene level analysis is performed using MAGMA, which compute gene-wise statistics taking into account physical distance and linkage disequilibrium (LD) between markers (de Leeuw et al. 2015). 
# All SNPs with MAF above 5% are used in these analyses, setting a distance threshold of 50kb. 
for i in {1..22}
do
nohup /usr/lib/magma/magma  --batch $i chr --big-data --seed 1234 --genes-only --bfile /mnt/Almacen6/Adapted/protocol/1K_Genome/Phase3_v5_reduced/mydataset.b37.rs --gene-annot $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot  --pval $path/dataset.b37.imputed.dosage.maf.0.01.SNP.PVAL  N=10000 --gene-model multi --out $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma &
done

nohup: nohup: nohup: failed to run command '/usr/lib/magma/magma'failed to run command '/usr/lib/magma/magma'failed to run command '/usr/lib/magma/magma': Permission denied
: Permission denied: Permission denied

nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: nohup: failed to run command '/usr/lib/magma/magma'failed to run command '/usr/lib/magma/magma': Permission denied: Permission denied

nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: nohup: failed to run command '/usr/lib/magma/magma': Permission denied
nohup: failed to run command '/usr/lib/magma/magma': Permission denied

In [86]:
%%bash
#merge batches
#/usr/lib/magma/magma
magma --merge $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma --out $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma
head $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out


Welcome to MAGMA v1.06 (linux)
Using flags:
	--merge /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma
	--out /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma

Start time is 09:19:36, Tuesday 12 Nov 2019

Merging gene results files with prefix '/mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma'... 
Reading file /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.batch1_chr.genes.out... 
	2001 genes read from file
Reading file /mnt/Almacen6/Adapted/jupyterAnalisis_others/Genomic_pipeline/outputs/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.batch2_chr.genes.out... 
	1223 genes

In [88]:
%%bash
sort -gk10,10 $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out >$path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted
head $path/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted


GENE       CHR      START       STOP  NSNPS  NPARAM      N        ZSTAT      P_JOINT  P_SNPWISE_MEAN  P_SNPWISE_TOP1
6869         2   75223590   75476645    729      80  10000       3.8263   6.5051e-05      1.6509e-05       0.0092927
7031        21   43732391   43836644    485      67  10000       3.4969   0.00023536      2.5108e-05        0.039246
8698        19    3128250    3230335    495      71  10000       3.2452   0.00058691      5.7742e-05        0.073409
223075       7   31503685   31748334    734      47  10000       3.1313   0.00087013      8.6048e-05         0.10956
5145         5  149187519  149374356    495      66  10000       3.7089   0.00010407      9.1283e-05       0.0041425
284257      18   54764293   54867639    338      33  10000       3.5861   0.00016785      9.2579e-05        0.003654
11214       15   85873818   86342589   1637      47  10000       3.5958   0.00016172      0.00010182        0.016458
114112       3  126275895  126425056    449      37  10000      

In [91]:
%%R
# Annotate genes using the reference file NCBI37.3.gene.loc
magma<-read.table("/mnt/data/GWAS/output/task6_genewise/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted", header=TRUE)
genes<-read.table("/mnt/data/GWAS/ref_files/NCBI37.3.gene.loc")
colnames(genes) <- c("GENE","CHR","START","STOP","STRAND","HUGO")

magma_merged<-merge(magma,genes, by="GENE")
magma_merged <- magma_merged[order(magma_merged$P_SNPWISE_MEAN), ]
magma_rank<-rank(magma_merged[,10],na.last = "keep", ties.method = "min")
magma_ranked=cbind(magma_rank, magma_merged)

write.table(magma_ranked, "/mnt/data/GWAS/output/task6_genewise/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted.annot", quote=FALSE, sep="\t", row.names = FALSE)


In [4]:
%%R
magma<-read.table("/mnt/data/GWAS/output/task6_genewise/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted", header=TRUE)
head(magma)

UsageError: Cell magic `%%R` not found.


In [3]:
genes<-read.table("/mnt/data/GWAS/ref_files/NCBI37.3.gene.loc")
colnames(genes) <- c("GENE","CHR","START","STOP","STRAND","HUGO")
head(genes)

hello


In [93]:
%%bash
head /mnt/data/GWAS/output/task6_genewise/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted.annot
tail /mnt/data/GWAS/output/task6_genewise/Imputed_files/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted.annot

magma_rank	GENE	CHR.x	START.x	STOP.x	NSNPS	NPARAM	N	ZSTAT	P_JOINT	P_SNPWISE_MEAN	P_SNPWISE_TOP1	CHR.y	START.y	STOP.y	STRAND	HUGO
1	6869	2	75223590	75476645	729	80	10000	3.8263	6.5051e-05	1.6509e-05	0.0092927	2	75273590	75426645	-	TACR1
2	7031	21	43732391	43836644	485	67	10000	3.4969	0.00023536	2.5108e-05	0.039246	21	43782391	43786644	-	TFF1
3	8698	19	3128250	3230335	495	71	10000	3.2452	0.00058691	5.7742e-05	0.073409	19	3178250	3180335	+	S1PR4
4	223075	7	31503685	31748334	734	47	10000	3.1313	0.00087013	8.6048e-05	0.10956	7	31553685	31698334	+	CCDC129
5	5145	5	149187519	149374356	495	66	10000	3.7089	0.00010407	9.1283e-05	0.0041425	5	149237519	149324356	-	PDE6A
6	284257	18	54764293	54867639	338	33	10000	3.5861	0.00016785	9.2579e-05	0.003654	18	54814293	54817639	+	BOD1L2
7	11214	15	85873818	86342589	1637	47	10000	3.5958	0.00016172	0.00010182	0.016458	15	85923818	86292589	+	AKAP13
8	114112	3	126275895	126425056	449	37	10000	3.7148	0.00010167	0.00015436	0.0031081	3	126325895	126375056	-	TXNR

**TODO: decide a shorter name for the output file and verify it on the user manual**

In [None]:
%%bash
head /mnt/data/GWAS/output/task6_genewise/dataset.b37.imputed.dosage.maf.0.01.LOC.50kb.genes.annot.magma.genes.out.sorted.annot