# CSF ANALYSIS FOR PPMI data

<span style="color:green">
Created: 2018-08-14     
By: Hirotaka Iwaki    
This document is a branched version of [prior analysis](https://github.com/hirotaka-i/sp01_ASYN/blob/master/CSF_ASync.ipynb)     
Instead of using raw csf $\alpha$-synuclein concentrations, here we used log tranformed concentrations.    
The main reason of dosing so is to combine the results with [prvious study on ADNI](https://www.ncbi.nlm.nih.gov/pubmed/29959729)    
**Analysis summary**    
Among 51 significant/sub-significant variants in ADNI study, we had 4 variants imputed with sufficient quality (Rsq>0.8) in PPMI dataset. In the analysis within PPMI dataset, only one had p < 0.05 for the association with the outcome. This association was mainly derived from controls rather than cases. The direction was opposite from the ADNI study.     
I also evaluated some variants in interest, including meta5 risk variants. But cannot find signals. either
</span>


## Variants in interest from ADNI study

#### The study used different reference for variants' positions. Get GRCh37 positions from rsIDs.

In [2]:
%%bash
mkdir -p t
sed 's/,/\t/g' data/sigADNI.csv | awk 'NR==1{print $0; next}{print $0 | "LANG=C sort"}' > t/signADNI_IDsorted.tab
LANG=C join --header  /home/$USER/tools/rs_37_IDsorted.txt t/signADNI_IDsorted.tab > t/signANDI_IDsorted_pos.tab

####  Convert betas from ADNI study to go with the alternative alleles in PPMI dataset

In [3]:
import pandas as pd
df = pd.read_csv("t/signANDI_IDsorted_pos.tab", sep = " ")
def ConvGenotype(x):
    if x == "A":
        return 1
    if x == "T":
        return -1
    if x == "G":
        return 10
    if x == "C":
        return -10   
def CheckFlip(x):
    if ConvGenotype(x.REF) + ConvGenotype(x.ALT) == 0:
        return None # palindrome
    elif abs(ConvGenotype(x.ALT)) == abs(ConvGenotype(x.A1)) :
        return x.BeTA # not flipped
    else:
        return -x.BeTA # flipped
df['convBETA'] = df.apply(CheckFlip, axis=1)
df.iloc[:,[1,0,8,2,3,4,6,7,8]].to_csv('t/ADNIsnps.txt', index=False, sep=" ")
df.head()

Unnamed: 0,ID,POS,REF,ALT,MAF,A1,BeTA,P,NearestG,convBETA
0,rs10119993,9:132139969,T,C,0.05364,T,0.2877,2.74e-06,Unknown,-0.2877
1,rs1041915,6:130014253,A,G,0.09987,C,0.2461,2.16e-06,ARHGAP18,0.2461
2,rs10489390,1:210507809,A,G,0.0852,C,0.2736,3.74e-06,HHAT,0.2736
3,rs10762004,10:67479447,G,T,0.05812,T,0.4406,1.91e-08,LINC01515,0.4406
4,rs10762011,10:67525572,T,C,0.06424,C,0.3394,1.96e-06,LINC01515,0.3394


In [4]:
%%bash
awk 'NR==1{print $0; next}{print $0 | "LANG=C sort"}' t/ADNIsnps.txt > data/_snpADNI.txt

## List of another variants in interest
<span style="color:green">
    92 risk variants recently identified + variants in interests (GBA, LRRK, APOE..)
</span> 

In [5]:
%%bash
awk  '{print $1,$48,$34,$26,$27,$28,$29,$30,$31}' data/Meta5.tab | sed 's/chr//' |\
    awk 'NR==1{print;next}{print;"LANG=C sort"}'> data/_snpMETA5.txt
awk -F',' '{print $4":"$5,$2,$1}' data/ProgressionPlus.csv |\
    awk '":"{print;next}{print;"LANG=C sort"}'> data/_snpPLUS.txt
awk -F' ' '/rs/{print $1,$2,$3}' data/_snp* | LANG=C sort | sed '1 iPOS ID NearGene' | sed 's/ /\t/g' > data/snpAll.txt

In [9]:
%%bash
cut -d' ' -f2 t/signANDI_IDsorted_pos.tab | tail -n +2 | LANG=C sort > posADNI.txt
cut -f1 data/Meta5.tab | tail -n +2 | sed 's/chr//g' | LANG=C sort > posMETA5.txt
cut -d',' -f 4,5 data/ProgressionPlus.csv | sed 's/,/:/g' | tail -n +2 | LANG=C sort | head > posPLUS.txt
cat posADNI.txt posMETA5.txt posPLUS.txt | LANG=C sort | uniq |sed '1 iPOS'> posAll.txt
LANG=C join -t$'\t' -a 1 --header posAll.txt ../dataset/PPMI/maf001rsq3.info > posAll_freq.txt

In [191]:
%%bash
# extract from plink bfile which filter imputation quality of Rsq >0.8
module load plink
DATASET=PPMI
plink --bfile /data/LNG/CORNELIS_TEMP/progression_GWAS/$DATASET/plink_files_hard/$DATASET.HARDCALLS \
    --extract posAll.txt --recodeA include-alt --out t/"$DATASET"_extract

PLINK v1.90b4.4 64-bit (21 May 2017)           www.cog-genomics.org/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Note: --recodeA flag deprecated.  Use 'recode A ...'.
Logging to t/PPMI_extract.log.
Options in effect:
  --bfile /data/LNG/CORNELIS_TEMP/progression_GWAS/PPMI/plink_files_hard/PPMI.HARDCALLS
  --extract posAll.txt
  --out t/PPMI_extract
  --recode A include-alt

257653 MB RAM detected; reserving 128826 MB for main workspace.
1073549 variants loaded from .bim file.
580 people (0 males, 0 females, 580 ambiguous) loaded from .fam.
Ambiguous sex IDs written to t/PPMI_extract.nosex .
--extract: 82 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 580 founders and 0 nonfounders present.
Calculating allele frequencies... 0%1%2%3%4%5%6%7%8%9%10%11%12%13%14%15%16%17%18%19%20%21%22%23%24%25%26%27%28%29%30%

[+] Loading plink  1.9.0-beta4.4  on cn3297 


### Analysis
<span style="color:green">
    Analysis will be done by R.    
    _analysis.R will process the variants per every 20K. (This is for later step. Not relevant to the analysis here)    
    Analysis will be done by case-only, control-only, and all
</span>

In [192]:
%%bash
mkdir -p PPMI/cut20_extract
FOLDER=PPMI/cut20_extract
DATASET=PPMI
N_COL=$(head -n1 t/"$DATASET"_extract.raw | sed 's/ /\n/g' | wc -l)
N_SNP=$(expr $N_COL - 6) # first 6 columns (FID IID PAT MAT SEX PHENOTYPE)
N_ITER=$(expr $(expr $N_SNP - 1) / 20000)  # number of iteration (couting up from 0)
echo $DATASET N_COL_$N_COL N_SNP_$N_SNP N_ITER_$N_ITER
for i in $(seq 0 $N_ITER);do
    START=$(expr 20000 \* $i + 7)
    if [ $i == $N_ITER ];then
        STOP=$N_COL
        cut -d" " -f2,"$START"-"$STOP" t/"$DATASET"_extract.raw | sed 's/ /\t/g' > $FOLDER/tp"$i".txt
    else
        STOP=$(expr 20000 \* $(expr $i + 1) + 6)
        cut -d" " -f2,"$START"-"$STOP" t/"$DATASET"_extract.raw | sed 's/ /\t/g' > $FOLDER/tp"$i".txt
    fi
done

PPMI N_COL_88 N_SNP_82 N_ITER_0


#### Separate analysis for case and controls

In [231]:
%%bash
# Rscript _analysis.R  FILE  logASYN  CASECODE
echo  '
library(data.table);library(dplyr);library(lme4)
args <- commandArgs(trailingOnly = TRUE)
FILE = args[1]
DATASET=strsplit(FILE, "/")[[1]][1]
ITER=strsplit(FILE, "/")[[1]][3] %>% gsub("^tp|.txt$","" , .)
OUTCOME=args[2]
CASECODE = as.numeric(args[3])
print(paste(DATASET,ITER, OUTCOME, CASECODE))
# Read pheno.txt and set COV (PC1 and PC2)
cohort=fread("data/pheno_PC5.txt") %>% filter(CASE==CASECODE) # Only limiting to CASE/CONTROL
COVs = names(cohort)[c(-(1:3),-9, -10, -11)] # eliminate ID, OUTCOME, CASE, PC3-5 from covariates
cohort = cohort %>% mutate(logASYN=log(ASYN))
# Read genotyping data
SNPset = fread(FILE)
SNPs = names(SNPset)[-1] # 1 ID, SNP name starts from 2nd col
## Merge
cohort_snp = left_join(cohort, SNPset, by = "IID")
# Set function for analysis
glmm.listfunc = function(x){
  # Models
  MODEL = paste(OUTCOME, "~", "`", SNPs[x], "`+", paste(COVs, collapse="+"), "+(1|IID)", sep = "")
  testLmer = try(lmer(eval(parse(text = MODEL)), data = cohort_snp),silent = T)
  if(class(testLmer)[1]=="try-error"){
    sumstat=rep("DROP",6)
  }else{
    temp = summary(testLmer)
    temp1 = temp$coefficients
    if(grep(substr(SNPs[x],1,7), rownames(temp1)) %>% length == 0){ # In this case, SNP is dropeed from the model
      sumstat=rep(NA,6)
    }else{
      RES = temp1[2,] # The first row is intercept
      PV_APPROX = 2 * pnorm(abs(RES[3]), lower.tail=F) # df is enough large for approximation
      OBS_N = paste(length(temp$residuals), "_", temp$ngrps, sep="")
      sumstat <- c(SNPs[x], OBS_N, RES[3], RES[1], RES[2], PV_APPROX)
    }
  }
  return(sumstat)
}
temp = lapply(1:length(SNPs), glmm.listfunc)
temp2 = do.call(rbind, temp)
attributes(temp2)$dimnames[[2]]=c("POS_A1(/A2)", "OBS_N", "Tvalue", "Beta", "SE", "PvApprox")
write.table(temp2, paste(DATASET, "/", OUTCOME, "/PHENO", CASECODE, "_", ITER, ".txt", sep=""), row.names = F, quote = F, sep = "\t")
print("complete")
' > _analysis1.R

DATASET=PPMI
OUTCOME=logASYN
FOLDER=PPMI/cut20_extract
module load R
mkdir -p $DATASET/$OUTCOME
for FILE in $(ls $FOLDER);do
    for CASECODE in 0 1;do
        Rscript --vanilla _analysis1.R $FOLDER/$FILE logASYN $CASECODE
    done
done

[1] "PPMI 0 logASYN 0"
[1] "complete"
[1] "PPMI 0 logASYN 1"
[1] "complete"


[+] Loading gcc  7.2.0  ... 
[+] Loading GSL 2.4 for GCC 7.2.0 ... 
[+] Loading openmpi 3.0.0  for GCC 7.2.0 
[+] Loading R 3.5.0_build2 

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: Matrix
fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: Matrix


In [232]:
%%bash
# Note A1 is a counted allele and usually minor. (A2 is usually major)
cut -f 1,4-6 PPMI/logASYN/PHENO0_0.txt | sed 's/_/\t/g' | sed 's/(\//\t/g' | sed 's/)//g' | head

POS	A1	A2	Beta	SE	PvApprox
1:154898185	C	G	0.120038966816388	0.227873839748725	0.598347636439312
1:155135036	A	G	-0.325598320954437	0.190698836613698	0.0877485803723177
1:155205634	C	T	0.249251667011242	0.385078639295328	0.517454162152952
1:161469054	G	C	0.0520186318981671	0.0426181488793378	0.222247111556426
1:171719769	T	C	0.0359282275543565	0.0566832618258547	0.526183991974181
1:205723572	C	T	-0.00272207441549625	0.0467392361407523	0.953557778026501
1:205737739	A	G	0.0114576208926563	0.0611025738610337	0.851257226375337
1:226916078	C	T	-0.0754508538783502	0.0481833018542482	0.117369009712297
1:232664611	T	C	-0.172876359199629	0.0649641967329956	0.00778853398931668


In [234]:
%%bash
DATASET=PPMI
for STRATA in PHENO0_0 PHENO1_0;do
    cut -f 1,4-6 $DATASET/logASYN/$STRATA.txt | sed 's/_/\t/g' | sed 's/(\//\t/g' | sed 's/)//g' |\
    awk 'NR==1{print $0;next}{print $0 | "LANG=C sort"}' |\
    LANG=C join -t$'\t' --header - ../dataset/$DATASET/maf001rsq3.info > t/$STRATA.txt
done

In [241]:
%%bash
DATASET=PPMI
OUTCOME=logASYN
mkdir -p meta/$DATASET/$OUTCOME
rm meta/$DATASET/$OUTCOME/metal.txt
echo "
SCHEME STDERR
AVERAGEFREQ ON
MINMAXFREQ ON
" > meta/$DATASET/$OUTCOME/metal.txt
for STRATA in PHENO0_0 PHENO1_0;do
    echo "
    MARKER POS
    ALLELE A1 A2
    FREQ   ALT_Frq
    EFFECT Beta
    STDERR SE
    PVALUE PvApprox
    PROCESS t/$STRATA.txt
  " >> meta/$DATASET/$OUTCOME/metal.txt
done
echo "
OUTFILE meta/$DATASET/$OUTCOME/metares .tbl
ANALYZE HETEROGENEITY
QUIT
" >> meta/$DATASET/$OUTCOME/metal.txt
module load metal
metal meta/$DATASET/$OUTCOME/metal.txt
sort -gk 10 meta/$DATASET/$OUTCOME/metares1.tbl | head

MetaAnalysis Helper - (c) 2007 - 2009 Goncalo Abecasis

# This program faciliates meta-analysis of genome-wide association studies.
# Commonly used commands are listed below:
#
# Options for describing input files ...
#   SEPARATOR        [WHITESPACE|COMMA|BOTH|TAB] (default = WHITESPACE)
#   COLUMNCOUNTING   [STRICT|LENIENT]            (default = 'STRICT')
#   MARKERLABEL      [LABEL]                     (default = 'MARKER')
#   ALLELELABELS     [LABEL1 LABEL2]             (default = 'ALLELE1','ALLELE2')
#   EFFECTLABEL      [LABEL|log(LABEL)]          (default = 'EFFECT')
#   FLIP
#
# Options for filtering input files ...
#   ADDFILTER        [LABEL CONDITION VALUE]     (example = ADDFILTER N > 10)
#                    (available conditions are <, >, <=, >=, =, !=, IN)
#   REMOVEFILTERS
#
# Options for sample size weighted meta-analysis ...
#   WEIGHTLABEL      [LABEL]                     (default = 'N')
#   PVALUELABEL      [LABEL]                     (default = 'PVALUE')
#   DEFAUL

rm: cannot remove ‘meta/PPMI/logASYN/metal.txt’: No such file or directory
[+] Loading metal  2017-12-21 


<span style="color:green">
    Note that METAL has the priority in allele genotype A>T>C>T. Some were flipped from A1/A2 in the original file
</span>

#### Analysis using both data

In [10]:
%%bash 
# Rscript _analysis.R  FILE  logASYN
echo '
library(data.table);library(dplyr);library(lme4)
args <- commandArgs(trailingOnly = TRUE)
FILE = args[1]
DATASET=strsplit(FILE, "/")[[1]][1]
ITER=strsplit(FILE, "/")[[1]][3] %>% gsub("^tp|.txt$","" , .)
OUTCOME=args[2]
CASECODE="BOTH"
print(paste(DATASET,ITER, OUTCOME, CASECODE))
# Read pheno.txt and set COV (PC1 and PC2)
cohort=fread("data/pheno_PC5.txt") 
COVs = names(cohort)[c(-(1:2),-9, -10, -11)] # eliminate ID, OUTCOME, PC3-5 from covariates. (CASE will be included)
cohort = cohort %>% mutate(logASYN=log(ASYN))
# Read genotyping data
SNPset = fread(FILE)
SNPs = names(SNPset)[-1] # 1 ID, SNP name starts from 2nd col
## Merge
cohort_snp = left_join(cohort, SNPset, by = "IID")
# Set function for analysis
glmm.listfunc = function(x){
  # Models
  MODEL = paste(OUTCOME, "~", "`", SNPs[x], "`+", paste(COVs, collapse="+"), "+(1|IID)", sep = "")
  testLmer = try(lmer(eval(parse(text = MODEL)), data = cohort_snp),silent = T)
  if(class(testLmer)[1]=="try-error"){
    sumstat=rep("DROP",6)
  }else{
    temp = summary(testLmer)
    temp1 = temp$coefficients
    if(grep(substr(SNPs[x],1,7), rownames(temp1)) %>% length == 0){ # In this case, SNP is dropeed from the model
      sumstat=rep(NA,6)
    }else{
      RES = temp1[2,] # The first row is intercept
      PV_APPROX = 2 * pnorm(abs(RES[3]), lower.tail=F) # df is enough large for approximation
      OBS_N = paste(length(temp$residuals), "_", temp$ngrps, sep="")
      sumstat <- c(SNPs[x], OBS_N, RES[3], RES[1], RES[2], PV_APPROX)
    }
  }
  return(sumstat)
}
temp = lapply(1:length(SNPs), glmm.listfunc)
temp2 = do.call(rbind, temp)
attributes(temp2)$dimnames[[2]]=c("POS_A1(/A2)", "OBS_N", "Tvalue", "Beta", "SE", "PvApprox")
write.table(temp2, paste(DATASET, "/", OUTCOME, "/PHENO", CASECODE, "_", ITER, ".txt", sep=""), row.names = F, quote = F, sep = "\t")
print("complete")
' > _analysis2.R

DATASET=PPMI
OUTCOME=logASYN
FOLDER=PPMI/cut20_extract
module load R
for FILE in $(ls $FOLDER);do
    Rscript --vanilla _analysis2.R $FOLDER/$FILE logASYN
done

[1] "PPMI 0 logASYN BOTH"
[1] "complete"


[+] Loading gcc  7.2.0  ... 
[+] Loading GSL 2.4 for GCC 7.2.0 ... 
[+] Loading openmpi 3.0.0  for GCC 7.2.0 
[+] Loading R 3.5.0_build2 

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Loading required package: Matrix


## Analysis results
<span style="color:green">
    We have 4 variants in PPMI among the ones reported in the ADNI study.
</span>

In [29]:
%%bash
awk ' $3 > 0.8 {print $0}' posAll_freq.txt | LANG=C join --header - data/_snpADNI.txt

POS ALT_Frq Rsq ID NearestG REF ALT MAF BeTA P NearestG
2:152239059 0.12421 0.88856 rs11885282 TNFAIP6 C A 0.1229 0.2295 9.03e-06 TNFAIP6
6:28018944 0.04194 0.93115 rs9393879 ZNF165 G A 0.05629 0.2676 7.95e-06 ZNF165
6:80626375 0.11332 0.99886 rs3812153 ELOVL4 T C 0.1156 0.2218 7.6e-06 ELOVL4
6:80652229 0.11854 0.89586 rs239520 ELOVL4 G T 0.1282 0.2218 7.6e-06 ELOVL4


In [14]:
%%bash
echo "ADNI variants in the results (CASES)"
awk 'NR==1{print;next}{print|"grep -f posADNI.txt"}' PPMI/logASYN/PHENO1_0.txt | cut -f 1,4-6
echo "ADNI variants in the results (CONTROLS)"
awk 'NR==1{print;next}{print|"grep -f posADNI.txt"}' PPMI/logASYN/PHENO0_0.txt | cut -f 1,4-6
echo "ADNI varians in the results (BOTH in the same model)"
awk 'NR==1{print;next}{print|"grep -f posADNI.txt"}' PPMI/logASYN/PHENOBOTH_0.txt | cut -f 1,4-6
echo "ADNI varians in the results (BOTH in the meta-analysis) P is in the 10th column" 
cut -f 1-4,8- meta/PPMI/logASYN/metares1.tbl | awk 'NR==1{print;next}{print|"grep -f posADNI.txt"}'  | cut -f 1,5-

ADNI variants in the results (CASES)
POS_A1(/A2)	Beta	SE	PvApprox
2:152239059_A(/C)	-0.0410664010484468	0.043845041944795	0.348951034166631
6:28018944_A(/G)	0.0108312130627539	0.0718106092103342	0.880109613053781
6:80626375_C(/T)	0.0527877197548212	0.0441769184478105	0.232119735326959
6:80652229_T(/G)	0.0554331009655801	0.044007964647903	0.207808211396904
ADNI variants in the results (CONTROLS)
POS_A1(/A2)	Beta	SE	PvApprox
2:152239059_A(/C)	-0.136135647733776	0.0603830920370812	0.0241626840115805
6:28018944_A(/G)	-0.0647129905249892	0.0965164429618842	0.50254758741586
6:80626375_C(/T)	0.0222083061871118	0.0729757581628624	0.760880732928569
6:80652229_T(/G)	0.0102669948957751	0.0727697730663932	0.887799846407267
ADNI varians in the results (BOTH in the same model)
POS_A1(/A2)	Beta	SE	PvApprox
2:152239059_A(/C)	-0.0769353390590187	0.0353351641262175	0.0294580092807975
6:28018944_A(/G)	-0.0100939197480789	0.0574544224740449	0.860540879638807
6:80626375_C(/T)	0.0354214892808076	0.037482798

<span style="color:green">
    Top hits from meta analysis, among variants in interest are;
</span>

In [19]:
%%bash
cut -f 1-4,8- meta/PPMI/logASYN/metares1.tbl | awk 'NR==1{print;next}$7<0.1{print | "sort -gk7"}'

MarkerName	Allele1	Allele2	Freq1	Effect	StdErr	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
2:152239059	a	c	0.1242	-0.0739	0.0355	0.03729	--	38.4	1.623	1	0.2027
4:90636630	a	g	0.3308	-0.0529	0.0257	0.03913	--	0.0	0.021	1	0.8847
11:83487277	a	c	0.4004	0.0503	0.0246	0.04047	++	0.0	0.454	1	0.5006
1:171719769	t	c	0.2250	0.0588	0.0305	0.05332	++	0.0	0.230	1	0.6317
3:161077630	a	g	0.6633	-0.0499	0.0260	0.05494	--	0.0	0.171	1	0.6791
10:121415685	a	g	0.2418	0.0564	0.0296	0.0567	++	0.0	0.125	1	0.7236
6:32578772	a	c	0.1573	0.0616	0.0334	0.06468	++	61.2	2.577	1	0.1084
14:37989270	t	c	0.6029	0.0435	0.0237	0.06638	++	0.0	0.056	1	0.8132
13:97865021	t	c	0.7469	-0.0475	0.0282	0.09175	--	0.0	0.756	1	0.3845
10:15557406	t	c	0.3103	0.0421	0.0255	0.09909	++	0.0	0.037	1	0.8466


## Meta-analysis with ADNI study

In [61]:
%%bash
echo '
SCHEME SAMPLESIZE
AVERAGEFREQ ON
MINMAXFREQ ON

MARKER POS
ALLELE ALT REF
FREQ MAF
EFFECT BeTA
PVALUE P
WEIGHTLABEL DONTUSECOLUMN
DEFAULTWEIGHT 209
PROCESS data/signANDI_IDsorted_pos.tab

MARKER POS
ALLELE A1 A2
FREQ   ALT_Frq
EFFECT Beta
STDERR SE
PVALUE PvApprox
WEIGHTLABEL DONTUSECOLUMN
DEFAULTWEIGHT 153
PROCESS t/PHENO0_0.txt
  

MARKER POS
ALLELE A1 A2
FREQ   ALT_Frq
EFFECT Beta
PVALUE PvApprox
WEIGHTLABEL DONTUSECOLUMN
DEFAULTWEIGHT 346
PROCESS t/PHENO1_0.txt
  

OUTFILE meta/PPMI/logASYN/metares2 .tbl
ANALYZE HETEROGENEITY
QUIT
'> meta/PPMI/logASYN/metal2.txt
module load metal
metal meta/PPMI/logASYN/metal2.txt

MetaAnalysis Helper - (c) 2007 - 2009 Goncalo Abecasis

# This program faciliates meta-analysis of genome-wide association studies.
# Commonly used commands are listed below:
#
# Options for describing input files ...
#   SEPARATOR        [WHITESPACE|COMMA|BOTH|TAB] (default = WHITESPACE)
#   COLUMNCOUNTING   [STRICT|LENIENT]            (default = 'STRICT')
#   MARKERLABEL      [LABEL]                     (default = 'MARKER')
#   ALLELELABELS     [LABEL1 LABEL2]             (default = 'ALLELE1','ALLELE2')
#   EFFECTLABEL      [LABEL|log(LABEL)]          (default = 'EFFECT')
#   FLIP
#
# Options for filtering input files ...
#   ADDFILTER        [LABEL CONDITION VALUE]     (example = ADDFILTER N > 10)
#                    (available conditions are <, >, <=, >=, =, !=, IN)
#   REMOVEFILTERS
#
# Options for sample size weighted meta-analysis ...
#   WEIGHTLABEL      [LABEL]                     (default = 'N')
#   PVALUELABEL      [LABEL]                     (default = 'PVALUE')
#   DEFAUL

[+] Loading metal  2017-12-21 


In [30]:
%%bash 
awk '$14>1' meta/PPMI/logASYN/metares21.tbl | head

MarkerName	Allele1	Allele2	Freq1	FreqSE	MinFreq	MaxFreq	Weight	Zscore	P-value	Direction	HetISq	HetChiSq	HetDf	HetPVal
6:28018944	a	g	0.0462	0.0065	0.0419	0.0563	708.00	2.221	0.02638	+-+	87.1	15.491	2	0.0004326
6:80652229	t	g	0.1214	0.0044	0.1185	0.1282	708.00	3.378	0.0007298	+++	80.5	10.231	2	0.006004
6:80626375	t	c	0.8860	0.0010	0.8844	0.8867	708.00	-3.409	0.0006525	---	79.9	9.937	2	0.006955
2:152239059	a	c	0.1238	0.0006	0.1229	0.1242	708.00	0.709	0.4783	+--	92.1	25.164	2	3.434e-06


<span style="color:green">
    P-values are mostly driven by ADNI study. the top hit in the PPMI had different direction. 
</span>

<span style="color:purple">
</span>

<span style="color:purple">
</span>