In this tutorial we run association tests on a syntetic dataset gemerated with plink (400 individuals, 1 SNP, binary phenotype). We use this commands to generate the data, recode it into ``.ped`` format, and then run association tests.
 
```
plink --dummy 400 1 acgt
plink --bfile --recode
plink --bfile --assoc
```

The resulting `plink.assoc` file looks as follows.
```
 CHR  SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR 
   1 snp0          0    G   0.4782      0.5    A       0.3816       0.5367       0.9163 
```

Our goal is to reproduce this results using ``.ped`` file.

In [1]:
import pandas as pd
import numpy as np
import math

df = pd.read_csv('plink.ped', sep=' ', header=None, 
                 names=['FamilyId', 'Id', 'FatherId', 'MotherId', 'Sex', 'Affection',
                        'A1', 'A2'],
                 index_col=1)
df = df[['Affection', 'A1', 'A2']]  # prune all irrelevant columns
df[:6]

Unnamed: 0_level_0,Affection,A1,A2
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
per0,1,G,A
per1,1,G,G
per2,2,G,A
per3,2,G,A
per4,2,A,A
per5,2,G,A


First we will combine ``A1`` and ``A2`` columns via [pandas.melt](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html).

In [2]:
df = pd.melt(df, id_vars=['Affection'], value_vars=['A1', 'A2'])
df[:6]

Unnamed: 0,Affection,variable,value
0,1,A1,G
1,1,A1,G
2,2,A1,G
3,2,A1,G
4,2,A1,A
5,2,A1,G


Then we find the contingency table.

In [3]:
df['count'] = 1
freq = df.groupby(['Affection', 'value']).sum().unstack(level=1)['count']
freq

value,A,G
Affection,Unnamed: 1_level_1,Unnamed: 2_level_1
1,194,194
2,215,197


Now we can analize this data in R.
```
> freq <- matrix(c(194, 194, 215, 197), ncol=2, byrow=TRUE)
> colnames(freq) <- c("A", "G")
> rownames(freq) <- c("Control", "Cases")
> chisq.test(freq)

        Pearson's Chi-squared test with Yates' continuity correction

data:  freq
X-squared = 0.29919, df = 1, p-value = 0.5844

> fisher.test(freq)

        Fisher's Exact Test for Count Data

data:  freq
p-value = 0.5714
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.6875727 1.2210360
sample estimates:
odds ratio 
   0.91638 
```

The ``odds ratio`` value perfectly matches plink output, while other data is somewhat different.