# Plink.jl

Julia functions to read binary Plink files (.fam, .bim, .bed). The .bed file is memory-mapped as a `BitArray`.

In [1]:
using Plink

Open a Plink file


In [2]:
p = PlinkFile("../data/plink")

<PLINK file (1056 samples x 4 markers) at ../data/plink>

Number of samples

In [3]:
nsamples(p)

1056

Samples are stored in a `Sample` struct with fields

- fid (family id)
- iid (individual id within family)
- iid of father (within family)
- iid of mother (within family)
- sex (1 = male, 2 = female, 0 = unknown)
- phenotype (1 = control, 2 = case, 0/-9 = missing)

In [4]:
samples(p)[1:5]

5-element Vector{Sample}:
 Sample("1", "1", "0", "0", 1, 2)
 Sample("2", "1", "0", "0", 1, 2)
 Sample("3", "1", "0", "0", 1, 2)
 Sample("4", "1", "0", "0", 1, 2)
 Sample("5", "1", "0", "0", 1, 2)

Number of markers

In [5]:
nmarkers(p)

4

Markers are stored in a `Marker` struct with fields
- chrom
- id
- cM
- pos
- a1
- a2

In [6]:
markers(p)

4-element Vector{Marker}:
 Marker("1", "rs001", 2.09, 123446, "2", "1")
 Marker("1", "rs002", 2.15, 123452, "2", "1")
 Marker("1", "rs003", 2.2, 123457, "2", "1")
 Marker("1", "rs004", 2.21, 123458, "1", "2")

Genotypes (A2,A1) can be accessed by indexing `PlinkFile` using [sample_idx, marker_idx].

Mapping from (A1, A2) to genotypes:

- (false, false): Hom a1
- (false, true):  Missing
- (true, false): Het
- (true, true): Hom a2

NOTE: Plink refers to a genotype as A2A1

In [7]:
p[12,1]

(true, false)

Single alleles A1 and A2 can also be accessed directly as `BitArray` views.

In [13]:
p.A2[12,1], p.A1[12,2]

(true, false)

The index of a marker in the Plink file (using linear search) can be obtained using `marker_index`.

In [9]:
mi = marker_index(p, "rs003")

3

In [10]:
markers(p)[mi]

Marker("1", "rs003", 2.2, 123457, "2", "1")

The index of a sample in the Plink file (using linear search) can be obtained using `sample_index`

In [11]:
si = sample_index(p, "12", "1")

12

In [12]:
samples(p)[si]

Sample("12", "1", "0", "0", 1, 2)