# Example workflow

We'll work with pylibseq's wrapper to libsequence's SimData, which is used to process bi-allele data encoded as 0/1 = ancestral/derived, respectively

In [1]:
from __future__ import print_function
from libsequence.polytable import SimData

## Assigning to an object

You may assign from a list of tuples. 

Each tuple is a site: (pos:genotypes)

Here, there are 2 sites and a sample size of $n=4$

In [2]:
rawData1 = [(0.1,'0101'),(0.2,'1010')]

In [3]:
sd = SimData()

In [4]:
sd.assign(rawData1)
sd.numsites()

2

In [5]:
sd.size()

4

In [6]:
sd.pos()

[0.1, 0.2]

In [7]:
sd.data()

['01', '10', '01', '10']

Or, you can assign from separate list of positions and haplotypes

In [9]:
rawDataPos = [0.1,0.2]
rawDataGenos = ['01','10','01','10']
sd.assign(rawDataPos,rawDataGenos)

In [10]:
sd.numsites()

2

In [11]:
sd.size()

4

In [12]:
sd.pos()

[0.1, 0.2]

In [13]:
sd.data()

['01', '10', '01', '10']

## Summary statistics
Let's calculate some basic summary statistics

In [15]:
from libsequence.summstats import PolySIM
#ms 10 1 -s 10 -I 2 5 5 0.05
rawDataPos=[0.0997, 0.2551, 0.3600, 0.4831, 0.5205, 0.5668, 0.5824, 0.6213, 0.7499, 0.9669]
rawDataGenos=['0000001010',
              '0000000011',
              '0000001010',
              '0000001010',
              '0000001010',
              '1111010100',
              '1111010100',
              '1111110100',
              '1111010100',
              '1111010100']
sd.assign(rawDataPos,rawDataGenos)

In [16]:
ps = PolySIM(sd)

In [17]:
ps.thetapi()

4.822222222222222

In [18]:
ps.thetaw()

3.5348576237901534

In [19]:
ps.tajimasd()

1.6142469967484658

## Sliding windows

In [20]:
from libsequence.windows import Windows

In [21]:
w = Windows(sd,window_size=0.1,step_len=0.05,starting_pos=0.,ending_pos=1.0)

In [22]:
len(w)

20

In [23]:
for i in range(len(w)):
    #Each window is a simData
    wi = w[i]
    pswi = PolySIM(wi)
    print(pswi.thetaw())

0.3534857623790153
0.3534857623790153
0.0
0.0
0.3534857623790153
0.3534857623790153
0.3534857623790153
0.3534857623790153
0.3534857623790153
0.7069715247580306
1.060457287137046
1.060457287137046
0.3534857623790153
0.3534857623790153
0.3534857623790153
0.0
0.0
0.0
0.3534857623790153
0.3534857623790153


## $F_{ST}$

Let's pretend that our data are from two demes of sizes n/2 each.

Note that most flavors of $F_{ST}$ are very similar to one another.  See Charlesworth, B. (1998) Mol. Biol. Evol. 15(5): 538-543 for a great overview.

In [25]:
from libsequence.fst import Fst
sd.size()
f = Fst(sd,[5,5])

In [26]:
#Hudson, Slatkin, and Maddison's FST:
f.hsm()

0.9268292682926829

In [27]:
#Slatkin's
f.slatkin()

0.8636363636363636

In [28]:
#Hudson, Boos, and Kaplan, which is also Nei's Gst:
f.hbk()

0.8636363636363635

In [29]:
#Positions of snps shared b/w demes 0 and 1
f.shared(0,1)

set()

In [30]:
#Positions of private mutations in deme 0 and 1:
f.priv(0,1)

({0.5824, 0.9669}, {0.5205})

In [31]:
#Positions of fixed differences between demes 0 and 1:
f.fixed(0,1)

{0.0997, 0.2551, 0.36, 0.4831, 0.5668, 0.6213, 0.7499}