### Design

We want to be able to fit the windows design for two-site statistics into the `tsk_treeseq_sample_count_stat` framework, while being able to use pairs of windows instead of the usual 1-d windows. This won't mean that the user has to explicitly specify two ranges of windows, they will be optional in the python api, but in the C api, we will have our two-site statistics code parse pairs of windows for simplicity. Our window validation code will run if we're on the code path for two-site statistics and the original window validation code will run for single site statistics.

Unfortunately, I can't think of a way to keep the c api exactly the same. We'll have to turn the num_windows into a length-2 array or we'll need two num_windows parameters. Perhaps `num_windows_2` or `num_windows_total`.

Ultimately, we would like to be able to comput the LD between pairs of windows. This will allow users to compute subsets of a whole correlation matrix. We will obtain either an nxn matrix or an nxm matrix from our specification.

Our validation for windows will require:

1. Windows are sorted (lefthand and righthand windows, by position).
1. There will be at least one window

The `tsk_treeseq_sample_count_stat` code accepts a double array for window boundnaries, so we will be storing both ranges in an array that gets passed into the two-site statistics code and transformed into two vectors.

The following: `[1, 3, 6, 1, 2, 4]` will become `[[1, 3), [3, 6)]`, `[[1, 2), [2, 4)]`

In the two-locus code, we'll make comparisons between `[1, 3) x [1, 2)`, `[1, 3) x [2, 4)`, `[3, 6) x [1, 2)`, `[3, 6) x [2, 4)`

In the python api, if we only specify one list of windows, we'll make pairwise comparisons between all windows specified.

What follows is an example implementation of windows in this style. If we think it's a good direction, I can add it to the two-locus C code.

In [46]:
import numpy as np

In [85]:
def print_windows(left, right):
    print('left: ', [(left[i], left[i + 1]) for i in range(len(left) - 1)])
    print('right: ', [(right[i], right[i + 1]) for i in range(len(right) - 1)])

def check_windows(windows, num_windows, ts_length, print_=True):
    left_len, right_len = num_windows
    left_windows = windows[:left_len]
    right_windows = windows[left_len:]

    if print_:
        print_windows(left_windows, right_windows)

    for i in range(left_len - 1):
        if (left_windows[i] >= left_windows[i + 1]):
            raise ValueError(f'Bad window (left) [{left_windows[i]}, {left_windows[i + 1]})')

    for i in range(right_len - 1):
        if (right_windows[i] >= right_windows[i + 1]):
            raise ValueError(f'Bad window (right) {right_windows[i]}, {right_windows[i + 1]}')

    if left_windows[left_len - 1] > ts_length:
        raise ValueError('Left windows out of bounds')

    if right_windows[right_len - 1] > ts_length:
        raise ValueError('Right windows out of bounds')

In [86]:
ts_length = 10
windows = [1, 6, 3, 1, 2, 4]
check_windows(windows, (3, 3), ts_length)

left:  [(1, 6), (6, 3)]
right:  [(1, 2), (2, 4)]


ValueError: Bad window (left) [6, 3)

In [87]:
ts_length = 10
windows = [1, 3, 6, 1, 4, 2]
check_windows(windows, (3, 3), ts_length)

left:  [(1, 3), (3, 6)]
right:  [(1, 4), (4, 2)]


ValueError: Bad window (right) 4, 2

In [88]:
ts_length = 10
windows = [1, 3, 6, 1, 4]
check_windows(windows, (3, 2), ts_length)

left:  [(1, 3), (3, 6)]
right:  [(1, 4)]


In [89]:
ts_length = 10
windows = [1, 3, 6, 1, 2, 4]
check_windows(windows, (3, 3), 10)

left:  [(1, 3), (3, 6)]
right:  [(1, 2), (2, 4)]


### Application

Now, let's demonstrate how this window format will be used when making comparisons between pairs of sites.

We will seek to the first window and would process until we hit the end of the final window.

In [90]:
sites = [0, .2, .5, 1, 1.5, 1.8, 2, 2.3, 2.8, 3, 3.3, 3.5, 4, 4.6, 4.8, 5, 6, 8, 9, 9.9]

In [152]:
def get_site_ranges(windows, num_windows, sites):
    ranges = np.zeros(num_windows, np.uint64)
    win = 0
    s = 0
    while True:
        start = windows[win]
        stop = windows[win + 1]
        # seek to start
        while sites[s] < start:
            s += 1
        ranges[win] = s
        # seek within range
        while sites[s + 1] < stop:  # TODO: bounds checking?
            s += 1
        ranges[win + 1] = s
        win += 1
        if win == num_windows - 1:
            break
    return ranges

In [167]:
def compare_sites(windows, num_windows, sites, num_sites):
    result = np.zeros((num_sites, num_sites), np.uint64)
    left_len, right_len = num_windows
    left_windows = windows[:left_len]
    right_windows = windows[left_len:]

    left_range = get_site_ranges(left_windows, left_len, sites)
    right_range = get_site_ranges(right_windows, right_len, sites)
    
    print('left_range:', left_range)
    print('right_range:', right_range)

    for w_l in range(left_len - 1):
        for w_r in range(right_len - 1):
            for site_l_idx in range(left_range[w_l], left_range[w_l + 1]):
                for site_r_idx in range(right_range[w_r], right_range[w_r + 1]):
                    result[site_l_idx, site_r_idx] += 1
                    print(w_l, w_r, site_l_idx, site_r_idx, sites[site_l_idx], sites[site_r_idx])
    return result

In [168]:
ts_length = 10
windows = [1, 3, 6, 1, 2, 4]
check_windows(windows, (3, 3), 10)
compare_sites(windows, (3, 3), sites, len(sites))

left:  [(1, 3), (3, 6)]
right:  [(1, 2), (2, 4)]
left_range: [ 3  9 15]
right_range: [ 3  6 11]
0 0 3 3 1 1
0 0 3 4 1 1.5
0 0 3 5 1 1.8
0 0 4 3 1.5 1
0 0 4 4 1.5 1.5
0 0 4 5 1.5 1.8
0 0 5 3 1.8 1
0 0 5 4 1.8 1.5
0 0 5 5 1.8 1.8
0 0 6 3 2 1
0 0 6 4 2 1.5
0 0 6 5 2 1.8
0 0 7 3 2.3 1
0 0 7 4 2.3 1.5
0 0 7 5 2.3 1.8
0 0 8 3 2.8 1
0 0 8 4 2.8 1.5
0 0 8 5 2.8 1.8
0 1 3 6 1 2
0 1 3 7 1 2.3
0 1 3 8 1 2.8
0 1 3 9 1 3
0 1 3 10 1 3.3
0 1 4 6 1.5 2
0 1 4 7 1.5 2.3
0 1 4 8 1.5 2.8
0 1 4 9 1.5 3
0 1 4 10 1.5 3.3
0 1 5 6 1.8 2
0 1 5 7 1.8 2.3
0 1 5 8 1.8 2.8
0 1 5 9 1.8 3
0 1 5 10 1.8 3.3
0 1 6 6 2 2
0 1 6 7 2 2.3
0 1 6 8 2 2.8
0 1 6 9 2 3
0 1 6 10 2 3.3
0 1 7 6 2.3 2
0 1 7 7 2.3 2.3
0 1 7 8 2.3 2.8
0 1 7 9 2.3 3
0 1 7 10 2.3 3.3
0 1 8 6 2.8 2
0 1 8 7 2.8 2.3
0 1 8 8 2.8 2.8
0 1 8 9 2.8 3
0 1 8 10 2.8 3.3
1 0 9 3 3 1
1 0 9 4 3 1.5
1 0 9 5 3 1.8
1 0 10 3 3.3 1
1 0 10 4 3.3 1.5
1 0 10 5 3.3 1.8
1 0 11 3 3.5 1
1 0 11 4 3.5 1.5
1 0 11 5 3.5 1.8
1 0 12 3 4 1
1 0 12 4 4 1.5
1 0 12 5 4 1.8
1 0 13 3 4.6 1
1 

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 1, 1, 1, 1, 1,