# Week 6 wednesday afternoon exercise on LAI with hapla
## First up we are just gonna do a quick check to see if hapla works. The code in the exercise will be a mix of (poorly written) python, R and shell scripts, so keep an eye on the dropdown menu in the top right corner of the code cells if you get lost in the syntax. There will be questions throughout the exercise marked with bulletpoints, please discuss these with eachother and ask questions if you have any.

In [None]:
which hapla
hapla -h

## For this exercise we will be simulating some data that you can analyse and you will be running this simulation yourself with the following code (should take ~2.5min). Here we are simulating 100 megabases which is like a medium sized human chromosome.
## You don't neccesarily need to understand the details of the code, just that it gives is some phased bcf files for samples simulated under a specific demographic model along with some files containing the true ancestries from the simulation so that we can check how well our inference went.

In [None]:
import numpy as np
import time
import subprocess
import pandas as pd

import demes
import msprime
import tspop


t0 = time.time()

sequence_length = 1e8
seed = 1337
rho = 1.3e-8 ## Recombination rate
mu = 1.25e-8  ## Mutation rate

## Make the Demography object.
graph = demes.load("/davidData/users/thomas/hmmadvbinf/aa.yaml")
demography = msprime.Demography.from_demes(graph)
demography.add_census(time=13)
demography.sort_events()


## Simulate.
ts = msprime.sim_ancestry(
    samples={"AFR": 100, "EUR": 100, "EAS": 100 ,"ADMIX" : 100},
    demography=demography,
    random_seed=seed,
    sequence_length=sequence_length,
    recombination_rate=rho
)

## Simulate mutations on the tree sequence.
mutated_ts = msprime.sim_mutations(ts, rate=mu, random_seed=seed)

## Define the output VCF and BCF file paths.
vcf_file_path = "simAA.vcf"
bcf_file_path = "simAA_maf.bcf"

## Save the mutated tree sequence to a VCF file.
with open(vcf_file_path, "w") as vcf_file:
    mutated_ts.write_vcf(vcf_file, individual_names=[f"{pop}_{i}" for pop in ["AFR", "EUR", "EAS", "ADMIX"] for i in range(1, 101)])

## Convert VCF to BCF and filter on minor allele freq
subprocess.run(["bcftools", "+fill-tags", vcf_file_path, "-O", "b", "-o", "simAA.bcf", "--", "-t", "MAF"])

subprocess.run(["bcftools", "view", "simAA.bcf", "-O", "b", "-o", bcf_file_path, "-i", "INFO/MAF > 0.05"])

subprocess.run(["rm", vcf_file_path])

subprocess.run(["bcftools", "index", "simAA.bcf"])
subprocess.run(["bcftools", "index", bcf_file_path])


pa = tspop.get_pop_ancestry(ts, census_time=13)
## print(pa)

st = pa.squashed_table
## print(st)

## Remap numbering of populations to order of sampled pops
remap = {2: 0, 4: 1, 5: 2}
st = st.replace({"population": remap})

st_path = "simAA_true_anc.csv"
st.to_csv(st_path, sep="\t", index=False)

st["tract_len"] = st["right"] - st["left"]

st['sample'] = st['sample'] // 2 ## switch to individual numbering, instead of haplotype
st_agg = st[['sample', 'population', 'tract_len']].groupby(['sample', 'population'], as_index=False).sum()
st_pivoted = st_agg.pivot(index='sample', columns='population', values='tract_len').fillna(0)
Q = st_pivoted.div(st_pivoted.sum(axis=1), axis=0).to_numpy()
np.savetxt("simAA_trueQ.csv", Q, delimiter="\t", fmt='%.4f')


t1 = time.time()
print(f"Seconds elapsed: {t1-t0}")

## Lets just have a quick look at the data we generated for good measure:

In [None]:
bcftools view simAA.bcf | head -n20
bcftools view simAA.bcf -h | wc -l

* ## Do you recognize the format?
* ## What do the different fields in the header line mean?
* ## Do you remember what MAF is short for?
* ## Can you see how many individuas we are working with, either from the cell output above or the simulation script?
* ## How many sites?

## Simulating our dataset here allows us to know the underlying  true ancestry to compare with,  that  we  would  never be  able  to  know completely with real data. The scenario from which our samples have been generated is a commonly used one which tries to approximate part of human demography  for modern American admixture between source populations from Africa, Europe and East Asia (Browning et al. 2018), it can also be visualized with the following snippet:

In [None]:
import msprime
import demes
import demesdraw
import matplotlib.pyplot as plt


## Make the Demography object.
graph = demes.load("/davidData/users/thomas/hmmadvbinf/aa.yaml")

fig, ax = plt.subplots(figsize=(10, 8))
demesdraw.tubes(graph, ax=ax)
ax.set_yscale('symlog')
ax.set_xlabel("Population Size")
ax.set_ylabel("Time (log scale)")
plt.show()

## And here we see first an ancient human population (also in Africa) followed by anatomically modern humans and then a split out of Africa with a small population size (bottleneck) and then a subsequent split into European and East Asian populations, with continous ongoing migration between the source populations and an increase in population sizes through the generations(the Y-scale here is measured in generations). Lastly  we have an admixture event in recent history forming a modern american population and it is the samples from this population that we will try to infer ancestry in.
## The .yaml file which we load the demographic scenario in from looks like this:

In [None]:
cat /davidData/users/thomas/hmmadvbinf/aa.yaml

* ## From the above, can you tell how much ancestry on average we would expect in a randomly sampled individual in the admixed population? (look for "proportions:")
## We will be working with just a single simulated contigous chromosome, but of course in the real world we run this for all available chromosomes. Now the first thing we need to do is to run the clustering module which will turn our perfectly phased bcf file into cluster assignments for each individual over windows with size 8:

In [None]:
hapla cluster --bcf simAA.bcf --size 8 --threads 8 --out simAA_wdsize8

## Hopefully it runs for you now and if it does you may see that we get a lot of warnings. The reason for these warnings is that a lot of our windows do not contain enough variation to generate more than a single cluster. We could simply lower the lambda value to get more clusters, but this might lead us to meaningful variation where there is none. The proper way to fix this problem is to filter our data on MAF so that we remove any sites where one allele is only observed a couple of times, since these will not be very informative about systematic trends in drift of alleles on a population level. Luckily we already made a bcf file earlier with a MAF < 0.05 filter which we can use:

In [None]:
hapla cluster --bcf simAA_maf.bcf --size 8 --threads 8 --out simAA_maf_wdsize8

## The next step will be to use these cluster assignments to infer admixture proportions (Q) and source population cluster frequencies (P, formerly F) for our data. But  first  lets have a look at the output we have generated so far. The .win contains information about each window, the .ids file list the order of appearance of samples by their ID/name and the .bca file contains the bulk of our data which is our cluster assignments in binary format:

In [None]:
ll -h

In [None]:
head simAA_maf_wdsize8.win

* ## Can you make sense of these fields?
* ## Why do the Lengths of each window differ?
## K is the number of clusters found for each window, lets have a look at our distribution of Ks:

In [None]:
tail -n+2 simAA_maf_wdsize8.win | cut -f6 | sort -n -r | uniq -c

* ## What is the most common number of clusters across all windows?
## 24 clusters might sound like a lot, but with 8 SNPs in a window we will have 2<sup>8</sup> = 256 different possible haplotypes which puts this number into perspective.


## Next step is to run admixture inference on our clustered haplotypes, the order in which Ks are assigned is arbitrary and so Ive set the seed to 6 because this makes our results match the order saved for the true ancestries. This makes the plotting a little smoother. Note here that usually we would also need to run the admixture part multiple times to ensure convergence, but we will skip this part here, dont tell Anders.

In [None]:
hapla admix --clusters simAA_maf_wdsize8 --K 3 --seed 6 --threads 8 --out simAA_maf_wdsize8

## And now lets have a look at the output, our new P and Q matrices:

In [None]:
head simAA_maf_wdsize8.K3.s6.P

In [None]:
wc -l simAA_maf_wdsize8.K3.s6.P

* ## Why do we have 3 columns in our P matrix?
* ## What is the number of rows and why do we have so many compared to the number of windows?
## If you cant figure out the previos question, try running this next line of code and try to figure out what it does. (the "| paste -sd+ | bc" part just sums up the numbers piped in)

In [None]:
cut -f6 simAA_maf_wdsize8.win | tail -n +2 | paste -sd+ | bc

## And then our inferred admixture proportions from Q to see if they match our expectations from the demographic model:

In [None]:
options(repr.plot.width = 26)

popinfo <- gsub("[^A-Z]", "", read.table("simAA_maf_wdsize8.ids", h=F)[, 1])
Q <- read.table("simAA_maf_wdsize8.K3.s6.Q", h=F)

N <- length(popinfo)
popNames <- unique(popinfo)
Npop <- length(popNames)
popIDint <- rep(1:Npop,table(popinfo)[popNames])
popSep <- c(0,cumsum(table(popIDint)))
meanPopX <- popSep[-1]-table(popIDint)/2

barplot(t(Q), col=c("#b783c9", "#3eb05d", "#826746"), space=0, border=NA, cex.axis=1.2,cex.lab=1.8,axisnames=FALSE,
             ylab=paste0("Admixture proportions\nfor K = ", ncol(Q)), xlab="", main="", cex.main=1.5,xpd=NA)
text(meanPopX, rep(-0.15,length(meanPopX)), unique(popinfo),xpd=NA,font=2,cex=2)
abline(v=popSep)
popinfo

* ## Do you see any immediate problems here?
## And since we simulated our data we can also cheat a little and check against the true admixture proportions:

In [None]:
popinfo <- gsub("[^A-Z]", "", read.table("simAA_maf_wdsize8.ids", h=F)[, 1])

Q <- read.table("simAA_trueQ.csv", h=F)[, c(2,1,3)]

N <- length(popinfo)
popNames <- unique(popinfo)
Npop <- length(popNames)
popIDint <- rep(1:Npop,table(popinfo)[popNames])
popSep <- c(0,cumsum(table(popIDint)))
meanPopX <- popSep[-1]-table(popIDint)/2

barplot(t(Q), col=c("#b783c9", "#3eb05d", "#826746"), space=0, border=NA, cex.axis=1.2,cex.lab=1.8,axisnames=FALSE,
             ylab=paste0("Admixture proportions\nfor K = ", ncol(Q)), xlab="", main="", cex.main=1.5,xpd=NA)
text(meanPopX, rep(-0.15,length(meanPopX)), unique(popinfo),xpd=NA,font=2,cex=2)
abline(v=popSep)

* ## What do you think of our inferred ancestry proportions? Can you spot any differences?

## Now we can start estimating local ancestry with the "fatash" module, where we use our observed cluster assignments and our Q and P matrices as input, and otherwise just default options:

In [None]:
hapla fatash --clusters simAA_maf_wdsize8 --qfile simAA_maf_wdsize8.K3.s6.Q --pfile simAA_maf_wdsize8.K3.s6.P --threads 8 --out simAA_maf_wdsize8

## Now we get an output in the form of a .path file, which looks like this inside:

In [None]:
ll

In [None]:
wc -l simAA_maf_wdsize8.path
head simAA_maf_wdsize8.path -n1

* ## What does the number of rows correspond to?
* ## What are we seeing as the output of the second command above?
## I have written up a little plotting script to make this a little easier to read, you dont have to understand everything in this (especially not the parts where i use "<<-", I dont even understand why I choose to do that)

In [None]:
popcols <- c("0" = "#b783c9", "1" = "#3eb05d", "2" = "#826746")

subset_hap <- function(haplo_index) {
    true_anc_sub <<- data.frame(pos = true_anc[true_anc["sample"] == haplo_index, ]$pos,
                               pop = true_anc[true_anc["sample"] == haplo_index, ]$population)
    true_anc_sub$len <<- c(diff(true_anc_sub$pos), seqlen - true_anc_sub[nrow(true_anc_sub), 1])

    hapla_sub <<- data.frame(pos = as.numeric(names(hapla)),
                            pop = unlist(hapla[haplo_index, ]))
    hapla_sub[1, 1] <<- 0
    hapla_sub <<- hapla_sub[c(1, 1 + which(diff(hapla_sub$pop) != 0)), ] ## only keep first instances of each pop in tract to remove redundancy and make plots look nicer
    hapla_sub$len <<- c(diff(hapla_sub$pos), seqlen - hapla_sub[nrow(hapla_sub), 1])

}


plot_hap <- function(haplo_index, remap) {
    subset_hap(haplo_index)
    ind_index <- (haplo_index + 1) %/% 2

    q <- read.table(list.files(pattern = "simAA_maf_wdsize8.K3.s6.Q")[1])
    admix_str <- "Admixture proportions inferred from hapla: "
    for (true_k in 0:(length(remap) - 1)) {
        inferred_k <- names(remap[remap == as.character(true_k)])

        admix_str <- paste0(
            admix_str,
            true_k, ": ",
            round(q[ind_index, as.numeric(inferred_k) + 1], 3)
        )
    }

    true_admix <- read.table("simAA_trueQ.csv", header = FALSE)
    true_admix_str <- "True admixture proportions: "
    for (i in 1:dim(q)[2]){
        true_admix_str <- paste0(true_admix_str, i - 1, ": ", round(true_admix[ind_index, i], 3))
    }
    
    
    plot(c(0, seqlen), c(0, 1), type = "n", main = paste("Haplotype no. ", haplo_index),
        ylab = "", yaxt = "n", xlab = "Position\n", cex.main = 3, cex.lab = 2, cex.sub = 3, bty="n")

    for (start in seq(1, nrow(hapla_sub))){
        segments(hapla_sub[start, 1], .3, hapla_sub[start + 1, 1], .3, col = popcols[as.character(hapla_sub[start, 2])], lend = 1, lwd = 64)
    }
    segments(hapla_sub[nrow(hapla_sub), 1], .3, seqlen, .3, col = popcols[as.character(hapla_sub[nrow(hapla_sub), 2])], lend = 1, lwd = 64)
    
    for (start in seq(1, nrow(true_anc_sub))){
        segments(true_anc_sub[start, 1], 0.1, true_anc_sub[start + 1, 1], 0.1, col = popcols[as.character(true_anc_sub[start, 2])], lend = 1, lwd = 64)
    }
    segments(true_anc_sub[nrow(true_anc_sub), 1], 0.1, seqlen, 0.1, col = popcols[as.character(true_anc_sub[nrow(true_anc_sub), 2])], lend = 1, lwd = 64)
    
    text(seqlen/2, -0.4, paste0(admix_str,  "\n", true_admix_str), xpd=T, cex=2, font=2, pos=2)
    text(-seqlen * 0.06, .3, "Inferred Ancestry",font=2, cex = 1.8, xpd=T)
    text(-seqlen * 0.05, .1, "True Ancestry",font=2, cex = 1.8, xpd=T)
    legend("top", legend = c("EUR", "AFR", "EAS"),
                   fill = popcols[1:length(unique(true_anc_sub$pop))],
                   cex = 2.5, bty="n")
}
###### read true path
true_anc <- read.table("simAA_true_anc.csv", header = TRUE)
true_anc["sample"] <- true_anc["sample"] + 1 ## change indexing from 0 to 1
names(true_anc)[2] <- "pos"

seqlen <- max(true_anc$right)
npop <- length(unique(true_anc$population))


###### read hapla path res
hapla <- read.table("simAA_maf_wdsize8.path", header = FALSE)
names(hapla) <- read.table("simAA_maf_wdsize8.win")[, 2]

## Order of Ks are inferred arbitrarily, so we have to adjust them to fit the true values
remap <- c("0"="1", "1"="0", "2"="2")
hapla[] <- lapply(hapla, function(x) {
    as.numeric(unname(remap[as.character(x)])) })


par(mar=c(12,12,4,4))
for (i in 601:602){
    plot_hap(i, remap)
}


* ## What do you make of this result?
* ## Does it make any mistakes compared to the true ancestry? Any common type of error?
## Try looking at some more individuals by adjusting the code in the the last little loop of the code cell (set it to eg ."i in 601:610").
* ## Why does the index start at 601?
## Next try setting it to 1:10 instead
* ## Which individuals are shown now?
* ## What about 201:210?

## Now lets try to change some of our parameters to see if we can get a better estimate. In the following code Ive set the window size to 32 instead and lowered the lambda parameter from 0.125 to 0.05 to get more clusters per window, lets see what difference that makes:

In [None]:
hapla cluster --bcf simAA_maf.bcf --size 32 --threads 8 --out simAA_maf_wdsize32 --lmbda 0.05
hapla admix --clusters simAA_maf_wdsize32 --K 3 --seed 2 --threads 8 --out simAA_maf_wdsize32
hapla fatash --clusters simAA_maf_wdsize32 --qfile simAA_maf_wdsize32.K3.s2.Q --pfile simAA_maf_wdsize32.K3.s2.P --threads 8 --out simAA_maf_wdsize32

In [None]:
popcols <- c("0" = "#b783c9", "1" = "#3eb05d", "2" = "#826746")

subset_hap <- function(haplo_index) {
    true_anc_sub <<- data.frame(pos = true_anc[true_anc["sample"] == haplo_index, ]$pos,
                               pop = true_anc[true_anc["sample"] == haplo_index, ]$population)
    true_anc_sub$len <<- c(diff(true_anc_sub$pos), seqlen - true_anc_sub[nrow(true_anc_sub), 1])

    hapla_sub <<- data.frame(pos = as.numeric(names(hapla)),
                            pop = unlist(hapla[haplo_index, ]))
    hapla_sub[1, 1] <<- 0
    hapla_sub <<- hapla_sub[c(1, 1 + which(diff(hapla_sub$pop) != 0)), ] ## only keep first instances of each pop in tract to remove redundancy and make plots look nicer
    hapla_sub$len <<- c(diff(hapla_sub$pos), seqlen - hapla_sub[nrow(hapla_sub), 1])

}


plot_hap <- function(haplo_index, remap) {
    subset_hap(haplo_index)
    ind_index <- (haplo_index + 1) %/% 2

    q <- read.table(list.files(pattern = "simAA_maf_wdsize32.K3.s2.Q")[1])
    admix_str <- "Admixture proportions inferred from hapla: "
    for (true_k in 0:(length(remap) - 1)) {
        inferred_k <- names(remap[remap == as.character(true_k)])

        admix_str <- paste0(
            admix_str,
            true_k, ": ",
            round(q[ind_index, as.numeric(inferred_k) + 1], 3)
        )
    }

    true_admix <- read.table("simAA_trueQ.csv", header = FALSE)
    true_admix_str <- "True admixture proportions: "
    for (i in 1:dim(q)[2]){
        true_admix_str <- paste0(true_admix_str, i - 1, ": ", round(true_admix[ind_index, i], 3))
    }
    
    
    plot(c(0, seqlen), c(0, 1), type = "n", main = paste("Haplotype no. ", haplo_index),
        ylab = "", yaxt = "n", xlab = "Position\n", cex.main = 3, cex.lab = 2, cex.sub = 3, bty="n")

    for (start in seq(1, nrow(hapla_sub))){
        segments(hapla_sub[start, 1], .3, hapla_sub[start + 1, 1], .3, col = popcols[as.character(hapla_sub[start, 2])], lend = 1, lwd = 64)
    }
    segments(hapla_sub[nrow(hapla_sub), 1], .3, seqlen, .3, col = popcols[as.character(hapla_sub[nrow(hapla_sub), 2])], lend = 1, lwd = 64)
    
    for (start in seq(1, nrow(true_anc_sub))){
        segments(true_anc_sub[start, 1], 0.1, true_anc_sub[start + 1, 1], 0.1, col = popcols[as.character(true_anc_sub[start, 2])], lend = 1, lwd = 64)
    }
    segments(true_anc_sub[nrow(true_anc_sub), 1], 0.1, seqlen, 0.1, col = popcols[as.character(true_anc_sub[nrow(true_anc_sub), 2])], lend = 1, lwd = 64)
    
    text(seqlen/2, -0.4, paste0(admix_str,  "\n", true_admix_str), xpd=T, cex=2, font=2, pos=2)
    text(-seqlen * 0.06, .3, "Inferred Ancestry",font=2, cex = 1.8, xpd=T)
    text(-seqlen * 0.05, .1, "True Ancestry",font=2, cex = 1.8, xpd=T)
    legend("top", legend = c("EUR", "AFR", "EAS"),
                   fill = popcols[1:length(unique(true_anc_sub$pop))],
                   cex = 2.5, bty="n")
}



###### read true path
true_anc <- read.table("simAA_true_anc.csv", header = TRUE)
true_anc["sample"] <- true_anc["sample"] + 1 ## change indexing from 0 to 1
names(true_anc)[2] <- "pos"

seqlen <- max(true_anc$right)
npop <- length(unique(true_anc$population))


###### read hapla path res
hapla <- read.table("simAA_maf_wdsize32.path", header = FALSE)
names(hapla) <- read.table("simAA_maf_wdsize32.win")[, 2]

## Order of Ks are inferred arbitrarily, so we have to adjust them to fit the true values
remap <- c("0"="1", "1"="2", "2"="0")
hapla[] <- lapply(hapla, function(x) {
    as.numeric(unname(remap[as.character(x)])) })


par(mar=c(12,12,4,4))
for (i in 601:610){
    plot_hap(i, remap)
}


* ## Did this help, if so, why do you think?
## If you have time left try messing around with the parameters and see what changes (Note that the ordering of Ks might change and mess up the colors, you can fix this by changing the "remap" named vector in the plotting code or try out a different seed for the admixture part)
## If you have yet more time left you can also try changing the number of samples in the simulation to 10 for each of the unadmixed populations to see what happens to our inference and try to adjust parameters to see if you can get a better result.

In [None]:
## cow stuff
hapla cluster --bcf /davidData/users/thomas/hmmadvbinf/cattle_hmmix_exercise/Chr25.314inds.imputated.BosTau9.bcf --size 256 --step 64 --threads 8 --out BosTau9
hapla admix --clusters BosTau9 --K 2 --seed 2 --threads 8 --out BosTau9
hapla fatash --clusters BosTau9 --qfile BosTau9.K2.s2.Q --pfile BosTau9.K2.s2.P --threads 8 --out BosTau9 --viterbi

In [None]:
cat /davidData/users/thomas/hmmadvbinf/cattle_hmmix_exercise/F1.pure.inds

In [None]:
options(repr.plot.width = 26, repr.plot.height = 10)

popinfo <- read.table("BosTau9.ids", h=F)[,1]
Q <- read.table("BosTau9.K2.s2.Q", h=F)

par(mar=c(12,14,4,4), oma=c(0,10,0,0))

barplot(t(Q), col=c("#83c996", "#ed7551"), space=0, border=NA, cex.axis=1.2,cex.lab=1.8,axisnames=FALSE,
             ylab=paste0("Admixture proportions\nfor K = ", ncol(Q)), xlab="", main="", cex.main=1.5,xpd=NA)

nlabrows <- 26
text(1:nrow(Q), rep((-1:-nlabrows)*0.03,nrow(Q)/nlabrows), popinfo, xpd=NA,font=2,cex=0.8, adj=.03)



In [None]:
options(repr.plot.width = 26, repr.plot.height = 4)
popcols <- c("0" = "#83c996", "1" = "#ed7551")

subset_hap <- function(haplo_index) {
    hapla_sub <<- data.frame(pos = as.numeric(names(hapla)),
                            pop = unlist(hapla[haplo_index, ]))
    hapla_sub[1, 1] <<- 0
    hapla_sub <<- hapla_sub[c(1, 1 + which(diff(hapla_sub$pop) != 0)), ] ## only keep first instances of each pop in tract to remove redundancy and make plots look nicer
    hapla_sub$len <<- c(diff(hapla_sub$pos), seqlen - hapla_sub[nrow(hapla_sub), 1])

}


plot_hap <- function(haplo_index) {
    subset_hap(haplo_index)
    plot(c(0, seqlen), c(0, .2), type = "n", main = paste("Haplotype no. ", haplo_index),
        ylab = "", yaxt = "n", xlab = "Position\n", cex.main = 3, cex.lab = 2, cex.sub = 3, bty="n")

    for (start in seq(1, nrow(hapla_sub))){
        segments(hapla_sub[start, 1], .1, hapla_sub[start + 1, 1], .1, col = popcols[as.character(hapla_sub[start, 2])], lend = 1, lwd = 64)
    }
    segments(hapla_sub[nrow(hapla_sub), 1], .1, seqlen, .1, col = popcols[as.character(hapla_sub[nrow(hapla_sub), 2])], lend = 1, lwd = 64)
    
    text(-seqlen * 0.06, 1, "Inferred Ancestry",font=2, cex = 1.8, xpd=T)
#     legend("top", legend = c("EUR", "AFR", "EAS"),
#                    fill = popcols[1:length(unique(true_anc_sub$pop))],
#                    cex = 2.5, bty="n")
}


###### read hapla path res
hapla <- read.table("BosTau9.path", header = FALSE)
names(hapla) <- read.table("BosTau9.win")[, 2]
seqlen <- max(as.numeric(names(hapla)))

popinfo <- read.table("BosTau9.ids", h=F)[,1]
F1 <- which(popinfo=="N_31B")
admixed <- 129:150

par(mar=c(12,12,4,4))
for (i in c(F1*2-1, F1*2, admixed)){
    plot_hap(i)
}
