# Population genetics summer course, Denmark

# Dating Admixture and Selection in Admixed Individuals

For the first part of this practical, we will be applying the statistical software ALDER, MALDER, fastGLOBETROTTER and MOSAIC to simulated individuals in order to detect and date admixture events.

Here we will use the same dataset as in the “Clustering Algorithms” practical, though now using chromosomes 20, 21 and 22. Again this consists populations:

| Population | Country | Region | number of individuals |
|:--- |:--- |:--- |:---:|
| Balochi | Pakistan | Central South Asia | 21 |
| BantuKenya | Kenya | Africa | 11 |
| BantuSouthAfrica | South Africa | Africa | 8 |
| Burusho | Pakistan | Central South Asia | 25 |
| English | Britain | Europe | 6 |
| HanNchina | China | East Asia | 10 |
| Kalash | Pakistan | Central South Asia | 23 |
| Makrani | Pakistan | Central South Asia | 22 |
| Mandenka | Senegal | Africa | 22 |
| MbutiPygmy | Congo | Africa | 13 |
| Mongola | Mongolia | East Asia | 10 |
| NorthItalian | Italy | Europe | 12 |
| Orcadian | Britain | Europe | 15 |
| Pathan | Pakistan | Central South Asia | 22 |
| Sardinian | Italy | Europe | 28 |
| Tuscan | Italy | Europe | 8 |
|  |  |  |  |
| Total |  |  | 256 |

The aim here is to see how well ALDER, MALDER, fastGLOBETROTTER and MOSAIC can reconstruct an admixture event in the simulated “population” described in the “Clustering Algorithms” practical. This simulated group consists of 20 individuals descending from an admixture event occurring 30 generations ago, where 80% of the DNA was contributed from present-day Brahui individuals (from Pakistan, Central South Asia) and the remaining 20% from present-day Yoruba individuals (from Nigeria, Africa). To identify this admixture event, we will use the 16 populations above (or a subset of these populations) as surrogates to the admixing sources.

# 1 Inferring admixture: ALDER/MALDER

Navigate to the folder `AlderMalderFiles/`. First, we will run ALDER to detect admixture in the simulated population.

Unzip and extract ALDER:

In [None]:
tar -xzf AdmixtureSelectionPractical.tar.gz; cd AlderMalderFiles; tar -xzvf alder_v1.03.tar.gz

Then compile:

In [None]:
cd alder; make; cd ..

Then run on the Brahui/Yoruba simulation:

In [None]:
alder/./alder -p BrahuiYorubaSimulation.alder.par > BrahuiYorubaSimulation.alder.out

The results of the above run will be in `BrahuiYorubaSimulation.alder.out`.

Also, run MALDER on the Brahui/Yoruba simulation. To do so, first unzip and extract
MALDER:

In [None]:
unzip malder-master.zip

Then compile

In [None]:
cd malder-master/MALDER; make; cd ../..

Then run on the Brahui/Yoruba simulation:

In [None]:
malder-master/MALDER/./malder -p BrahuiYorubaSimulation.malder.par > BrahuiYorubaSimulation.malder.out

Once finished, answer the following questions:
1. Does ALDER detect admixture in this simulation? If so, what is the inferred date?
2. What does the evidence for admixture look like here?
3. When running MALDER, does the inferred admixture change when using different combinations of the surrogate populations?

# 2 Inferring admixture: fastGLOBETROTTER

Navigate to the folder GlobetrotterFiles/. As mentioned in the lecture, running GLOBETROTTER or fastGLOBETROTTER requires three steps:
1. use CHROMOPAINTER to paint surrogate populations against each other
2. use CHROMOPAINTER to paint target (admixed) populations against surrogates 3. run GLOBETROTTER or fastGLOBTROTTER using combined results from (1)-(2)

For steps (1)-(2), we will use ChromoPainterv2. Unzip and compile ChromoPainterv2:

In [None]:
cd ../GlobetrotterFiles
tar -xzvf ChromoPainterv2.tar.gz
gcc -Wall -o ChromoPainterv2 ChromoPainterv2.c -lm -lz

We have already done step (1) in the last practical. For step (2), we have also done this in the last practical, but note below I have highlighted how we use `-s 10` here to output painting samples:

In [None]:
./ChromoPainterv2 -g data/BrahuiYorubaSimulationChrom22.haplotypes \
-r data/BrahuiYorubaSimulationChrom22.recomrates \
-t example/BrahuiYorubaSimulation.idfile.txt \
-f BrahuiYorubaSimulation.poplistReduced.txt 0 0 \
-o example/BrahuiYorubaSimulationAdmixtureChrom22 -s 10

Repeat the above ChromoPainterv2 command for chromosomes 20 and 21. As mentioned in the lecture, there are two output files of interest for this analysis: `example/BrahuiYorubaSimulationAdmixtureChrom22.chunklengths.out`
and `example/BrahuiYorubaSimulationAdmixtureChrom22.samples.out`.

(In real applications, we want to sum the `.chunklengths.out` files across chromosomes, and then combine the output from steps (1) and (2). For simplicity here, we will use the combined matrix we made in the previous practical, which is only for chromosome 22, in `data/BrahuiYorubaSimulationAllVersusAllChrom22.chunklengths.out`.)

Finally, for step (3) we’ll run fastGLOBETROTTER to infer admixture, using this output from ChromoPainterv2. Unzip and extract fastGLOBETROTTER:

In [None]:
./ChromoPainterv2 -g data/BrahuiYorubaSimulationChrom20.haplotypes \
-r data/BrahuiYorubaSimulationChrom20.recomrates \
-t example/BrahuiYorubaSimulation.idfile.txt \
-f BrahuiYorubaSimulation.poplistReduced.txt 0 0 \
-o example/BrahuiYorubaSimulationAdmixtureChrom20 -s 10 

In [None]:
./ChromoPainterv2 -g data/BrahuiYorubaSimulationChrom21.haplotypes \
-r data/BrahuiYorubaSimulationChrom21.recomrates \
-t example/BrahuiYorubaSimulation.idfile.txt \
-f BrahuiYorubaSimulation.poplistReduced.txt 0 0 \
-o example/BrahuiYorubaSimulationAdmixtureChrom21 -s 10

In [None]:
tar -xzvf fastGLOBETROTTER.tar.gz

Next compile with:

In [None]:
R CMD SHLIB -o fastGLOBETROTTERCompanion.so fastGLOBETROTTERCompanion.c -lz

To run fastGLOBETROTTER for the Brahui-Yoruba simulation, type:

In [None]:
R < fastGLOBETROTTER.R BrahuiYorubaSimulationAdmixture.paramfile.txt BrahuiYorubaSimulationAdmixture.samplesfile.txt BrahuiYorubaSimulationAdmixture.recomfile.txt 1 --no-save > output.out

It will take a few minutes to complete. You can follow progress by typing:

In [None]:
pic output.out

Once finished, the following output files will be produced, each in the `example/` directory:

`example/BrahuiYorubaSimulationAdmixed.fastGT.main.txt` 
`example/BrahuiYorubaSimulationAdmixed.fastGT.main.pdf` 
`example/BrahuiYorubaSimulationAdmixed.fastGT.main_curves.txt` `example/BrahuiYorubaSimulationAdmixed.fastGT.boot.txt`

Using these files, answer the following questions:
1. From the fastGLOBETROTTER user manual, what do the different measures in BrahuiYorubaSimulationAdmixed.fastGT.main.txt tell you? In particular what is fastGLOBTROTTER’s conclusion about admixture in this application? And what are the inferred sources and dates of the admixture event?
2. How do you interpret the coancestry curves in BrahuiYorubaSimulationAdmixed.fastGT.main.pdf? Do the results from BrahuiYorubaSimulationAdmixed.fastGT.main.txt make sense in light of these coancestry curves?
3. How confident are the date estimates?

# 3 Inferring admixture: MOSAIC

Navigate to the folder `MosaicFiles/`. Then unzip and extract MOSAIC:

In [None]:
cd ../MosaicFiles/
tar -xzvf mosaic-master.tar.gz

Then run on the Brahui/Yoruba simulation:

In [None]:
Rscript mosaic-master/mosaic.R -c 20:22 -p "Balochi BantuKenya BantuSouthAfrica
Burusho English HanNchina Kalash Makrani Mandenka MbutiPygmy Mongola NorthItalian
Orcadian Pathan Sardinian Tuscan" BrahuiYorubaSimulation -a 2 data/

It will take a few minutes to complete. The results of the above run will be in three folders: `MOSAIC_RESULTS`, `MOSAIC_PLOTS`, `FREQS`).
Looking at the plots in `MOSAIC_PLOTS/`, answer the following questions:
1. What are the conclusions of admixture here, i.e. the inferred date and sources?
2. Does it seem as if the algorithm has converged? 
3. What does the local painting look like?

# 4 Inferring selection in admixed inds: ADAPTMIX

For this last section, we will simulate and test for selection in admixed populations with AdaptMix, using example data provided with the program. This data is comprised of a small subset of data from 1000 Genomes populations. In particular we will test for selection in a simulated admixed Peruvian population (PEL), using admixture surrogates from China (CHB), Nigeria (YRI) and Spain (IBS).

Navigate to the folder `AdaptMixFiles/`. Then unzip and extract AdaptMix and AdaptMixSimulator:

In [None]:
cd ../AdaptMixFiles
tar -xzvf AdaptMixv1.tar.gz
tar -xzvf AdaptMixSimulator.tar.gz

First we will use `AdaptMixSimulator.R`, running it with `CHB_selection_paramfile.txt` and the example data in `simexample/`, to generate a simulated “`PEL`” population that has selection and is admixed from simulated sources related to `{CHB, YRI, IBS}`: (How related the sources are depends on “drift.btwn.surrogates.and.sources” in
CHB selection paramfile.txt, with higher values of this making the simulated sources more different from `{CHB, YRI, IBS}`.)

In [None]:
R < AdaptMixSimulator.R CHB_selection_paramfile.txt simexample/PEL_REFs_ALLCHR_chr.txt \
simexample/PEL_REFs.ids.txt CHB_selection_ALLCHR --no-save > screenoutput.out

This will simulate input data to be read into `run_AdaptMix.R` that consists of the real data for the surrogate populations, added atop a simulated `PEL` population. While nearly all SNPs are neutral, one randomly selected SNP – with starting frequency ≥0.05 and ≤0.1 (`range.startfrequency.selected.snp:0.05 0.1`) in the population undergoing selection (`CHB`) – will have strong selection (`sel.coeff: 0.1` per generation, for 150 generations) occurring prior to admixture. This selected SNP will be the last SNP in the output file `CHB_selection_ALLCHR.haps`.

Next run run AdaptMix on this simulated dataset, testing for selection in the simulated PEL population:

In [None]:
R < run_AdaptMix.R example/PEL_analysis_paramfile.txt CHB_selection_ALLCHR.txt \
CHB_selection_ALLCHR.idfile.txt CHB_selection_ALLCHR.adaptmix.txt --no-save > screenoutput.out2

The output will be in `CHB_selection_ALLCHR.adaptmix.txt`, with scores for the selected SNP in the last row of this file. The header is in the third row, with columns giving the p-value of the selection test (column 3) and other information, such as AIC scores.

Repeat this for another simulation described in `PEL_selection_paramfile.txt`, which instead simulates selection post-admixture, with selection strength s = 0.15 for 50 generations:

In [None]:
R < AdaptMixSimulator.R PEL_selection_paramfile.txt simexample/PEL_REFs_ALLCHR_chr.txt \
simexample/PEL_REFs.ids.txt PEL_selection_ALLCHR --no-save > screenoutput.out

R < run_AdaptMix.R example/PEL_analysis_paramfile.txt PEL_selection_ALLCHR.txt \
CHB_selection_ALLCHR.idfile.txt PEL_selection_ALLCHR.adaptmix.txt --no-save > screenoutput.out2

The `AdaptMix` output for this run will be in `PEL_selection_ALLCHR.adaptmix.txt`. 

Use the two `AdaptMix` output files for these two simulations to answer the following questions.
1. For each simulation scenario, is there evidence of selection at the SNP with simulated selection?
2. For the SNP with simulated selection in each scenario, do the results indicate selection post-admixture, or in a particular source population pre-admixture?
3. Looking at the bottom of screenoutput.out, how well do the correlations between the simulated allele frequencies of the sources and their respective surrogate populations match that observed in the real data? How would you adjust drift.btwn.surrogates.and.sources in the input parameter files to make a better match?

In [None]:
pic screenoutput.out