# Day 1, Practical 2

</br>
<font size="12">ROHs from data</font>


For this exercises we will use data from the Blue Wildebeest. To simplify the the analysis we have included only one of the Brindle populations ( B-Etosha). There are five subspecies and but we have included all 3 populations from the east white bearded subpopulation.

<img src="https://raw.githubusercontent.com/popgenDK/popgenDK.github.io/gh-pages/images/slider/wildeBeastMap.png" alt="image info" />

In this exercise we will cover:
 - Using plink to estimate runs of homozygosity
 - Plotting these estimated runs from select individuals for a closer look
 - Plotting the runs in a summarized manner for many individuals
    
    
Tools used: plink, R

The notebooks are editable, so feel free to experiment and change the code to see what happens, or to write notes in the text cells. Just remember to download the notebooks (e.g. both the originals and any edited versions you may make) to your own computer at some point so you can access them later.

First, we define the paths for the files we need during the exercise.

In [None]:
# Set path to data 
PL=/davidData/data/course/kenyaWorkshop/anders/structure_day3/blue_wildebeest_thin

# make sure the required programs are installed
which plink

# make directory for the exercise
mkdir -p ~/kenya2024/Inbreeding_ROH
cd ~/kenya2024/Inbreeding_ROH

# download plotting script
wget https://raw.githubusercontent.com/popgenDK/ROH/main/plotPlinkROH.R

We will use Plink v1.9 (https://www.cog-genomics.org/plink/) for estimating runs of homozygosity

## Input
Now lets have a look into the files we will be using as input:

### Fam file

In [None]:
echo -- number of lines in fam file --
wc -l $PL.fam

echo -- first 10 lines fam file --
head $PL.fam

echo -- counts of populations/subspecies from first column of fam file --
cut -f1 -d" " $PL.fam | sort | uniq -c

### Bim file 

In [None]:
echo -- number of lines in bim file --
wc -l $PL.bim

echo -e "\n-- first 10 lines bim file --"
echo -e "CHR\tvariantID CM\tPosition allele_1\t allele_2"
head $PL.bim

echo -e "\n-- counts number of variants per chromosome from the first column of bim file --"
echo \#Var Chromosome_name
cat $PL.bim | cut -f1  | uniq -c

## Inferring ROH's

In [None]:

plink --bfile $PL --chr-set 29  --make-bed --homozyg --homozyg-window-het 3 --homozyg-window-missing 20 --out wildebeest_ROH --geno 0.01 --maf 0.1

## Understanding the settings and plotting ROH results for one individual
Notice that we use many options in the plink command. choose two (or all if you have time) and try to find out what the do.
Replace "..." with the option you want to know about, for example "plink --help geno"

In [None]:
plink --help ...

The ouput of the analysis is given in three different files, which look like this inside:

In [None]:
# this one contains the information about inference of ROH's for all individuals 
head wildebeest_ROH.hom

In [None]:
# this one contains summarised information per site/variant
head wildebeest_ROH.hom.summary

In [None]:
# this one contains summarised information per individual
head wildebeest_ROH.hom.indiv

## Plotting
Plot the estimated ROH for individual CTauKeS__701 from East Amboseli

In [None]:
Rscript plotPlinkROH.R -p wildebeest_ROH.bed -s CTauKeS__701 --homfile wildebeest_ROH.hom

This will generate a plot (as a .png file) in the same directory where your notebook is placed. You can open it as instructed below.

![ROH
plot](https://github.com/popgenDK/courses/blob/main/kenya2024/exercises/day2/openROH.png?raw=true)

Here is a graphical explanation of what you see in the plot.

![ROH
plot](https://github.com/popgenDK/courses/blob/main/kenya2024/exercises/day2/ROHplot.png?raw=true)


 - Does it look like plink has identified all ROHs in this individual?
 - Do you see any ROHs longer than 5 mb?

## Plotting ROH for other individuals
Now try to copy the code above and change the name given with "-s" to plot the estimated ROH for individual CTauKeS__709 from East Narobi 

In [None]:
...

And then for individual CTauKeW__638 from West Serengeti

In [None]:
...

 - When you compare these three individuals, what do you see? Which ones have more and which have fewer and shorter ROHs?

Now, we will plot the average ROH proportion in various populations. This proportion is also called FROH, and is a good estimator of the inbreeding coefficient of each individual.

In [None]:
library('ggplot2')
total_ROH <- read.table('~/kenya2024/Inbreeding_ROH/wildebeest_ROH.hom.indiv', header=T)
autosomal_genome_size = 2000000

ggplot(total_ROH, aes(x=FID, y=KB/autosomal_genome_size)) + 
  geom_boxplot()+ylab('FROH')+
theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
                     panel.grid.minor = element_blank(), axis.line = element_line(colour = "black", linewidth = 1))


 - What does an FROH of 0.15 mean?

Next, we will visualize the distribution of ROHs of different lengths in each individual. We will get two plots, one showing the number of ROHs of different sizes, the other how large a proportion of the genome is contained in ROHs of different sizes. Script by Anders Albrechtsen.

In [None]:
options(repr.plot.width=17)

hom <- read.table("~/kenya2024/Inbreeding_ROH/wildebeest_ROH.hom",head=T)
hom <- subset(hom,KB>100)
tab <- tapply(hom$KB/1e3,hom$IID,function(x) table(cut(x,c(1e6,2e6,5e6,1e7,1e8)/1e6)))

res <- do.call(rbind,tab)
barplot(t(res),col=1:7+1,las=2,ylab="Number of ROHs")
legend("topright",fill=1:length(tab[[1]])+1,legend=names(tab[[1]]),hor=T,title="Size of ROH")



tab2 <- tapply(hom$KB/1e3,hom$IID,function(x) tapply(x,cut(x,c(1e6,2e6,5e6,1e7,1e8)/1e6),sum))

res2 <- do.call(rbind,tab2)
barplot(t(res2)/2e3,col=1:7+1,las=2,ylab="fraction of Genome")
legend("top",fill=1:length(tab2[[1]])+1,legend=names(tab2[[1]]),hor=T,title="Size of ROH i MB")

## Interpretation of wildebeest ROHs 
 - Which population needs more protection based on inbreeding values?
 - Which population has lower inbreeding and why?
 - How inbred are wildebeest compared to other Alcelaphines like Hirola or Hartebeest or Topi or Blesbocks?