# Characterizing CpG Methylation

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data.

1. Download coverage files
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of all sequenced data, methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type
4. Identify methylation islands for each species
5. Characterize genomic location of methylation islands

## 0. Set working directory

In [1]:
!pwd

/Users/yaaminivenkataraman/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses


In [3]:
!mkdir Characterizing-CpG-Methylation

In [4]:
cd Characterizing-CpG-Methylation/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


## 1. Download coverage files

In [None]:
#MASTER MC FILE

In [None]:
#MASTER PA FILE

## 2. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

### 2a. *M. capitata*

#### Methylated loci

In [None]:
!awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} > ${f}-Meth.bedgraph-MC

In [None]:
!head
!wc -l

#### Sparsely methylated loci

In [None]:
!awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' > ${f}-sparseMeth.bedgraph-MC

In [None]:
!head
!wc -l

#### Unmethylated loci

In [None]:
awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} > ${f}-unMeth.bedgraph-MC

In [None]:
!head
!wc -l

### 2b. *P. acuta*

#### Methylated loci

In [None]:
!awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} > ${f}-Meth.bedgraph-PA

In [None]:
!head
!wc -l

#### Sparsely methylated loci

In [None]:
!awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' > ${f}-sparseMeth.bedgraph-PA

In [None]:
!head
!wc -l

#### Unmethylated loci

In [None]:
!awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} > ${f}-unMeth.bedgraph-PA

In [None]:
!head
!wc -l

## 3. Characterize genomic locations of CpGs

### 3a. Create BEDfiles

In [None]:
!find *bedgraph-MC

In [None]:
%%bash

for f in *bedgraph-MC
do
    awk {print $1"\t"$2"\t"$3}' ${f} > ${f}.bed-MC
    wc -l ${f}.bed-MC
done

In [None]:
#Confirm file creation
!head .bed-MC

In [None]:
!find *bedgraph-PA

In [None]:
%%bash

for f in *bedgraph-PA
do
    awk {print $1"\t"$2"\t"$3}' ${f} > ${f}.bed-PA
    wc -l ${f}.bed-PA
done

In [None]:
#Confirm file creation
!head .bed-PA

### 3b. Set variable paths

In [None]:
bedtoolsDirectory = ""

In [None]:
mcPromoters = ""

In [None]:
mcExonUTR = ""

In [None]:
mcExons = ""

In [None]:
mcIntrons = ""

In [None]:
mcGenes = ""

In [None]:
mcTransElem = ""

In [None]:
mcIntergenic = ""

In [None]:
paPromoters = ""

In [None]:
paExonUTR = ""

In [None]:
paExons = ""

In [None]:
paIntrons = ""

In [None]:
paGenes = ""

In [None]:
paTransElem = ""

In [None]:
paIntergenic = ""

### 3c. Exons

#### *M. capitata*

In [None]:
!find *bed-MC

In [None]:
%%bash

for f in *.bed-MC
do
  {bedtoolsDirectory}intersectBed \
  -wb \
  -a ${f} \
  -b {mcExons} \
  > ${f}-mcExon.txt
done

In [None]:
%%bash

for f in *mcExon.txt
do
    wc -l
done

#### *P. acuta*

In [None]:
!find *bed-PA

In [None]:
%%bash

for f in *.bed-PA
do
  {bedtoolsDirectory}intersectBed \
  -wb \
  -a ${f} \
  -b {paExons} \
  > ${f}-paExon.txt
done

In [None]:
%%bash

for f in *paExon.txt
do
    wc -l
done

### 3d. Introns

### 3e. Genes

### Promoters

### Transposable Elements

### Intergenic

## 4. Identify methylation islands

To identify methylation islands using the method from Jeong et al. (2018), define:

- starting size of the methylation window: 500 bp
- minimum fraction of methylated CpGs required within the window to be accepted: 0.02
- step size to extend the accepted window as long as the mCpG fraction is met: 50 bp
- mCpG file: input with mCpG chromosome and bp position

### 4a. *M. capitata*

In [None]:
#Modify mCpG file by removing the third column that is not needed for methylation island analysis
!awk '{print $1"\t"$2}' .bed-MC > .bed-MC-Reduced

In [None]:
#Identify methylation islands using 0.02 mCpG fraction
! ./methyl_island_sliding_window.pl 500 0.02 50 .bed-MC-Reduced \
> MC-Methylation-Islands-500_0.02_50.tab

In [None]:
#Filter by MI length and print MI length in a new column
!awk '{if ($3-$2 >= 500) { print $1"\t"$2"\t"$3"\t"$4"\t"$3-$2}}' MC-Methylation-Islands-500_0.02_50.tab \
> MC-Methylation-Islands-500_0.02_50-filtered.tab
!head MC-Methylation-Islands-500_0.02_50-filtered.tab
! wc -l MC-Methylation-Islands-500_0.02_50-filtered.tab

In [None]:
#Count max mCpG in an island
#Count min mCpG in an island
!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \
MC-Methylation-Islands-500_0.02_50-filtered.tab
!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \
MC-Methylation-Islands-500_0.02_50-filtered.tab

In [None]:
#Create tab-delimited BEDfile without additional information
!awk '{print $1"\t"$2"\t"$3}' MC-Methylation-Islands-500_0.02_50-filtered.tab \
> MC-Methylation-Islands-500_0.02_50-filtered.tab

### 4b. *P. acuta*

In [None]:
#Modify mCpG file by removing the third column that is not needed for methylation island analysis
!awk '{print $1"\t"$2}' .bed-PA > .bed-PA-Reduced

In [None]:
#Identify methylation islands using 0.02 mCpG fraction (same as original paper)
! ./methyl_island_sliding_window.pl 500 0.02 50 .bed-PA-Reduced \
> PA-Methylation-Islands-500_0.02_50.tab

In [None]:
#Filter by MI length and print MI length in a new column
!awk '{if ($3-$2 >= 500) { print $1"\t"$2"\t"$3"\t"$4"\t"$3-$2}}' PA-Methylation-Islands-500_0.02_50.tab \
> PA-Methylation-Islands-500_0.02_50-filtered.tab
!head PA-Methylation-Islands-500_0.02_50-filtered.tab
! wc -l PA-Methylation-Islands-500_0.02_50-filtered.tab

In [None]:
#Count max mCpG in an island
#Count min mCpG in an island
!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \
PA-Methylation-Islands-500_0.02_50-filtered.tab
!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \
PA-Methylation-Islands-500_0.02_50-filtered.tab

## 5. Characterize genomic location of methylation islands

### 5a. Set variable paths

In [None]:
mcMethylationIslands = ""

In [None]:
paMethylationIslands = ""

### 5b. Genes

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {mcMethylationIslands} \
-b {mcGenes} \
> mcMethylationIslands-Genes.txt

In [None]:
!head mcMethylationIslands-Genes.txt
!wc -l mcMethylationIslands-Genes.txt

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {paMethylationIslands} \
-b {paGenes} \
> paMethylationIslands-Genes.txt

In [None]:
!head paMethylationIslands-Genes.txt
!wc paMethylationIslands-Genes.txt

### 5c. Intergenic

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {mcMethylationIslands} \
-b {mcIntergenic} \
> mcMethylationIslands-Intergenic.txt

In [None]:
!head mcMethylationIslands-Genes.txt
!wc -l mcMethylationIslands-Genes.txt

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {paMethylationIslands} \
-b {paIntergenic} \
> paMethylationIslands-Intergenic.txt

In [None]:
!head paMethylationIslands-Intergenic.txt
!wc paMethylationIslands-Intergenic.txt