# Characterizing CpG Methylation

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data.

1. Download coverage files
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of all sequenced data, methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type
4. Identify methylation islands for each species
5. Characterize genomic location of methylation islands

## 0. Set working directory

In [1]:
!pwd

/Users/yaaminivenkataraman/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation

In [3]:
cd Characterizing-CpG-Methylation/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


## 1. Download coverage files

### 1a. *M. capitata*

In [16]:
#Download Mcap WGBS and MBD-BS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*_10x.bedgraph" https://gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/

--2020-03-07 13:14:04--  https://gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  49.55K  --.-KB/s    in 0.006s  

2020-03-07 13:14:05 (7.89 MB/s) - ‘gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/index.html.tmp’ saved [50740]

Loading robots.txt; please ignore errors.
--2020-03-07 13:14:05--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-03-07 13:14:05 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/index.html.tmp since it should be rejected.

--2020-03-07 13:14:05--  https://gannet.fish.washing

In [17]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/* .

In [20]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [76]:
#Download Mcap RRBS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*_10x.bedgraph" https://gannet.fish.washington.edu/tmp/Mcap_full-u1M/nodedup/

--2020-03-07 14:36:50--  https://gannet.fish.washington.edu/tmp/Mcap_full-u1M/nodedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/tmp/Mcap_full-u1M/nodedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  23.48K  --.-KB/s    in 0.005s  

2020-03-07 14:36:50 (4.74 MB/s) - ‘gannet.fish.washington.edu/tmp/Mcap_full-u1M/nodedup/index.html.tmp’ saved [24044]

Loading robots.txt; please ignore errors.
--2020-03-07 14:36:50--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-03-07 14:36:50 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/tmp/Mcap_full-u1M/nodedup/index.html.tmp since it should be rejected.

--2020-03-07 14:36:50--  https://gannet.fish

In [77]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/tmp/Mcap_full-u1M/nodedup/* .

In [78]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [80]:
!find *bedgraph

Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph
Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph
Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph
Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph
Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph
Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph
Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph
Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph
Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph


In [81]:
%%bash

for f in *bedgraph
do
    cat ${f} > ${f}-Mcap
done

In [82]:
#Columns: chr, start, end, %meth
!head Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap

2	1779837	1779839	0.000000
2	1779853	1779855	0.000000
2	1779898	1779900	0.000000
2	1804233	1804235	0.000000
2	1804297	1804299	0.000000
2	1804303	1804305	4.000000
2	1804306	1804308	0.000000
2	1804322	1804324	0.000000
2	1804347	1804349	7.142857
2	1804359	1804361	0.000000


In [83]:
!wc -l *bedgraph-Mcap

    9099 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap
   11090 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap
    8966 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap
 1405400 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap
 1177284 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap
 1238727 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap
    7230 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap
    4776 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap
    3739 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap
 3866311 total


### 1b. *P. acuta*

## 2. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

### 2a. *M. capitata*

#### Methylated loci

In [86]:
%%bash
for f in *Mcap
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [87]:
!head *Mcap-Meth

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth <==
9 525025 525027 50.000000
16 1095928 1095930 50.000000
16 1095987 1095989 54.545455
16 1095993 1095995 66.666667
16 1937017 1937019 53.846154
26 1550319 1550321 61.538462
26 1550325 1550327 64.285714
26 1550327 1550329 53.846154
28 407411 407413 50.000000
28 407920 407922 50.000000

==> Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth <==
2 81840 81842 100.000000
2 81845 81847 100.000000
2 81865 81867 92.857143
2 81900 81902 91.304348
2 81997 81999 100.000000
2 82001 82003 100.000000
2 82009 82011 100.000000
2 82292 82294 70.000000
2 82298 82300 100.000000
2 2675474 2675476 74.074074

==> Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth <==
7 87179 87181 63.636364
7 2064280 2064282 100.000000
10 732017 732019 60.000000
10 732020 732022 100.000000
10 1190063 1190065 83.333333
10 1190118 1190120 95.454545
10 1190131 11

In [88]:
!wc -l *Mcap-Meth

     493 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     827 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     370 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
  170811 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-Meth
  153132 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-Meth
  159475 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-Meth
    2828 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     956 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     942 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
  489834 total


#### Sparsely methylated loci

In [89]:
%%bash
for f in *Mcap
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [90]:
!head *Mcap-sparseMeth

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth <==
4 2282559 2282561 17.647059
4 2653035 2653037 20.000000
6 479467 479469 18.181818
10 1079591 1079593 20.000000
10 1222877 1222879 20.000000
10 1239917 1239919 15.384615
10 1240019 1240021 20.000000
13 2124876 2124878 18.181818
14 495499 495501 35.294118
16 593911 593913 16.666667

==> Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth <==
2 2675448 2675450 16.666667
2 2675927 2675929 20.000000
2 2679017 2679019 41.666667
2 2679956 2679958 16.666667
2 2680034 2680036 18.750000
2 2680170 2680172 42.105263
2 2680234 2680236 28.571429
4 2282559 2282561 14.285714
4 2933639 2933641 16.666667
10 1079949 1079951 27.272727

==> Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth <==
4 2282559 2282561 11.764706
4 2282561 2282563 11.111111
4 2762159 2762161 11.111111
4 2762294 2762296 11.111111
5 1381724 1381726 15.3

In [91]:
!wc -l *Mcap-sparseMeth

    1116 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
    1422 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
    1043 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
  141418 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-sparseMeth
  132172 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-sparseMeth
  138554 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-sparseMeth
     840 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
     650 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
     532 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
  417747 total


#### Unmethylated loci

In [92]:
%%bash
for f in *Mcap
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [93]:
!head *Mcap-unMeth

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth <==
2 1779837 1779839 0.000000
2 1779853 1779855 0.000000
2 1779898 1779900 0.000000
2 1804233 1804235 0.000000
2 1804297 1804299 0.000000
2 1804303 1804305 4.000000
2 1804306 1804308 0.000000
2 1804322 1804324 0.000000
2 1804347 1804349 7.142857
2 1804359 1804361 0.000000

==> Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth <==
1 893230 893232 0.000000
1 893246 893248 10.000000
1 893258 893260 0.000000
1 893276 893278 0.000000
2 2437213 2437215 0.000000
2 2437231 2437233 0.000000
2 2675538 2675540 6.896552
2 2675971 2675973 0.000000
2 2675985 2675987 0.000000
2 2676029 2676031 0.000000

==> Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth <==
1 1620404 1620406 9.090909
2 1704736 1704738 0.000000
2 1805539 1805541 0.000000
2 1805567 1805569 0.000000
2 1806267 1806269 0.000000
2 1806331 1806333 0.000000
2 1806337 1

In [94]:
!wc -l *Mcap-unMeth

    7490 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    8841 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    7553 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
 1093171 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-unMeth
  891980 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-unMeth
  940698 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-unMeth
    3562 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    3170 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    2265 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
 2958730 total


#### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
|     10     	|    WGBS    	|        9099       	|         493        	|             1116            	|         7490         	|
|     11     	|    WGBS    	|       11090       	|         827        	|             1422            	|         8841         	|
|     12     	|    WGBS    	|        8966       	|         370        	|             1043            	|         7553         	|
|     13     	|    RRBS    	|      1405400      	|       170811       	|            141418           	|        1093171       	|
|     14     	|    RRBS    	|      1177284      	|       153132       	|            132172           	|        891980        	|
|     15     	|    RRBS    	|      1238727      	|       159475       	|            138554           	|        940698        	|
|     16     	|  MBD-BSSeq 	|        7230       	|        2828        	|             840             	|         3562         	|
|     17     	|  MBD-BSSeq 	|        4776       	|         956        	|             650             	|         3170         	|
|     18     	|  MBD-BSSeq 	|        3739       	|         942        	|             532             	|         2265         	|

### 2b. *P. acuta*

## 3. Characterize genomic locations of CpGs

### 3a. Create BEDfiles

In [61]:
%%bash

for f in *bedgraph-Mcap*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

    9099 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
     493 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
    1116 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
    7490 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
   11090 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
     827 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
    1422 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
    8841 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
    8966 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
     370 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
    1043 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
   

In [62]:
#Confirm file creation
!head Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed

2	1779837	1779839
2	1779853	1779855
2	1779898	1779900
2	1804233	1804235
2	1804297	1804299
2	1804303	1804305
2	1804306	1804308
2	1804322	1804324
2	1804347	1804349
2	1804359	1804361


### 3b. Set variable paths

In [63]:
bedtoolsDirectory = "/Users/bedtools2/bin/"

In [None]:
mcPromoters = ""

In [None]:
mcExonUTR = ""

In [None]:
mcExons = ""

In [None]:
mcIntrons = ""

In [64]:
mcGenes = "../../genome-feature-files/Mcap.GFFannotation.gene.gff"

In [None]:
mcTransElem = ""

In [None]:
mcIntergenic = ""

In [None]:
paPromoters = ""

In [None]:
paExonUTR = ""

In [None]:
paExons = ""

In [None]:
paIntrons = ""

In [None]:
paGenes = ""

In [None]:
paTransElem = ""

In [None]:
paIntergenic = ""

### 3c. Check variable paths

In [71]:
!{bedtoolsDirectory}intersectBed -help

/Users/bedtools2/bin/intersectBed: line 2: 22426 Killed: 9               ${0%/*}/bedtools intersect "$@"


In [72]:
!head {mcGenes}

1	AUGUSTUS	gene	18387	18755	0.97	-	.	g21532
1	AUGUSTUS	CDS	18387	18755	0.97	-	0	transcript_id "g21532.t1"; gene_id "g21532";
1	AUGUSTUS	gene	22321	27293	0.23	-	.	g21533
1	AUGUSTUS	CDS	22321	22608	0.55	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	intron	22609	26300	0.25	-	.	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	26301	27293	0.29	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	AUGUSTUS	CDS	37447	37810	1	+	0	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	37811	45037	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	45038	45208	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";


In [73]:
!wc -l {mcGenes}

  458273 ../../genome-feature-files/Mcap.GFFannotation.gene.gff


### 3d. *M. capitata*

#### Genes

In [66]:
%%bash

for f in *Mcap*bed
do
  {bedtoolsDirectory}intersectBed \
  -wb \
  -a ${f} \
  -b {mcGenes} \
  > ${f}-mcGenes
done

bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDirectory}intersectBed: command not found
bash: line 5: {bedtoolsDi

CalledProcessError: Command 'b'\nfor f in *Mcap*bed\ndo\n  {bedtoolsDirectory}intersectBed \\\n  -wb \\\n  -a ${f} \\\n  -b {mcGenes} \\\n  > ${f}-mcGenes.txt\ndone\n'' returned non-zero exit status 127.

In [None]:
#Check output
!head *mcGenes

In [None]:
#Count number of overlaps
!wc -l *mcGenes

#### Intergenic

In [None]:
for f in *Mcap*bed
do
  {bedtoolsDirectory}intersectBed \
  -v \
  -a ${f} \
  -b {mcGenes} \
  > ${f}-mcIntergenic
done

In [None]:
#Check output
!head *mcIntergenic

In [None]:
#Count number of overlaps
!wc -l *mcIntergenic

### 3d. *P. acuta*

## 4. Identify methylation islands

To identify methylation islands using the method from Jeong et al. (2018), define:

- starting size of the methylation window: 500 bp
- minimum fraction of methylated CpGs required within the window to be accepted: 0.02
- step size to extend the accepted window as long as the mCpG fraction is met: 50 bp
- mCpG file: input with mCpG chromosome and bp position

### 4a. *M. capitata*

In [None]:
#Modify mCpG file by removing the third column that is not needed for methylation island analysis
!awk '{print $1"\t"$2}' .bed-MC > .bed-MC-Reduced

In [None]:
#Identify methylation islands using 0.02 mCpG fraction
! ./methyl_island_sliding_window.pl 500 0.02 50 .bed-MC-Reduced \
> MC-Methylation-Islands-500_0.02_50.tab

In [None]:
#Filter by MI length and print MI length in a new column
!awk '{if ($3-$2 >= 500) { print $1"\t"$2"\t"$3"\t"$4"\t"$3-$2}}' MC-Methylation-Islands-500_0.02_50.tab \
> MC-Methylation-Islands-500_0.02_50-filtered.tab
!head MC-Methylation-Islands-500_0.02_50-filtered.tab
! wc -l MC-Methylation-Islands-500_0.02_50-filtered.tab

In [None]:
#Count max mCpG in an island
#Count min mCpG in an island
!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \
MC-Methylation-Islands-500_0.02_50-filtered.tab
!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \
MC-Methylation-Islands-500_0.02_50-filtered.tab

In [None]:
#Create tab-delimited BEDfile without additional information
!awk '{print $1"\t"$2"\t"$3}' MC-Methylation-Islands-500_0.02_50-filtered.tab \
> MC-Methylation-Islands-500_0.02_50-filtered.tab

### 4b. *P. acuta*

In [None]:
#Modify mCpG file by removing the third column that is not needed for methylation island analysis
!awk '{print $1"\t"$2}' .bed-PA > .bed-PA-Reduced

In [None]:
#Identify methylation islands using 0.02 mCpG fraction (same as original paper)
! ./methyl_island_sliding_window.pl 500 0.02 50 .bed-PA-Reduced \
> PA-Methylation-Islands-500_0.02_50.tab

In [None]:
#Filter by MI length and print MI length in a new column
!awk '{if ($3-$2 >= 500) { print $1"\t"$2"\t"$3"\t"$4"\t"$3-$2}}' PA-Methylation-Islands-500_0.02_50.tab \
> PA-Methylation-Islands-500_0.02_50-filtered.tab
!head PA-Methylation-Islands-500_0.02_50-filtered.tab
! wc -l PA-Methylation-Islands-500_0.02_50-filtered.tab

In [None]:
#Count max mCpG in an island
#Count min mCpG in an island
!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \
PA-Methylation-Islands-500_0.02_50-filtered.tab
!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \
PA-Methylation-Islands-500_0.02_50-filtered.tab

## 5. Characterize genomic location of methylation islands

### 5a. Set variable paths

In [None]:
mcMethylationIslands = ""

In [None]:
paMethylationIslands = ""

### 5b. Genes

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {mcMethylationIslands} \
-b {mcGenes} \
> mcMethylationIslands-Genes.txt

In [None]:
!head mcMethylationIslands-Genes.txt
!wc -l mcMethylationIslands-Genes.txt

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {paMethylationIslands} \
-b {paGenes} \
> paMethylationIslands-Genes.txt

In [None]:
!head paMethylationIslands-Genes.txt
!wc paMethylationIslands-Genes.txt

### 5c. Intergenic

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {mcMethylationIslands} \
-b {mcIntergenic} \
> mcMethylationIslands-Intergenic.txt

In [None]:
!head mcMethylationIslands-Genes.txt
!wc -l mcMethylationIslands-Genes.txt

In [None]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {paMethylationIslands} \
-b {paIntergenic} \
> paMethylationIslands-Intergenic.txt

In [None]:
!head paMethylationIslands-Intergenic.txt
!wc paMethylationIslands-Intergenic.txt