# Characterizing CpG Methylation (5x data)

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome. I will use 5x data.

1. Characterize overlap between CG motifs and genome feature tracks
1. Download coverage files
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of all sequenced data, methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type

## 0. Set working directory and obtain checksums

In [1]:
!pwd

/Users/yaamini/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaamini/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation-5x

In [4]:
cd Characterizing-CpG-Methylation-5x/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x


In [31]:
!wget https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200410/all_031520-TG-bs_files_GANNET_md5sum.txt

--2020-04-27 09:22:10--  https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200410/all_031520-TG-bs_files_GANNET_md5sum.txt
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 90413 (88K) [text/plain]
Saving to: ‘all_031520-TG-bs_files_GANNET_md5sum.txt’


2020-04-27 09:22:10 (40.5 MB/s) - ‘all_031520-TG-bs_files_GANNET_md5sum.txt’ saved [90413/90413]



In [32]:
!head all_031520-TG-bs_files_GANNET_md5sum.txt

04829778554df5986ae415fcda3b7e81  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth9_R1_001_val_1.fq.gz
e1048fea898bc32cb03ff801534183d9  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth15_R2_001_val_2.fq.gz
d6e026bb59b10a11ad9b51b8acdd18a7  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth5_R2_001_val_2.fq.gz
bfe70cae27f3251ead4e6686391940ca  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth8_R1_001_val_1.fq.gz_G_to_A.fastq
26c6f90dd9cef5e30f32e312007f3176  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth15_R2_001_val_2.fq.gz_G_to_A.fastq
f41790ce58777f20ee742cba75692065  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth1_R1_001_val_1.fq.gz
4ed014c23ba4c28681d5b4af17e95346  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth14_R1_001_val_1.fq.gz
fc3ad5f9624c63e28bab515b5848158c  /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth13_R2_001_val_2.fq.gz_C_to_T.fastq
8b2c14989c4638fa2cdd7d16a36a7b99  /Volumes/web/seashell/bu-mox/scrubb

### *M. capitata*

In [33]:
#Get all lines from original checksum document
#Extract information for 5x bedgraphs
#Extract information for Mcap data only
#Only keep the first 32 characters in each line (md5sum hashes)
#Save hashes
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 5x.bedgraph \
| grep Mcap \
| cut -c1-32 \
> Mcap-5xbedgraph-GANNET-md5sum-hashes.txt

In [45]:
#Get all lines from original checksum document
#Extract information for 5x bedgraphs
#Extract information for Mcap data only
#Reverse order of characters in each line
#Only keep the first 48 characters in each line
#actually the last 48 characters in the original file, which maps to paths locally
#Reverse characters
#Save paths
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 5x.bedgraph \
| grep Mcap \
| rev \
| cut -c1-47 \
| rev \
> Mcap-5xbedgraph-GANNET-md5sum-paths.txt

In [46]:
#Paste hashes and paths to create a md5sum file
#Save checksum file
#Check output
#Count number of lines        
!paste Mcap-5xbedgraph-GANNET-md5sum-hashes.txt Mcap-5xbedgraph-GANNET-md5sum-paths.txt \
> Mcap-5xbedgraph-GANNET-md5sum.txt
!head Mcap-5xbedgraph-GANNET-md5sum.txt
!wc -l Mcap-5xbedgraph-GANNET-md5sum.txt

04fb72d5df60656e6cec15637164fbec	Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
b2f097299df0cb7d518d22338fdcf39f	Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
073d1c40116a3f93f7a7022cfb4cd3d2	Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
83035e7e47b8ad486de22dacc17ae8ed	Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
a255210553db073e5458ccb523a34798	Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
6493359aad0b4228f65b5e563d337ceb	Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
fc0f66cf04ffebe76d61c1db75cfed6e	Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
1d7c24b238dc72cd92346213b3523611	Meth15_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
2bb476cb98072f0e76bfb5c318246c38	Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
       9 Mcap-5xbedgraph-GANNET-md5sum.txt


### *P. acuta*

In [38]:
#Get all lines from original checksum document
#Extract information for 5x bedgraphs
#Extract information for Pact data only
#Only keep the first 32 characters in each line (md5sum hashes)
#Save hashes
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 5x.bedgraph \
| grep Pact \
| cut -c1-32 \
> Pact-5xbedgraph-GANNET-md5sum-hashes.txt

In [43]:
#Get all lines from original checksum document
#Extract information for 5x bedgraphs
#Extract information for Pact data only
#Reverse order of characters in each line
#Only keep the first 48 characters in each line
#actually the last 48 characters in the original file, which maps to paths locally
#Reverse characters
#Save paths
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 5x.bedgraph \
| grep Pact \
| rev \
| cut -c1-46 \
| rev \
> Pact-5xbedgraph-GANNET-md5sum-paths.txt

In [44]:
#Paste hashes and paths to create a md5sum file
#Save checksum file
#Check output
#Count number of lines        
!paste Pact-5xbedgraph-GANNET-md5sum-hashes.txt Pact-5xbedgraph-GANNET-md5sum-paths.txt \
> Pact-5xbedgraph-GANNET-md5sum.txt
!head Pact-5xbedgraph-GANNET-md5sum.txt
!wc -l Pact-5xbedgraph-GANNET-md5sum.txt

c838562956c7abe3656a2b7438a40dc1	Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
c9a4b002113e2501d81e4762cf952b79	Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
1ec934f5b4ce012b64b77dd69d70ee5f	Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
c456156b7f6a11543d8dc697e8e74b4e	Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
d634ffc3f062d248e36b8dddc9a315e0	Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
a2b842c439c3df3fb699690cd5b55d5a	Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
5994ba73d412d8992f2465b148f5ae80	Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
ed4428a6c8cb6a4964687d91c0d8ccb3	Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
736fd3802ce1b45b6eb32abf6e1bcb3f	Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
       9 Pact-5xbedgraph-GANNET-md5sum.txt


## *M. capitata*

In [5]:
#Make a directory for Mcap output
#!mkdir Mcap

In [47]:
cd Mcap/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x/Mcap


### 1. Characterize CG motif locations in feature tracks

#### 1a. Set variable paths

In [30]:
bedtoolsDirectory = "/usr/local/bin/"

In [8]:
mcGenes = "../../../genome-feature-files/Mcap.GFFannotation.gene.gff"

In [9]:
mcCDS = "../../../genome-feature-files/Mcap.GFFannotation.CDS.gff"

In [10]:
mcIntron = "../../../genome-feature-files/Mcap.GFFannotation.intron.gff"

In [11]:
mcCGMotifs = "../../../genome-feature-files/Mcap_CpG.gff"

#### 1b. Check variable paths

In [12]:
!head {mcGenes}
!wc -l {mcGenes}

1	AUGUSTUS	gene	18387	18755	0.97	-	.	g21532
1	AUGUSTUS	gene	22321	27293	0.23	-	.	g21533
1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	AUGUSTUS	gene	88347	97184	1	+	.	g21537
1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	AUGUSTUS	gene	109867	128510	0.89	+	.	g21539
1	AUGUSTUS	gene	132854	139285	1	-	.	g21540
1	AUGUSTUS	gene	148344	149588	0.44	+	.	g21541
   53875 ../../../genome-feature-files/Mcap.GFFannotation.gene.gff


In [13]:
!head {mcCDS}
!wc -l {mcCDS}

1	AUGUSTUS	CDS	18387	18755	0.97	-	0	transcript_id "g21532.t1"; gene_id "g21532";
1	AUGUSTUS	CDS	22321	22608	0.55	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	26301	27293	0.29	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	37447	37810	1	+	0	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	45038	45208	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	46625	47272	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	49943	50132	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	51903	52266	1	+	1	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	AUGUSTUS	CDS	62261	62557	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
  224096 ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff


In [14]:
!head {mcIntron}
!wc -l {mcIntron}

1	AUGUSTUS	intron	22609	26300	0.25	-	.	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	intron	37811	45037	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	45209	46624	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	47273	49942	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	50133	51902	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	59507	62260	1	-	.	transcript_id "g21535.t1"; gene_id "g21535";
1	AUGUSTUS	intron	64578	64654	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	67319	71345	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	71456	72865	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
  170950 ../../../genome-feature-files/Mcap.GFFannotation.intron.gff


In [15]:
!head {mcCGMotifs}
!wc -l {mcCGMotifs}

##gff-version 2.0
##date 2020-03-29
##Type DNA 1
1	fuzznuc	misc_feature	37	38	2.000	+	.	Sequence "1.1" ; note "*pat pattern1"
1	fuzznuc	misc_feature	90	91	2.000	+	.	Sequence "1.2" ; note "*pat pattern1"
1	fuzznuc	misc_feature	121	122	2.000	+	.	Sequence "1.3" ; note "*pat pattern1"
1	fuzznuc	misc_feature	132	133	2.000	+	.	Sequence "1.4" ; note "*pat pattern1"
1	fuzznuc	misc_feature	153	154	2.000	+	.	Sequence "1.5" ; note "*pat pattern1"
1	fuzznuc	misc_feature	170	171	2.000	+	.	Sequence "1.6" ; note "*pat pattern1"
1	fuzznuc	misc_feature	220	221	2.000	+	.	Sequence "1.7" ; note "*pat pattern1"
 28684519 ../../../genome-feature-files/Mcap_CpG.gff


#### 1c. Characterize overlaps with `bedtools`

In [16]:
!{bedtoolsDirectory}intersectBed -h


Tool:    bedtools intersect (aka intersectBed)
Version: v2.17.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>

Options: 
	-abam	The A input file is in BAM format.  Output will be BAM as well.

	-ubam	Write uncompressed BAM output. Default writes compressed BAM.

	-bed	When using BAM input (-abam), write output as BED. The default
		is to write output in BAM when using -abam.

	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are

In [17]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcGenes} \
| wc -l
!echo "CG motif overlaps with genes"

 9450564
CG motif overlaps with genes


In [18]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcCDS} \
| wc -l
!echo "CG motif overlaps with coding sequences (CDS)"

 1953206
CG motif overlaps with coding sequences (CDS)


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcIntron} \
| wc -l
!echo "CG motif overlaps with introns"

 7503314
CG motif overlaps with introns


In [20]:
!{bedtoolsDirectory}intersectBed \
-v \
-a {mcCGMotifs} \
-b {mcGenes} \
| wc -l
!echo "CG motif overlaps that do not overlap with genes (i.e. intergenic regions)"

 19224826
CG motif overlaps that do not overlap with genes (i.e. intergenic regions)


#### 1d. Summary

| *M. capitata* Genome Feature 	| Number individual features 	| **Overlaps with CG Motifs** 	| **% Total CG Motifs** 	|
|:----------------------------------:	|:------------------------------:	|:---------------------------:	|:--------------------:	|
|                Genes               	|             458273             	|           9450564          	|         32.9         	|
|          Coding Sequences          	|             283926             	|           1953206           	|          6.8         	|
|               Introns              	|             221428             	|           7503314          	|         26.2         	|
|         Intergenic Regions         	|               N/A              	|           19224826          	|         67.0         	|

### 2. Download coverage files

In [5]:
#Download Mcap WGBS and MBD-BS 5x sample bedgraphs
!wget -r -l1 --no-parent -A "*5x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/

--2020-04-27 16:35:56--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  42.27K  --.-KB/s    in 0.001s  

2020-04-27 16:35:57 (38.9 MB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp’ saved [43285]

Loading robots.txt; please ignore errors.
--2020-04-27 16:35:57--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-27 16:35:57 ERROR 404: Not Found.

Removing gannet

In [6]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/* .

In [7]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [8]:
#Check downloaded files
!ls *bedgraph

Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph


In [9]:
#Download Mcap RRBS 5x sample bedgraphs
!wget -r -l1 --no-parent -A "*5x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/

--2020-04-27 16:36:52--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  19.31K  --.-KB/s    in 0.001s  

2020-04-27 16:36:52 (26.6 MB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.html.tmp’ saved [19778]

Loading robots.txt; please ignore errors.
--2020-04-27 16:36:52--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-27 16:36:52 ERROR 404: Not Found.

Removing 

In [10]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/* .

In [11]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [12]:
!find *bedgraph

Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth15_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph


In [13]:
#Verify checksums from gannet
!md5sum -c ../Mcap-5xbedgraph-GANNET-md5sum.txt

md5sum: ../Pact-5xbedgraph-GANNET-md5sum.txt: No such file or directory


In [49]:
!wc -l *bedgraph

 4571288 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 4661716 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 8791700 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 3173254 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 2648697 Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 3176517 Meth15_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
  583599 Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
  242390 Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
  153392 Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 28002553 total


### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [50]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [51]:
!head *Meth

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
1 58745 58747 100.000000
1 103334 103336 100.000000
1 103347 103349 100.000000
1 103356 103358 100.000000
1 103360 103362 100.000000
1 103398 103400 100.000000
1 105953 105955 80.000000
1 106012 106014 50.000000
1 106155 106157 60.000000
1 106173 106175 66.666667

==> Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
1 6905 6907 60.000000
1 7273 7275 80.000000
1 58745 58747 100.000000
1 59207 59209 100.000000
1 69235 69237 100.000000
1 69271 69273 80.000000
1 69275 69277 100.000000
1 69451 69453 100.000000
1 69580 69582 100.000000
1 69584 69586 100.000000

==> Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
1 4948 4950 50.000000
1 4967 4969 50.000000
1 4986 4988 50.000000
1 57065 57067 80.000000
1 58609 58611 100.000000
1 58618 58620 100.000000
1 59207 59209 100.000000
1 59277 59279 100.000000
1 59393 59395 100.000000
1 59438 59440 100.000000

==> Meth13_R1_001_val_1_bismark

In [52]:
!wc -l *Meth

  450582 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  528902 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
 1059904 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  257741 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  184742 Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  231347 Meth15_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  106695 Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   45506 Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   29468 Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
 2894887 total


##### Sparsely methylated loci

In [53]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [54]:
!head *sparseMeth

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth <==
1 27782 27784 20.000000
1 80133 80135 20.000000
1 106202 106204 40.000000
1 140551 140553 33.333333
1 148080 148082 16.666667
1 150099 150101 40.000000
1 169735 169737 12.500000
1 169771 169773 42.857143
1 169796 169798 14.285714
1 169800 169802 16.666667

==> Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth <==
1 6550 6552 12.500000
1 6671 6673 20.000000
1 6996 6998 20.000000
1 7016 7018 40.000000
1 7019 7021 40.000000
1 7293 7295 16.666667
1 7427 7429 16.666667
1 74928 74930 14.285714
1 153767 153769 20.000000
1 193930 193932 20.000000

==> Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth <==
1 4190 4192 16.666667
1 4891 4893 33.333333
1 4910 4912 28.571429
1 4929 4931 33.333333
1 5005 5007 28.571429
1 5024 5026 40.000000
1 5151 5153 20.000000
1 5160 5162 16.666667
1 5228 5230 11.111111
1 6282 6284 11.111111

==> Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sp

In [55]:
!wc -l *sparseMeth

  547868 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  517805 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
 1000337 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  152042 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  135052 Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  179454 Meth15_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
   74839 Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
   28850 Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
   16793 Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
 2653040 total


##### Unmethylated loci

In [56]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [57]:
!head *unMeth

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
1 6570 6572 0.000000
1 6713 6715 0.000000
1 6780 6782 0.000000
1 6813 6815 0.000000
1 6818 6820 0.000000
1 27606 27608 0.000000
1 27613 27615 0.000000
1 27641 27643 0.000000
1 27643 27645 0.000000
1 27674 27676 0.000000

==> Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
1 4929 4931 0.000000
1 5665 5667 0.000000
1 6453 6455 0.000000
1 6484 6486 0.000000
1 6527 6529 0.000000
1 6570 6572 0.000000
1 6618 6620 0.000000
1 6652 6654 0.000000
1 6661 6663 0.000000
1 6668 6670 0.000000

==> Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
1 4062 4064 0.000000
1 4069 4071 0.000000
1 4077 4079 0.000000
1 4086 4088 0.000000
1 4146 4148 0.000000
1 4150 4152 0.000000
1 4155 4157 0.000000
1 4172 4174 0.000000
1 4184 4186 0.000000
1 5043 5045 0.000000

==> Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
1 3493 3495 0.000000
1 3518 3520 0.000000
1 3727 3729 0.000000
1 

In [58]:
!wc -l *unMeth

 3572838 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 3615009 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 6731459 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 2763471 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 2328903 Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 2765716 Meth15_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
  402065 Meth16_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
  168034 Meth17_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
  107131 Meth18_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 22454626 total


##### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
|     10     	|    WGBS    	|        4571288       	|         450582 (9.9%)        	|             547868 (12.0%)            	|         3572838 (78.2%)        	|
|     11     	|    WGBS    	|       4661716       	|         528902 (11.3%)       	|             517805 (11.1%)           	|         3615009 (77.5%)        	|
|     12     	|    WGBS    	|        8791700       	|         1059904 (12.1%)       	|             1000337 (11.4%)           	|         6731459 (76.6%)         	|
|     13     	|    RRBS    	|      3173254      	|       257741 (8.1%)       	|            152042 (4.8%)           	|        2763471 (87.1%)      	|
|     14     	|    RRBS    	|      2648697      	|       184742 (7.0%)       	|            135052 (5.1%)          	|        2328903 (87.9%)       	|
|     15     	|    RRBS    	|      3176517      	|       231347 (7.3%)       	|            179454 (5.6%)          	|        2765716 (87.1%)       	|
|     16     	|  MBD-BSSeq 	|        583599       	|        106695 (18.3%)        	|             74839 (12.8%)             	|         402065 (68.9%)        	|
|     17     	|  MBD-BSSeq 	|        242390       	|         45506 (18.8%)        	|             28850 (11.9%)            	|         168034 (69.3%)        	|
|     18     	|  MBD-BSSeq 	|        153392       	|         29468 (19.2%)       	|             16793 (10.9%)            	|         107131 (69.8%)        	|

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [59]:
%%bash

for f in *bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 4571288 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
  450582 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  547868 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
 3572838 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 4661716 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
  528902 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  517805 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
 3615009 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 8791700 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1059904 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
 1000337 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
 6731459 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 3173254 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
  257741 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  152042 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed


In [60]:
#Confirm BEDfile creation
!find *.bed

Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth14_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Me

In [62]:
#Confirm file creation
!head Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed

1	6570	6572
1	6713	6715
1	6780	6782
1	6813	6815
1	6818	6820
1	27606	27608
1	27613	27615
1	27641	27643
1	27643	27645
1	27674	27676


#### 4b. Genes

In [63]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcGenes
done

In [64]:
#Check output
!head *mcGenes

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcGenes <==
1	58745	58747	1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	103334	103336	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	103347	103349	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	103356	103358	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	103360	103362	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	103398	103400	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	105953	105955	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	106012	106014	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	106155	106157	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	106173	106175	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcGenes <==
1	80133	80135	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	106202	106204	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	184227	184229	1	AUGUSTUS	gene	183017	185772	0.26	+	.	g21546
1	184266	184268	1	AUGUSTUS	gene	183017	

In [65]:
#Count number of overlaps
!wc -l *mcGenes

  230343 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcGenes
  196827 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcGenes
 1255899 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcGenes
 1683069 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcGenes
  267181 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcGenes
  188348 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcGenes
 1284620 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcGenes
 1740149 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcGenes
  533230 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcGenes
  348967 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcGenes
 2322688 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcGenes
 3204885 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcGenes
  123271 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcGenes
   53235 Meth13_R1_001_val_1_

#### 4c. Coding Sequences (CDS)

In [66]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \
  > ${f}-mcCDS
done

In [67]:
#Check output
!head *mcCDS

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS <==
1	58745	58747	1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	438779	438781	1	AUGUSTUS	CDS	438772	439162	1	+	0	transcript_id "g21564.t1"; gene_id "g21564";
1	438791	438793	1	AUGUSTUS	CDS	438772	439162	1	+	0	transcript_id "g21564.t1"; gene_id "g21564";
1	786125	786127	1	AUGUSTUS	CDS	785899	786207	0.98	-	0	transcript_id "g21598.t1"; gene_id "g21598";
1	786144	786146	1	AUGUSTUS	CDS	785899	786207	0.98	-	0	transcript_id "g21598.t1"; gene_id "g21598";
1	789544	789546	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	789590	789592	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	879226	879228	1	AUGUSTUS	CDS	879219	879325	1	+	2	transcript_id "g21603.t1"; gene_id "g21603";
1	983540	983542	1	AUGUSTUS	CDS	983471	983576	1	-	2	transcript_id "g21609.t1"; gene_id "g21609";
1	1263116	1263118	1	AUGUSTUS	CDS	1262915	126430

In [68]:
#Count number of overlaps
!wc -l *mcCDS

   54412 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   60266 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcCDS
  361901 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcCDS
  476579 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcCDS
   64070 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   58258 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcCDS
  374559 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcCDS
  496887 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcCDS
  113396 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   89455 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcCDS
  589114 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcCDS
  791965 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcCDS
   18158 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   11383 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgrap

#### 4d. Introns

In [69]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \
  > ${f}-mcIntrons
done

In [70]:
#Check output
!head *mcIntrons

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons <==
1	103334	103336	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103347	103349	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103356	103358	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103360	103362	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103398	103400	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	105953	105955	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106012	106014	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106155	106157	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106173	106175	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106216	106218	1	AUGUST

In [71]:
#Count number of overlaps
!wc -l *mcIntrons

  176102 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
  136671 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntrons
  894836 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntrons
 1207609 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntrons
  203311 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
  130195 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntrons
  910897 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntrons
 1244403 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntrons
  420249 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
  259702 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntrons
 1735085 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntrons
 2415036 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntrons
  105145 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
   

#### 4e. Intergenic

In [72]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcIntergenic
done

In [73]:
#Check output
!head *mcIntergenic

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic <==
1	320600	320602
1	320631	320633
1	443126	443128
1	444404	444406
1	446577	446579
1	446641	446643
1	446659	446661
1	446682	446684
1	446691	446693
1	446746	446748

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic <==
1	27782	27784
1	140551	140553
1	148080	148082
1	150099	150101
1	169735	169737
1	169771	169773
1	169796	169798
1	169800	169802
1	182756	182758
1	185808	185810

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic <==
1	6570	6572
1	6713	6715
1	6780	6782
1	6813	6815
1	6818	6820
1	27606	27608
1	27613	27615
1	27641	27643
1	27643	27645
1	27674	27676

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic <==
1	6570	6572
1	6713	6715
1	6780	6782
1	6813	6815
1	6818	6820
1	27606	27608
1	27613	27615
1	27641	27643
1	27643	27645
1	27674	27676

==> Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Met

In [74]:
#Count number of overlaps
!wc -l *mcIntergenic

  220239 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic
  351041 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic
 2316939 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic
 2888219 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic
  261721 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic
  329457 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic
 2330389 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic
 2921567 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic
  526674 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic
  651370 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic
 4408771 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic
 5586815 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic
  134470 Meth13_R1_001_val_1_bismark_bt2_pe

#### Summary

##### Overlaps with Genes

| **Sample** 	| **Method** 	| **CpGs with Data** 	| **Methylated CpGs** 	| **Sparsely Methylated CpGs** 	| **Unmethylated CpGs** 	|
|:----------:	|:----------:	|:------------------:	|:-------------------:	|:----------------------------:	|:---------------------:	|
|     10     	|    WGBS    	|       1683069      	|    230343 (13.7%)   	|        196827 (11.7%)        	|    1255899 (74.6%)    	|
|     11     	|    WGBS    	|       1740149      	|    267181 (15.3%)   	|        188348 (10.8%)        	|    1284620 (73.8%)    	|
|     12     	|    WGBS    	|       3204885      	|    533230 (16.6%)   	|        348967 (10.9%)        	|    2322688 (72.5%)    	|
|     13     	|    RRBS    	|       1062577      	|    123271 (11.6%)   	|         53235 (5.0%)         	|     886071 (83.4%)    	|
|     14     	|    RRBS    	|       895852       	|    89865 (10.0%)    	|         48499 (5.4%)         	|     757488 (84.6%)    	|
|     15     	|    RRBS    	|       1064769      	|    113217 (10.6%)   	|         64106 (6.0%)         	|     887446 (83.3%)    	|
|     16     	|  MBD-BSSeq 	|       208503       	|    48436 (23.2%)    	|         25316 (12.1%)        	|     134751 (64.6%)    	|
|     17     	|  MBD-BSSeq 	|        83519       	|    20458 (24.5%)    	|         9441 (11.3%)         	|     53620 (64.2%)     	|
|     18     	|  MBD-BSSeq 	|        50302       	|    12960 (25.8%)    	|          4843 (9.6%)         	|     32499 (64.6%)     	|

##### Overlaps with Coding Sequences (CDS)

| **Sample** 	| **Method** 	| **CpGs with Data** 	| **Methylated CpGs** 	| **Sparsely Methylated CpGs** 	| **Unmethylated CpGs** 	|
|:----------:	|:----------:	|:------------------:	|:-------------------:	|:----------------------------:	|:---------------------:	|
|     10     	|    WGBS    	|       476579       	|    54412 (11.4%)    	|         60266 (12.6%)        	|     361901 (75.9%)    	|
|     11     	|    WGBS    	|       496887       	|    64070 (12.9%)    	|         58258 (11.7%)        	|     374559 (75.4%)    	|
|     12     	|    WGBS    	|       791965       	|    113396 (14.3%)   	|         89455 (11.3%)        	|     589114 (74.4%)    	|
|     13     	|    RRBS    	|       200808       	|     18158 (9.0%)    	|         11383 (5.7%)         	|     171267 (85.3%)    	|
|     14     	|    RRBS    	|       172313       	|     12833 (7.4%)    	|         10486 (6.1%)         	|     148994 (86.5%)    	|
|     15     	|    RRBS    	|       203147       	|     15130 (7.4%)    	|         14015 (6.9%)         	|     174002 (85.7%)    	|
|     16     	|  MBD-BSSeq 	|        72490       	|    13535 (18.7%)    	|         9666 (13.3%)         	|     49289 (68.0%)     	|
|     17     	|  MBD-BSSeq 	|        29305       	|     5649 (19.3%)    	|         3690 (12.6%)         	|     19966 (68.1%)     	|
|     18     	|  MBD-BSSeq 	|        17619       	|     3992 (22.7%)    	|         1907 (10.8%)         	|     11720 (66.5%)     	|

##### Overalps with Introns

| **Sample** 	| **Method** 	| **CpGs with Data** 	| **Methylated CpGs** 	| **Sparsely Methylated CpGs** 	| **Unmethylated CpGs** 	|
|:----------:	|:----------:	|:------------------:	|:-------------------:	|:----------------------------:	|:---------------------:	|
|     10     	|    WGBS    	|       1207609      	|    176102 (14.6%)   	|        136671 (11.3%)        	|     894836 (74.1%)    	|
|     11     	|    WGBS    	|       1244403      	|    203311 (16.3%)   	|        130195 (10.5%)        	|     910897 (73.2%)    	|
|     12     	|    WGBS    	|       2415036      	|    420249 (17.4%)   	|        259702 (10.8%)        	|    1735085 (71.8%)    	|
|     13     	|    RRBS    	|       862191       	|    105145 (12.2%)   	|         41866 (4.9%)         	|     715180 (82.9%)    	|
|     14     	|    RRBS    	|       723933       	|    77053 (10.6%)    	|         38032 (5.3%)         	|     608848 (84.1%)    	|
|     15     	|    RRBS    	|       862091       	|    98111 (11.4%)    	|         50116 (5.8%)         	|     713864 (82.8%)    	|
|     16     	|  MBD-BSSeq 	|       136153       	|    34929 (25.7%)    	|         15666 (11.5%)        	|     85558 (62.8%)     	|
|     17     	|  MBD-BSSeq 	|        54275       	|    14821 (27.3%)    	|         5755 (10.6%)         	|     33699 (62.1%)     	|
|     18     	|  MBD-BSSeq 	|        32717       	|     8973 (27.4%)    	|          2942 (9.0%)         	|     20802 (63.6%)     	|

##### Overlaps with Intergenic regions

| **Sample** 	| **Method** 	| **CpGs with Data** 	| **Methylated CpGs** 	| **Sparsely Methylated CpGs** 	| **Unmethylated CpGs** 	|
|:----------:	|:----------:	|:------------------:	|:-------------------:	|:----------------------------:	|:---------------------:	|
|     10     	|    WGBS    	|       2888219      	|    220239 (7.6%)    	|        351041 (12.2%)        	|    2316939 (80.2%)    	|
|     11     	|    WGBS    	|       2921567      	|    261721 (9.0%)    	|        329457 (11.3%)        	|    2330389 (79.8%)    	|
|     12     	|    WGBS    	|       5586815      	|    526674 (9.4%)    	|        651370 (11.7%)        	|    4408771 (78.9%)    	|
|     13     	|    RRBS    	|       2110677      	|    134470 (6.4%)    	|         98807 (4.7%)         	|    1877400 (88.9%)    	|
|     14     	|    RRBS    	|       1752845      	|     94877 (5.4%)    	|         86553 (4.9%)         	|    1571415 (89.6%)    	|
|     15     	|    RRBS    	|       2111748      	|    118130 (5.6%)    	|         115348 (5.5%)        	|    1878270 (88.9%)    	|
|     16     	|  MBD-BSSeq 	|       375096       	|    58259 (15.5%)    	|         49523 (13.2%)        	|     267314 (71.3%)    	|
|     17     	|  MBD-BSSeq 	|       158871       	|    25048 (15.8%)    	|         19409 (12.2%)        	|     114414 (72.0%)    	|
|     18     	|  MBD-BSSeq 	|       103090       	|    16508 (16.0%)    	|         11950 (11.6%)        	|     74632 (72.4%)     	|

## *P. acuta*

In [64]:
cd ..

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


In [16]:
#Make a directory for Pact output
#!mkdir Pact

In [17]:
cd Pact/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x/Pact


### 1. Characterize CG motif locations in feature tracks

#### 1a. Set variable paths

In [18]:
paGenes = "../../../genome-feature-files/Pact.GFFannotation.Genes.gff"

In [19]:
paCDS = "../../../genome-feature-files/Pact.GFFannotation.CDS.gff"

In [20]:
paIntron = "../../../genome-feature-files/Pact.GFFannotation.Intron.gff"

In [21]:
paCGMotifs = "../../../genome-feature-files/Pact_CpG.gff"

#### 1b. Check variable paths

In [22]:
!head {paGenes}
!wc -l {paGenes}

scaffold6_cov64	AUGUSTUS	gene	1	5652	0.46	-	.	g1
scaffold6_cov64	AUGUSTUS	gene	5805	6678	0.57	+	.	g2
scaffold7_cov100	AUGUSTUS	gene	1	2566	0.96	+	.	g3
scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	AUGUSTUS	gene	13339	15463	0.92	-	.	g7
scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	AUGUSTUS	gene	18586	19270	0.99	-	.	g9
scaffold7_cov100	AUGUSTUS	gene	19312	20050	0.74	+	.	g10
   64558 ../../../genome-feature-files/Pact.GFFannotation.Genes.gff


In [23]:
!head {paCDS}
!wc -l {paCDS}

scaffold6_cov64	AUGUSTUS	CDS	495	842	0.84	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1208	1555	0.92	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1922	2269	1	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5583	5652	0.26	-	0	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	495	842	0.84	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1208	1555	0.92	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1922	2269	1	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	4754	4851	0.4	-	1	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5594	5652	0.54	-	0	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5805	5838	0.98	+	0	transcript_id "g2.t1"; gene_id "g2";
  318484 ../../../genome-feature-files/Pact.GFFannotation.CDS.gff


In [24]:
!head {paIntron}
!wc -l {paIntron}

scaffold6_cov64	AUGUSTUS	intron	1	494	0.82	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	843	1207	0.92	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1556	1921	1	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	2270	5582	0.23	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1	494	0.82	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	843	1207	0.92	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1556	1921	1	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	2270	4753	0.4	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	4852	5593	0.48	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	5839	5945	0.54	+	.	transcript_id "g2.t1"; gene_id "g2";
  241534 ../../../genome-feature-files/Pact.GFFannotation.Intron.gff


In [25]:
!head {paCGMotifs}
!wc -l {paCGMotifs}

##gff-version 2.0
##date 2020-03-29
##Type DNA scaffold1_cov55
scaffold1_cov55	fuzznuc	misc_feature	23	24	2.000	+	.	Sequence "scaffold1_cov55.1" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	35	36	2.000	+	.	Sequence "scaffold1_cov55.2" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	50	51	2.000	+	.	Sequence "scaffold1_cov55.3" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	85	86	2.000	+	.	Sequence "scaffold1_cov55.4" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	93	94	2.000	+	.	Sequence "scaffold1_cov55.5" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	103	104	2.000	+	.	Sequence "scaffold1_cov55.6" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	106	107	2.000	+	.	Sequence "scaffold1_cov55.7" ; note "*pat pattern1"
 9639415 ../../../genome-feature-files/Pact_CpG.gff


#### 1c. Characterize overlaps with `bedtools`

In [31]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paGenes} \
| wc -l
!echo "CG motif overlaps with genes"

 3434720
CG motif overlaps with genes


In [32]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paCDS} \
| wc -l
!echo "CG motif overlaps with CDS"

 1455630
CG motif overlaps with CDS


In [33]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paIntron} \
| wc -l
!echo "CG motif overlaps with introns"

 1999490
CG motif overlaps with introns


In [34]:
!{bedtoolsDirectory}intersectBed \
-v \
-a {paCGMotifs} \
-b {paGenes} \
| wc -l
!echo "CG motif overlaps that do not overlap with genes (i.e. intergenic regions)"

 5720900
CG motif overlaps that do not overlap with genes (i.e. intergenic regions)


#### 1d. Summary

| *P. acuta* Genome Feature 	| **Number individual features** 	| **Overlaps with CG Motifs** 	| **% Total CG Motifs** 	|
|:-------------------------------:	|:------------------------------:	|:---------------------------:	|:---------------------:	|
|              Genes              	|              64558             	|           3434720           	|          35.6         	|
|         Coding Sequences        	|             318484             	|           1455630           	|          15.1         	|
|             Introns             	|             241534             	|           1999490           	|          20.7         	|
|        Intergenic Regions       	|               N/A              	|           5720900           	|          59.3         	|

### 2. Download coverage files

In [35]:
#Download Pact WGBS and MBD-BS 5x sample bedgraphs
!wget -r -l1 --no-parent -A "*5x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/

--2020-04-27 16:53:05--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  42.11K  --.-KB/s    in 0.001s  

2020-04-27 16:53:06 (34.2 MB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp’ saved [43123]

Loading robots.txt; please ignore errors.
--2020-04-27 16:53:06--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-27 16:53:06 ERROR 404: Not Found.

Removing gannet

In [36]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/* .

In [37]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [38]:
#Check files
!find *bedgraph

Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph


In [39]:
#Download Pact RRBS 5x sample bedgraphs
!wget -r -l1 --no-parent -A "*5x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/

--2020-04-27 16:53:20--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  19.51K  --.-KB/s    in 0.001s  

2020-04-27 16:53:21 (25.4 MB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.html.tmp’ saved [19983]

Loading robots.txt; please ignore errors.
--2020-04-27 16:53:21--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-27 16:53:21 ERROR 404: Not Found.

Removing 

In [40]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/* .

In [41]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [42]:
!find *bedgraph

Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph


In [43]:
#Verify checksums from gannet
!md5sum -c ../Pact-5xbedgraph-GANNET-md5sum.txt

Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK


In [44]:
!wc -l *bedgraph

 5546051 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 6358722 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 5866786 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 1835561 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 1451229 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 1517358 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 2640625 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
  539008 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 2732607 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 28487947 total


### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [45]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [46]:
!head *Meth

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
scaffold7_cov100 4351 4353 50.000000
scaffold7_cov100 5500 5502 83.333333
scaffold7_cov100 5578 5580 57.142857
scaffold7_cov100 5986 5988 100.000000
scaffold7_cov100 6144 6146 100.000000
scaffold7_cov100 6188 6190 100.000000
scaffold7_cov100 6198 6200 88.888889
scaffold7_cov100 6231 6233 100.000000
scaffold7_cov100 6233 6235 100.000000
scaffold7_cov100 7438 7440 100.000000

==> Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
scaffold7_cov100 5500 5502 62.500000
scaffold7_cov100 5986 5988 66.666667
scaffold7_cov100 6144 6146 100.000000
scaffold7_cov100 6188 6190 94.117647
scaffold7_cov100 6198 6200 100.000000
scaffold7_cov100 6231 6233 71.428571
scaffold7_cov100 6233 6235 100.000000
scaffold7_cov100 7438 7440 88.235294
scaffold7_cov100 7696 7698 95.833333
scaffold7_cov100 7796 7798 60.000000

==> Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
scaffold7_cov100 5500 5502 87.500000
scaffo

In [47]:
!wc -l *Meth

  110364 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  126440 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  124819 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   31047 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   30345 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   26617 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  258222 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  213342 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  255370 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
 1176566 total


##### Sparsely methylated loci

In [None]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [None]:
!head *sparseMeth

In [None]:
!wc -l *sparseMeth

##### Unmethylated loci

In [92]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [93]:
!head *unMeth

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
scaffold2_cov51 649 651 0.000000
scaffold2_cov51 686 688 8.333333
scaffold3_cov83 189 191 6.250000
scaffold3_cov83 208 210 0.000000
scaffold3_cov83 243 245 0.000000
scaffold3_cov83 261 263 8.108108
scaffold6_cov64 290 292 0.000000
scaffold6_cov64 298 300 0.000000
scaffold6_cov64 489 491 0.000000
scaffold6_cov64 826 828 0.000000

==> Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
scaffold1_cov55 169 171 0.000000
scaffold1_cov55 194 196 0.000000
scaffold1_cov55 250 252 0.000000
scaffold1_cov55 291 293 0.000000
scaffold3_cov83 189 191 8.695652
scaffold3_cov83 208 210 6.896552
scaffold3_cov83 243 245 0.000000
scaffold3_cov83 261 263 7.692308
scaffold3_cov83 475 477 4.761905
scaffold3_cov83 484 486 7.317073

==> Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
scaffold2_cov51 649 651 0.000000
scaffold2_cov51 778 780 0.000000
scaffold3_cov83 208 210 5.128205
scaffold3_cov83 243 24

In [94]:
!wc -l *unMeth

 2396869 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 3750400 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2872071 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 1106625 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
  949949 Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
  965584 Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
  441932 Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
   54045 Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
  424531 Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 12962006 total


##### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
|     1     	|    WGBS    	|        2518069       	|         37201 (1.5%)       	|             83999 (3.3%)           	|         2396869 (95.2%)        	|
|     2     	|    WGBS    	|       3926923       	|         66524 (1.7%)       	|             109999 (2.8%)            	|         3750400 (95.5%)         	|
|     3     	|    WGBS    	|         3028012      	|         51081 (1.7%)       	|             104860 (3.5%)            	|         2872071 (94.9%)         	|
|     4     	|    RRBS    	|      1184293      	|       12021 (1.0%)       	|            65647 (5.5%)           	|        1106625 (93.4%)       	|
|     5     	|    RRBS    	|       992337     	|       14557 (1.5%)       	|            27831 (2.8%)          	|        949949 (95.7%)       	|
|     6     	|    RRBS    	|       1014588     	|       10621 (1.0%)      	|            38383 (3.8%)          	|        965584 (95.2%)        	|
|     7     	|  MBD-BSSeq 	|        744052       	|        195284 (26.2%)       	|             106836 (14.3%)             	|         441932 (59.4%)        	|
|     8     	|  MBD-BSSeq 	|        250032       	|         156098 (62.4%)       	|             39889 (16.0%)            	|         54045 (21.6%)        	|
|     9     	|  MBD-BSSeq 	|        725079       	|         187956 (25.9%)       	|             112592 (15.5%)             	|         424531  (58.5%)       	|

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [97]:
%%bash

for f in *bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 2518069 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
   37201 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
   83999 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 2396869 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 3926923 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
   66524 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
  109999 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 3750400 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 3028012 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
   51081 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
  104860 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 2872071 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 1184293 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
   12021 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
   65647 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed


In [98]:
#Confirm BEDfile creation
!find *.bed

Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Me

In [99]:
#Confirm file creation
!head Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed

scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold3_cov83	189	191
scaffold3_cov83	208	210
scaffold3_cov83	243	245
scaffold3_cov83	261	263
scaffold3_cov83	475	477
scaffold3_cov83	484	486
scaffold3_cov83	504	506
scaffold6_cov64	290	292


#### 4b. Genes

In [102]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paGenes
done

In [103]:
#Check output
!head *paGenes

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes <==
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	7891	7893	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	8323	8325	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	9877	9879	scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	10216	10218	scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	16910	16912	scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	17090	17092	scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	17461	17463	scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes <==
scaffold7_cov100	1293	1295	sc

In [104]:
#Count number of overlaps
!wc -l *paGenes

   25899 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
   34626 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes
 1118909 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paGenes
 1179434 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paGenes
   47716 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
   49984 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes
 1723334 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paGenes
 1821034 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paGenes
   35357 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
   44108 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes
 1346622 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paGenes
 1426087 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paGenes
    4988 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
   26205 Meth4_R1_001_val_1_b

#### 4c. Coding Sequences (CDS)

In [105]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \
  > ${f}-paCDS
done

In [106]:
#Check output
!head *paCDS

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS <==
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	7892	7893	scaffold7_cov100	AUGUSTUS	CDS	7893	7980	1	-	0	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7892	7893	scaffold7_cov100	AUGUSTUS	CDS	7893	7980	1	-	0	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	8323	8325	scaffold7_cov100	AUGUSTUS	CDS	8286	8363	1	-	0	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	8323	8325	scaffold7_cov100	AUGUSTUS	CDS	8286	8363	1	-	0	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	9877	9879	s

In [107]:
#Count number of overlaps
!wc -l *paCDS

   23872 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
   26408 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paCDS
  800209 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paCDS
  850489 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paCDS
   41839 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
   34947 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paCDS
 1143040 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paCDS
 1219826 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paCDS
   32069 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
   32550 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paCDS
  946018 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paCDS
 1010637 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paCDS
    3840 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
   18125 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgrap

#### 4d. Introns

In [108]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \
  > ${f}-paIntron
done

In [109]:
#Check output
!head *paIntron

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron <==
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	7891	7892	scaffold7_cov100	AUGUSTUS	intron	7716	7892	1	-	.	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7891	7892	scaffold7_cov100	AUGUSTUS	intron	7716	7892	1	-	.	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	18941	18943	scaffold7_cov100	AUGUSTUS	intron	18757	19234	0.99	-	.	transcript_id "g9.t1"; gene_id "g9";
scaffold7_cov100	21273	21275	scaffold7_cov100	AUGUSTUS	intron	20697	24051	0.49	+	.	transcript_id "g11.t1"; gene_id "g11";
scaffold7_cov100	50493	50495	scaffold7_cov100	AUGUSTUS	intron	50191	50604	1	+	.	transcript_id "g16.t1"; gene_id "g16";
scaffold7_cov100	69268	69270	scaffold7_cov100	AUGUSTUS	intron	69212	69346	1	-	.	transcript_id "g18.t1"; g

In [110]:
#Count number of overlaps
!wc -l *paIntron

   11164 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
   19112 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntron
  703614 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntron
  733890 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntron
   23483 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
   32016 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntron
 1181764 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntron
 1237263 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntron
   15720 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
   25760 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntron
  867166 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntron
  908646 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntron
    2726 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
   16685 Meth4_R

#### 4e. Intergenic

In [111]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paIntergenic
done

In [112]:
#Check output
!head *paIntergenic

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic <==
scaffold7_cov100	24494	24496
scaffold7_cov100	24941	24943
scaffold7_cov100	78473	78475
scaffold7_cov100	107792	107794
scaffold7_cov100	107834	107836
scaffold7_cov100	108138	108140
scaffold7_cov100	148319	148321
scaffold7_cov100	148342	148344
scaffold7_cov100	230317	230319
scaffold7_cov100	327789	327791

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic <==
scaffold3_cov83	475	477
scaffold3_cov83	484	486
scaffold3_cov83	504	506
scaffold7_cov100	24454	24456
scaffold7_cov100	25157	25159
scaffold7_cov100	31789	31791
scaffold7_cov100	121132	121134
scaffold7_cov100	163515	163517
scaffold7_cov100	194346	194348
scaffold7_cov100	196956	196958

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic <==
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold3_cov83	189	191
scaffold3_cov83	208	210
scaffold3_cov83	243	245
scaffold3_cov83	261	263
scaffold6_cov64	5797	5799

In [113]:
#Count number of overlaps
!wc -l *paIntergenic

   11319 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic
   49388 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic
 1278710 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic
 1339417 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntergenic
   18848 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic
   60047 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic
 2028226 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic
 2107121 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntergenic
   15747 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic
   60783 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic
 1526292 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic
 1602822 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntergenic
    7040 Meth4_R1_001_val_1_bismark_bt2_pe.

#### Summary

##### Overlaps with Genes

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     1     	|    WGBS    	|        1179434       	|         25899 (2.2%)        	|             34626 (2.9%)             	|         1118909 (94.9%)         	|
|     2     	|    WGBS    	|        1821034       	|         47716 (2.6%)        	|             49984 (2.7%)             	|         1723334 (94.6%)         	|
|     3     	|    WGBS    	|        1426087        	|         35357 (2.5%)       	|             44108 (3.1%)            	|         1346622 (94.4%)        	|
|     4     	|    RRBS    	|      502813      	|       4988 (1.0%)      	|            26205 (5.2%)          	|        471620 (93.8%)       	|
|     5     	|    RRBS    	|       416016      	|       5815 (1.4%)      	|            10664 (2.6%)          	|        399537 (96.0%)       	|
|     6     	|    RRBS    	|       428818      	|       4293 (1.0%)      	|             14614 (3.4%)          	|        409911 (95.6%)       	|
|     7     	|  MBD-BSSeq 	|        325489       	|        87468 (26.9%)       	|             31803 (9.8%)            	|          206218 (63.4%)        	|
|     8     	|  MBD-BSSeq 	|        88675       	|         60278 (68.0%)       	|              11894 (13.4%)            	|          16503 (18.6%)        	|
|     9     	|  MBD-BSSeq 	|        314739       	|         92131 (29.3%)       	|              34146 (10.8%)            	|          188462 (59.9%)        	|

##### Overlaps with Coding Sequences (CDS)

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     1     	|    WGBS    	|        850489       	|         23872 (2.8%)        	|             26408 (3.1%)             	|         800209 (94.1%)         	|
|     2     	|    WGBS    	|        1219826       	|         41839 (3.4%)        	|             34947 (2.9%)             	|         1143040 (93.7%)         	|
|     3     	|    WGBS    	|        1010637        	|         32069 (3.2%)       	|             32550 (3.2%)            	|         946018 (93.6%)        	|
|     4     	|    RRBS    	|      334839      	|       3840 (1.1%)      	|            18125 (5.4%)          	|        312874 (93.4%)       	|
|     5     	|    RRBS    	|       275416      	|       4217 (1.5%)      	|            7617 (2.8%)          	|        263582 (95.7%)       	|
|     6     	|    RRBS    	|       283255      	|       3230 (1.1%)      	|             10420 (3.7%)          	|        269605 (95.2%)       	|
|     7     	|  MBD-BSSeq 	|        284856       	|        72744 (25.5%)       	|             24809 (8.7%)            	|          187303 (65.8%)        	|
|     8     	|  MBD-BSSeq 	|        73240       	|         51129 (69.8%)       	|              9109 (12.4%)            	|          13002 (17.8%)        	|
|     9     	|  MBD-BSSeq 	|        265810       	|         75705 (28.5%)       	|              26175 (9.8%)            	|          163930 (61.7%)        	|

##### Overalps with Introns

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     1     	|    WGBS    	|        733890       	|         11164 (1.5%)        	|             19112 (2.6%)             	|         703614 (95.9%)         	|
|     2     	|    WGBS    	|        1237263       	|         23483 (1.9%)        	|             32016 (2.6%)             	|         1181764 (95.5%)         	|
|     3     	|    WGBS    	|        908646        	|         15720 (1.7%)       	|             25760 (2.8%)            	|         867166 (95.4%)        	|
|     4     	|    RRBS    	|      336436      	|       2726 (0.8%)      	|            16685 (5.0%)          	|        317025 (94.2%)       	|
|     5     	|    RRBS    	|       277680      	|       3514 (1.3%)      	|            6409 (2.3%)          	|        267757 (96.4%)       	|
|     6     	|    RRBS    	|       287803      	|       2348 (0.8%)      	|             8824 (3.1%)          	|        276631 (96.1%)       	|
|     7     	|  MBD-BSSeq 	|        140343       	|        42347 (30.2%)       	|             15003 (10.7%)            	|          82993 (59.1%)        	|
|     8     	|  MBD-BSSeq 	|        40384       	|         27138 (67.2%)       	|              5541 (13.7%)            	|          7705 (19.1%)        	|
|     9     	|  MBD-BSSeq 	|        146914       	|         46592 (31.7%)       	|              16913 (11.5%)            	|          83409 (56.8%)        	|

##### Overlaps with Integenic regions

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     1     	|    WGBS    	|        1339417       	|         11319 (0.8%)        	|             49388 (3.7%)             	|         1278710 (95.5%)         	|
|     2     	|    WGBS    	|        2107121       	|         18848 (0.9%)        	|             60047 (2.8%)             	|         2028226 (96.3%)         	|
|     3     	|    WGBS    	|        1602822        	|         15747 (1.0%)       	|             60783 (3.8%)            	|         1526292 (95.2%)        	|
|     4     	|    RRBS    	|      681870      	|       7040 (1.0%)      	|            39459 (5.8%)          	|        635371 (93.2%)       	|
|     5     	|    RRBS    	|       576682      	|       8749 (1.5%)      	|            17176 (3.0%)          	|        550757 (95.5%)       	|
|     6     	|    RRBS    	|       586105      	|       6335 (1.1%)      	|             23780 (4.1%)          	|        555990 (94.9%)       	|
|     7     	|  MBD-BSSeq 	|        418826       	|        107907 (25.8%)       	|             75067 (17.9%)            	|          235852 (56.3%)        	|
|     8     	|  MBD-BSSeq 	|        161464       	|         95906 (59.4%)       	|              28006 (17.3%)            	|          37552 (23.3%)        	|
|     9     	|  MBD-BSSeq 	|        410614       	|         95913 (23.4%)       	|              78494 (19.1%)            	|          236207 (57.5%)        	|