# Characterizing CpG Methylation

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome.

1. Characterize overlap between CG motifs and genome feature tracks
1. Download coverage files
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of all sequenced data, methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type

## 0. Set working directory

In [1]:
!pwd

/Users/yaaminivenkataraman/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation

In [3]:
cd Characterizing-CpG-Methylation/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


## *M. capitata*

In [36]:
#Make a directory for Mcap output
#!mkdir Mcap

In [4]:
cd Mcap/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation/Mcap


### 1. Characterize CG motif locations in feature tracks

#### 1a. Set variable paths

In [44]:
bedtoolsDirectory = "/usr/local/bin/"

In [35]:
mcGenes = "../../../genome-feature-files/Mcap.GFFannotation.gene.gff"

In [36]:
mcCDS = "../../../genome-feature-files/Mcap.GFFannotation.CDS.gff"

In [37]:
mcIntron = "../../../genome-feature-files/Mcap.GFFannotation.intron.gff"

In [38]:
mcCGMotifs = "../../../genome-feature-files/Mcap_CpG.gff"

#### 1b. Check variable paths

In [39]:
!head {mcGenes}
!wc -l {mcGenes}

1	AUGUSTUS	gene	18387	18755	0.97	-	.	g21532
1	AUGUSTUS	gene	22321	27293	0.23	-	.	g21533
1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	AUGUSTUS	gene	88347	97184	1	+	.	g21537
1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	AUGUSTUS	gene	109867	128510	0.89	+	.	g21539
1	AUGUSTUS	gene	132854	139285	1	-	.	g21540
1	AUGUSTUS	gene	148344	149588	0.44	+	.	g21541
   53875 ../../../genome-feature-files/Mcap.GFFannotation.gene.gff


In [40]:
!head {mcCDS}
!wc -l {mcCDS}

1	AUGUSTUS	CDS	18387	18755	0.97	-	0	transcript_id "g21532.t1"; gene_id "g21532";
1	AUGUSTUS	CDS	22321	22608	0.55	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	26301	27293	0.29	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	37447	37810	1	+	0	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	45038	45208	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	46625	47272	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	49943	50132	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	51903	52266	1	+	1	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	AUGUSTUS	CDS	62261	62557	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
  224096 ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff


In [41]:
!head {mcIntron}
!wc -l {mcIntron}

1	AUGUSTUS	intron	22609	26300	0.25	-	.	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	intron	37811	45037	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	45209	46624	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	47273	49942	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	50133	51902	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	59507	62260	1	-	.	transcript_id "g21535.t1"; gene_id "g21535";
1	AUGUSTUS	intron	64578	64654	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	67319	71345	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	71456	72865	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
  170950 ../../../genome-feature-files/Mcap.GFFannotation.intron.gff


In [42]:
!head {mcCGMotifs}
!wc -l {mcCGMotifs}

##gff-version 2.0
##date 2020-03-29
##Type DNA 1
1	fuzznuc	misc_feature	37	38	2.000	+	.	Sequence "1.1" ; note "*pat pattern1"
1	fuzznuc	misc_feature	90	91	2.000	+	.	Sequence "1.2" ; note "*pat pattern1"
1	fuzznuc	misc_feature	121	122	2.000	+	.	Sequence "1.3" ; note "*pat pattern1"
1	fuzznuc	misc_feature	132	133	2.000	+	.	Sequence "1.4" ; note "*pat pattern1"
1	fuzznuc	misc_feature	153	154	2.000	+	.	Sequence "1.5" ; note "*pat pattern1"
1	fuzznuc	misc_feature	170	171	2.000	+	.	Sequence "1.6" ; note "*pat pattern1"
1	fuzznuc	misc_feature	220	221	2.000	+	.	Sequence "1.7" ; note "*pat pattern1"
 28684519 ../../../genome-feature-files/Mcap_CpG.gff


#### 1c. Characterize overlaps with `bedtools`

In [21]:
!{bedtoolsDirectory}intersectBed -h


Tool:    bedtools intersect (aka intersectBed)
Version: v2.29.2
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlapping features restricted by -f 

In [45]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcGenes} \
| wc -l
!echo "CG motif overlaps with genes"

 9450564
CG motif overlaps with genes


In [46]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcCDS} \
| wc -l
!echo "CG motif overlaps with coding sequences (CDS)"

 1953206
CG motif overlaps with coding sequences (CDS)


In [47]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcIntron} \
| wc -l
!echo "CG motif overlaps with introns"

 7503314
CG motif overlaps with introns


In [48]:
!{bedtoolsDirectory}intersectBed \
-v \
-a {mcCGMotifs} \
-b {mcGenes} \
| wc -l
!echo "CG motif overlaps that do not overlap with genes (i.e. intergenic regions)"

 19224826
CG motif overlaps that do not overlap with genes (i.e. intergenic regions)


#### 1d. Summary

| *M. capitata* Genome Feature 	| Number individual features 	| **Overlaps with CG Motifs** 	| **% Total CG Motifs** 	|
|:----------------------------------:	|:------------------------------:	|:---------------------------:	|:--------------------:	|
|                Genes               	|             458273             	|           9450564          	|         32.9         	|
|          Coding Sequences          	|             283926             	|           1953206           	|          6.8         	|
|               Introns              	|             221428             	|           7503314          	|         26.2         	|
|         Intergenic Regions         	|               N/A              	|           19224826          	|         67.0         	|

### 2. Download coverage files

In [5]:
#Download Mcap WGBS and MBD-BS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/

--2020-04-13 10:36:29--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  42.27K  --.-KB/s    in 0.09s   

2020-04-13 10:36:31 (470 KB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp’ saved [43285]

Loading robots.txt; please ignore errors.
--2020-04-13 10:36:31--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-13 10:36:31 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/

In [7]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/* .

In [8]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [11]:
#Check downloaded files
!ls *bedgraph

Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph


In [12]:
#Download Mcap RRBS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/

--2020-04-13 10:39:10--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  19.31K  --.-KB/s    in 0.04s   

2020-04-13 10:39:11 (470 KB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.html.tmp’ saved [19778]

Loading robots.txt; please ignore errors.
--2020-04-13 10:39:11--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-13 10:39:11 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-

In [13]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup//* .

In [14]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [15]:
!find *bedgraph

Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph


In [16]:
!wc -l *bedgraph

  470893 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
  479520 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 1756997 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 2945967 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 2310457 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 2874355 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
   44091 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
   21797 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
   14818 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 10918895 total


### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [17]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [18]:
!head *Meth

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
1 448144 448146 76.923077
1 789544 789546 50.000000
1 789590 789592 50.000000
1 876587 876589 100.000000
1 876606 876608 63.636364
1 1267992 1267994 100.000000
1 1269495 1269497 100.000000
1 1269508 1269510 83.333333
1 1269532 1269534 100.000000
1 1373604 1373606 78.571429

==> Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
1 448144 448146 63.636364
1 450663 450665 83.333333
1 450675 450677 81.818182
1 450703 450705 80.000000
1 744619 744621 54.545455
1 1264052 1264054 100.000000
1 1269495 1269497 90.000000
1 1269508 1269510 80.000000
1 1269532 1269534 100.000000
1 1273421 1273423 90.909091

==> Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
1 69151 69153 90.909091
1 69235 69237 80.000000
1 69580 69582 100.000000
1 69584 69586 100.000000
1 69845 69847 100.000000
1 69882 69884 100.000000
1 70019 70021 91.666667
1 70068 70070 100.000000
1 70083 70085 100.000000
1 70129 70

In [19]:
!wc -l *Meth

   35488 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
   49640 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
  173085 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
  204797 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
  136505 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
  172957 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
   16168 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
    6080 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
    4837 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
  799557 total


##### Sparsely methylated loci

In [20]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [21]:
!head *sparseMeth

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
1 405936 405938 18.181818
1 437346 437348 18.181818
1 437349 437351 18.181818
1 460307 460309 15.384615
1 463857 463859 18.181818
1 527701 527703 15.384615
1 527706 527708 15.384615
1 527726 527728 15.384615
1 527739 527741 13.333333
1 601991 601993 20.000000

==> Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
1 219905 219907 15.384615
1 230777 230779 18.181818
1 272290 272292 20.000000
1 411793 411795 20.000000
1 425364 425366 16.666667
1 460884 460886 18.181818
1 460920 460922 23.529412
1 460947 460949 20.000000
1 460953 460955 13.333333
1 461445 461447 16.666667

==> Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
1 79955 79957 16.666667
1 205989 205991 15.789474
1 216074 216076 15.384615
1 243227 243229 18.181818
1 273081 273083 25.000000
1 356393 356395 20.000000
1 380405 380407 28.571429
1 380486 380488 25.000000
1 382279 382281 16.666667
1 382287

In [22]:
!wc -l *sparseMeth

   43405 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
   42459 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
  127160 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
  129206 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
  107375 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
  145976 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
    5283 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
    2282 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
    1433 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
  604579 total


##### Unmethylated loci

In [23]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [24]:
!head *unMeth

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
1 208521 208523 0.000000
1 208524 208526 0.000000
1 212911 212913 0.000000
1 213160 213162 0.000000
1 213207 213209 0.000000
1 213247 213249 0.000000
1 213252 213254 0.000000
1 217269 217271 0.000000
1 217349 217351 10.000000
1 238052 238054 0.000000

==> Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
1 6453 6455 0.000000
1 6484 6486 0.000000
1 221750 221752 10.000000
1 223373 223375 0.000000
1 229852 229854 0.000000
1 230363 230365 0.000000
1 230742 230744 0.000000
1 230854 230856 0.000000
1 232150 232152 0.000000
1 233751 233753 0.000000

==> Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
1 5243 5245 10.000000
1 5296 5298 0.000000
1 5332 5334 0.000000
1 5343 5345 0.000000
1 5368 5370 8.333333
1 5665 5667 0.000000
1 5737 5739 0.000000
1 5749 5751 0.000000
1 5751 5753 0.000000
1 6320 6322 0.000000

==> Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==

In [25]:
!wc -l *unMeth

  392000 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
  387421 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 1456752 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2611964 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2066577 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2555422 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
   22640 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
   13435 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
    8548 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 9514759 total


##### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
|     10     	|    WGBS    	|        470893       	|         35488 (7.5%)        	|             43405 (9.2%)            	|         392000 (83.2%)        	|
|     11     	|    WGBS    	|       479520       	|         49640 (10.3%)       	|             42459 (8.9%)           	|         387421 (80.8%)        	|
|     12     	|    WGBS    	|        1756997       	|         173085 (9.9%)       	|             127160 (7.2%)           	|         1456752 (82.9%)         	|
|     13     	|    RRBS    	|      2945967      	|       204797 (7.0%)       	|            129206 (4.4%)           	|        2611964 (88.7%)      	|
|     14     	|    RRBS    	|      2310457      	|       136505 (5.9%)       	|            107375 (4.6%)          	|        2066577 (89.4%)       	|
|     15     	|    RRBS    	|      2874355      	|       172957 (6.0%)       	|            145976 (5.1%)          	|        2555422 (88.9%)       	|
|     16     	|  MBD-BSSeq 	|        44091       	|        16168 (36.7%)        	|             5283 (12.0%)             	|         22640 (51.3%)        	|
|     17     	|  MBD-BSSeq 	|        21797       	|         6080 (27.9%)        	|             2282 (10.5%)            	|         13435 (61.6%)        	|
|     18     	|  MBD-BSSeq 	|        14818       	|         4837 (32.6%)       	|             1433 (9.7%)            	|         8548 (57.7%)        	|

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [27]:
%%bash

for f in *bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  470893 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
   35488 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
   43405 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
  392000 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
  479520 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
   49640 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
   42459 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
  387421 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 1756997 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
  173085 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
  127160 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 1456752 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 2945967 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
  204797 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
  129206 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-

In [28]:
#Confirm BEDfile creation
!find *.bed

Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedg

In [30]:
#Confirm file creation
!head Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed

1	208521	208523
1	208524	208526
1	212911	212913
1	213160	213162
1	213207	213209
1	213247	213249
1	213252	213254
1	217269	217271
1	217349	217351
1	238052	238054


#### 4b. Genes

In [49]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcGenes
done

In [50]:
#Check output
!head *mcGenes

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes <==
1	789544	789546	1	AUGUSTUS	gene	789380	790334	0.68	-	.	g21600
1	789590	789592	1	AUGUSTUS	gene	789380	790334	0.68	-	.	g21600
1	876587	876589	1	AUGUSTUS	gene	875739	924977	0.37	+	.	g21603
1	876606	876608	1	AUGUSTUS	gene	875739	924977	0.37	+	.	g21603
1	1267992	1267994	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1269495	1269497	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1269508	1269510	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1269532	1269534	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1373604	1373606	1	AUGUSTUS	gene	1361058	1381745	1	-	.	g21633
1	1387604	1387606	1	AUGUSTUS	gene	1385571	1392430	1	+	.	g21634

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes <==
1	437346	437348	1	AUGUSTUS	gene	435136	440238	0.92	+	.	g21564
1	437349	437351	1	AUGUSTUS	gene	435136	440238	0.92	+	.	g21564
1	527701	527703	1	AUGUSTUS	gene	524298	537113	1	-	.	g21573
1	527706	527708	1	AUGUST

6	2130399	2130401	6	AUGUSTUS	gene	2129874	2130850	0.47	-	.	g22813
6	2130495	2130497	6	AUGUSTUS	gene	2129874	2130850	0.47	-	.	g22813
6	2130504	2130506	6	AUGUSTUS	gene	2129874	2130850	0.47	-	.	g22813
7	1957296	1957298	7	AUGUSTUS	gene	1948122	1976462	0.84	-	.	g23037
7	1957322	1957324	7	AUGUSTUS	gene	1948122	1976462	0.84	-	.	g23037
9	23708	23710	9	AUGUSTUS	gene	23704	24276	1	+	.	g23074
9	23768	23770	9	AUGUSTUS	gene	23704	24276	1	+	.	g23074
9	23821	23823	9	AUGUSTUS	gene	23704	24276	1	+	.	g23074

==> Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes <==
1	832598	832600	1	AUGUSTUS	gene	820948	851855	0.36	-	.	g21601
1	2608371	2608373	1	AUGUSTUS	gene	2548946	2617412	0.39	+	.	g21721
4	811590	811592	4	AUGUSTUS	gene	784630	817728	1	-	.	g22330
4	1192766	1192768	4	AUGUSTUS	gene	1187854	1193900	0.92	+	.	g22353
4	2291584	2291586	4	AUGUSTUS	gene	2287083	2295982	0.49	+	.	g22430
4	2291603	2291605	4	AUGUSTUS	gene	2287083	2295982	0.49	+	.	g22430
4	2291633	2291635	4	AUGUSTU

In [51]:
#Count number of overlaps
!wc -l *mcGenes

   17007 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
   15647 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes
  127448 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes
  160102 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcGenes
   23163 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
   15042 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes
  128305 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes
  166510 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcGenes
   82744 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
   44107 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes
  507401 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes
  634252 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcGenes
   99753 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
   46167 Meth13_

#### 4c. Coding Sequences (CDS)

In [52]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \
  > ${f}-mcCDS
done

In [53]:
#Check output
!head *mcCDS

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS <==
1	789544	789546	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	789590	789592	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	1390109	1390111	1	AUGUSTUS	CDS	1390067	1390192	1	+	0	transcript_id "g21634.t1"; gene_id "g21634";
1	1409734	1409736	1	AUGUSTUS	CDS	1409580	1409782	1	-	2	transcript_id "g21635.t1"; gene_id "g21635";
1	1409778	1409780	1	AUGUSTUS	CDS	1409580	1409782	1	-	2	transcript_id "g21635.t1"; gene_id "g21635";
1	1425319	1425321	1	AUGUSTUS	CDS	1425203	1425363	1	+	2	transcript_id "g21636.t1"; gene_id "g21636";
1	1556613	1556615	1	AUGUSTUS	CDS	1556557	1556755	1	+	0	transcript_id "g21645.t1"; gene_id "g21645";
1	1835566	1835568	1	AUGUSTUS	CDS	1835411	1835679	1	+	2	transcript_id "g21667.t1"; gene_id "g21667";
1	1851774	1851776	1	AUGUSTUS	CDS	1851564	1852055	1	-	0	transcript_id "g21670.t1"; gene_id "g21670";
1	1851897	1851899	1	

In [54]:
#Count number of overlaps
!wc -l *mcCDS

    5918 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
    7652 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcCDS
   50048 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcCDS
   63618 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcCDS
    8085 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
    7257 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcCDS
   51737 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcCDS
   67079 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcCDS
   25530 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
   16812 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcCDS
  166759 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcCDS
  209101 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcCDS
   14708 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
   10382 Meth13_R1_001_val_1_bismark_bt2_p

#### 4d. Introns

In [55]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \
  > ${f}-mcIntrons
done

In [56]:
#Check output
!head *mcIntrons

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons <==
1	876587	876589	1	AUGUSTUS	intron	875789	877365	1	+	.	transcript_id "g21603.t1"; gene_id "g21603";
1	876606	876608	1	AUGUSTUS	intron	875789	877365	1	+	.	transcript_id "g21603.t1"; gene_id "g21603";
1	1267992	1267994	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1269495	1269497	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1269508	1269510	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1269532	1269534	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1373604	1373606	1	AUGUSTUS	intron	1372665	1373781	1	-	.	transcript_id "g21633.t1"; gene_id "g21633";
1	1387604	1387606	1	AUGUSTUS	intron	1385589	1390066	1	+	.	transcript_id "g21634.t1"; gene_id "g21634";
1	1390056	1390058	1	AUGUSTUS	intron	1385589	1390066	1	+	.	transcript_id "g21634.t1"; gene_id "g2163

In [57]:
#Count number of overlaps
!wc -l *mcIntrons

   11106 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
    8003 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntrons
   77481 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntrons
   96590 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntrons
   15088 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
    7793 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntrons
   76641 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntrons
   99522 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntrons
   57278 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
   27323 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntrons
  340957 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntrons
  425558 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntrons
   85069 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-

#### 4e. Intergenic

In [43]:
%%bash 

for f in *Mcap*bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcIntergenic
done

In [44]:
#Check output
!head *mcIntergenic

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic <==
16	1095928	1095930
16	1095987	1095989
16	1095993	1095995
16	1937017	1937019
26	1550319	1550321
26	1550325	1550327
26	1550327	1550329
28	407411	407413
28	407920	407922
29	601463	601465

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcIntergenic <==
4	2282559	2282561
4	2653035	2653037
10	1222877	1222879
10	1239917	1239919
10	1240019	1240021
13	2124876	2124878
14	495499	495501
16	593911	593913
16	1095907	1095909
16	1095913	1095915

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcIntergenic <==
2	1779837	1779839
2	1779853	1779855
2	1779898	1779900
2	1804233	1804235
2	1804297	1804299
2	1804303	1804305
2	1804306	1804308
2	1804322	1804324
2	1804347	1804349
2	1804359	1804361

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcIntergenic <==
2	1779837	1779839
2	1779853	1779855
2	

In [45]:
#Count number of overlaps
!wc -l *mcIntergenic

     209 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic
     977 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcIntergenic
    6386 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcIntergenic
    7572 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcIntergenic
     369 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic
    1198 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcIntergenic
    7553 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcIntergenic
    9120 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcIntergenic
     220 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic
     941 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.

#### Summary

##### Overlaps with Genes

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     10     	|    WGBS    	|        160102       	|         17007 (10.6%)        	|             15647 (9.8%)             	|         127448 (79.6%)         	|
|     11     	|    WGBS    	|        166510       	|         23163 (13.9%)        	|             15042 (9.0%)             	|         128305 (77.1%)         	|
|     12     	|    WGBS    	|        634252        	|         82744 (13.0%)       	|             44107 (7.0%)            	|         507401 (80.0%)        	|
|     13     	|    RRBS    	|      988858      	|       99753 (10.1%)      	|            46167 (4.7%)          	|        842938 (85.2%)       	|
|     14     	|    RRBS    	|       780718      	|       67014 (8.6%)      	|            39102 (5.0%)          	|        674602 (86.4%)       	|
|     15     	|    RRBS    	|       964930      	|       86291 (8.9%)      	|             53429 (5.5%)          	|        825210 (85.5%)       	|
|     16     	|  MBD-BSSeq 	|        11499       	|        6390 (55.6%)       	|             1215 (10.6%)            	|          3894 (33.9%)        	|
|     17     	|  MBD-BSSeq 	|        5127       	|         2373 (46.3%)       	|              511 (10.0%)            	|          2243 (43.7%)        	|
|     18     	|  MBD-BSSeq 	|        3278       	|         2018 (61.6%)       	|              165 (5.0%)            	|          1095 (33.4%)        	|

##### Overlaps with Coding Sequences (CDS)

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     10     	|    WGBS    	|        63618       	|         5918 (9.3%)        	|             7652 (12.0%)             	|         50048 (78.7%)         	|
|     11     	|    WGBS    	|        67079       	|         8085 (12.1%)        	|             7257 (10.8%)             	|         51737 (77.1%)         	|
|     12     	|    WGBS    	|        209101        	|         25530 (12.2%)       	|             16812 (8.0%)            	|         166759 (79.8%)        	|
|     13     	|    RRBS    	|      190252      	|       14708 (7.7%)      	|            10382 (5.5%)          	|        165162 (86.8%)       	|
|     14     	|    RRBS    	|       151699      	|       9279 (6.1%)      	|            8654 (5.7%)          	|        133766 (8.8%)       	|
|     15     	|    RRBS    	|       186877      	|       11132 (6.0%)      	|             12318 (6.6%)          	|        163427 (87.5%)       	|
|     16     	|  MBD-BSSeq 	|        4664       	|        2046 (43.9%)       	|             623 (13.4%)            	|          1995 (42.8%)        	|
|     17     	|  MBD-BSSeq 	|        2307       	|         758 (32.9%)       	|              220 (9.5%)            	|          1329 (57.6%)        	|
|     18     	|  MBD-BSSeq 	|        1282       	|         750 (58.5)       	|              93 (7.3%)            	|          439 (34.2%)        	|

##### Overalps with Introns

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     10     	|    WGBS    	|        96590       	|         11106 (11.5%)        	|             8003 (8.3%)             	|         77481 (80.2%)         	|
|     11     	|    WGBS    	|        99522       	|         15088 (15.2%)        	|             7793 (7.8%)             	|         76641 (77.0%)         	|
|     12     	|    WGBS    	|        425558        	|         57278 (13.5%)       	|             27323 (6.4%)            	|         340957 (80.1%)        	|
|     13     	|    RRBS    	|      798984      	|       85069 (10.6%)      	|            35795 (4.5%)          	|        678120 (84.9%)       	|
|     14     	|    RRBS    	|       629356      	|       57748 (9.2%)      	|            30463 (4.8%)          	|        541145 (86.0%)       	|
|     15     	|    RRBS    	|       778492      	|       75177 (9.7%)      	|             41135 (5.3%)          	|        662180 (85.1%)       	|
|     16     	|  MBD-BSSeq 	|        6841       	|        4347 (63.5%)       	|             595 (8.7%)            	|          1899 (27.8%)        	|
|     17     	|  MBD-BSSeq 	|        2824       	|         1618 (57.3%)       	|              291 (10.3%)            	|          915 (32.4%)        	|
|     18     	|  MBD-BSSeq 	|        2000       	|         1270 (63.5%)       	|              73 (3.7%)            	|          657 (32.9%)        	|

##### Overlaps with Integenic regions

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
|     10     	|    WGBS    	|        7572       	|         209 (%)        	|             977 (%)             	|         6386 (%)         	|
|     11     	|    WGBS    	|        9120       	|         369 (%)        	|             1198 (%)             	|         7553 (%)         	|
|     12     	|    WGBS    	|        7812        	|         220 (%)       	|             941 (%)            	|         6651 (%)        	|
|     13     	|    RRBS    	|      797611      	|       90640 (%)      	|            78930 (%)          	|        628041 (%)       	|
|     14     	|    RRBS    	|       663746      	|       80696 (%)      	|            72762 (%)          	|        510288 (%)       	|
|     15     	|    RRBS    	|       702827      	|       85205 (%)      	|             76752 (%)          	|        540870 (%)       	|
|     16     	|  MBD-BSSeq 	|        5785       	|        1977 (%)       	|             739 (%)            	|          3069 (%)        	|
|     17     	|  MBD-BSSeq 	|        3914       	|         657 (%)       	|              596 (%)            	|          2661 (%)        	|
|     18     	|  MBD-BSSeq 	|        3131       	|         652 (%)       	|              497 (%)            	|          1982 (%)        	|

## *P. acuta*

In [6]:
cd ..

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


In [65]:
#Make a directory for Pact output
#!mkdir Pact

In [7]:
cd Pact/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation/Pact


### 1. Characterize CG motif locations in feature tracks

#### 1a. Set variable paths

In [11]:
paGenes = "../../../genome-feature-files/Pact.GFFannotation.Genes.gff"

In [9]:
paCDS = "../../../genome-feature-files/Pact.GFFannotation.CDS.gff"

In [11]:
paIntron = "../../../genome-feature-files/Pact.GFFannotation.Intron.gff"

In [12]:
paCGMotifs = "../../../genome-feature-files/Pact_CpG.gff"

#### 1b. Check variable paths

In [12]:
!head {paGenes}
!wc -l {paGenes}

scaffold6_cov64	AUGUSTUS	gene	1	5652	0.46	-	.	g1
scaffold6_cov64	AUGUSTUS	gene	5805	6678	0.57	+	.	g2
scaffold7_cov100	AUGUSTUS	gene	1	2566	0.96	+	.	g3
scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	AUGUSTUS	gene	13339	15463	0.92	-	.	g7
scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	AUGUSTUS	gene	18586	19270	0.99	-	.	g9
scaffold7_cov100	AUGUSTUS	gene	19312	20050	0.74	+	.	g10
   64558 ../../../genome-feature-files/Pact.GFFannotation.Genes.gff


In [18]:
!head {paCDS}
!wc -l {paCDS}

scaffold6_cov64	AUGUSTUS	CDS	495	842	0.84	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1208	1555	0.92	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1922	2269	1	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5583	5652	0.26	-	0	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	495	842	0.84	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1208	1555	0.92	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1922	2269	1	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	4754	4851	0.4	-	1	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5594	5652	0.54	-	0	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5805	5838	0.98	+	0	transcript_id "g2.t1"; gene_id "g2";
  318484 ../genome-feature-files/Pact.GFFannotation.CDS.gff


In [19]:
!head {paIntron}
!wc -l {paIntron}

scaffold6_cov64	AUGUSTUS	intron	1	494	0.82	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	843	1207	0.92	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1556	1921	1	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	2270	5582	0.23	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1	494	0.82	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	843	1207	0.92	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1556	1921	1	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	2270	4753	0.4	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	4852	5593	0.48	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	5839	5945	0.54	+	.	transcript_id "g2.t1"; gene_id "g2";
  241534 ../genome-feature-files/Pact.GFFannotation.Intron.gff


In [20]:
!head {paCGMotifs}
!wc -l {paCGMotifs}

##gff-version 2.0
##date 2020-03-29
##Type DNA scaffold1_cov55
scaffold1_cov55	fuzznuc	misc_feature	23	24	2.000	+	.	Sequence "scaffold1_cov55.1" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	35	36	2.000	+	.	Sequence "scaffold1_cov55.2" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	50	51	2.000	+	.	Sequence "scaffold1_cov55.3" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	85	86	2.000	+	.	Sequence "scaffold1_cov55.4" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	93	94	2.000	+	.	Sequence "scaffold1_cov55.5" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	103	104	2.000	+	.	Sequence "scaffold1_cov55.6" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	106	107	2.000	+	.	Sequence "scaffold1_cov55.7" ; note "*pat pattern1"
 9639415 ../genome-feature-files/Pact_CpG.gff


#### 1c. Characterize overlaps with `bedtools`

In [26]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paGenes} \
| wc -l
!echo "CG motif overlaps with genes"

 3434720
CG motif overlaps with genes


In [27]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paCDS} \
| wc -l
!echo "CG motif overlaps with CDS"

 1455630
CG motif overlaps with CDS


In [28]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paIntron} \
| wc -l
!echo "CG motif overlaps with introns"

 1999490
CG motif overlaps with introns


In [29]:
!{bedtoolsDirectory}intersectBed \
-v \
-a {paCGMotifs} \
-b {paGenes} \
| wc -l
!echo "CG motif overlaps that do not overlap with genes (i.e. intergenic regions)"

 5720900
CG motif overlaps that do not overlap with genes (i.e. intergenic regions)


#### 1d. Summary

| *P. acuta* Genome Feature 	| **Number individual features** 	| **Overlaps with CG Motifs** 	| **% Total CG Motifs** 	|
|:-------------------------------:	|:------------------------------:	|:---------------------------:	|:---------------------:	|
|              Genes              	|              64558             	|           3434720           	|          35.6         	|
|         Coding Sequences        	|             318484             	|           1455630           	|          15.1         	|
|             Introns             	|             241534             	|           1999490           	|          20.7         	|
|        Intergenic Regions       	|               N/A              	|           5720900           	|          59.3         	|

### 2. Download coverage files

In [8]:
#Download Pact WGBS and MBD-BS 10x sample bedgraphs
#!wget -r -l1 --no-parent -A "*10x.bedgraph" LINK

--2020-03-09 13:34:14--  https://gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  49.55K  --.-KB/s    in 0.004s  

2020-03-09 13:34:16 (10.8 MB/s) - ‘gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/index.html.tmp’ saved [50740]

Loading robots.txt; please ignore errors.
--2020-03-09 13:34:16--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-03-09 13:34:16 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/tmp/Mcap_full-u1M/dedup/index.html.tmp since it should be rejected.

--2020-03-09 13:34:16--  https://gannet.fish.washing

In [9]:
#Move samples from directory structure on gannet to cd
#!mv LINK/* .

In [10]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [18]:
#Download Pact RRBS 10x sample bedgraphs
#!wget -r -l1 --no-parent -A "*10x.bedgraph" LINK

--2020-03-09 13:36:12--  https://gannet.fish.washington.edu/tmp/nodedup-r2/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/tmp/nodedup-r2/index.html.tmp’

gannet.fish.washing     [ <=>                ]  20.50K  --.-KB/s    in 0.001s  

2020-03-09 13:36:12 (34.1 MB/s) - ‘gannet.fish.washington.edu/tmp/nodedup-r2/index.html.tmp’ saved [20990]

Loading robots.txt; please ignore errors.
--2020-03-09 13:36:12--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-03-09 13:36:12 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/tmp/nodedup-r2/index.html.tmp since it should be rejected.

--2020-03-09 13:36:12--  https://gannet.fish.washington.edu/tmp/nodedup-r2/?C=N;O=D
Reus

In [19]:
#Move samples from directory structure on gannet to cd
#!mv LINK/* .

In [20]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [21]:
!find *bedgraph

*merged_CpG_evidence.cov_10x.bedgraph
Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph
Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph
Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph
Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph
Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph
Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph
Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph
Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph
Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph


### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [25]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [26]:
!head *Meth

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth <==
9 525025 525027 50.000000
16 1095928 1095930 50.000000
16 1095987 1095989 54.545455
16 1095993 1095995 66.666667
16 1937017 1937019 53.846154
26 1550319 1550321 61.538462
26 1550325 1550327 64.285714
26 1550327 1550329 53.846154
28 407411 407413 50.000000
28 407920 407922 50.000000

==> Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth <==
2 81840 81842 100.000000
2 81845 81847 100.000000
2 81865 81867 92.857143
2 81900 81902 91.304348
2 81997 81999 100.000000
2 82001 82003 100.000000
2 82009 82011 100.000000
2 82292 82294 70.000000
2 82298 82300 100.000000
2 2675474 2675476 74.074074

==> Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth <==
7 87179 87181 63.636364
7 2064280 2064282 100.000000
10 732017 732019 60.000000
10 732020 732022 100.000000
10 1190063 1190065 83.333333
10 1190118 1190120 95.454545
10 1190131 11

In [27]:
!wc -l *Meth

     493 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     827 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     370 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
  170838 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-Meth
  153150 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-Meth
  159484 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-Meth
    2828 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     956 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
     942 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth
  489888 total


##### Sparsely methylated loci

In [28]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [29]:
!head *sparseMeth

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth <==
4 2282559 2282561 17.647059
4 2653035 2653037 20.000000
6 479467 479469 18.181818
10 1079591 1079593 20.000000
10 1222877 1222879 20.000000
10 1239917 1239919 15.384615
10 1240019 1240021 20.000000
13 2124876 2124878 18.181818
14 495499 495501 35.294118
16 593911 593913 16.666667

==> Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth <==
2 2675448 2675450 16.666667
2 2675927 2675929 20.000000
2 2679017 2679019 41.666667
2 2679956 2679958 16.666667
2 2680034 2680036 18.750000
2 2680170 2680172 42.105263
2 2680234 2680236 28.571429
4 2282559 2282561 14.285714
4 2933639 2933641 16.666667
10 1079949 1079951 27.272727

==> Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth <==
4 2282559 2282561 11.764706
4 2282561 2282563 11.111111
4 2762159 2762161 11.111111
4 2762294 2762296 11.111111
5 1381724 1381726 15.3

In [30]:
!wc -l *sparseMeth

    1116 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
    1422 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
    1043 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
  141204 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-sparseMeth
  132046 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-sparseMeth
  138386 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-sparseMeth
     840 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
     650 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
     532 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth
  417239 total


##### Unmethylated loci

In [31]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [32]:
!head *unMeth

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth <==
2 1779837 1779839 0.000000
2 1779853 1779855 0.000000
2 1779898 1779900 0.000000
2 1804233 1804235 0.000000
2 1804297 1804299 0.000000
2 1804303 1804305 4.000000
2 1804306 1804308 0.000000
2 1804322 1804324 0.000000
2 1804347 1804349 7.142857
2 1804359 1804361 0.000000

==> Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth <==
1 893230 893232 0.000000
1 893246 893248 10.000000
1 893258 893260 0.000000
1 893276 893278 0.000000
2 2437213 2437215 0.000000
2 2437231 2437233 0.000000
2 2675538 2675540 6.896552
2 2675971 2675973 0.000000
2 2675985 2675987 0.000000
2 2676029 2676031 0.000000

==> Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth <==
1 1620404 1620406 9.090909
2 1704736 1704738 0.000000
2 1805539 1805541 0.000000
2 1805567 1805569 0.000000
2 1806267 1806269 0.000000
2 1806331 1806333 0.000000
2 1806337 1

In [33]:
!wc -l *unMeth

    7490 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    8841 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    7553 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
 1091170 Meth13.fastp-trim.202003062040_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-unMeth
  890583 Meth14.fastp-trim.202003064415_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-unMeth
  938884 Meth15.fastp-trim.202003065503_R1_001_bismark_bt2_pe_10x.bedgraph-Mcap-unMeth
    3562 Meth16.fastp-trim.202003062412_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    3170 Meth17.fastp-trim.202003063731_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
    2265 Meth18.fastp-trim.202003065117_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth
 2953518 total


##### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
|     1     	|    WGBS    	|               	|         493        	|             1116            	|         7490         	|
|     2     	|    WGBS    	|              	|         827        	|             1422            	|         8841         	|
|     3     	|    WGBS    	|               	|         370        	|             1043            	|         7553         	|
|     4     	|    RRBS    	|            	|       170838       	|            141204           	|        1091170       	|
|     5     	|    RRBS    	|            	|       153150       	|            132046           	|        891583        	|
|     6     	|    RRBS    	|            	|       159484       	|            138386           	|        938884        	|
|     7     	|  MBD-BSSeq 	|        7230       	|        2828        	|             840             	|         3562         	|
|     8     	|  MBD-BSSeq 	|        4776       	|         956        	|             650             	|         3170         	|
|     9     	|  MBD-BSSeq 	|        3739       	|         942        	|             532             	|         2265         	|

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [34]:
%%bash

for f in *bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

    9099 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
     493 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
    1116 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
    7490 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
   11090 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
     827 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
    1422 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
    8841 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
    8966 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
     370 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
    1043 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
   

In [38]:
#Confirm BEDfile creation
!find *.bed

Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed
Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed
Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed
Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed
Meth13.

In [39]:
#Confirm file creation
!head FILE

2	1779837	1779839
2	1779853	1779855
2	1779898	1779900
2	1804233	1804235
2	1804297	1804299
2	1804303	1804305
2	1804306	1804308
2	1804322	1804324
2	1804347	1804349
2	1804359	1804361


#### 4b. Genes

In [40]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ./../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paGenes
done

In [41]:
#Check output
!head *paGenes

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes <==
9	525025	525027	9	AUGUSTUS	gene	519464	527530	0.19	+	.	g23129
9	525025	525027	9	AUGUSTUS	intron	523974	527201	0.35	+	.	transcript_id "g23129.t1"; gene_id "g23129";
161	892370	892372	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892370	892372	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	892400	892402	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892400	892402	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	892671	892673	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892671	892673	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	893122	893124	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	893122	893124	161	AUGUSTUS	CDS	893070	893203	1	+	0	transcript_id "g38942.t1"; gene_id "g38942";

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2

In [42]:
#Count number of overlaps
!wc -l *paGenes

     494 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes
     263 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcGenes
    1969 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcGenes
    2726 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcGenes
     760 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes
     422 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcGenes
    2313 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcGenes
    3495 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcGenes
     294 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes
     182 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcGenes
    1661

#### 4c. Coding Sequences (CDS)

In [30]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \
  > ${f}-paCDS
done

Error: Unable to open file *Mcap*bed. Exiting.


CalledProcessError: Command 'b'\nfor f in *Mcap*bed\ndo\n  /usr/local/bin/intersectBed \\\n  -wb \\\n  -a ${f} \\\n  -b ../../genome-feature-files/Mcap.GFFannotation.CDS.gff \\\n  > ${f}-mcCDS\ndone\n'' returned non-zero exit status 1.

In [41]:
#Check output
!head *paCDS

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes <==
9	525025	525027	9	AUGUSTUS	gene	519464	527530	0.19	+	.	g23129
9	525025	525027	9	AUGUSTUS	intron	523974	527201	0.35	+	.	transcript_id "g23129.t1"; gene_id "g23129";
161	892370	892372	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892370	892372	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	892400	892402	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892400	892402	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	892671	892673	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892671	892673	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	893122	893124	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	893122	893124	161	AUGUSTUS	CDS	893070	893203	1	+	0	transcript_id "g38942.t1"; gene_id "g38942";

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2

In [None]:
#Count number of overlaps
!wc -l *paCDS

#### 4d. Introns

In [30]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \
  > ${f}-paIntron
done

Error: Unable to open file *Mcap*bed. Exiting.


CalledProcessError: Command 'b'\nfor f in *Mcap*bed\ndo\n  /usr/local/bin/intersectBed \\\n  -wb \\\n  -a ${f} \\\n  -b ../../genome-feature-files/Mcap.GFFannotation.CDS.gff \\\n  > ${f}-mcCDS\ndone\n'' returned non-zero exit status 1.

In [41]:
#Check output
!head *paIntron

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes <==
9	525025	525027	9	AUGUSTUS	gene	519464	527530	0.19	+	.	g23129
9	525025	525027	9	AUGUSTUS	intron	523974	527201	0.35	+	.	transcript_id "g23129.t1"; gene_id "g23129";
161	892370	892372	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892370	892372	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	892400	892402	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892400	892402	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	892671	892673	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	892671	892673	161	AUGUSTUS	intron	892194	893069	1	+	.	transcript_id "g38942.t1"; gene_id "g38942";
161	893122	893124	161	AUGUSTUS	gene	887972	894174	0.12	+	.	g38942
161	893122	893124	161	AUGUSTUS	CDS	893070	893203	1	+	0	transcript_id "g38942.t1"; gene_id "g38942";

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2

In [42]:
#Count number of overlaps
!wc -l *paIntron

     494 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes
     263 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcGenes
    1969 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcGenes
    2726 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcGenes
     760 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes
     422 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcGenes
    2313 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcGenes
    3495 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcGenes
     294 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcGenes
     182 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcGenes
    1661

#### 4e. Intergenic

In [43]:
%%bash 

for f in *Mcap*bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paIntergenic
done

In [44]:
#Check output
!head *paIntergenic

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic <==
16	1095928	1095930
16	1095987	1095989
16	1095993	1095995
16	1937017	1937019
26	1550319	1550321
26	1550325	1550327
26	1550327	1550329
28	407411	407413
28	407920	407922
29	601463	601465

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcIntergenic <==
4	2282559	2282561
4	2653035	2653037
10	1222877	1222879
10	1239917	1239919
10	1240019	1240021
13	2124876	2124878
14	495499	495501
16	593911	593913
16	1095907	1095909
16	1095913	1095915

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcIntergenic <==
2	1779837	1779839
2	1779853	1779855
2	1779898	1779900
2	1804233	1804235
2	1804297	1804299
2	1804303	1804305
2	1804306	1804308
2	1804322	1804324
2	1804347	1804349
2	1804359	1804361

==> Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcIntergenic <==
2	1779837	1779839
2	1779853	1779855
2	

In [45]:
#Count number of overlaps
!wc -l *paIntergenic

     209 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic
     977 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcIntergenic
    6386 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcIntergenic
    7572 Meth10.fastp-trim.202003063900_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcIntergenic
     369 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic
    1198 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-sparseMeth.bed-mcIntergenic
    7553 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-unMeth.bed-mcIntergenic
    9120 Meth11.fastp-trim.202003065734_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap.bed-mcIntergenic
     220 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.bedgraph-Mcap-Meth.bed-mcIntergenic
     941 Meth12.fastp-trim.202003060645_R1_001_bismark_bt2_pe._10x.

#### Summary

##### Overlaps with Genes

##### Overlaps with Coding Sequences (CDS)


##### Overalps wtih Introns

##### Overlaps with Integenic regions