# Characterizing CpG Methylation (union bedgraphs with 5x data)

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome. I will use [union bedgraphs](https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/) with 5x data.

1. Download union bedgraphs and format for downstream analyses
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type

## 0. Set working directory and install programs

In [1]:
!pwd

/Users/yaamini/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaamini/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation-5x-Union

In [4]:
cd Characterizing-CpG-Methylation-5x-Union/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union


In [33]:
#Install pandas for this notebook
import pandas as pd
print(pd.__version__)

0.18.1


## *M. capitata*

In [5]:
#Make a directory for Mcap output
#!mkdir Mcap

In [6]:
cd Mcap/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Mcap


### 1. Format data

#### 1a. Download bedgraph

In [13]:
#Download Mcap 5x union bedgraph
!wget https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Mcap_union_5x.bedgraph

--2020-05-06 10:13:09--  https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Mcap_union_5x.bedgraph
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862402537 (822M)
Saving to: ‘Mcap_union_5x.bedgraph’


2020-05-06 10:13:20 (78.1 MB/s) - ‘Mcap_union_5x.bedgraph’ saved [862402537/862402537]



In [27]:
#Check downloaded file
#WGBS: 10-12
#RRBS: 14-16
#MBD-BS: 16-18
!tail Mcap_union_5x.bedgraph

3043	18674	18676	12.500000	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
3043	18688	18690	7.692308	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A
3043	18720	18722	8.333333	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18734	18736	8.333333	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18740	18742	0.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18755	18757	11.111111	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18784	18786	20.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18801	18803	0.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18822	18824	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A
3043	18827	18829	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A


#### 1b. Manipulate with `pandas`

In [38]:
#Import union data into pandas
#Check head
df = pd.read_table("Mcap_union_5x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,10,11,12,13,14,15,16,17,18
0,1,3493,3495,,,,0.0,,0.0,,,
1,1,3518,3520,,,,0.0,,0.0,,,
2,1,3727,3729,,,,0.0,0.0,8.695652,,,
3,1,3752,3754,,,,0.0,0.0,0.0,,,
4,1,3757,3759,,,,0.0,0.0,0.0,,,


In [43]:
#Average the first three columns for WGBS information and save as a new column
#Average the middle three columns for RRBS information and save as a new column
#Average the last three columns for MBD-BS information and save as a new column
#Check output
df['WGBS'] = df[['10', '11', "12"]].mean(axis=1)
df['RRBS'] = df[['13', '14', "15"]].mean(axis=1)
df['MBD-BS'] = df[['16', '17', "18"]].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,10,11,12,13,14,15,16,17,18,WGBS,RRBS,MBD-BS
13340258,3043,18674,18676,12.5,,0.0,,,,,,,6.25,,
13340259,3043,18688,18690,7.692308,0.0,0.0,,,,,0.0,,2.564103,,0.0
13340260,3043,18720,18722,8.333333,0.0,0.0,,,,,0.0,0.0,2.777778,,0.0
13340261,3043,18734,18736,8.333333,0.0,0.0,,,,,,0.0,2.777778,,0.0
13340262,3043,18740,18742,0.0,0.0,0.0,,,,,,0.0,0.0,,0.0
13340263,3043,18755,18757,11.111111,0.0,0.0,,,,,,0.0,3.703704,,0.0
13340264,3043,18784,18786,20.0,0.0,0.0,,,,,0.0,0.0,6.666667,,0.0
13340265,3043,18801,18803,0.0,0.0,0.0,,,,,0.0,0.0,0.0,,0.0
13340266,3043,18822,18824,,0.0,0.0,,,,,0.0,,0.0,,0.0
13340267,3043,18827,18829,,0.0,0.0,,,,,0.0,,0.0,,0.0


In [87]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("Mcap_union_5x-averages.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

#### 1c. Separate methods into new bedgraphs

In [88]:
#Check pandas manipulations
!head Mcap_union_5x-averages.bedgraph

	chrom	start	end	10	11	12	13	14	15	16	17	18	WGBS	RRBS	MBD-BS
0	1	3493	3495	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A
1	1	3518	3520	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A
2	1	3727	3729	N/A	N/A	N/A	0.0	0.0	8.695652	N/A	N/A	N/A	N/A	2.898550666666667	N/A
3	1	3752	3754	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
4	1	3757	3759	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
5	1	3770	3772	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
6	1	4062	4064	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
7	1	4069	4071	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
8	1	4077	4079	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A


In [145]:
#Remove header
#Keep chr, start, end, and WGBS average (col 2-4, 13)
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $13}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-WGBS.bedgraph

In [146]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-WGBS.bedgraph
!wc -l Mcap_union_5x-averages-WGBS.bedgraph

1	8113	8115	20.0
1	224609	224611	0.0
1	264560	264562	0.0
1	264598	264600	0.0
1	271145	271147	0.0
1	277994	277996	16.666667
1	278004	278006	0.0
1	278039	278041	0.0
1	278049	278051	0.0
1	278067	278069	0.0
  153392 Mcap_union_5x-averages-WGBS.bedgraph


In [147]:
#Remove header
#Keep chr, start, end, and RRBS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $14}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-RRBS.bedgraph

In [148]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-RRBS.bedgraph
!wc -l Mcap_union_5x-averages-RRBS.bedgraph

1	4062	4064	0.0
1	4069	4071	0.0
1	4077	4079	0.0
1	4086	4088	0.0
1	4146	4148	0.0
1	4150	4152	0.0
1	4155	4157	0.0
1	4172	4174	0.0
1	4184	4186	0.0
1	4190	4192	16.666667
 11509837 Mcap_union_5x-averages-RRBS.bedgraph


In [149]:
#Remove header
#Keep chr, start, end, and MBD-BS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $15}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-MBDBS.bedgraph

In [150]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-MBDBS.bedgraph
!wc -l Mcap_union_5x-averages-MBDBS.bedgraph

1	3493	3495	0.0
1	3518	3520	0.0
1	3727	3729	2.898550666666667
1	3752	3754	0.0
1	3757	3759	0.0
1	3770	3772	0.0
1	11876	11878	0.0
1	11887	11889	0.0
1	11894	11896	0.0
1	11941	11943	0.0
 3981450 Mcap_union_5x-averages-MBDBS.bedgraph


In [151]:
!find *averages-*bedgraph

Mcap_union_5x-averages-MBDBS.bedgraph
Mcap_union_5x-averages-RRBS.bedgraph
Mcap_union_5x-averages-WGBS.bedgraph


In [152]:
!wc -l *averages-*bedgraph > Mcap_union_5x-averages-counts.txt

### 2. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [153]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [158]:
!head *-Meth

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth <==
1 32228 32230 50.413223
1 58618 58620 95.65826333333332
1 58745 58747 96.819728
1 58764 58766 99.16666666666667
1 58792 58794 83.42830033333334
1 66041 66043 100.0
1 66050 66052 100.0
1 66339 66341 88.888889
1 66345 66347 77.777778
1 66354 66356 77.777778

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth <==
1 4948 4950 50.0
1 4967 4969 50.0
1 4986 4988 50.0
1 57065 57067 80.0
1 58609 58611 100.0
1 58618 58620 100.0
1 58745 58747 100.0
1 59207 59209 100.0
1 59277 59279 100.0
1 59393 59395 100.0

==> Mcap_union_5x-averages-WGBS.bedgraph-Meth <==
1 1002973 1002975 50.0
1 1343240 1343242 100.0
1 1343249 1343251 100.0
1 1343263 1343265 83.333333
1 1343265 1343267 100.0
1 1343295 1343297 100.0
1 1343304 1343306 100.0
1 1343320 1343322 100.0
1 1451821 1451823 60.0
1 1468323 1468325 100.0


In [155]:
!wc -l *-Meth

  329361 Mcap_union_5x-averages-MBDBS.bedgraph-Meth
 1350936 Mcap_union_5x-averages-RRBS.bedgraph-Meth
   29468 Mcap_union_5x-averages-WGBS.bedgraph-Meth
 1709765 total


In [159]:
!wc -l *-Meth > Mcap_union_5x-Meth-counts.txt

##### Sparsely methylated loci

In [160]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [161]:
!head *-sparseMeth

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth <==
1 15092 15094 30.0
1 21739 21741 13.636364000000002
1 34139 34141 11.764706
1 42261 42263 10.539216
1 45163 45165 10.31746
1 48370 48372 14.285714000000002
1 87492 87494 33.333333
1 89011 89013 14.285714000000002
1 101503 101505 17.380952
1 101545 101547 23.3333335

==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth <==
1 4190 4192 16.666667
1 4891 4893 33.333333
1 4910 4912 28.571429
1 4929 4931 16.6666665
1 5005 5007 28.571429
1 5024 5026 40.0
1 5151 5153 20.0
1 5160 5162 16.666667
1 5228 5230 11.111111
1 6282 6284 11.111111

==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth <==
1 8113 8115 20.0
1 277994 277996 16.666667
1 387294 387296 20.0
1 461787 461789 40.0
1 480696 480698 20.0
1 605019 605021 28.571429
1 605050 605052 33.333333
1 646162 646164 20.0
1 667790 667792 40.0
1 726420 726422 20.0


In [162]:
!wc -l *-sparseMeth

  220277 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth
 1155033 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth
   16793 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth
 1392103 total


In [163]:
!wc -l *-sparseMeth > Mcap_union_5x-sparseMeth-counts.txt

##### Unmethylated loci

In [164]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [165]:
!head *-unMeth

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth <==
1 3493 3495 0.0
1 3518 3520 0.0
1 3727 3729 2.898550666666667
1 3752 3754 0.0
1 3757 3759 0.0
1 3770 3772 0.0
1 11876 11878 0.0
1 11887 11889 0.0
1 11894 11896 0.0
1 11941 11943 0.0

==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth <==
1 4062 4064 0.0
1 4069 4071 0.0
1 4077 4079 0.0
1 4086 4088 0.0
1 4146 4148 0.0
1 4150 4152 0.0
1 4155 4157 0.0
1 4172 4174 0.0
1 4184 4186 0.0
1 5043 5045 0.0

==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth <==
1 224609 224611 0.0
1 264560 264562 0.0
1 264598 264600 0.0
1 271145 271147 0.0
1 278004 278006 0.0
1 278039 278041 0.0
1 278049 278051 0.0
1 278067 278069 0.0
1 280413 280415 0.0
1 280448 280450 0.0


In [166]:
!wc -l *-unMeth

 3431812 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth
 9003868 Mcap_union_5x-averages-RRBS.bedgraph-unMeth
  107131 Mcap_union_5x-averages-WGBS.bedgraph-unMeth
 12542811 total


In [167]:
!wc -l *-unMeth > Mcap_union_5x-unMeth-counts.txt

### 3. Characterize genomic locations of CpGs

#### 3a. Create BEDfiles

In [182]:
%%bash

for f in *averages-*bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 3981450 Mcap_union_5x-averages-MBDBS.bedgraph.bed
  329361 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed
  220277 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed
 3431812 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed
 11509837 Mcap_union_5x-averages-RRBS.bedgraph.bed
 1350936 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed
 1155033 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed
 9003868 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed
  153392 Mcap_union_5x-averages-WGBS.bedgraph.bed
   29468 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed
   16793 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed
  107131 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed


In [183]:
#Confirm file creation
!head Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed

1	3493	3495
1	3518	3520
1	3727	3729
1	3752	3754
1	3757	3759
1	3770	3772
1	11876	11878
1	11887	11889
1	11894	11896
1	11941	11943


#### 3b. Genes

In [185]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcGenes
done

In [186]:
#Check output
!head *mcGenes

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes <==
1	58618	58620	1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	58745	58747	1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	58764	58766	1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	58792	58794	1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	66041	66043	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	66050	66052	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	66339	66341	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	66345	66347	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	66354	66356	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	66400	66402	1	AUGUSTUS	gene	64466	84798	1	+	.	g21536

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes <==
1	42261	42263	1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	45163	45165	1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	48370	48372	1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	89011	89013	1	AUGUSTUS	gene	88347	97184	1	+	.	g21537
1	101503	101505	1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	101545	101547	1

In [187]:
#Count number of overlaps
!wc -l *mcGenes

  156521 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes
   77322 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes
 1073239 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes
 1307082 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes
  685807 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes
  400637 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcGenes
 3054154 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcGenes
 4140598 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcGenes
   12960 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcGenes
    4843 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcGenes
   32499 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcGenes
   50302 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcGenes
 10995964 total


In [188]:
!wc -l *mcGenes > Mcap_union_5x-mcGenes-counts.txt

#### 3c. Coding Sequences (CDS)

In [189]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \
  > ${f}-mcCDS
done

In [190]:
#Check output
!head *mcCDS

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcCDS <==
1	58618	58620	1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	58745	58747	1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	58764	58766	1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	58792	58794	1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	1174296	1174298	1	AUGUSTUS	CDS	1173914	1174302	0.48	-	1	transcript_id "g21623.t1"; gene_id "g21623";
1	1367668	1367670	1	AUGUSTUS	CDS	1367658	1367706	1	-	0	transcript_id "g21633.t1"; gene_id "g21633";
1	1432386	1432388	1	AUGUSTUS	CDS	1431855	1432904	1	-	0	transcript_id "g21638.t1"; gene_id "g21638";
1	1432398	1432400	1	AUGUSTUS	CDS	1431855	1432904	1	-	0	transcript_id "g21638.t1"; gene_id "g21638";
1	1432427	1432429	1	AUGUSTUS	CDS	1431855	1432904	1	-	0	transcript_id "g21638.t1"; gene_id "g21638";
1	1432441	1432443	1	AUGUSTUS	CDS	1431855	1432904	1	-	0	tra

In [191]:
#Count number of overlaps
!wc -l *mcCDS

   22660 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcCDS
   16593 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcCDS
  205283 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcCDS
  244536 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcCDS
  139018 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcCDS
  103408 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcCDS
  753113 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcCDS
  995539 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcCDS
    3992 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcCDS
    1907 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcCDS
   11720 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcCDS
   17619 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcCDS
 2515388 total


In [192]:
!wc -l *mcCDS > Mcap_union_5x-mcCDS-counts.txt

#### 3d. Introns

In [193]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \
  > ${f}-mcIntrons
done

In [194]:
#Check output
!head *mcIntrons

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntrons <==
1	66041	66043	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66050	66052	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66339	66341	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66345	66347	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66354	66356	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66400	66402	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66540	66542	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66543	66545	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66613	66615	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	66668	66670	1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gen

In [195]:
#Count number of overlaps
!wc -l *mcIntrons

  133901 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntrons
   60756 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntrons
  868445 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntrons
 1063102 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntrons
  547317 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntrons
  297449 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntrons
 2303015 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntrons
 3147781 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntrons
    8973 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntrons
    2942 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntrons
   20802 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntrons
   32717 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntrons
 8487200 total


In [196]:
!wc -l *mcIntrons > Mcap-5x-mcIntrons-counts.txt

#### 3e. Flanking regions

In [197]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.gff \
  > ${f}-mcFlanks
done

In [198]:
#Check output
!head *mcFlanks

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanks <==
1	147722	147724	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	147732	147734	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	147767	147769	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	147785	147787	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	147794	147796	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	147806	147808	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	788995	788997	1	AUGUSTUS	gene	788380	789379	0.68	-	.	g21600
1	1501390	1501392	1	AUGUSTUS	gene	1500581	1501580	0.99	+	.	g21643
1	1501390	1501392	1	AUGUSTUS	gene	1500925	1501924	0.93	-	.	g21644
1	1501624	1501626	1	AUGUSTUS	gene	1500925	1501924	0.93	-	.	g21644

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanks <==
1	21739	21741	1	AUGUSTUS	gene	21321	22320	0.23	-	.	g21533
1	87492	87494	1	AUGUSTUS	gene	87347	88346	1	+	.	g21537
1	185844	185846	1	AUGUSTUS	gene	185773	186181	0.26	+	.	g21546
1	185844	185846	1	AUGUSTUS	gene	185773	186181	0.68	+	.	g21

In [199]:
#Count number of overlaps
!wc -l *mcFlanks

   37968 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanks
   27669 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanks
  361553 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanks
  427190 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanks
  151474 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanks
  142393 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanks
  985243 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanks
 1279110 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanks
    4136 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanks
    1765 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanks
   10361 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanks
   16262 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanks
 3445124 total


In [200]:
!wc -l *mcFlanks > Mcap-5x-mcFlanks-counts.txt

#### 3f. Intergenic

In [201]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcIntergenic
done

In [202]:
#Check output
!head *mcIntergenic

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntergenic <==
1	32228	32230
1	130507	130509
1	147722	147724
1	147732	147734
1	147767	147769
1	147785	147787
1	147794	147796
1	147806	147808
1	241717	241719
1	241722	241724

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntergenic <==
1	15092	15094
1	21739	21741
1	34139	34141
1	87492	87494
1	166013	166015
1	176289	176291
1	185844	185846
1	186587	186589
1	198078	198080
1	237921	237923

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntergenic <==
1	3493	3495
1	3518	3520
1	3727	3729
1	3752	3754
1	3757	3759
1	3770	3772
1	11876	11878
1	11887	11889
1	11894	11896
1	11941	11943

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntergenic <==
1	3493	3495
1	3518	3520
1	3727	3729
1	3752	3754
1	3757	3759
1	3770	3772
1	11876	11878
1	11887	11889
1	11894	11896
1	11941	11943

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntergenic <==
1	4948	4950
1	4967	4969
1	4986	4988
1	57065	57067
1	443126	443128
1	444404	444406
1	4461

In [203]:
#Count number of overlaps
!wc -l *mcIntergenic

  172840 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntergenic
  142955 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntergenic
 2358573 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntergenic
 2674368 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntergenic
  665129 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntergenic
  754396 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntergenic
 5949714 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntergenic
 7369239 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntergenic
   16508 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntergenic
   11950 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntergenic
   74632 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntergenic
  103090 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntergenic
 20293394 total


In [204]:
!wc -l *mcIntergenic > Mcap-5x-mcIntergenic-counts.txt

## *P. acuta*

In [206]:
cd ..

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union


In [207]:
#Make a directory for Pact output
#!mkdir Pact

In [208]:
cd Pact/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Pact


#### 1a. Download bedgraph

In [209]:
#Download Mcap 5x union bedgraph
!wget https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Pact_union_5x.bedgraph

--2020-05-06 21:58:27--  https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Pact_union_5x.bedgraph
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1212075709 (1.1G)
Saving to: ‘Pact_union_5x.bedgraph’


2020-05-06 21:58:42 (75.7 MB/s) - ‘Pact_union_5x.bedgraph’ saved [1212075709/1212075709]



In [210]:
#Check downloaded file
#WGBS: 1-3
#RRBS: 4-6
#MBD-BS: 7-9
!tail Pact_union_5x.bedgraph

scaffold168460_cov188	280	282	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168461_cov203	69	71	N/A	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168461_cov203	142	144	N/A	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168461_cov203	148	150	N/A	N/A	12.500000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168463_cov229	14	16	N/A	10.000000	N/A	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168463_cov229	95	97	N/A	5.555556	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168463_cov229	106	108	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168465_cov225	45	47	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168465_cov225	48	50	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168465_cov225	59	61	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	N/A


#### 1b. Manipulate with `pandas`

In [211]:
#Import union data into pandas
#Check head
df = pd.read_table("Pact_union_5x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,1,2,3,4,5,6,7,8,9
0,scaffold1_cov55,49,51,,0.0,,,,,,,
1,scaffold1_cov55,84,86,,0.0,,,,,,,
2,scaffold1_cov55,92,94,,0.0,,,,,,,
3,scaffold1_cov55,102,104,16.666667,0.0,,,,,,,
4,scaffold1_cov55,105,107,0.0,12.5,,,,,,,


In [212]:
#Average the first three columns for WGBS information and save as a new column
#Average the middle three columns for RRBS information and save as a new column
#Average the last three columns for MBD-BS information and save as a new column
#Check output
df['WGBS'] = df[['1', '2', "3"]].mean(axis=1)
df['RRBS'] = df[['4', '5', "6"]].mean(axis=1)
df['MBD-BS'] = df[['7', '8', "9"]].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,1,2,3,4,5,6,7,8,9,WGBS,RRBS,MBD-BS
15758665,scaffold168460_cov188,280,282,,0.0,0.0,,,,,,,0.0,,
15758666,scaffold168461_cov203,69,71,,,0.0,,,,,,,0.0,,
15758667,scaffold168461_cov203,142,144,,,0.0,,,,,,,0.0,,
15758668,scaffold168461_cov203,148,150,,,12.5,,,,,,,12.5,,
15758669,scaffold168463_cov229,14,16,,10.0,,,,,,,,10.0,,
15758670,scaffold168463_cov229,95,97,,5.555556,0.0,,,,,,,2.777778,,
15758671,scaffold168463_cov229,106,108,,0.0,0.0,,,,,,,0.0,,
15758672,scaffold168465_cov225,45,47,,0.0,,,,,,,,0.0,,
15758673,scaffold168465_cov225,48,50,,0.0,,,,,,,,0.0,,
15758674,scaffold168465_cov225,59,61,,0.0,0.0,,,,,,,0.0,,


In [213]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("Pact_union_5x-averages.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

#### 1c. Separate methods into new bedgraphs

In [214]:
#Check pandas manipulations
!head Pact_union_5x-averages.bedgraph

	chrom	start	end	1	2	3	4	5	6	7	8	9	WGBS	RRBS	MBD-BS
0	scaffold1_cov55	49	51	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
1	scaffold1_cov55	84	86	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
2	scaffold1_cov55	92	94	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
3	scaffold1_cov55	102	104	16.666667	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	8.3333335	N/A	N/A
4	scaffold1_cov55	105	107	0.0	12.5	N/A	N/A	N/A	N/A	N/A	N/A	N/A	6.25	N/A	N/A
5	scaffold1_cov55	116	118	0.0	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
6	scaffold1_cov55	119	121	0.0	0.0	20.0	N/A	N/A	N/A	N/A	N/A	N/A	6.666666666666667	N/A	N/A
7	scaffold1_cov55	146	148	0.0	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
8	scaffold1_cov55	169	171	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A


In [215]:
#Remove header
#Keep chr, start, end, and WGBS average (col 2-4, 13)
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Pact_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $13}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Pact_union_5x-averages-WGBS.bedgraph

In [216]:
#Check output: chr, start, end, % meth
!head Pact_union_5x-averages-WGBS.bedgraph
!wc -l Pact_union_5x-averages-WGBS.bedgraph

scaffold3_cov83	118	120	14.285714000000002
scaffold3_cov83	130	132	12.5
scaffold3_cov83	137	139	25.0
scaffold3_cov83	189	191	38.461538
scaffold3_cov83	208	210	23.529412
scaffold3_cov83	243	245	0.0
scaffold3_cov83	261	263	23.809524
scaffold3_cov83	475	477	48.0
scaffold3_cov83	484	486	32.0
scaffold6_cov64	290	292	0.0
 2732607 Pact_union_5x-averages-WGBS.bedgraph


In [217]:
#Remove header
#Keep chr, start, end, and RRBS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Pact_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $14}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Pact_union_5x-averages-RRBS.bedgraph

In [218]:
#Check output: chr, start, end, % meth
!head Pact_union_5x-averages-RRBS.bedgraph
!wc -l Pact_union_5x-averages-RRBS.bedgraph

scaffold1_cov55	49	51	0.0
scaffold1_cov55	84	86	0.0
scaffold1_cov55	92	94	0.0
scaffold1_cov55	102	104	8.3333335
scaffold1_cov55	105	107	6.25
scaffold1_cov55	116	118	0.0
scaffold1_cov55	119	121	6.666666666666667
scaffold1_cov55	146	148	0.0
scaffold1_cov55	169	171	0.0
scaffold1_cov55	186	188	10.0
 7665143 Pact_union_5x-averages-RRBS.bedgraph


In [219]:
#Remove header
#Keep chr, start, end, and MBD-BS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Pact_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $15}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Pact_union_5x-averages-MBDBS.bedgraph

In [220]:
#Check output: chr, start, end, % meth
!head Pact_union_5x-averages-MBDBS.bedgraph
!wc -l Pact_union_5x-averages-MBDBS.bedgraph

scaffold6_cov64	2536	2538	0.0
scaffold6_cov64	2584	2586	0.0
scaffold6_cov64	2676	2678	48.888889
scaffold6_cov64	2682	2684	0.0
scaffold6_cov64	4553	4555	17.142857
scaffold6_cov64	4588	4590	7.407407333333333
scaffold6_cov64	5101	5103	0.0
scaffold6_cov64	5309	5311	0.0
scaffold6_cov64	5456	5458	0.0
scaffold6_cov64	5486	5488	0.0
 3508578 Pact_union_5x-averages-MBDBS.bedgraph


In [151]:
!find *averages-*bedgraph

Mcap_union_5x-averages-MBDBS.bedgraph
Mcap_union_5x-averages-RRBS.bedgraph
Mcap_union_5x-averages-WGBS.bedgraph


In [221]:
!wc -l *averages-*bedgraph > Pact_union_5x-averages-counts.txt

### 2. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [222]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [223]:
!head *Meth

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth <==
scaffold7_cov100 1535 1537 60.0
scaffold7_cov100 17000 17002 100.0
scaffold7_cov100 17090 17092 100.0
scaffold7_cov100 24454 24456 100.0
scaffold7_cov100 24494 24496 87.5
scaffold7_cov100 24509 24511 100.0
scaffold7_cov100 24557 24559 100.0
scaffold7_cov100 33140 33142 80.0
scaffold7_cov100 33157 33159 80.0
scaffold7_cov100 40896 40898 50.0

==> Pact_union_5x-averages-RRBS.bedgraph-Meth <==
scaffold7_cov100 5500 5502 77.77777766666667
scaffold7_cov100 5986 5988 75.55555566666668
scaffold7_cov100 6144 6146 100.0
scaffold7_cov100 6188 6190 98.03921566666666
scaffold7_cov100 6198 6200 96.29629633333333
scaffold7_cov100 6231 6233 90.47619033333332
scaffold7_cov100 6233 6235 100.0
scaffold7_cov100 7438 7440 96.07843133333334
scaffold7_cov100 7696 7698 98.611111
scaffold7_cov100 7796 7798 80.60606066666666

==> Pact_union_5x-averages-WGBS.bedgraph-Meth <==
scaffold7_cov100 5578 5580 66.666667
scaffold7_cov100 5986 5

In [224]:
!wc -l *Meth

   66798 Pact_union_5x-averages-MBDBS.bedgraph-Meth
  154042 Pact_union_5x-averages-RRBS.bedgraph-Meth
  255370 Pact_union_5x-averages-WGBS.bedgraph-Meth
  476210 total


In [225]:
!wc -l *-Meth > Pact_union_5x-Meth-counts.txt

##### Sparsely methylated loci

In [226]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [227]:
!head *sparseMeth

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth <==
scaffold6_cov64 2676 2678 48.888889
scaffold6_cov64 4553 4555 17.142857
scaffold6_cov64 5904 5906 10.800086
scaffold6_cov64 6374 6376 24.382284333333335
scaffold6_cov64 7373 7375 16.666667
scaffold7_cov100 2301 2303 25.0
scaffold7_cov100 4351 4353 20.935810000000004
scaffold7_cov100 15408 15410 20.0
scaffold7_cov100 17074 17076 33.333333
scaffold7_cov100 17098 17100 20.0

==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth <==
scaffold1_cov55 252 254 20.0
scaffold2_cov51 686 688 11.609686333333334
scaffold3_cov83 475 477 12.281745999999998
scaffold3_cov83 504 506 10.047846999999999
scaffold7_cov100 4305 4307 16.6666665
scaffold7_cov100 4351 4353 24.761904666666666
scaffold7_cov100 4630 4632 11.416666666666666
scaffold7_cov100 5578 5580 37.56613766666666
scaffold7_cov100 7121 7123 18.75
scaffold7_cov100 7201 7203 29.761905

==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth <==
scaffold3_cov83 118 120 14.

In [228]:
!wc -l *sparseMeth

  210985 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth
  311665 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth
  337855 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth
  860505 total


In [229]:
!wc -l *-sparseMeth > Pact_union_5x-sparseMeth-counts.txt

##### Unmethylated loci

In [230]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [231]:
!head *unMeth

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth <==
scaffold6_cov64 2536 2538 0.0
scaffold6_cov64 2584 2586 0.0
scaffold6_cov64 2682 2684 0.0
scaffold6_cov64 4588 4590 7.407407333333333
scaffold6_cov64 5101 5103 0.0
scaffold6_cov64 5309 5311 0.0
scaffold6_cov64 5456 5458 0.0
scaffold6_cov64 5486 5488 0.0
scaffold6_cov64 5545 5547 4.761904666666667
scaffold6_cov64 5559 5561 3.952991666666667

==> Pact_union_5x-averages-RRBS.bedgraph-unMeth <==
scaffold1_cov55 49 51 0.0
scaffold1_cov55 84 86 0.0
scaffold1_cov55 92 94 0.0
scaffold1_cov55 102 104 8.3333335
scaffold1_cov55 105 107 6.25
scaffold1_cov55 116 118 0.0
scaffold1_cov55 119 121 6.666666666666667
scaffold1_cov55 146 148 0.0
scaffold1_cov55 169 171 0.0
scaffold1_cov55 186 188 10.0

==> Pact_union_5x-averages-WGBS.bedgraph-unMeth <==
scaffold3_cov83 243 245 0.0
scaffold6_cov64 290 292 0.0
scaffold6_cov64 298 300 0.0
scaffold6_cov64 489 491 0.0
scaffold6_cov64 826 828 0.0
scaffold6_cov64 2097 2099 0.0
sc

In [232]:
!wc -l *unMeth

 3230795 Pact_union_5x-averages-MBDBS.bedgraph-unMeth
 7199436 Pact_union_5x-averages-RRBS.bedgraph-unMeth
 2139382 Pact_union_5x-averages-WGBS.bedgraph-unMeth
 12569613 total


In [233]:
!wc -l *-unMeth > Pact_union_5x-unMeth-counts.txt

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [54]:
%%bash

for f in *averages-*bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 5546051 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 6358722 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 5866786 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1835561 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1451229 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1517358 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 2640625 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
  539008 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 2732607 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed


In [55]:
%%bash

for f in *bedgraph-Meth
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  110364 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  126440 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  124819 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
   31047 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
   30345 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
   26617 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  258222 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  213342 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  255370 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed


In [56]:
%%bash

for f in *bedgraph-sparseMeth
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  367019 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  345887 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  385346 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  137700 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
   64837 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
   89246 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  296059 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
   80086 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  337855 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed


In [57]:
%%bash

for f in *bedgraph-unMeth
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 5068668 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 5886395 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 5356621 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 1666814 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 1356047 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 1401495 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 2086344 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
  245580 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 2139382 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed


In [58]:
#Confirm BEDfile creation
!find *.bed

Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth5_R1_001_val_1_

In [60]:
#Confirm file creation
!head Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed

scaffold1_cov55	102	104
scaffold1_cov55	105	107
scaffold1_cov55	116	118
scaffold1_cov55	119	121
scaffold1_cov55	146	148
scaffold1_cov55	186	188
scaffold1_cov55	194	196
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold2_cov51	778	780


#### 4b. Genes

In [57]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paGenes
done

In [58]:
#Check output
!head *paGenes

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes <==
scaffold7_cov100	4351	4353	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	5500	5502	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	5578	5580	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	5986	5988	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6198	6200	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	7696	7698	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	7796	7798	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes <==
scaffold7_cov100	1293	1295	scaffold7_cov100	AUGUSTUS	gene	

In [59]:
#Count number of overlaps
!wc -l *paGenes

   73959 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
  157337 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes
 2235696 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paGenes
 2466992 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paGenes
   85861 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
  144292 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes
 2531803 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paGenes
 2761956 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paGenes
   82377 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
  161791 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes
 2344110 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paGenes
 2588278 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paGenes
   13588 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
   56290 Meth4_R1_001_val_1_bismark_bt2_pe

In [22]:
!wc -l *paGenes > Pact-5x-paGenes-counts.txt

#### 4c. Coding Sequences (CDS)

In [60]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \
  > ${f}-paCDS
done

In [61]:
#Check output
!head *paCDS

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS <==
scaffold7_cov100	5500	5502	scaffold7_cov100	AUGUSTUS	CDS	5466	5540	1	-	2	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	5500	5502	scaffold7_cov100	AUGUSTUS	CDS	5466	5540	1	-	2	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6198	6200	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6198	6200	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	7696	7698	scaff

In [62]:
#Count number of overlaps
!wc -l *paCDS

   59188 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   89863 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paCDS
 1345289 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paCDS
 1494340 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paCDS
   66365 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   76868 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paCDS
 1477399 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paCDS
 1620632 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paCDS
   65245 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   89654 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paCDS
 1397816 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paCDS
 1552715 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paCDS
    9644 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   36616 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.b

In [23]:
!wc -l *paCDS > Pact-5x-paCDS-counts.txt

#### 4d. Introns

In [63]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \
  > ${f}-paIntron
done

In [64]:
#Check output
!head *paIntron

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron <==
scaffold7_cov100	4351	4353	scaffold7_cov100	AUGUSTUS	intron	4181	4607	1	-	.	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	4351	4353	scaffold7_cov100	AUGUSTUS	intron	4181	4607	1	-	.	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	5578	5580	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	5578	5580	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	5986	5988	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	5986	5988	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	7796	7

In [65]:
#Count number of overlaps
!wc -l *paIntron

   41787 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
  122271 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntron
 1676080 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntron
 1840138 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntron
   51567 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
  117738 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntron
 1943990 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntron
 2113295 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntron
   47352 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
  128122 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntron
 1770676 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntron
 1946150 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntron
    8446 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
   38589 Meth4_R1_001_val_1_b

In [24]:
!wc -l *paIntron > Pact-5x-paIntron-counts.txt

#### 4e. Flanking regions

In [61]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.flanks.gff \
  > ${f}-paFlanks
done

In [62]:
#Check output
!head *paFlanks

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks <==
scaffold7_cov100	6231	6233	scaffold7_cov100	AUGUSTUS	gene	6218	7068	0.78	-	.	g4
scaffold7_cov100	6231	6233	scaffold7_cov100	AUGUSTUS	gene	6218	7068	1	-	.	g5
scaffold7_cov100	6233	6235	scaffold7_cov100	AUGUSTUS	gene	6218	7068	0.78	-	.	g4
scaffold7_cov100	6233	6235	scaffold7_cov100	AUGUSTUS	gene	6218	7068	1	-	.	g5
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.96	+	.	g8
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.99	-	.	g9
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.74	+	.	g10
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.39	+	.	g11
scaffold7_cov100	19296	19298	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.96	+	.	g8
scaffold7_cov100	19296	19298	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.99	-	.	g9

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks <==
scaffold6_cov64	7373	7375	s

In [63]:
#Count number of overlaps
!wc -l *paFlanks

   28031 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
   97808 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks
 1317046 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paFlanks
 1442885 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paFlanks
   32259 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
   93054 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks
 1536795 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paFlanks
 1662108 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paFlanks
   31840 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
  102280 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks
 1392903 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paFlanks
 1527023 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paFlanks
    7491 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
   34079 Meth4_R1_001_val_1_b

In [64]:
!wc -l *paFlanks > Pact-5x-paFlanks-counts.txt

#### 4e. Intergenic

In [66]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paIntergenic
done

In [67]:
#Check output
!head *paIntergenic

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic <==
scaffold7_cov100	6231	6233
scaffold7_cov100	6233	6235
scaffold7_cov100	19284	19286
scaffold7_cov100	19296	19298
scaffold7_cov100	24494	24496
scaffold7_cov100	24509	24511
scaffold7_cov100	24557	24559
scaffold7_cov100	24617	24619
scaffold7_cov100	24895	24897
scaffold7_cov100	24941	24943

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic <==
scaffold1_cov55	102	104
scaffold1_cov55	186	188
scaffold3_cov83	118	120
scaffold3_cov83	137	139
scaffold3_cov83	475	477
scaffold3_cov83	484	486
scaffold3_cov83	504	506
scaffold6_cov64	7373	7375
scaffold6_cov64	7983	7985
scaffold7_cov100	13275	13277

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic <==
scaffold1_cov55	105	107
scaffold1_cov55	116	118
scaffold1_cov55	119	121
scaffold1_cov55	146	148
scaffold1_cov55	194	196
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold2_cov51	778	780

In [68]:
#Count number of overlaps
!wc -l *paIntergenic

   36461 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic
  209781 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic
 2834593 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic
 3080835 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntergenic
   40642 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic
  201712 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic
 3356494 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic
 3598848 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntergenic
   42507 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic
  223666 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic
 3014184 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic
 3280357 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntergenic
   17473 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph

In [25]:
!wc -l *paIntergenic > Pact-5x-paIntergenic-counts.txt