# Characterizing CpG Methylation (union bedgraphs with 5x data)

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome. I will use [union bedgraphs](https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/) with 5x data.

1. Download union bedgraphs and format for downstream analyses
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type

## 0. Set working directory and install programs

In [1]:
!pwd

/Users/yaamini/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaamini/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation-5x-Union

In [4]:
cd Characterizing-CpG-Methylation-5x-Union/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union


In [5]:
#Install pandas for this notebook
import pandas as pd
print(pd.__version__)

0.18.1


## *M. capitata*

In [5]:
#Make a directory for Mcap output
!mkdir Mcap

mkdir: Mcap: File exists


In [6]:
cd Mcap/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Mcap


### 1. Format data

#### 1a. Download bedgraph

In [8]:
#Download Mcap 5x union bedgraph
!wget https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Mcap_union_5x.bedgraph

--2020-05-19 21:52:34--  https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Mcap_union_5x.bedgraph
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862402537 (822M)
Saving to: ‘Mcap_union_5x.bedgraph’


2020-05-19 21:52:45 (81.3 MB/s) - ‘Mcap_union_5x.bedgraph’ saved [862402537/862402537]



In [9]:
#Check downloaded file
#WGBS: 10-12
#RRBS: 14-16
#MBD-BS: 16-18
!tail Mcap_union_5x.bedgraph

3043	18674	18676	12.500000	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
3043	18688	18690	7.692308	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A
3043	18720	18722	8.333333	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18734	18736	8.333333	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18740	18742	0.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18755	18757	11.111111	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18784	18786	20.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18801	18803	0.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18822	18824	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A
3043	18827	18829	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A


#### 1b. Manipulate with `pandas`

In [10]:
#Import union data into pandas
#Check head
df = pd.read_table("Mcap_union_5x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,10,11,12,13,14,15,16,17,18
0,1,3493,3495,,,,0.0,,0.0,,,
1,1,3518,3520,,,,0.0,,0.0,,,
2,1,3727,3729,,,,0.0,0.0,8.695652,,,
3,1,3752,3754,,,,0.0,0.0,0.0,,,
4,1,3757,3759,,,,0.0,0.0,0.0,,,


In [11]:
#Average the first three columns for WGBS information and save as a new column
#Average the middle three columns for RRBS information and save as a new column
#Average the last three columns for MBD-BS information and save as a new column
#Check output
df['WGBS'] = df[['10', '11', "12"]].mean(axis=1)
df['RRBS'] = df[['13', '14', "15"]].mean(axis=1)
df['MBD-BS'] = df[['16', '17', "18"]].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,10,11,12,13,14,15,16,17,18,WGBS,RRBS,MBD-BS
13340258,3043,18674,18676,12.5,,0.0,,,,,,,6.25,,
13340259,3043,18688,18690,7.692308,0.0,0.0,,,,,0.0,,2.564103,,0.0
13340260,3043,18720,18722,8.333333,0.0,0.0,,,,,0.0,0.0,2.777778,,0.0
13340261,3043,18734,18736,8.333333,0.0,0.0,,,,,,0.0,2.777778,,0.0
13340262,3043,18740,18742,0.0,0.0,0.0,,,,,,0.0,0.0,,0.0
13340263,3043,18755,18757,11.111111,0.0,0.0,,,,,,0.0,3.703704,,0.0
13340264,3043,18784,18786,20.0,0.0,0.0,,,,,0.0,0.0,6.666667,,0.0
13340265,3043,18801,18803,0.0,0.0,0.0,,,,,0.0,0.0,0.0,,0.0
13340266,3043,18822,18824,,0.0,0.0,,,,,0.0,,0.0,,0.0
13340267,3043,18827,18829,,0.0,0.0,,,,,0.0,,0.0,,0.0


In [12]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("Mcap_union_5x-averages.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

#### 1c. Separate methods into new bedgraphs

In [13]:
#Check pandas manipulations
!head Mcap_union_5x-averages.bedgraph

	chrom	start	end	10	11	12	13	14	15	16	17	18	WGBS	RRBS	MBD-BS
0	1	3493	3495	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A
1	1	3518	3520	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A
2	1	3727	3729	N/A	N/A	N/A	0.0	0.0	8.695652	N/A	N/A	N/A	N/A	2.898550666666667	N/A
3	1	3752	3754	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
4	1	3757	3759	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
5	1	3770	3772	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
6	1	4062	4064	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
7	1	4069	4071	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
8	1	4077	4079	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A


In [14]:
#Remove header
#Keep chr, start, end, and WGBS average (col 2-4, 13)
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $14}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-WGBS.bedgraph

In [15]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-WGBS.bedgraph
!wc -l Mcap_union_5x-averages-WGBS.bedgraph

1	4062	4064	0.0
1	4069	4071	0.0
1	4077	4079	0.0
1	4086	4088	0.0
1	4146	4148	0.0
1	4150	4152	0.0
1	4155	4157	0.0
1	4172	4174	0.0
1	4184	4186	0.0
1	4190	4192	16.666667
 11509837 Mcap_union_5x-averages-WGBS.bedgraph


In [16]:
#Remove header
#Keep chr, start, end, and RRBS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $15}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-RRBS.bedgraph

In [17]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-RRBS.bedgraph
!wc -l Mcap_union_5x-averages-RRBS.bedgraph

1	3493	3495	0.0
1	3518	3520	0.0
1	3727	3729	2.898550666666667
1	3752	3754	0.0
1	3757	3759	0.0
1	3770	3772	0.0
1	11876	11878	0.0
1	11887	11889	0.0
1	11894	11896	0.0
1	11941	11943	0.0
 3981450 Mcap_union_5x-averages-RRBS.bedgraph


In [18]:
#Remove header
#Keep chr, start, end, and MBD-BS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $16}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-MBDBS.bedgraph

In [19]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-MBDBS.bedgraph
!wc -l Mcap_union_5x-averages-MBDBS.bedgraph

1	5228	5230	0.0
1	5243	5245	0.0
1	5247	5249	0.0
1	5296	5298	0.0
1	8113	8115	20.0
1	59438	59440	100.0
1	77096	77098	0.0
1	77145	77147	0.0
1	77151	77153	0.0
1	77179	77181	0.0
  866555 Mcap_union_5x-averages-MBDBS.bedgraph


In [20]:
!find *averages-*bedgraph

Mcap_union_5x-averages-MBDBS.bedgraph
Mcap_union_5x-averages-RRBS.bedgraph
Mcap_union_5x-averages-WGBS.bedgraph


In [21]:
!wc -l *averages-*bedgraph > Mcap_union_5x-averages-counts.txt

### 2. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [23]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [24]:
!head *-Meth

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth <==
1 59438 59440 100.0
1 106173 106175 100.0
1 106202 106204 100.0
1 344031 344033 50.0
1 344044 344046 60.0
1 446326 446328 80.0
1 446344 446346 100.0
1 446367 446369 100.0
1 446376 446378 100.0
1 786125 786127 60.0

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth <==
1 32228 32230 50.413223
1 58618 58620 95.65826333333332
1 58745 58747 96.819728
1 58764 58766 99.16666666666667
1 58792 58794 83.42830033333334
1 66041 66043 100.0
1 66050 66052 100.0
1 66339 66341 88.888889
1 66345 66347 77.777778
1 66354 66356 77.777778

==> Mcap_union_5x-averages-WGBS.bedgraph-Meth <==
1 4948 4950 50.0
1 4967 4969 50.0
1 4986 4988 50.0
1 57065 57067 80.0
1 58609 58611 100.0
1 58618 58620 100.0
1 58745 58747 100.0
1 59207 59209 100.0
1 59277 59279 100.0
1 59393 59395 100.0


In [25]:
!wc -l *-Meth

  148321 Mcap_union_5x-averages-MBDBS.bedgraph-Meth
  329361 Mcap_union_5x-averages-RRBS.bedgraph-Meth
 1350936 Mcap_union_5x-averages-WGBS.bedgraph-Meth
 1828618 total


In [26]:
!wc -l *-Meth > Mcap_union_5x-Meth-counts.txt

##### Sparsely methylated loci

In [27]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [28]:
!head *-sparseMeth

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth <==
1 8113 8115 20.0
1 211907 211909 40.0
1 217198 217200 14.285714000000002
1 234158 234160 14.285714000000002
1 234196 234198 12.5
1 244563 244565 20.0
1 269174 269176 16.666667
1 269178 269180 16.666667
1 269182 269184 16.666667
1 277994 277996 16.666667

==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth <==
1 15092 15094 30.0
1 21739 21741 13.636364000000002
1 34139 34141 11.764706
1 42261 42263 10.539216
1 45163 45165 10.31746
1 48370 48372 14.285714000000002
1 87492 87494 33.333333
1 89011 89013 14.285714000000002
1 101503 101505 17.380952
1 101545 101547 23.3333335

==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth <==
1 4190 4192 16.666667
1 4891 4893 33.333333
1 4910 4912 28.571429
1 4929 4931 16.6666665
1 5005 5007 28.571429
1 5024 5026 40.0
1 5151 5153 20.0
1 5160 5162 16.666667
1 5228 5230 11.111111
1 6282 6284 11.111111


In [29]:
!wc -l *-sparseMeth

  103713 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth
  220277 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth
 1155033 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth
 1479023 total


In [30]:
!wc -l *-sparseMeth > Mcap_union_5x-sparseMeth-counts.txt

##### Unmethylated loci

In [31]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [32]:
!head *-unMeth

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth <==
1 5228 5230 0.0
1 5243 5245 0.0
1 5247 5249 0.0
1 5296 5298 0.0
1 77096 77098 0.0
1 77145 77147 0.0
1 77151 77153 0.0
1 77179 77181 0.0
1 81812 81814 0.0
1 81817 81819 0.0

==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth <==
1 3493 3495 0.0
1 3518 3520 0.0
1 3727 3729 2.898550666666667
1 3752 3754 0.0
1 3757 3759 0.0
1 3770 3772 0.0
1 11876 11878 0.0
1 11887 11889 0.0
1 11894 11896 0.0
1 11941 11943 0.0

==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth <==
1 4062 4064 0.0
1 4069 4071 0.0
1 4077 4079 0.0
1 4086 4088 0.0
1 4146 4148 0.0
1 4150 4152 0.0
1 4155 4157 0.0
1 4172 4174 0.0
1 4184 4186 0.0
1 5043 5045 0.0


In [33]:
!wc -l *-unMeth

  614521 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth
 3431812 Mcap_union_5x-averages-RRBS.bedgraph-unMeth
 9003868 Mcap_union_5x-averages-WGBS.bedgraph-unMeth
 13050201 total


In [34]:
!wc -l *-unMeth > Mcap_union_5x-unMeth-counts.txt

### 3. Characterize genomic locations of CpGs

#### 3a. Create BEDfiles

In [35]:
%%bash

for f in *averages-*bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  866555 Mcap_union_5x-averages-MBDBS.bedgraph.bed
  148321 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed
  103713 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed
  614521 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed
 3981450 Mcap_union_5x-averages-RRBS.bedgraph.bed
  329361 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed
  220277 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed
 3431812 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed
 11509837 Mcap_union_5x-averages-WGBS.bedgraph.bed
 1350936 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed
 1155033 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed
 9003868 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed


In [None]:
#Confirm file creation
!head Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed

1	5228	5230
1	5243	5245
1	5247	5249
1	5296	5298
1	77096	77098
1	77145	77147
1	77151	77153
1	77179	77181
1	81812	81814
1	81817	81819


#### 3b. Genes

In [8]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcGenes
done

In [9]:
#Check output
!head *mcGenes

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes <==
1	59438	59440
1	106173	106175
1	106202	106204
1	344031	344033
1	344044	344046
1	786125	786127
1	786144	786146
1	786151	786153
1	879915	879917
1	883893	883895

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes <==
1	323095	323097
1	328382	328384
1	328386	328388
1	330194	330196
1	330197	330199
1	334750	334752
1	334782	334784
1	341742	341744
1	343939	343941
1	343962	343964

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes <==
1	77096	77098
1	77145	77147
1	77151	77153
1	77179	77181
1	81812	81814
1	81817	81819
1	81835	81837
1	81874	81876
1	81887	81889
1	109670	109672

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes <==
1	59438	59440
1	77096	77098
1	77145	77147
1	77151	77153
1	77179	77181
1	81812	81814
1	81817	81819
1	81835	81837
1	81874	81876
1	81887	81889

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes <==
1	58618	58620
1	58745	58747


In [10]:
#Count number of overlaps
!wc -l *mcGenes

   67258 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes
   35276 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes
  207003 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes
  309537 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes
  156521 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes
   77322 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcGenes
 1073239 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcGenes
 1307082 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcGenes
  685807 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcGenes
  400637 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcGenes
 3054154 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcGenes
 4140598 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcGenes
 11514434 total


In [11]:
!wc -l *mcGenes > Mcap_union_5x-mcGenes-counts.txt

#### 3c. Coding Sequences (CDS)

In [12]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \
  > ${f}-mcCDS
done

In [13]:
#Check output
!head *mcCDS

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcCDS <==
1	59438	59440
1	786125	786127
1	786144	786146
1	786151	786153
1	1263040	1263042
1	1409642	1409644
1	1409734	1409736
1	1543924	1543926
1	1601051	1601053
1	1641103	1641105

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcCDS <==
1	323095	323097
1	354622	354624
1	601511	601513
1	666749	666751
1	667790	667792
1	709103	709105
1	744333	744335
1	744365	744367
1	786094	786096
1	786097	786099

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcCDS <==
1	109670	109672
1	238112	238114
1	238133	238135
1	323036	323038
1	323051	323053
1	323066	323068
1	323098	323100
1	354586	354588
1	354616	354618
1	361975	361977

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcCDS <==
1	59438	59440
1	109670	109672
1	238112	238114
1	238133	238135
1	323036	323038
1	323051	323053
1	323066	323068
1	323095	323097
1	323098	323100
1	354586	354588

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcC

In [14]:
#Count number of overlaps
!wc -l *mcCDS

   18518 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcCDS
   13110 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcCDS
   74148 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcCDS
  105776 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcCDS
   22660 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcCDS
   16593 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcCDS
  205283 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcCDS
  244536 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcCDS
  139018 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcCDS
  103408 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcCDS
  753113 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcCDS
  995539 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcCDS
 2691702 total


In [15]:
!wc -l *mcCDS > Mcap_union_5x-mcCDS-counts.txt

#### 3d. Introns

In [16]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \
  > ${f}-mcIntrons
done

In [17]:
#Check output
!head *mcIntrons

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntrons <==
1	106173	106175
1	106202	106204
1	344031	344033
1	344044	344046
1	879915	879917
1	883893	883895
1	982886	982888
1	1243019	1243021
1	1259506	1259508
1	1259529	1259531

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntrons <==
1	328382	328384
1	328386	328388
1	330194	330196
1	330197	330199
1	334750	334752
1	334782	334784
1	341742	341744
1	343939	343941
1	343962	343964
1	344000	344002

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntrons <==
1	77096	77098
1	77145	77147
1	77151	77153
1	77179	77181
1	81812	81814
1	81817	81819
1	81835	81837
1	81874	81876
1	81887	81889
1	323150	323152

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntrons <==
1	77096	77098
1	77145	77147
1	77151	77153
1	77179	77181
1	81812	81814
1	81817	81819
1	81835	81837
1	81874	81876
1	81887	81889
1	106173	106175

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntrons <==
1	66041	66

In [18]:
#Count number of overlaps
!wc -l *mcIntrons

   48781 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntrons
   22188 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntrons
  133009 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntrons
  203978 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntrons
  133901 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntrons
   60756 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntrons
  868445 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntrons
 1063102 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntrons
  547317 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntrons
  297449 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntrons
 2303015 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntrons
 3147781 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntrons
 8829722 total


In [19]:
!wc -l *mcIntrons > Mcap_union_5x-mcIntrons-counts.txt

#### 3e. Flanking regions

In [20]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.gff \
  > ${f}-mcFlanks
done

In [21]:
#Check output
!head *mcFlanks

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanks <==
1	789213	789215
1	2070697	2070699
1	2070732	2070734
2	126187	126189
2	126190	126192
2	126197	126199
2	126199	126201
2	173804	173806
2	173810	173812
2	197311	197313

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanks <==
1	217198	217200
1	376177	376179
1	458648	458650
1	618190	618192
1	618205	618207
1	726420	726422
1	743459	743461
1	778795	778797
1	789254	789256
1	789277	789279

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanks <==
1	217219	217221
1	217248	217250
1	217269	217271
1	237189	237191
1	322944	322946
1	322963	322965
1	375501	375503
1	375506	375508
1	376200	376202
1	376220	376222

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanks <==
1	217198	217200
1	217219	217221
1	217248	217250
1	217269	217271
1	237189	237191
1	322944	322946
1	322963	322965
1	375501	375503
1	375506	375508
1	376177	376179

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth

In [22]:
#Count number of overlaps
!wc -l *mcFlanks

   15511 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanks
   10623 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanks
   59451 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanks
   85585 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanks
   32478 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanks
   23685 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanks
  335368 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanks
  391531 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanks
  127379 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanks
  123698 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanks
  910305 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanks
 1161382 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanks
 3276996 total


In [23]:
!wc -l *mcFlanks > Mcap_union_5x-mcFlanks-counts.txt

#### 3f. Upstream flanking regions

In [24]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.Upstream.gff \
  > ${f}-mcFlanksUpstream
done

In [25]:
#Check output
!head *mcFlanksUpstream

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksUpstream <==
2	126187	126189
2	126190	126192
2	126197	126199
2	126199	126201
2	173804	173806
2	173810	173812
2	389824	389826
2	420160	420162
2	445154	445156
2	445170	445172

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksUpstream <==
1	376177	376179
1	618190	618192
1	618205	618207
1	726420	726422
1	944356	944358
1	1276837	1276839
1	1276872	1276874
1	1700903	1700905
1	1700905	1700907
1	1852075	1852077

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksUpstream <==
1	237189	237191
1	375501	375503
1	375506	375508
1	376200	376202
1	376220	376222
1	376235	376237
1	376261	376263
1	376283	376285
1	376288	376290
1	376319	376321

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksUpstream <==
1	237189	237191
1	375501	375503
1	375506	375508
1	376177	376179
1	376200	376202
1	376220	376222
1	376235	376237
1	376261	376263
1	376283	376285
1	376288	376290

==> Mca

In [26]:
#Count number of overlaps
!wc -l *mcFlanksUpstream

    8984 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksUpstream
    6301 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksUpstream
   33652 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksUpstream
   48937 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksUpstream
   18189 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanksUpstream
   13354 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanksUpstream
  190835 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanksUpstream
  222378 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanksUpstream
   69987 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanksUpstream
   68425 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanksUpstream
  503989 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanksUpstream
  642401 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanksUpstream
 1827432 total


In [27]:
!wc -l *mcFlanksUpstream > Mcap_union_5x-mcFlanksUpstream-counts.txt

#### 3g. Downstream flanking regions

In [28]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.Downstream.gff \
  > ${f}-mcFlanksDownstream
done

In [29]:
#Check output
!head *mcFlanksDownstream

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksDownstream <==
1	789213	789215
1	2070697	2070699
1	2070732	2070734
2	197311	197313
2	197321	197323
2	197327	197329
2	260911	260913
2	301621	301623
2	330280	330282
2	330291	330293

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksDownstream <==
1	217198	217200
1	458648	458650
1	743459	743461
1	778795	778797
1	789254	789256
1	789277	789279
1	1700903	1700905
1	1700905	1700907
1	1708805	1708807
1	1709206	1709208

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksDownstream <==
1	217219	217221
1	217248	217250
1	217269	217271
1	322944	322946
1	322963	322965
1	458552	458554
1	458666	458668
1	458703	458705
1	458918	458920
1	458933	458935

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksDownstream <==
1	217198	217200
1	217219	217221
1	217248	217250
1	217269	217271
1	322944	322946
1	322963	322965
1	458552	458554
1	458648	458650
1	458666	458668
1	458703	458705

In [30]:
#Count number of overlaps
!wc -l *mcFlanksDownstream

    8342 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksDownstream
    5233 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksDownstream
   28232 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksDownstream
   41807 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksDownstream
   16949 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanksDownstream
   12157 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanksDownstream
  154771 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanksDownstream
  183877 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanksDownstream
   69176 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanksDownstream
   63486 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanksDownstream
  435477 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanksDownstream
  568139 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanksDownstream
 1587646 total


In [31]:
!wc -l *mcFlanksDownstream > Mcap_union_5x-mcFlanksDownstream-counts.txt

#### 3h. Intergenic

In [32]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.intergenic.bed \
  > ${f}-mcIntergenic
done

In [33]:
#Check output
!head *mcIntergenic

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntergenic <==
1	446326	446328
1	446344	446346
1	446367	446369
1	446376	446378
1	1002973	1002975
1	1006917	1006919
1	1006924	1006926
1	1343240	1343242
1	1343249	1343251
1	1343263	1343265

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntergenic <==
1	8113	8115
1	211907	211909
1	234158	234160
1	234196	234198
1	244563	244565
1	269174	269176
1	269178	269180
1	269182	269184
1	277994	277996
1	284269	284271

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntergenic <==
1	5228	5230
1	5243	5245
1	5247	5249
1	5296	5298
1	192753	192755
1	210921	210923
1	210930	210932
1	211905	211907
1	211917	211919
1	211925	211927

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntergenic <==
1	5228	5230
1	5243	5245
1	5247	5249
1	5296	5298
1	8113	8115
1	192753	192755
1	210921	210923
1	210930	210932
1	211905	211907
1	211907	211909

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntergeni

In [34]:
#Count number of overlaps
!wc -l *mcIntergenic

   65566 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntergenic
   57824 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntergenic
  348119 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntergenic
  471509 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntergenic
  140391 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntergenic
  119284 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntergenic
 2023518 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntergenic
 2283193 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntergenic
  537847 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntergenic
  630806 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntergenic
 5040188 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntergenic
 6208841 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntergenic
 17927086 total


In [35]:
!wc -l *mcIntergenic > Mcap_union_5x-mcIntergenic-counts.txt

## *P. acuta*

In [36]:
cd ..

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union


In [66]:
#Make a directory for Pact output
!mkdir Pact

In [37]:
cd Pact/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Pact


#### 1a. Download bedgraph

In [68]:
#Download Mcap 5x union bedgraph
!wget https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Pact_union_5x.bedgraph

--2020-05-19 22:34:21--  https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Pact_union_5x.bedgraph
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1212075709 (1.1G)
Saving to: ‘Pact_union_5x.bedgraph’


2020-05-19 22:34:35 (84.8 MB/s) - ‘Pact_union_5x.bedgraph’ saved [1212075709/1212075709]



In [69]:
#Check downloaded file
#WGBS: 1-3
#RRBS: 4-6
#MBD-BS: 7-9
!tail Pact_union_5x.bedgraph

scaffold168460_cov188	280	282	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168461_cov203	69	71	N/A	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168461_cov203	142	144	N/A	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168461_cov203	148	150	N/A	N/A	12.500000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168463_cov229	14	16	N/A	10.000000	N/A	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168463_cov229	95	97	N/A	5.555556	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168463_cov229	106	108	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168465_cov225	45	47	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168465_cov225	48	50	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A	N/A
scaffold168465_cov225	59	61	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	N/A


#### 1b. Manipulate with `pandas`

In [70]:
#Import union data into pandas
#Check head
df = pd.read_table("Pact_union_5x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,1,2,3,4,5,6,7,8,9
0,scaffold1_cov55,49,51,,0.0,,,,,,,
1,scaffold1_cov55,84,86,,0.0,,,,,,,
2,scaffold1_cov55,92,94,,0.0,,,,,,,
3,scaffold1_cov55,102,104,16.666667,0.0,,,,,,,
4,scaffold1_cov55,105,107,0.0,12.5,,,,,,,


In [71]:
#Average the first three columns for WGBS information and save as a new column
#Average the middle three columns for RRBS information and save as a new column
#Average the last three columns for MBD-BS information and save as a new column
#Check output
df['WGBS'] = df[['1', '2', "3"]].mean(axis=1)
df['RRBS'] = df[['4', '5', "6"]].mean(axis=1)
df['MBD-BS'] = df[['7', '8', "9"]].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,1,2,3,4,5,6,7,8,9,WGBS,RRBS,MBD-BS
15758665,scaffold168460_cov188,280,282,,0.0,0.0,,,,,,,0.0,,
15758666,scaffold168461_cov203,69,71,,,0.0,,,,,,,0.0,,
15758667,scaffold168461_cov203,142,144,,,0.0,,,,,,,0.0,,
15758668,scaffold168461_cov203,148,150,,,12.5,,,,,,,12.5,,
15758669,scaffold168463_cov229,14,16,,10.0,,,,,,,,10.0,,
15758670,scaffold168463_cov229,95,97,,5.555556,0.0,,,,,,,2.777778,,
15758671,scaffold168463_cov229,106,108,,0.0,0.0,,,,,,,0.0,,
15758672,scaffold168465_cov225,45,47,,0.0,,,,,,,,0.0,,
15758673,scaffold168465_cov225,48,50,,0.0,,,,,,,,0.0,,
15758674,scaffold168465_cov225,59,61,,0.0,0.0,,,,,,,0.0,,


In [72]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("Pact_union_5x-averages.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

#### 1c. Separate methods into new bedgraphs

In [73]:
#Check pandas manipulations
!head Pact_union_5x-averages.bedgraph

	chrom	start	end	1	2	3	4	5	6	7	8	9	WGBS	RRBS	MBD-BS
0	scaffold1_cov55	49	51	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
1	scaffold1_cov55	84	86	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
2	scaffold1_cov55	92	94	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
3	scaffold1_cov55	102	104	16.666667	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	8.3333335	N/A	N/A
4	scaffold1_cov55	105	107	0.0	12.5	N/A	N/A	N/A	N/A	N/A	N/A	N/A	6.25	N/A	N/A
5	scaffold1_cov55	116	118	0.0	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
6	scaffold1_cov55	119	121	0.0	0.0	20.0	N/A	N/A	N/A	N/A	N/A	N/A	6.666666666666667	N/A	N/A
7	scaffold1_cov55	146	148	0.0	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
8	scaffold1_cov55	169	171	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A


In [74]:
#Remove header
#Keep chr, start, end, and WGBS average (col 2-4, 13)
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Pact_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $14}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Pact_union_5x-averages-WGBS.bedgraph

In [75]:
#Check output: chr, start, end, % meth
!head Pact_union_5x-averages-WGBS.bedgraph
!wc -l Pact_union_5x-averages-WGBS.bedgraph

scaffold1_cov55	49	51	0.0
scaffold1_cov55	84	86	0.0
scaffold1_cov55	92	94	0.0
scaffold1_cov55	102	104	8.3333335
scaffold1_cov55	105	107	6.25
scaffold1_cov55	116	118	0.0
scaffold1_cov55	119	121	6.666666666666667
scaffold1_cov55	146	148	0.0
scaffold1_cov55	169	171	0.0
scaffold1_cov55	186	188	10.0
 7665143 Pact_union_5x-averages-WGBS.bedgraph


In [76]:
#Remove header
#Keep chr, start, end, and RRBS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Pact_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $15}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Pact_union_5x-averages-RRBS.bedgraph

In [77]:
#Check output: chr, start, end, % meth
!head Pact_union_5x-averages-RRBS.bedgraph
!wc -l Pact_union_5x-averages-RRBS.bedgraph

scaffold6_cov64	2536	2538	0.0
scaffold6_cov64	2584	2586	0.0
scaffold6_cov64	2676	2678	48.888889
scaffold6_cov64	2682	2684	0.0
scaffold6_cov64	4553	4555	17.142857
scaffold6_cov64	4588	4590	7.407407333333333
scaffold6_cov64	5101	5103	0.0
scaffold6_cov64	5309	5311	0.0
scaffold6_cov64	5456	5458	0.0
scaffold6_cov64	5486	5488	0.0
 3508578 Pact_union_5x-averages-RRBS.bedgraph


In [78]:
#Remove header
#Keep chr, start, end, and MBD-BS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Pact_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $16}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Pact_union_5x-averages-MBDBS.bedgraph

In [79]:
#Check output: chr, start, end, % meth
!head Pact_union_5x-averages-MBDBS.bedgraph
!wc -l Pact_union_5x-averages-MBDBS.bedgraph

scaffold2_cov51	649	651	0.0
scaffold2_cov51	686	688	0.0
scaffold2_cov51	778	780	0.0
scaffold3_cov83	118	120	37.142857
scaffold3_cov83	130	132	26.25
scaffold3_cov83	137	139	37.5
scaffold3_cov83	189	191	41.452991
scaffold3_cov83	208	210	42.12885166666667
scaffold3_cov83	243	245	0.0
scaffold3_cov83	261	263	47.75132266666666
 5906529 Pact_union_5x-averages-MBDBS.bedgraph


In [80]:
!find *averages-*bedgraph

Pact_union_5x-averages-MBDBS.bedgraph
Pact_union_5x-averages-RRBS.bedgraph
Pact_union_5x-averages-WGBS.bedgraph


In [81]:
!wc -l *averages-*bedgraph > Pact_union_5x-averages-counts.txt

### 2. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [82]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [83]:
!head *Meth

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth <==
scaffold3_cov83 475 477 61.45454566666666
scaffold3_cov83 504 506 51.6666665
scaffold7_cov100 5986 5988 77.14912266666668
scaffold7_cov100 6144 6146 96.29377666666666
scaffold7_cov100 6188 6190 95.25813700000002
scaffold7_cov100 6198 6200 95.69715500000001
scaffold7_cov100 6231 6233 78.51851866666668
scaffold7_cov100 6233 6235 95.98157266666665
scaffold7_cov100 7201 7203 80.0
scaffold7_cov100 7438 7440 80.0

==> Pact_union_5x-averages-RRBS.bedgraph-Meth <==
scaffold7_cov100 1535 1537 60.0
scaffold7_cov100 17000 17002 100.0
scaffold7_cov100 17090 17092 100.0
scaffold7_cov100 24454 24456 100.0
scaffold7_cov100 24494 24496 87.5
scaffold7_cov100 24509 24511 100.0
scaffold7_cov100 24557 24559 100.0
scaffold7_cov100 33140 33142 80.0
scaffold7_cov100 33157 33159 80.0
scaffold7_cov100 40896 40898 50.0

==> Pact_union_5x-averages-WGBS.bedgraph-Meth <==
scaffold7_cov100 5500 5502 77.77777766666667
scaffold7_cov100 5986 

In [84]:
!wc -l *Meth

  725453 Pact_union_5x-averages-MBDBS.bedgraph-Meth
   66798 Pact_union_5x-averages-RRBS.bedgraph-Meth
  154042 Pact_union_5x-averages-WGBS.bedgraph-Meth
  946293 total


In [85]:
!wc -l *-Meth > Pact_union_5x-Meth-counts.txt

##### Sparsely methylated loci

In [86]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [87]:
!head *sparseMeth

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth <==
scaffold3_cov83 118 120 37.142857
scaffold3_cov83 130 132 26.25
scaffold3_cov83 137 139 37.5
scaffold3_cov83 189 191 41.452991
scaffold3_cov83 208 210 42.12885166666667
scaffold3_cov83 261 263 47.75132266666666
scaffold3_cov83 484 486 47.386809
scaffold6_cov64 4146 4148 16.25
scaffold6_cov64 5904 5906 14.285714000000002
scaffold6_cov64 6880 6882 10.714285500000003

==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth <==
scaffold6_cov64 2676 2678 48.888889
scaffold6_cov64 4553 4555 17.142857
scaffold6_cov64 5904 5906 10.800086
scaffold6_cov64 6374 6376 24.382284333333335
scaffold6_cov64 7373 7375 16.666667
scaffold7_cov100 2301 2303 25.0
scaffold7_cov100 4351 4353 20.935810000000004
scaffold7_cov100 15408 15410 20.0
scaffold7_cov100 17074 17076 33.333333
scaffold7_cov100 17098 17100 20.0

==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth <==
scaffold1_cov55 252 254 20.0
scaffold2_cov51 686 688 11.609686

In [88]:
!wc -l *sparseMeth

  713277 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth
  210985 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth
  311665 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth
 1235927 total


In [89]:
!wc -l *-sparseMeth > Pact_union_5x-sparseMeth-counts.txt

##### Unmethylated loci

In [90]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [91]:
!head *unMeth

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth <==
scaffold2_cov51 649 651 0.0
scaffold2_cov51 686 688 0.0
scaffold2_cov51 778 780 0.0
scaffold3_cov83 243 245 0.0
scaffold6_cov64 290 292 0.0
scaffold6_cov64 298 300 0.0
scaffold6_cov64 489 491 0.0
scaffold6_cov64 826 828 7.142857000000001
scaffold6_cov64 2097 2099 0.0
scaffold6_cov64 2179 2181 0.0

==> Pact_union_5x-averages-RRBS.bedgraph-unMeth <==
scaffold6_cov64 2536 2538 0.0
scaffold6_cov64 2584 2586 0.0
scaffold6_cov64 2682 2684 0.0
scaffold6_cov64 4588 4590 7.407407333333333
scaffold6_cov64 5101 5103 0.0
scaffold6_cov64 5309 5311 0.0
scaffold6_cov64 5456 5458 0.0
scaffold6_cov64 5486 5488 0.0
scaffold6_cov64 5545 5547 4.761904666666667
scaffold6_cov64 5559 5561 3.952991666666667

==> Pact_union_5x-averages-WGBS.bedgraph-unMeth <==
scaffold1_cov55 49 51 0.0
scaffold1_cov55 84 86 0.0
scaffold1_cov55 92 94 0.0
scaffold1_cov55 102 104 8.3333335
scaffold1_cov55 105 107 6.25
scaffold1_cov55 116 118 0.0
s

In [92]:
!wc -l *unMeth

 4467799 Pact_union_5x-averages-MBDBS.bedgraph-unMeth
 3230795 Pact_union_5x-averages-RRBS.bedgraph-unMeth
 7199436 Pact_union_5x-averages-WGBS.bedgraph-unMeth
 14898030 total


In [93]:
!wc -l *-unMeth > Pact_union_5x-unMeth-counts.txt

### 3. Characterize genomic locations of CpGs

#### 3a. Create BEDfiles

In [94]:
%%bash

for f in *averages-*bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 5906529 Pact_union_5x-averages-MBDBS.bedgraph.bed
  725453 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed
  713277 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed
 4467799 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed
 3508578 Pact_union_5x-averages-RRBS.bedgraph.bed
   66798 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed
  210985 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed
 3230795 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed
 7665143 Pact_union_5x-averages-WGBS.bedgraph.bed
  154042 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed
  311665 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed
 7199436 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed


In [40]:
#Confirm file creation
!head Pact_union_5x-averages-MBDBS.bedgraph.bed

scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold2_cov51	778	780
scaffold3_cov83	118	120
scaffold3_cov83	130	132
scaffold3_cov83	137	139
scaffold3_cov83	189	191
scaffold3_cov83	208	210
scaffold3_cov83	243	245
scaffold3_cov83	261	263


#### 3b. Genes

In [41]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paGenes
done

In [42]:
#Check output
!head *paGenes

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paGenes <==
scaffold7_cov100	5986	5988
scaffold7_cov100	6144	6146
scaffold7_cov100	6188	6190
scaffold7_cov100	6198	6200
scaffold7_cov100	7201	7203
scaffold7_cov100	7438	7440
scaffold7_cov100	7696	7698
scaffold7_cov100	7796	7798
scaffold7_cov100	7891	7893
scaffold7_cov100	8323	8325

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paGenes <==
scaffold6_cov64	4146	4148
scaffold6_cov64	5904	5906
scaffold7_cov100	1390	1392
scaffold7_cov100	1422	1424
scaffold7_cov100	3956	3958
scaffold7_cov100	4630	4632
scaffold7_cov100	4678	4680
scaffold7_cov100	5500	5502
scaffold7_cov100	5578	5580
scaffold7_cov100	11439	11441

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paGenes <==
scaffold6_cov64	290	292
scaffold6_cov64	298	300
scaffold6_cov64	489	491
scaffold6_cov64	826	828
scaffold6_cov64	2097	2099
scaffold6_cov64	2179	2181
scaffold6_cov64	2805	2807
scaffold6_cov64	2872	2874
scaffold6_cov64	2876	2878
s

In [43]:
#Count number of overlaps
!wc -l *paGenes

  328451 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paGenes
  286670 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paGenes
 2134275 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paGenes
 2749396 Pact_union_5x-averages-MBDBS.bedgraph.bed-paGenes
   28777 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paGenes
   85319 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paGenes
 1357962 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paGenes
 1472058 Pact_union_5x-averages-RRBS.bedgraph.bed-paGenes
  100773 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paGenes
  119682 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paGenes
 2978882 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paGenes
 3199337 Pact_union_5x-averages-WGBS.bedgraph.bed-paGenes
 14841582 total


In [44]:
!wc -l *paGenes > Pact_union_5x-paGenes-counts.txt

#### 3c. Coding Sequences (CDS)

In [45]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \
  > ${f}-paCDS
done

In [46]:
#Check output
!head *paCDS

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paCDS <==
scaffold7_cov100	6144	6146
scaffold7_cov100	6188	6190
scaffold7_cov100	6198	6200
scaffold7_cov100	7696	7698
scaffold7_cov100	7891	7893
scaffold7_cov100	8323	8325
scaffold7_cov100	9715	9717
scaffold7_cov100	9877	9879
scaffold7_cov100	10216	10218
scaffold7_cov100	10273	10275

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paCDS <==
scaffold7_cov100	1390	1392
scaffold7_cov100	1422	1424
scaffold7_cov100	4630	4632
scaffold7_cov100	4678	4680
scaffold7_cov100	5500	5502
scaffold7_cov100	14042	14044
scaffold7_cov100	17164	17166
scaffold7_cov100	28638	28640
scaffold7_cov100	28791	28793
scaffold7_cov100	33157	33159

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paCDS <==
scaffold6_cov64	826	828
scaffold6_cov64	2097	2099
scaffold6_cov64	2179	2181
scaffold6_cov64	4793	4795
scaffold6_cov64	5581	5583
scaffold6_cov64	5583	5585
scaffold6_cov64	5592	5594
scaffold6_cov64	5594	5596
scaffold6_cov6

In [47]:
#Count number of overlaps
!wc -l *paCDS

  200688 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paCDS
  165073 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paCDS
 1209545 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paCDS
 1575306 Pact_union_5x-averages-MBDBS.bedgraph.bed-paCDS
   15427 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paCDS
   43226 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paCDS
  676526 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paCDS
  735179 Pact_union_5x-averages-RRBS.bedgraph.bed-paCDS
   57154 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paCDS
   50536 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paCDS
 1300474 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paCDS
 1408164 Pact_union_5x-averages-WGBS.bedgraph.bed-paCDS
 7437298 total


In [48]:
!wc -l *paCDS > Pact_union_5x-paCDS-counts.txt

#### 3d. Introns

In [49]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \
  > ${f}-paIntron
done

In [50]:
#Check output
!head *paIntron

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntron <==
scaffold7_cov100	5986	5988
scaffold7_cov100	7201	7203
scaffold7_cov100	7438	7440
scaffold7_cov100	7796	7798
scaffold7_cov100	7891	7893
scaffold7_cov100	10414	10416
scaffold7_cov100	13385	13387
scaffold7_cov100	13450	13452
scaffold7_cov100	18941	18943
scaffold7_cov100	19095	19097

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntron <==
scaffold6_cov64	4146	4148
scaffold6_cov64	5904	5906
scaffold7_cov100	3956	3958
scaffold7_cov100	5578	5580
scaffold7_cov100	11439	11441
scaffold7_cov100	35655	35657
scaffold7_cov100	35661	35663
scaffold7_cov100	36634	36636
scaffold7_cov100	43825	43827
scaffold7_cov100	43840	43842

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntron <==
scaffold6_cov64	290	292
scaffold6_cov64	298	300
scaffold6_cov64	489	491
scaffold6_cov64	2805	2807
scaffold6_cov64	2872	2874
scaffold6_cov64	2876	2878
scaffold6_cov64	3198	3200
scaffold6_cov64	3343	3345
sca

In [51]:
#Count number of overlaps
!wc -l *paIntron

  130904 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntron
  123293 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntron
  935497 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntron
 1189694 Pact_union_5x-averages-MBDBS.bedgraph.bed-paIntron
   13535 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paIntron
   42572 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paIntron
  688533 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paIntron
  744640 Pact_union_5x-averages-RRBS.bedgraph.bed-paIntron
   44757 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paIntron
   69941 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paIntron
 1694853 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paIntron
 1809551 Pact_union_5x-averages-WGBS.bedgraph.bed-paIntron
 7487770 total


In [52]:
!wc -l *paIntron > Pact_union_5x-paIntron-counts.txt

#### 3e. Flanking regions

In [53]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.flanks.gff \
  > ${f}-paFlanks
done

In [54]:
#Check output
!head *paFlanks

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanks <==
scaffold7_cov100	6231	6233
scaffold7_cov100	6233	6235
scaffold7_cov100	11815	11817
scaffold7_cov100	12652	12654
scaffold7_cov100	12662	12664
scaffold7_cov100	12675	12677
scaffold7_cov100	12683	12685
scaffold7_cov100	12704	12706
scaffold7_cov100	12726	12728
scaffold7_cov100	12737	12739

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanks <==
scaffold6_cov64	6880	6882
scaffold7_cov100	12131	12133
scaffold7_cov100	12247	12249
scaffold7_cov100	12861	12863
scaffold7_cov100	15597	15599
scaffold7_cov100	15614	15616
scaffold7_cov100	15630	15632
scaffold7_cov100	24443	24445
scaffold7_cov100	24617	24619
scaffold7_cov100	24969	24971

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanks <==
scaffold6_cov64	5797	5799
scaffold6_cov64	5800	5802
scaffold6_cov64	6751	6753
scaffold6_cov64	6805	6807
scaffold6_cov64	6813	6815
scaffold6_cov64	6862	6864
scaffold6_cov64	6885	6887
scaffold6_c

In [55]:
#Count number of overlaps
!wc -l *paFlanks

  116700 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanks
  121769 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanks
  805964 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanks
 1044433 Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanks
   11990 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanks
   40163 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanks
  646364 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanks
  698517 Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanks
   27106 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanks
   57568 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanks
 1438851 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanks
 1523525 Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanks
 6532950 total


In [56]:
!wc -l *paFlanks > Pact_union_5x-paFlanks-counts.txt

#### 3f. Upstream flanking regions

In [57]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.flanks.Upstream.gff \
  > ${f}-paFlanksUpstream
done

In [58]:
#Check output
!head *paFlanksUpstream

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksUpstream <==
scaffold7_cov100	6231	6233
scaffold7_cov100	6233	6235
scaffold7_cov100	11815	11817
scaffold7_cov100	12652	12654
scaffold7_cov100	12662	12664
scaffold7_cov100	18520	18522
scaffold7_cov100	19284	19286
scaffold7_cov100	19296	19298
scaffold7_cov100	77986	77988
scaffold7_cov100	77988	77990

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksUpstream <==
scaffold7_cov100	12131	12133
scaffold7_cov100	12247	12249
scaffold7_cov100	15597	15599
scaffold7_cov100	15614	15616
scaffold7_cov100	15630	15632
scaffold7_cov100	38477	38479
scaffold7_cov100	41092	41094
scaffold7_cov100	49521	49523
scaffold7_cov100	73493	73495
scaffold7_cov100	78244	78246

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksUpstream <==
scaffold6_cov64	5797	5799
scaffold6_cov64	5800	5802
scaffold7_cov100	6764	6766
scaffold7_cov100	6913	6915
scaffold7_cov100	12587	12589
scaffold7_cov100	15549	15551
sca

In [59]:
#Count number of overlaps
!wc -l *paFlanksUpstream

   70985 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksUpstream
   76373 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksUpstream
  515522 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksUpstream
  662880 Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanksUpstream
    7390 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanksUpstream
   25148 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanksUpstream
  415601 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanksUpstream
  448139 Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanksUpstream
   15908 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanksUpstream
   33921 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanksUpstream
  881679 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanksUpstream
  931508 Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanksUpstream
 4085054 total


In [60]:
!wc -l *paFlanksUpstream > Pact_union_5x-paFlanksUpstream-counts.txt

#### 3g. Downstream flanking regions

In [61]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.flanks.Downstream.gff \
  > ${f}-paFlanksDownstream
done

In [62]:
#Check output
!head *paFlanksDownstream

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksDownstream <==
scaffold7_cov100	6231	6233
scaffold7_cov100	6233	6235
scaffold7_cov100	12652	12654
scaffold7_cov100	12662	12664
scaffold7_cov100	12675	12677
scaffold7_cov100	12683	12685
scaffold7_cov100	12704	12706
scaffold7_cov100	12726	12728
scaffold7_cov100	12737	12739
scaffold7_cov100	12806	12808

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksDownstream <==
scaffold6_cov64	6880	6882
scaffold7_cov100	12861	12863
scaffold7_cov100	24443	24445
scaffold7_cov100	24617	24619
scaffold7_cov100	24969	24971
scaffold7_cov100	74997	74999
scaffold7_cov100	78244	78246
scaffold7_cov100	91999	92001
scaffold7_cov100	92019	92021
scaffold7_cov100	92029	92031

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksDownstream <==
scaffold6_cov64	6751	6753
scaffold6_cov64	6805	6807
scaffold6_cov64	6813	6815
scaffold6_cov64	6862	6864
scaffold6_cov64	6885	6887
scaffold6_cov64	6909	6911
scaffold

In [63]:
#Count number of overlaps
!wc -l *paFlanksDownstream

   78103 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksDownstream
   71684 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksDownstream
  416647 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksDownstream
  566434 Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanksDownstream
    6897 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanksDownstream
   21410 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanksDownstream
  329980 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanksDownstream
  358287 Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanksDownstream
   18803 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanksDownstream
   35085 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanksDownstream
  767316 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanksDownstream
  821204 Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanksDownstream
 3491850 total


In [64]:
!wc -l *paFlanksDownstream > Pact_union_5x-paFlanksDownstream-counts.txt

#### 3h. Intergenic

In [65]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -u \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.intergenic.bed \
  > ${f}-paIntergenic
done

In [66]:
#Check output
!head *paIntergenic

==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntergenic <==
scaffold3_cov83	475	477
scaffold3_cov83	504	506
scaffold7_cov100	216293	216295
scaffold7_cov100	230584	230586
scaffold7_cov100	267363	267365
scaffold7_cov100	273435	273437
scaffold7_cov100	304579	304581
scaffold7_cov100	304583	304585
scaffold7_cov100	304590	304592
scaffold7_cov100	304601	304603

==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntergenic <==
scaffold3_cov83	118	120
scaffold3_cov83	130	132
scaffold3_cov83	137	139
scaffold3_cov83	189	191
scaffold3_cov83	208	210
scaffold3_cov83	261	263
scaffold3_cov83	484	486
scaffold7_cov100	84244	84246
scaffold7_cov100	84631	84633
scaffold7_cov100	84874	84876

==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntergenic <==
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold2_cov51	778	780
scaffold3_cov83	243	245
scaffold6_cov64	7687	7689
scaffold6_cov64	7693	7695
scaffold6_cov64	7698	7700
scaffold6_cov64	7701	7703
sc

In [67]:
#Count number of overlaps
!wc -l *paIntergenic

  280338 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntergenic
  304903 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntergenic
 1528019 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntergenic
 2113260 Pact_union_5x-averages-MBDBS.bedgraph.bed-paIntergenic
   26043 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paIntergenic
   85523 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paIntergenic
 1226877 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paIntergenic
 1338443 Pact_union_5x-averages-RRBS.bedgraph.bed-paIntergenic
   26172 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paIntergenic
  134445 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paIntergenic
 2782628 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paIntergenic
 2943245 Pact_union_5x-averages-WGBS.bedgraph.bed-paIntergenic
 12789896 total


In [68]:
!wc -l *paIntergenic > Pact_union_5x-paIntergenic-counts.txt