# Characterizing CpG Methylation (union bedgraphs with 5x data)

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome. I will use [union bedgraphs](https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/) with 5x data.

1. Download union bedgraphs and format for downstream analyses
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type

## 0. Set working directory and install programs

In [1]:
!pwd

/Users/yaamini/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaamini/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation-5x-Union

In [4]:
cd Characterizing-CpG-Methylation-5x-Union/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union


In [33]:
#Install pandas for this notebook
import pandas as pd
print(pd.__version__)

0.18.1


## *M. capitata*

In [5]:
#Make a directory for Mcap output
#!mkdir Mcap

In [6]:
cd Mcap/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Mcap


### 1. Add variable paths

In [8]:
mcGenes = "../../../genome-feature-files/Mcap.GFFannotation.gene.gff"

In [9]:
mcCDS = "../../../genome-feature-files/Mcap.GFFannotation.CDS.gff"

In [10]:
mcIntron = "../../../genome-feature-files/Mcap.GFFannotation.intron.gff"

In [11]:
mcFlanks = "../../../genome-feature-files/Mcap.GFFannotation.flanks.gff"

In [12]:
mcCGMotifs = "../../../genome-feature-files/Mcap_CpG.gff"

### 2. Format data

#### 2a. Download bedgraph

In [13]:
#Download Mcap 5x union bedgraph
!wget https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Mcap_union_5x.bedgraph

--2020-05-06 10:13:09--  https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/Mcap_union_5x.bedgraph
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862402537 (822M)
Saving to: ‘Mcap_union_5x.bedgraph’


2020-05-06 10:13:20 (78.1 MB/s) - ‘Mcap_union_5x.bedgraph’ saved [862402537/862402537]



In [27]:
#Check downloaded file
#WGBS: 10-12
#RRBS: 14-16
#MBD-BS: 16-18
!tail Mcap_union_5x.bedgraph

3043	18674	18676	12.500000	N/A	0.000000	N/A	N/A	N/A	N/A	N/A	N/A
3043	18688	18690	7.692308	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A
3043	18720	18722	8.333333	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18734	18736	8.333333	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18740	18742	0.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18755	18757	11.111111	0.000000	0.000000	N/A	N/A	N/A	N/A	N/A	0.000000
3043	18784	18786	20.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18801	18803	0.000000	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	0.000000
3043	18822	18824	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A
3043	18827	18829	N/A	0.000000	0.000000	N/A	N/A	N/A	N/A	0.000000	N/A


#### 2b. Manipulate with `pandas`

In [38]:
#Import union data into pandas
#Check head
df = pd.read_table("Mcap_union_5x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,10,11,12,13,14,15,16,17,18
0,1,3493,3495,,,,0.0,,0.0,,,
1,1,3518,3520,,,,0.0,,0.0,,,
2,1,3727,3729,,,,0.0,0.0,8.695652,,,
3,1,3752,3754,,,,0.0,0.0,0.0,,,
4,1,3757,3759,,,,0.0,0.0,0.0,,,


In [43]:
#Average the first three columns for WGBS information and save as a new column
#Average the middle three columns for RRBS information and save as a new column
#Average the last three columns for MBD-BS information and save as a new column
#Check output
df['WGBS'] = df[['10', '11', "12"]].mean(axis=1)
df['RRBS'] = df[['13', '14', "15"]].mean(axis=1)
df['MBD-BS'] = df[['16', '17', "18"]].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,10,11,12,13,14,15,16,17,18,WGBS,RRBS,MBD-BS
13340258,3043,18674,18676,12.5,,0.0,,,,,,,6.25,,
13340259,3043,18688,18690,7.692308,0.0,0.0,,,,,0.0,,2.564103,,0.0
13340260,3043,18720,18722,8.333333,0.0,0.0,,,,,0.0,0.0,2.777778,,0.0
13340261,3043,18734,18736,8.333333,0.0,0.0,,,,,,0.0,2.777778,,0.0
13340262,3043,18740,18742,0.0,0.0,0.0,,,,,,0.0,0.0,,0.0
13340263,3043,18755,18757,11.111111,0.0,0.0,,,,,,0.0,3.703704,,0.0
13340264,3043,18784,18786,20.0,0.0,0.0,,,,,0.0,0.0,6.666667,,0.0
13340265,3043,18801,18803,0.0,0.0,0.0,,,,,0.0,0.0,0.0,,0.0
13340266,3043,18822,18824,,0.0,0.0,,,,,0.0,,0.0,,0.0
13340267,3043,18827,18829,,0.0,0.0,,,,,0.0,,0.0,,0.0


In [87]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("Mcap_union_5x-averages.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

#### 2c. Separate methods into new bedgraphs

In [88]:
#Check pandas manipulations
!head Mcap_union_5x-averages.bedgraph

	chrom	start	end	10	11	12	13	14	15	16	17	18	WGBS	RRBS	MBD-BS
0	1	3493	3495	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A
1	1	3518	3520	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A
2	1	3727	3729	N/A	N/A	N/A	0.0	0.0	8.695652	N/A	N/A	N/A	N/A	2.898550666666667	N/A
3	1	3752	3754	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
4	1	3757	3759	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
5	1	3770	3772	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	N/A	N/A	0.0	N/A
6	1	4062	4064	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
7	1	4069	4071	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A
8	1	4077	4079	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A


In [145]:
#Remove header
#Keep chr, start, end, and WGBS average (col 2-4, 13)
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $13}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-WGBS.bedgraph

In [146]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-WGBS.bedgraph
!wc -l Mcap_union_5x-averages-WGBS.bedgraph

1	8113	8115	20.0
1	224609	224611	0.0
1	264560	264562	0.0
1	264598	264600	0.0
1	271145	271147	0.0
1	277994	277996	16.666667
1	278004	278006	0.0
1	278039	278041	0.0
1	278049	278051	0.0
1	278067	278069	0.0
  153392 Mcap_union_5x-averages-WGBS.bedgraph


In [147]:
#Remove header
#Keep chr, start, end, and RRBS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $14}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-RRBS.bedgraph

In [148]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-RRBS.bedgraph
!wc -l Mcap_union_5x-averages-RRBS.bedgraph

1	4062	4064	0.0
1	4069	4071	0.0
1	4077	4079	0.0
1	4086	4088	0.0
1	4146	4148	0.0
1	4150	4152	0.0
1	4155	4157	0.0
1	4172	4174	0.0
1	4184	4186	0.0
1	4190	4192	16.666667
 11509837 Mcap_union_5x-averages-RRBS.bedgraph


In [149]:
#Remove header
#Keep chr, start, end, and MBD-BS average
#Remove rows where the 4th column (average %meth) is N/A
#Save file
!tail -n +2 Mcap_union_5x-averages.bedgraph \
| awk -F'\t' -v OFS='\t' '{print $2, $3, $4, $15}' \
| awk -F'\t' -v OFS='\t' '$4 != "N/A"' \
> Mcap_union_5x-averages-MBDBS.bedgraph

In [150]:
#Check output: chr, start, end, % meth
!head Mcap_union_5x-averages-MBDBS.bedgraph
!wc -l Mcap_union_5x-averages-MBDBS.bedgraph

1	3493	3495	0.0
1	3518	3520	0.0
1	3727	3729	2.898550666666667
1	3752	3754	0.0
1	3757	3759	0.0
1	3770	3772	0.0
1	11876	11878	0.0
1	11887	11889	0.0
1	11894	11896	0.0
1	11941	11943	0.0
 3981450 Mcap_union_5x-averages-MBDBS.bedgraph


In [151]:
!find *averages-*bedgraph

Mcap_union_5x-averages-MBDBS.bedgraph
Mcap_union_5x-averages-RRBS.bedgraph
Mcap_union_5x-averages-WGBS.bedgraph


In [152]:
!wc -l *averages-*bedgraph > Mcap_union_5x-averages-counts.txt

### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [153]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [158]:
!head *-Meth

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth <==
1 32228 32230 50.413223
1 58618 58620 95.65826333333332
1 58745 58747 96.819728
1 58764 58766 99.16666666666667
1 58792 58794 83.42830033333334
1 66041 66043 100.0
1 66050 66052 100.0
1 66339 66341 88.888889
1 66345 66347 77.777778
1 66354 66356 77.777778

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth <==
1 4948 4950 50.0
1 4967 4969 50.0
1 4986 4988 50.0
1 57065 57067 80.0
1 58609 58611 100.0
1 58618 58620 100.0
1 58745 58747 100.0
1 59207 59209 100.0
1 59277 59279 100.0
1 59393 59395 100.0

==> Mcap_union_5x-averages-WGBS.bedgraph-Meth <==
1 1002973 1002975 50.0
1 1343240 1343242 100.0
1 1343249 1343251 100.0
1 1343263 1343265 83.333333
1 1343265 1343267 100.0
1 1343295 1343297 100.0
1 1343304 1343306 100.0
1 1343320 1343322 100.0
1 1451821 1451823 60.0
1 1468323 1468325 100.0


In [155]:
!wc -l *-Meth

  329361 Mcap_union_5x-averages-MBDBS.bedgraph-Meth
 1350936 Mcap_union_5x-averages-RRBS.bedgraph-Meth
   29468 Mcap_union_5x-averages-WGBS.bedgraph-Meth
 1709765 total


In [159]:
!wc -l *-Meth > Mcap_union_5x-Meth-counts.txt

##### Sparsely methylated loci

In [160]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [161]:
!head *-sparseMeth

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth <==
1 15092 15094 30.0
1 21739 21741 13.636364000000002
1 34139 34141 11.764706
1 42261 42263 10.539216
1 45163 45165 10.31746
1 48370 48372 14.285714000000002
1 87492 87494 33.333333
1 89011 89013 14.285714000000002
1 101503 101505 17.380952
1 101545 101547 23.3333335

==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth <==
1 4190 4192 16.666667
1 4891 4893 33.333333
1 4910 4912 28.571429
1 4929 4931 16.6666665
1 5005 5007 28.571429
1 5024 5026 40.0
1 5151 5153 20.0
1 5160 5162 16.666667
1 5228 5230 11.111111
1 6282 6284 11.111111

==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth <==
1 8113 8115 20.0
1 277994 277996 16.666667
1 387294 387296 20.0
1 461787 461789 40.0
1 480696 480698 20.0
1 605019 605021 28.571429
1 605050 605052 33.333333
1 646162 646164 20.0
1 667790 667792 40.0
1 726420 726422 20.0


In [162]:
!wc -l *-sparseMeth

  220277 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth
 1155033 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth
   16793 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth
 1392103 total


In [163]:
!wc -l *-sparseMeth > Mcap_union_5x-sparseMeth-counts.txt

##### Unmethylated loci

In [164]:
%%bash
for f in *averages-*bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [165]:
!head *-unMeth

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth <==
1 3493 3495 0.0
1 3518 3520 0.0
1 3727 3729 2.898550666666667
1 3752 3754 0.0
1 3757 3759 0.0
1 3770 3772 0.0
1 11876 11878 0.0
1 11887 11889 0.0
1 11894 11896 0.0
1 11941 11943 0.0

==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth <==
1 4062 4064 0.0
1 4069 4071 0.0
1 4077 4079 0.0
1 4086 4088 0.0
1 4146 4148 0.0
1 4150 4152 0.0
1 4155 4157 0.0
1 4172 4174 0.0
1 4184 4186 0.0
1 5043 5045 0.0

==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth <==
1 224609 224611 0.0
1 264560 264562 0.0
1 264598 264600 0.0
1 271145 271147 0.0
1 278004 278006 0.0
1 278039 278041 0.0
1 278049 278051 0.0
1 278067 278069 0.0
1 280413 280415 0.0
1 280448 280450 0.0


In [166]:
!wc -l *-unMeth

 3431812 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth
 9003868 Mcap_union_5x-averages-RRBS.bedgraph-unMeth
  107131 Mcap_union_5x-averages-WGBS.bedgraph-unMeth
 12542811 total


In [167]:
!wc -l *-unMeth > Mcap_union_5x-unMeth-counts.txt

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [182]:
%%bash

for f in *averages-*bedgraph*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 3981450 Mcap_union_5x-averages-MBDBS.bedgraph.bed
  329361 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed
  220277 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed
 3431812 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed
 11509837 Mcap_union_5x-averages-RRBS.bedgraph.bed
 1350936 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed
 1155033 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed
 9003868 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed
  153392 Mcap_union_5x-averages-WGBS.bedgraph.bed
   29468 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed
   16793 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed
  107131 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed


In [183]:
#Confirm file creation
!head Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed

1	3493	3495
1	3518	3520
1	3727	3729
1	3752	3754
1	3757	3759
1	3770	3772
1	11876	11878
1	11887	11889
1	11894	11896
1	11941	11943


#### 4b. Genes

In [177]:
!/usr/local/bin/intersectBed \
-wb \
-a Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed \
-b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
| head

Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?


In [174]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcGenes
done

Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file format.  Please use tab-delimited BED, GFF, or VCF. Perhaps you have non-integer starts or ends at line 1?
Unexpected file 

In [175]:
#Check output
!head *mcGenes

==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes <==

==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes <==

==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes <==

==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes <==

==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes <==

==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcGenes <==

==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcGenes <==

==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcGenes <==

==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcGenes <==

==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcGenes <==

==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcGenes <==

==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcGenes <==


In [176]:
#Count number of overlaps
!wc -l *mcGenes

       0 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes
       0 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes
       0 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes
       0 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes
       0 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes
       0 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcGenes
       0 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcGenes
       0 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcGenes
       0 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcGenes
       0 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcGenes
       0 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcGenes
       0 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcGenes
       0 total


In [11]:
!wc -l *mcGenes > Mcap_union_5x-mcGenes-counts.txt

#### 4c. Coding Sequences (CDS)

In [66]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \
  > ${f}-mcCDS
done

In [67]:
#Check output
!head *mcCDS

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS <==
1	58745	58747	1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	438779	438781	1	AUGUSTUS	CDS	438772	439162	1	+	0	transcript_id "g21564.t1"; gene_id "g21564";
1	438791	438793	1	AUGUSTUS	CDS	438772	439162	1	+	0	transcript_id "g21564.t1"; gene_id "g21564";
1	786125	786127	1	AUGUSTUS	CDS	785899	786207	0.98	-	0	transcript_id "g21598.t1"; gene_id "g21598";
1	786144	786146	1	AUGUSTUS	CDS	785899	786207	0.98	-	0	transcript_id "g21598.t1"; gene_id "g21598";
1	789544	789546	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	789590	789592	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	879226	879228	1	AUGUSTUS	CDS	879219	879325	1	+	2	transcript_id "g21603.t1"; gene_id "g21603";
1	983540	983542	1	AUGUSTUS	CDS	983471	983576	1	-	2	transcript_id "g21609.t1"; gene_id "g21609";
1	1263116	1263118	1	AUGUSTUS	CDS	1262915	126430

In [68]:
#Count number of overlaps
!wc -l *mcCDS

   54412 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   60266 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcCDS
  361901 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcCDS
  476579 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcCDS
   64070 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   58258 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcCDS
  374559 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcCDS
  496887 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcCDS
  113396 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   89455 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcCDS
  589114 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcCDS
  791965 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcCDS
   18158 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcCDS
   11383 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgrap

In [12]:
!wc -l *mcCDS > Mcap-5x-mcCDS-counts.txt

#### 4d. Introns

In [69]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \
  > ${f}-mcIntrons
done

In [70]:
#Check output
!head *mcIntrons

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons <==
1	103334	103336	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103347	103349	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103356	103358	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103360	103362	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	103398	103400	1	AUGUSTUS	intron	102849	104430	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	105953	105955	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106012	106014	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106155	106157	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106173	106175	1	AUGUSTUS	intron	104815	109637	1	-	.	transcript_id "g21538.t1"; gene_id "g21538";
1	106216	106218	1	AUGUST

In [71]:
#Count number of overlaps
!wc -l *mcIntrons

  176102 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
  136671 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntrons
  894836 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntrons
 1207609 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntrons
  203311 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
  130195 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntrons
  910897 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntrons
 1244403 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntrons
  420249 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
  259702 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntrons
 1735085 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntrons
 2415036 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntrons
  105145 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntrons
   

In [13]:
!wc -l *mcIntrons > Mcap-5x-mcIntrons-counts.txt

#### 4e. Flanking regions

In [24]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.gff \
  > ${f}-mcFlanks
done

In [25]:
#Check output
!head *mcFlanks

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcFlanks <==
1	443126	443128	1	AUGUSTUS	gene	443083	443143	0.79	-	.	g21565
1	443126	443128	1	AUGUSTUS	gene	443083	443143	0.61	-	.	g21566
1	444404	444406	1	AUGUSTUS	gene	443636	444635	0.61	-	.	g21566
1	1392759	1392761	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634
1	1392780	1392782	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634
1	1392793	1392795	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634
1	1392832	1392834	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634
1	1392838	1392840	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634
1	1392908	1392910	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634
1	1392921	1392923	1	AUGUSTUS	gene	1392431	1393430	1	+	.	g21634

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcFlanks <==
1	27782	27784	1	AUGUSTUS	gene	27294	28293	0.23	-	.	g21533
1	148080	148082	1	AUGUSTUS	gene	147344	148343	0.44	+	.	g21541
1	150099	150101	1	AUGUSTUS	gene	149589	150588	0.44	+	.	g21541
1	182756	182758	1	AUGUSTUS	gene	1820

In [26]:
#Count number of overlaps
!wc -l *mcFlanks

   51223 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcFlanks
   68513 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcFlanks
  391706 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcFlanks
  511442 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcFlanks
   60053 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcFlanks
   63957 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcFlanks
  396934 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcFlanks
  520944 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcFlanks
  121924 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcFlanks
  118462 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcFlanks
  736137 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcFlanks
  976523 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcFlanks
   30117 Meth13_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcFlanks
   18490 Meth13_

In [27]:
!wc -l *mcFlanks > Mcap-5x-mcFlanks-counts.txt

#### 4f. Intergenic

In [72]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
  > ${f}-mcIntergenic
done

In [73]:
#Check output
!head *mcIntergenic

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic <==
1	320600	320602
1	320631	320633
1	443126	443128
1	444404	444406
1	446577	446579
1	446641	446643
1	446659	446661
1	446682	446684
1	446691	446693
1	446746	446748

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic <==
1	27782	27784
1	140551	140553
1	148080	148082
1	150099	150101
1	169735	169737
1	169771	169773
1	169796	169798
1	169800	169802
1	182756	182758
1	185808	185810

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic <==
1	6570	6572
1	6713	6715
1	6780	6782
1	6813	6815
1	6818	6820
1	27606	27608
1	27613	27615
1	27641	27643
1	27643	27645
1	27674	27676

==> Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic <==
1	6570	6572
1	6713	6715
1	6780	6782
1	6813	6815
1	6818	6820
1	27606	27608
1	27613	27615
1	27641	27643
1	27643	27645
1	27674	27676

==> Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Met

In [74]:
#Count number of overlaps
!wc -l *mcIntergenic

  220239 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic
  351041 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic
 2316939 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic
 2888219 Meth10_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic
  261721 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic
  329457 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic
 2330389 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic
 2921567 Meth11_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic
  526674 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-mcIntergenic
  651370 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-mcIntergenic
 4408771 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-mcIntergenic
 5586815 Meth12_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-mcIntergenic
  134470 Meth13_R1_001_val_1_bismark_bt2_pe

In [14]:
!wc -l *mcIntergenic > Mcap-5x-mcIntergenic-counts.txt

## *P. acuta*

In [28]:
cd ..

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x


In [16]:
#Make a directory for Pact output
#!mkdir Pact

In [29]:
cd Pact/

/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x/Pact


### 1. Set variable paths

In [18]:
paGenes = "../../../genome-feature-files/Pact.GFFannotation.Genes.gff"

In [19]:
paCDS = "../../../genome-feature-files/Pact.GFFannotation.CDS.gff"

In [20]:
paIntron = "../../../genome-feature-files/Pact.GFFannotation.Intron.gff"

In [30]:
paFlanks = "../../../genome-feature-files/Pact.GFFannotation.flanks.gff"

In [31]:
paCGMotifs = "../../../genome-feature-files/Pact_CpG.gff"

### 2. Download coverage files

In [34]:
#Download Pact WGBS and MBD-BS 5x sample bedgraphs
!wget -r -l1 --no-parent -A "*5x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/

--2020-05-05 13:25:32--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  42.11K  --.-KB/s    in 0.001s  

2020-05-05 13:25:34 (47.9 MB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp’ saved [43123]

Loading robots.txt; please ignore errors.
--2020-05-05 13:25:34--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-05-05 13:25:34 ERROR 404: Not Found.

Removing gannet

In [35]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/* .

In [36]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [37]:
#Check files
!find *bedgraph

Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph


In [38]:
#Download Pact RRBS 5x sample bedgraphs
!wget -r -l1 --no-parent -A "*5x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/

--2020-05-05 13:25:49--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.html.tmp’

gannet.fish.washing     [ <=>                ]  19.51K  --.-KB/s    in 0.001s  

2020-05-05 13:25:50 (32.7 MB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.html.tmp’ saved [19983]

Loading robots.txt; please ignore errors.
--2020-05-05 13:25:50--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-05-05 13:25:50 ERROR 404: Not Found.

Removing 

In [39]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/* .

In [40]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [41]:
!find *bedgraph

Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph


In [42]:
#Verify checksums from gannet
!md5sum -c ../Pact-5xbedgraph-GANNET-md5sum.txt

Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph: OK


In [44]:
!wc -l *bedgraph

 5546051 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 6358722 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 5866786 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 1835561 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 1451229 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 1517358 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 2640625 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
  539008 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 2732607 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph
 28487947 total


In [18]:
!wc -l *bedgraph > Pact-5x-bedgraph-counts.txt

### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [45]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [46]:
!head *Meth

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
scaffold7_cov100 4351 4353 50.000000
scaffold7_cov100 5500 5502 83.333333
scaffold7_cov100 5578 5580 57.142857
scaffold7_cov100 5986 5988 100.000000
scaffold7_cov100 6144 6146 100.000000
scaffold7_cov100 6188 6190 100.000000
scaffold7_cov100 6198 6200 88.888889
scaffold7_cov100 6231 6233 100.000000
scaffold7_cov100 6233 6235 100.000000
scaffold7_cov100 7438 7440 100.000000

==> Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
scaffold7_cov100 5500 5502 62.500000
scaffold7_cov100 5986 5988 66.666667
scaffold7_cov100 6144 6146 100.000000
scaffold7_cov100 6188 6190 94.117647
scaffold7_cov100 6198 6200 100.000000
scaffold7_cov100 6231 6233 71.428571
scaffold7_cov100 6233 6235 100.000000
scaffold7_cov100 7438 7440 88.235294
scaffold7_cov100 7696 7698 95.833333
scaffold7_cov100 7796 7798 60.000000

==> Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth <==
scaffold7_cov100 5500 5502 87.500000
scaffo

In [47]:
!wc -l *Meth

  110364 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  126440 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  124819 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   31047 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   30345 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
   26617 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  258222 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  213342 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
  255370 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth
 1176566 total


In [19]:
!wc -l *-Meth > Pact-5x-Meth-counts.txt

##### Sparsely methylated loci

In [48]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-sparseMeth
done

In [49]:
!head *sparseMeth

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth <==
scaffold1_cov55 102 104 16.666667
scaffold1_cov55 186 188 20.000000
scaffold3_cov83 118 120 12.500000
scaffold3_cov83 137 139 12.500000
scaffold3_cov83 475 477 18.750000
scaffold3_cov83 484 486 14.893617
scaffold3_cov83 504 506 21.052632
scaffold6_cov64 7373 7375 12.500000
scaffold6_cov64 7983 7985 11.111111
scaffold7_cov100 1293 1295 11.111111

==> Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth <==
scaffold1_cov55 105 107 12.500000
scaffold1_cov55 252 254 20.000000
scaffold2_cov51 686 688 11.111111
scaffold6_cov64 3978 3980 11.111111
scaffold6_cov64 7077 7079 12.500000
scaffold7_cov100 2652 2654 16.666667
scaffold7_cov100 3994 3996 10.526316
scaffold7_cov100 7121 7123 25.000000
scaffold7_cov100 7201 7203 16.666667
scaffold7_cov100 10755 10757 13.333333

==> Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth <==
scaffold1_cov55 119 121 20.000000
scaffold1_cov55 194 196 20.00000

In [50]:
!wc -l *sparseMeth

  367019 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  345887 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  385346 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  137700 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
   64837 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
   89246 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  296059 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
   80086 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
  337855 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth
 2104035 total


In [20]:
!wc -l *-sparseMeth > Pact-5x-sparseMeth-counts.txt

##### Unmethylated loci

In [51]:
%%bash
for f in *bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-unMeth
done

In [52]:
!head *unMeth

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
scaffold1_cov55 105 107 0.000000
scaffold1_cov55 116 118 0.000000
scaffold1_cov55 119 121 0.000000
scaffold1_cov55 146 148 0.000000
scaffold1_cov55 194 196 0.000000
scaffold2_cov51 649 651 0.000000
scaffold2_cov51 686 688 8.333333
scaffold2_cov51 778 780 0.000000
scaffold3_cov83 130 132 0.000000
scaffold3_cov83 189 191 6.250000

==> Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
scaffold1_cov55 49 51 0.000000
scaffold1_cov55 84 86 0.000000
scaffold1_cov55 92 94 0.000000
scaffold1_cov55 102 104 0.000000
scaffold1_cov55 116 118 0.000000
scaffold1_cov55 119 121 0.000000
scaffold1_cov55 146 148 0.000000
scaffold1_cov55 169 171 0.000000
scaffold1_cov55 186 188 0.000000
scaffold1_cov55 194 196 0.000000

==> Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth <==
scaffold1_cov55 250 252 0.000000
scaffold2_cov51 649 651 0.000000
scaffold2_cov51 778 780 0.000000
scaffold3_cov83 118 120 0.000000
scaffold3_cov83 130 132 0.

In [53]:
!wc -l *unMeth

 5068668 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 5886395 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 5356621 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 1666814 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 1356047 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 1401495 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 2086344 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
  245580 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 2139382 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth
 25207346 total


In [21]:
!wc -l *-unMeth > Pact-5x-unMeth-counts.txt

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [54]:
%%bash

for f in *bedgraph
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 5546051 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 6358722 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 5866786 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1835561 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1451229 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 1517358 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 2640625 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
  539008 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
 2732607 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed


In [55]:
%%bash

for f in *bedgraph-Meth
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  110364 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  126440 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  124819 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
   31047 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
   30345 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
   26617 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  258222 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  213342 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
  255370 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed


In [56]:
%%bash

for f in *bedgraph-sparseMeth
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  367019 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  345887 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  385346 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  137700 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
   64837 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
   89246 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  296059 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
   80086 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
  337855 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed


In [57]:
%%bash

for f in *bedgraph-unMeth
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

 5068668 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 5886395 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 5356621 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 1666814 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 1356047 Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 1401495 Meth6_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 2086344 Meth7_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
  245580 Meth8_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
 2139382 Meth9_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed


In [58]:
#Confirm BEDfile creation
!find *.bed

Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed
Meth5_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed
Meth5_R1_001_val_1_

In [60]:
#Confirm file creation
!head Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed

scaffold1_cov55	102	104
scaffold1_cov55	105	107
scaffold1_cov55	116	118
scaffold1_cov55	119	121
scaffold1_cov55	146	148
scaffold1_cov55	186	188
scaffold1_cov55	194	196
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold2_cov51	778	780


#### 4b. Genes

In [57]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paGenes
done

In [58]:
#Check output
!head *paGenes

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes <==
scaffold7_cov100	4351	4353	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	5500	5502	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	5578	5580	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	5986	5988	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6198	6200	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	7696	7698	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	7796	7798	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes <==
scaffold7_cov100	1293	1295	scaffold7_cov100	AUGUSTUS	gene	

In [59]:
#Count number of overlaps
!wc -l *paGenes

   73959 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
  157337 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes
 2235696 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paGenes
 2466992 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paGenes
   85861 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
  144292 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes
 2531803 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paGenes
 2761956 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paGenes
   82377 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
  161791 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paGenes
 2344110 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paGenes
 2588278 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paGenes
   13588 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paGenes
   56290 Meth4_R1_001_val_1_bismark_bt2_pe

In [22]:
!wc -l *paGenes > Pact-5x-paGenes-counts.txt

#### 4c. Coding Sequences (CDS)

In [60]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \
  > ${f}-paCDS
done

In [61]:
#Check output
!head *paCDS

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS <==
scaffold7_cov100	5500	5502	scaffold7_cov100	AUGUSTUS	CDS	5466	5540	1	-	2	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	5500	5502	scaffold7_cov100	AUGUSTUS	CDS	5466	5540	1	-	2	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6198	6200	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6198	6200	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	7696	7698	scaff

In [62]:
#Count number of overlaps
!wc -l *paCDS

   59188 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   89863 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paCDS
 1345289 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paCDS
 1494340 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paCDS
   66365 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   76868 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paCDS
 1477399 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paCDS
 1620632 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paCDS
   65245 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   89654 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paCDS
 1397816 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paCDS
 1552715 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paCDS
    9644 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paCDS
   36616 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.b

In [23]:
!wc -l *paCDS > Pact-5x-paCDS-counts.txt

#### 4d. Introns

In [63]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \
  > ${f}-paIntron
done

In [64]:
#Check output
!head *paIntron

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron <==
scaffold7_cov100	4351	4353	scaffold7_cov100	AUGUSTUS	intron	4181	4607	1	-	.	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	4351	4353	scaffold7_cov100	AUGUSTUS	intron	4181	4607	1	-	.	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	5578	5580	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	5578	5580	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	5986	5988	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	5986	5988	scaffold7_cov100	AUGUSTUS	intron	5541	6090	1	-	.	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	7796	7

In [65]:
#Count number of overlaps
!wc -l *paIntron

   41787 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
  122271 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntron
 1676080 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntron
 1840138 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntron
   51567 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
  117738 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntron
 1943990 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntron
 2113295 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntron
   47352 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
  128122 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntron
 1770676 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntron
 1946150 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntron
    8446 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntron
   38589 Meth4_R1_001_val_1_b

In [24]:
!wc -l *paIntron > Pact-5x-paIntron-counts.txt

#### 4e. Flanking regions

In [61]:
%%bash

for f in *bed
do
  /usr/local/bin/intersectBed \
  -wb \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.flanks.gff \
  > ${f}-paFlanks
done

In [62]:
#Check output
!head *paFlanks

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks <==
scaffold7_cov100	6231	6233	scaffold7_cov100	AUGUSTUS	gene	6218	7068	0.78	-	.	g4
scaffold7_cov100	6231	6233	scaffold7_cov100	AUGUSTUS	gene	6218	7068	1	-	.	g5
scaffold7_cov100	6233	6235	scaffold7_cov100	AUGUSTUS	gene	6218	7068	0.78	-	.	g4
scaffold7_cov100	6233	6235	scaffold7_cov100	AUGUSTUS	gene	6218	7068	1	-	.	g5
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.96	+	.	g8
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.99	-	.	g9
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.74	+	.	g10
scaffold7_cov100	19284	19286	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.39	+	.	g11
scaffold7_cov100	19296	19298	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.96	+	.	g8
scaffold7_cov100	19296	19298	scaffold7_cov100	AUGUSTUS	gene	19271	19311	0.99	-	.	g9

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks <==
scaffold6_cov64	7373	7375	s

In [63]:
#Count number of overlaps
!wc -l *paFlanks

   28031 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
   97808 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks
 1317046 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paFlanks
 1442885 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paFlanks
   32259 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
   93054 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks
 1536795 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paFlanks
 1662108 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paFlanks
   31840 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
  102280 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paFlanks
 1392903 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paFlanks
 1527023 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paFlanks
    7491 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paFlanks
   34079 Meth4_R1_001_val_1_b

In [64]:
!wc -l *paFlanks > Pact-5x-paFlanks-counts.txt

#### 4e. Intergenic

In [66]:
%%bash 

for f in *bed
do
  /usr/local/bin/intersectBed \
  -v \
  -a ${f} \
  -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
  > ${f}-paIntergenic
done

In [67]:
#Check output
!head *paIntergenic

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic <==
scaffold7_cov100	6231	6233
scaffold7_cov100	6233	6235
scaffold7_cov100	19284	19286
scaffold7_cov100	19296	19298
scaffold7_cov100	24494	24496
scaffold7_cov100	24509	24511
scaffold7_cov100	24557	24559
scaffold7_cov100	24617	24619
scaffold7_cov100	24895	24897
scaffold7_cov100	24941	24943

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic <==
scaffold1_cov55	102	104
scaffold1_cov55	186	188
scaffold3_cov83	118	120
scaffold3_cov83	137	139
scaffold3_cov83	475	477
scaffold3_cov83	484	486
scaffold3_cov83	504	506
scaffold6_cov64	7373	7375
scaffold6_cov64	7983	7985
scaffold7_cov100	13275	13277

==> Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic <==
scaffold1_cov55	105	107
scaffold1_cov55	116	118
scaffold1_cov55	119	121
scaffold1_cov55	146	148
scaffold1_cov55	194	196
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold2_cov51	778	780

In [68]:
#Count number of overlaps
!wc -l *paIntergenic

   36461 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic
  209781 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic
 2834593 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic
 3080835 Meth1_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntergenic
   40642 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic
  201712 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic
 3356494 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic
 3598848 Meth2_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntergenic
   42507 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-Meth.bed-paIntergenic
  223666 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-sparseMeth.bed-paIntergenic
 3014184 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph-unMeth.bed-paIntergenic
 3280357 Meth3_R1_001_val_1_bismark_bt2_pe._5x.bedgraph.bed-paIntergenic
   17473 Meth4_R1_001_val_1_bismark_bt2_pe._5x.bedgraph

In [25]:
!wc -l *paIntergenic > Pact-5x-paIntergenic-counts.txt