# Characterizing CpG Methylation

To describe general metylation trends, irrespective of pCO<sub>2</sub> treatment in *C. virginica* gonad sequence data, I need to characterize individual CpG loci. Gavery and Roberts (2013) and Olson and Roberts (2013) define a CpG locus as methylated if at least half of the reads remained unconverted after bisulfite treatment. I will use information in `.cov` files to identify methylated CpG loci.

1. Download coverage files
2. Limit to 5x coverage only
3. Concatenate 5x loci for all samples
4. Identify methylated loci

## 0. Prepare for analyses

## 0a. Set working directory

In [2]:
pwd

'/Users/yaamini/Documents/yaamini-virginica/notebooks'

In [3]:
cd ../analyses/

/Users/yaamini/Documents/yaamini-virginica/analyses


In [3]:
!mkdir 2019-03-18-Characterizing-CpG-Methylation

In [4]:
cd 2019-03-18-Characterizing-CpG-Methylation/

/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-18-Characterizing-CpG-Methylation


## 1. Obtain coverage files

In [5]:
#Download files from gannet. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \
http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/

--2019-03-18 16:16:05--  http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'

gannet.fish.washing     [ <=>                ]  61.14K  --.-KB/s    in 0.001s  

2019-03-18 16:16:07 (45.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]

Loading robots.txt; please ignore errors.
--2019-03-18 16:16:07--  http://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-03-18 16:16:07 ERROR 404: Not Found.

Removing gann

In [6]:
#Move all files from gannet folder to the current directory
!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* .

In [7]:
#Confirm all files were moved
!ls

[34m@eaDir[m[m
[34mgannet.fish.washington.edu[m[m
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz


In [8]:
#Remove the empty gannet directory
!rm -r gannet.fish.washington.edu

In [9]:
#Unzip the coverage files
!gunzip *cov.gz

In [10]:
#Confirm files were unzipped
!ls *cov

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov


In [14]:
#Remove samples from high pCO2 treatment
!rm zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov

In [15]:
#Confirm file removal
!ls *cov

zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov


In [13]:
#See what the file looks like. 
#Columns: <chromosome> <start position> <end position> <methylation percentage> <count methylated> <count unmethylated>
!head -n 1 zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov

NC_007175.2	49	49	0	0	5


## 2. Limit to 5x coverage

For each coverage file, I want to retain all loci that have 5x coverage only. Using `awk`, I'll add the count methylated and unmethylated to get coverage. If that coverage is higher than 5, I'll redirect that information into a new file.

In [18]:
%%bash
for f in *.cov
do
    awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 5) { print $1, $2-1, $2, $4 }}' \
> ${f}_5x.bedgraph
done

In [21]:
#Confirm files were created
!ls *5x.bedgraph

zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph


In [20]:
#Check columns for one of the file
!head zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph

NC_007175.2 1579 1580 0
NC_007175.2 2180 2181 0
NC_007175.2 3383 3384 0
NC_007175.2 3394 3395 0
NC_007175.2 5413 5414 0
NC_007175.2 5415 5416 0
NC_007175.2 5426 5427 0
NC_007175.2 11101 11102 0
NC_007175.2 12881 12882 0
NC_007175.2 12985 12986 20


## 3. Concatenate 5x loci for all samples

Now that I know how many loci have at least 5x coverage in each control sample, I want to see which loci have 5x coverage across all samples. 

I will use a series of `join` commands to merge sample information. Since I don't want to retain loci unless they have entries in all samples, I don't need to do any type of outer join. Since all loci have a defined start and stop position, I can use the argument `-j12` to merge samples by the start position (second field; `2`) in the first file (`-j1`).

In [27]:
#Merge information from samples 2 and 3
!join -1 2 -2 2 zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph \
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph \
> 2019-03-18-S2-S3-5x-CpG-Loci.bedgraph

In [28]:
#Confirm merge occured
!head 2019-03-18-S2-S3-5x-CpG-Loci.bedgraph

47 NC_007175.2 48 0 NC_007175.2 48 12.5
49 NC_007175.2 50 0 NC_007175.2 50 10
86 NC_007175.2 87 0 NC_007175.2 87 0
87 NC_007175.2 88 0 NC_007175.2 88 0
145 NC_007175.2 146 3.7037037037037 NC_007175.2 146 0
191 NC_007175.2 192 3.125 NC_007175.2 192 0
244 NC_007175.2 245 2.53164556962025 NC_007175.2 245 0
255 NC_007175.2 256 3.2258064516129 NC_007175.2 256 0
262 NC_007175.2 263 0 NC_007175.2 263 0
264 NC_007175.2 265 0.909090909090909 NC_007175.2 265 4.16666666666667


In [32]:
#Merge sample 4 with 2 and 3
!join -1 1 -2 2 2019-03-18-S2-S3-5x-CpG-Loci.bedgraph \
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph \
> 2019-03-18-S2-S3-S4-5x-CpG-Loci.bedgraph

In [33]:
#Confirm merge occured
!head 2019-03-18-S2-S3-S4-5x-CpG-Loci.bedgraph

47 NC_007175.2 48 0 NC_007175.2 48 12.5 NC_007175.2 48 0
49 NC_007175.2 50 0 NC_007175.2 50 10 NC_007175.2 50 0
86 NC_007175.2 87 0 NC_007175.2 87 0 NC_007175.2 87 0
87 NC_007175.2 88 0 NC_007175.2 88 0 NC_007175.2 88 0
145 NC_007175.2 146 3.7037037037037 NC_007175.2 146 0 NC_007175.2 146 1.66666666666667
191 NC_007175.2 192 3.125 NC_007175.2 192 0 NC_007175.2 192 0
244 NC_007175.2 245 2.53164556962025 NC_007175.2 245 0 NC_007175.2 245 0
255 NC_007175.2 256 3.2258064516129 NC_007175.2 256 0 NC_007175.2 256 2.89855072463768
262 NC_007175.2 263 0 NC_007175.2 263 0 NC_007175.2 263 1.20481927710843
264 NC_007175.2 265 0.909090909090909 NC_007175.2 265 4.16666666666667 NC_007175.2 265 2.4390243902439


In [34]:
#Merge sample 5 with 2-4
!join -1 1 -2 2 2019-03-18-S2-S3-S4-5x-CpG-Loci.bedgraph \
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph \
> 2019-03-18-S2-S3-S4-S5-5x-CpG-Loci.bedgraph

In [35]:
#Confirm merge occured
!head 2019-03-18-S2-S3-S4-S5-5x-CpG-Loci.bedgraph

47 NC_007175.2 48 0 NC_007175.2 48 12.5 NC_007175.2 48 0 NC_007175.2 48 1.66666666666667
49 NC_007175.2 50 0 NC_007175.2 50 10 NC_007175.2 50 0 NC_007175.2 50 7.8125
86 NC_007175.2 87 0 NC_007175.2 87 0 NC_007175.2 87 0 NC_007175.2 87 1.37931034482759
87 NC_007175.2 88 0 NC_007175.2 88 0 NC_007175.2 88 0 NC_007175.2 88 1.28205128205128
145 NC_007175.2 146 3.7037037037037 NC_007175.2 146 0 NC_007175.2 146 1.66666666666667 NC_007175.2 146 1.55038759689922
191 NC_007175.2 192 3.125 NC_007175.2 192 0 NC_007175.2 192 0 NC_007175.2 192 2.85714285714286
244 NC_007175.2 245 2.53164556962025 NC_007175.2 245 0 NC_007175.2 245 0 NC_007175.2 245 3.05676855895197
255 NC_007175.2 256 3.2258064516129 NC_007175.2 256 0 NC_007175.2 256 2.89855072463768 NC_007175.2 256 0
262 NC_007175.2 263 0 NC_007175.2 263 0 NC_007175.2 263 1.20481927710843 NC_007175.2 263 1.29032258064516
264 NC_007175.2 265 0.909090909090909 NC_007175.2 265 4.16666666666667 NC_007175.2 265 2.4390243902439 NC_007175.2 265 0.

In [42]:
#Merge sample 1 with 2-5
!join -1 1 -2 2 2019-03-18-S2-S3-S4-S5-5x-CpG-Loci.bedgraph \
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph \
> 2019-03-18-S2-S3-S4-S5-S1-5x-CpG-Loci.bedgraph

In [49]:
#Confirm merge occured
!head 2019-03-18-S2-S3-S4-S5-S1-5x-CpG-Loci.bedgraph

9998181 NC_035780.1 9998182 85.7142857142857 NC_035780.1 9998182 86.8421052631579 NC_035780.1 9998182 93.75 NC_035780.1 9998182 84.6153846153846 NC_035780.1 9998182 88.8888888888889
9998210 NC_035780.1 9998211 88.8888888888889 NC_035780.1 9998211 90.9090909090909 NC_035780.1 9998211 83.3333333333333 NC_035780.1 9998211 90 NC_035780.1 9998211 83.3333333333333
9999496 NC_035780.1 9999497 87.5 NC_035780.1 9999497 80 NC_035780.1 9999497 100 NC_035780.1 9999497 61.5384615384615 NC_035780.1 9999497 100
9999553 NC_035780.1 9999554 86.6666666666667 NC_035780.1 9999554 50 NC_035780.1 9999554 85.7142857142857 NC_035780.1 9999554 90 NC_035780.1 9999554 81.8181818181818
9999564 NC_035780.1 9999565 100 NC_035780.1 9999565 55.5555555555556 NC_035780.1 9999565 88.8888888888889 NC_035780.1 9999565 87.8787878787879 NC_035780.1 9999565 91.6666666666667
9999607 NC_035780.1 9999608 91.6666666666667 NC_035780.1 9999608 63.6363636363636 NC_035780.1 9999608 60 NC_035780.1 9999608 80 NC_035780.1 9999608 

In [47]:
#Remove fields with redundant information
!awk '{print $2, $1, $3, $16, $4, $7, $10, $13}' 2019-03-18-S2-S3-S4-S5-S1-5x-CpG-Loci.bedgraph \
> 2019-03-18-Control-5x-CpG-Loci.bedgraph

In [48]:
#Confirm file changes
#Columns: <chromosome> <start> <stop> <sample 1 %methylation> <sample 2> <sample 3> <sample 4> <sample 5>
!head 2019-03-18-Control-5x-CpG-Loci.bedgraph

NC_035780.1 9998181 9998182 88.8888888888889 85.7142857142857 86.8421052631579 93.75 84.6153846153846
NC_035780.1 9998210 9998211 83.3333333333333 88.8888888888889 90.9090909090909 83.3333333333333 90
NC_035780.1 9999496 9999497 100 87.5 80 100 61.5384615384615
NC_035780.1 9999553 9999554 81.8181818181818 86.6666666666667 50 85.7142857142857 90
NC_035780.1 9999564 9999565 91.6666666666667 100 55.5555555555556 88.8888888888889 87.8787878787879
NC_035780.1 9999607 9999608 100 91.6666666666667 63.6363636363636 60 80
NC_035780.1 9999618 9999619 71.4285714285714 94.1176470588235 75 85.1851851851852 95.1219512195122
NC_035780.1 9999672 9999673 96.2962962962963 86.046511627907 90.9090909090909 84.375 88.8888888888889
NC_035780.1 9999700 9999701 97.2222222222222 88.135593220339 11.1111111111111 95.6521739130435 84.9557522123894
NC_035780.1 9999753 9999754 84.6153846153846 86.9565217391304 31.25 78.5714285714286 80


In [52]:
#Count number of loci
!wc -l 2019-03-18-Control-5x-CpG-Loci.bedgraph

   63827 2019-03-18-Control-5x-CpG-Loci.bedgraph


## 4. Identify methylated loci

Olson and Roberts (2014) define the following categories for CpG methylation:

- Methylated (50% methylation and above)
- Sparsely methylated (0-50% methylated)
- Unmethylated (0% methylation)

I will slightly modify this since I have multiple samples:

- Methylated (50% methylation and above)
- Sparsely methylated (10-50% methylated)
- Unmethylated (10% methylation and below)

By summing the percent methylation columns from each sample, I can identify methylated CpG loci:

- Methylated (sum ≥ 250)
- Sparsely methylated (0 < sum < 250)
- Ummethylated (sum ≤ 50)

### 4a. Methylated loci

In [77]:
%%bash
awk '{print $1, $2, $3, $4+$5+$6+$7+$8}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
| awk '{if ($4 >= 250) { print $1, $2, $3, $4+$5+$6+$7+$8 }}' \
> 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph

In [78]:
#Confirm methylated loci were saved
!head 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph

NC_035780.1 9998181 9998182 439.811
NC_035780.1 9998210 9998211 436.465
NC_035780.1 9999496 9999497 429.038
NC_035780.1 9999553 9999554 394.199
NC_035780.1 9999564 9999565 423.99
NC_035780.1 9999607 9999608 395.303
NC_035780.1 9999618 9999619 420.853
NC_035780.1 9999672 9999673 446.516
NC_035780.1 9999700 9999701 377.077
NC_035780.1 9999753 9999754 361.393


In [91]:
#Count methylated loci
!wc -l 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph

   60552 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph


### 4b. Sparsely methylated loci

In [80]:
%%bash
awk '{print $1, $2, $3, $4+$5+$6+$7+$8}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
| awk '{if ($4 < 250) { print $1, $2, $3, $4+$5+$6+$7+$8 }}' \
| awk '{if ($4 > 50) { print $1, $2, $3, $4+$5+$6+$7+$8 }}' \
> 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph

In [81]:
#Confirm sparsely methylated loci were saved
!head 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph

NC_035782.1 10000240 10000241 163.835
NC_035782.1 10000241 10000242 211.057
NC_035782.1 10000258 10000259 139.353
NC_035782.1 10000259 10000260 159.116
NC_035782.1 10000290 10000291 162.222
NC_035782.1 10000291 10000292 152.043
NC_035782.1 10000460 10000461 93.1705
NC_035782.1 10000529 10000530 159.852
NC_035782.1 10000810 10000811 82.5068
NC_035782.1 10000954 10000955 57.2039


In [90]:
#Count sparsely methylated loci
!wc -l 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph

    2796 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph


### 4c. Unmethylated loci

In [83]:
%%bash
awk '{print $1, $2, $3, $4+$5+$6+$7+$8}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
| awk '{if ($4 <= 50) { print $1, $2, $3, $4+$5+$6+$7+$8 }}' \
> 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph

In [84]:
#Confirm unmethylated loci were saved
!head 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph

NC_035782.1 10001428 10001429 0
NC_035782.1 10001429 10001430 14.2857
NC_035782.1 10001448 10001449 0
NC_035782.1 10001449 10001450 20.5357
NC_035782.1 10001473 10001474 9.09091
NC_035782.1 10001475 10001476 20.202
NC_035782.1 10013203 10013204 0
NC_035782.1 10020330 10020331 8.33333
NC_035782.1 10020331 10020332 11.1111
NC_035782.1 10020475 10020476 25.0256


In [89]:
#Count unmethylated loci
!wc -l 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph

     479 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph


## 5. Location of methylated loci

My final step is to characterize the location of methylated loci in the genome. I will use `intersectBed` to find overlaps between methylated loci and exons, introns, mRNA coding regions, transposable elements, and putative promoter regions.

### 5a. Created `.bed` file

In [99]:
%%bash
awk '{print $1, $2, $3}' 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph \
> 2019-03-18-Control-5x-CpG-Loci-Methylated.bed

In [100]:
#Confirm file creation
!head 2019-03-18-Control-5x-CpG-Loci-Methylated.bed

NC_035780.1 9998181 9998182
NC_035780.1 9998210 9998211
NC_035780.1 9999496 9999497
NC_035780.1 9999553 9999554
NC_035780.1 9999564 9999565
NC_035780.1 9999607 9999608
NC_035780.1 9999618 9999619
NC_035780.1 9999672 9999673
NC_035780.1 9999700 9999701
NC_035780.1 9999753 9999754


### 5a. Set variable paths

In [69]:
bedtoolsDirectory = "/Users/Shared/bioinformatics/bedtools2/bin/"

In [94]:
methylatedLoci = "2019-03-18-Control-5x-CpG-Loci-Methylated.bed"

In [71]:
exonList = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_exon.bed"

In [72]:
intronList = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_intron.bed"

In [73]:
mRNAList = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_mRNA.gff3"

In [74]:
transposableElementsCg = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-Cg.gff"

In [75]:
putativePromoters = "../2018-11-01-DML-and-DMR-Analysis/2018-11-14-Flanking-Analysis/2018-11-15-mRNA-Upstream-Flanks.bed"

### 5b. Exons

In [101]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {exonList} \
| wc -l
!echo "methylated loci overlaps with exons"

ERROR: file 2019-03-18-Control-5x-CpG-Loci-Methylated.bed has non positional records, which are only valid for the groupBy tool.

Tool:    bedtools intersect (aka intersectBed)
Version: v2.26.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the

In [None]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {exonList} \
> 2019-03-18-MethLoci-Exon.txt

In [None]:
!head 2019-03-18-MethLoci-Exon.txt

### 5c. Introns

In [98]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {intronList} \
| wc -l
!echo "methylated loci overlaps with exons"

ERROR: file 2019-03-18-Control-5x-CpG-Loci-Methylated.bed has non positional records, which are only valid for the groupBy tool.

Tool:    bedtools intersect (aka intersectBed)
Version: v2.26.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the

In [None]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {exonList} \
> 2019-03-18-MethLoci-Intron.txt

In [None]:
!head 2019-03-18-MethLoci-Intron.txt

### 5d. mRNA

In [98]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {mRNAList} \
| wc -l
!echo "methylated loci overlaps with mRNA coding regions"

ERROR: file 2019-03-18-Control-5x-CpG-Loci-Methylated.bed has non positional records, which are only valid for the groupBy tool.

Tool:    bedtools intersect (aka intersectBed)
Version: v2.26.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the

In [None]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {mRNAList} \
> 2019-03-18-MethLoci-mRNA.txt

In [None]:
!head 2019-03-18-MethLoci-mRNA.txt

In [None]:
! cut -f14 2019-03-18-MethLoci-mRNA.txt| sort | uniq -c > 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt

In [None]:
!head 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt

In [None]:
!wc -l 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt

### 5e. Transposable elements (*C. gigas* only)

In [98]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {transposableElementsCg} \
| wc -l
!echo "methylated loci overlaps with transposable elements (Cg)"

ERROR: file 2019-03-18-Control-5x-CpG-Loci-Methylated.bed has non positional records, which are only valid for the groupBy tool.

Tool:    bedtools intersect (aka intersectBed)
Version: v2.26.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the

In [None]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {transposableElementsCg} \
> 2019-03-18-MethLoci-TE-Cg.txt

In [None]:
!head 2019-03-18-MethLoci-TE-Cg.txt

### 5f. Putative promoters

In [98]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {putativePromoters} \
| wc -l
!echo "methylated loci overlaps with putative promoters"

ERROR: file 2019-03-18-Control-5x-CpG-Loci-Methylated.bed has non positional records, which are only valid for the groupBy tool.

Tool:    bedtools intersect (aka intersectBed)
Version: v2.26.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the

In [None]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {putativePromoters} \
> 2019-03-18-MethLoci-Putative-Promoters.txt

In [None]:
!head 2019-03-18-MethLoci-Putative-Promoters.txt