# Characterizing CpG Methylation

I'll describe general methylation trends in the geoduck genome using a concatenation of 10x data from all samples. I'll also generate methylation islands for the geoduck genome based on a perl script from Jeong et al. (2018).

1. Obtain concatenated coverage file
2. Characterize methylation levels for each CpG dinucleotide
3. Determine genomic locations for CpGs
4. Generate methylation islands

## 1. Obtain concatenated coverage file

In [20]:
#Download from gannet
!wget https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0102/Pg_val_1_bismark_bt2_pe._10x.bedgraph

--2020-03-09 10:31:27--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/0102/Pg_val_1_bismark_bt2_pe._10x.bedgraph
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39782030 (38M)
Saving to: ‘Pg_val_1_bismark_bt2_pe._10x.bedgraph’


2020-03-09 10:31:30 (13.2 MB/s) - ‘Pg_val_1_bismark_bt2_pe._10x.bedgraph’ saved [39782030/39782030]

--2020-03-09 10:31:30--  http://../Data/
Resolving ..... failed: nodename nor servname provided, or not known.
wget: unable to resolve host address ‘..’
FINISHED --2020-03-09 10:31:30--
Total wall clock time: 3.0s
Downloaded: 1 files, 38M in 2.9s (13.2 MB/s)


In [21]:
#Move to Data folder
!mv *bedgraph ../Data/

In [23]:
#Confirm file was moved
!ls ../Data/*bedgraph

../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph


In [26]:
#Columns: chr, start, end, %meth
!head ../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph

Scaffold_01	53	55	3.125000
Scaffold_01	71	73	2.777778
Scaffold_01	95	97	2.040816
Scaffold_01	118	120	0.000000
Scaffold_01	192	194	0.000000
Scaffold_01	201	203	0.000000
Scaffold_01	208	210	0.000000
Scaffold_01	212	214	0.000000
Scaffold_01	220	222	0.000000
Scaffold_01	237	239	1.315789


In [24]:
!wc -l ../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph
!echo "CpG loci with 10x coverage"

 1016980 ../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph
CpG loci with 10x coverage


## 2. Characterize methylation level for each CpG dinucleotide

- methylated: > 50% methylated
- sparsely methylated: 10-50% methylated
- unmethylated: < 10% methylated

### 2a. Methylated CpGs

In [27]:
#If percent methylation is greater or equal to 50, then save the loci information
!awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph \
> ../Data/Pg_val_1_bismark_bt2_pe._10x-Methylated.bedgraph

In [28]:
!head ../Data/Pg_val_1_bismark_bt2_pe._10x-Methylated.bedgraph

Scaffold_01 11797 11799 50.000000
Scaffold_01 11838 11840 50.000000
Scaffold_01 11843 11845 50.000000
Scaffold_01 11846 11848 50.000000
Scaffold_01 11851 11853 55.555556
Scaffold_01 12029 12031 56.250000
Scaffold_01 51414 51416 56.834532
Scaffold_01 51426 51428 58.333333
Scaffold_01 51470 51472 55.072464
Scaffold_01 51563 51565 77.083333


In [29]:
!wc -l ../Data/Pg_val_1_bismark_bt2_pe._10x-Methylated.bedgraph
!echo "methylated CpGs"

  310808 ../Data/Pg_val_1_bismark_bt2_pe._10x-Methylated.bedgraph
methylated CpGs


### 2b. Sparsely methylated CpGs

In [30]:
%%bash
awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph \
| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
> ../Data/Pg_val_1_bismark_bt2_pe._10x-Sparsely-Methylated.bedgraph

In [31]:
!head ../Data/Pg_val_1_bismark_bt2_pe._10x-Sparsely-Methylated.bedgraph

Scaffold_01 617 619 15.384615
Scaffold_01 2962 2964 12.658228
Scaffold_01 7518 7520 12.000000
Scaffold_01 8343 8345 17.073171
Scaffold_01 8347 8349 15.384615
Scaffold_01 8352 8354 22.448980
Scaffold_01 8358 8360 31.506849
Scaffold_01 8366 8368 34.426230
Scaffold_01 8381 8383 26.363636
Scaffold_01 8385 8387 23.931624


In [33]:
!wc -l ../Data/Pg_val_1_bismark_bt2_pe._10x-Sparsely-Methylated.bedgraph
!echo "sparsely methylated CpGs"

   92368 ../Data/Pg_val_1_bismark_bt2_pe._10x-Sparsely-Methylated.bedgraph
sparsely methylated CpGs


### 2c. Unmethylated CpGs

In [34]:
!awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph \
> ../Data/Pg_val_1_bismark_bt2_pe._10x-Unmethylated.bedgraph

In [35]:
!head ../Data/Pg_val_1_bismark_bt2_pe._10x-Unmethylated.bedgraph

Scaffold_01 53 55 3.125000
Scaffold_01 71 73 2.777778
Scaffold_01 95 97 2.040816
Scaffold_01 118 120 0.000000
Scaffold_01 192 194 0.000000
Scaffold_01 201 203 0.000000
Scaffold_01 208 210 0.000000
Scaffold_01 212 214 0.000000
Scaffold_01 220 222 0.000000
Scaffold_01 237 239 1.315789


In [37]:
!wc -l ../Data/Pg_val_1_bismark_bt2_pe._10x-Unmethylated.bedgraph
!echo "unmethylated CpGs"

  613804 ../Data/Pg_val_1_bismark_bt2_pe._10x-Unmethylated.bedgraph
unmethylated CpGs


## 3. Determine genomic locations for CpGs

### 3a. Create BEDfiles for `bedtools` and IGV

In [38]:
!find ../Data/*bedgraph

../Data/Pg_val_1_bismark_bt2_pe._10x-Methylated.bedgraph
../Data/Pg_val_1_bismark_bt2_pe._10x-Sparsely-Methylated.bedgraph
../Data/Pg_val_1_bismark_bt2_pe._10x-Unmethylated.bedgraph
../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph


In [39]:
%%bash

for f in ../Data/*bedgraph
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
done

In [42]:
!ls ../Data/*bed

../Data/Pg_val_1_bismark_bt2_pe._10x-Methylated.bedgraph.bed
../Data/Pg_val_1_bismark_bt2_pe._10x-Sparsely-Methylated.bedgraph.bed
../Data/Pg_val_1_bismark_bt2_pe._10x-Unmethylated.bedgraph.bed
../Data/Pg_val_1_bismark_bt2_pe._10x.bedgraph.bed


### 3b. Obtain genome feature tracks

In [43]:
!wget https://osf.io/bcxk7/ > ../Data/Ge

/bin/sh: ../Data/Genome/: Is a directory


## 4. Generate methylation islands