# Characterizing the general methylation landscape

In this notebook, I will characterize the general methylation landscape for each sex separately. To characterize CpG methylation, I will use individual samples, as well as a union BEDgraph that concatenates all sample information.

1. Concatenate coverage information
2. Characterize methylation for each CpG dinucleotide in individual samples and union BEDgraph
2. Determine genomic location of highly methylated, moderately methylated, and lowly CpGs

## 0. Set working directory

In [1]:
!pwd

/Users/yaaminivenkataraman/Documents/ceabigr/code


In [2]:
cd ../output/

/Users/yaaminivenkataraman/Documents/ceabigr/output


In [3]:
!mkdir methylation-landscape

mkdir: methylation-landscape: File exists


In [3]:
cd methylation-landscape/

/Users/yaaminivenkataraman/Documents/ceabigr/output/methylation-landscape


In [4]:
bedtoolsDirectory = "/opt/homebrew/bin/"

In [5]:
#Install pandas for this notebook
import pandas as pd
print(pd.__version__)

0.25.1


## 1. Set up analysis

### 1a. Obtain sample bedGraphs

In [41]:
#Download 10x bedgraphs
!wget -r \
--no-check-certificate --no-directories --no-parent --reject "index.html*" \
-P . \
-A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/

--2022-02-21 14:45:35--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./index.html.tmp’

index.html.tmp          [ <=>                ]  62.99K  --.-KB/s    in 0.03s   

2022-02-21 14:45:41 (2.27 MB/s) - ‘./index.html.tmp’ saved [64500]

Loading robots.txt; please ignore errors.
--2022-02-21 14:45:41--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2022-02-21 14:45:41 ERROR 404: Not Found.

Removing ./index.html.tmp since it should be rejected.

--2022-02-21 14:45:41--  https://gannet.fish.washington.edu/seashell/bu-mox/s

HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./index.html?C=S;O=D.tmp’

index.html?C=S;O=D.     [ <=>                ]  62.99K  --.-KB/s    in 0.03s   

2022-02-21 15:03:49 (2.12 MB/s) - ‘./index.html?C=S;O=D.tmp’ saved [64500]

Removing ./index.html?C=S;O=D.tmp since it should be rejected.

--2022-02-21 15:03:49--  https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=D;O=D
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./index.html?C=D;O=D.tmp’

index.html?C=D;O=D.     [ <=>                ]  62.99K  --.-KB/s    in 0.03s   

2022-02-21 15:03:52 (2.11 MB/s) - ‘./index.html?C=D;O=D.tmp’ saved [64500]

Removing ./index.html?C=D;O=D.tmp since it should be rejected.

FINISHED --2022-02-21 15:03:52--
Total wall clock time: 18m 17s
Downloaded: 35 files, 7.6G in 17m 26s (7.44 MB/s)


In [42]:
#Check directory for all files
!ls

12M_R1_val_1_10x.bedgraph 36F_R1_val_1_10x.bedgraph 54F_R1_val_1_10x.bedgraph
13M_R1_val_1_10x.bedgraph 39F_R1_val_1_10x.bedgraph 59M_R1_val_1_10x.bedgraph
16F_R1_val_1_10x.bedgraph 3F_R1_val_1_10x.bedgraph  64M_R1_val_1_10x.bedgraph
19F_R1_val_1_10x.bedgraph 41F_R1_val_1_10x.bedgraph 6M_R1_val_1_10x.bedgraph
22F_R1_val_1_10x.bedgraph 44F_R1_val_1_10x.bedgraph 76F_R1_val_1_10x.bedgraph
23M_R1_val_1_10x.bedgraph 48M_R1_val_1_10x.bedgraph 77F_R1_val_1_10x.bedgraph
29F_R1_val_1_10x.bedgraph 50F_R1_val_1_10x.bedgraph 7M_R1_val_1_10x.bedgraph
31M_R1_val_1_10x.bedgraph 52F_R1_val_1_10x.bedgraph 9M_R1_val_1_10x.bedgraph
35F_R1_val_1_10x.bedgraph 53F_R1_val_1_10x.bedgraph


In [43]:
#Obtain md5
!md5 *

MD5 (12M_R1_val_1_10x.bedgraph) = 23710e666ba1d8d0aff05465a7ec143d
MD5 (13M_R1_val_1_10x.bedgraph) = 895d0b04d434e9567e648689e734308e
MD5 (16F_R1_val_1_10x.bedgraph) = 5e031b5c1aac89336d1a1a32c84383ff
MD5 (19F_R1_val_1_10x.bedgraph) = ad8e874af92071a08a9fc0e5cd31c959
MD5 (22F_R1_val_1_10x.bedgraph) = a34054b0cedf5b219e1a7f236571f882
MD5 (23M_R1_val_1_10x.bedgraph) = 8b5aa08e26db26865b35090e69c27ce0
MD5 (29F_R1_val_1_10x.bedgraph) = cda12bdb3ac9a2a304005d496c8d641b
MD5 (31M_R1_val_1_10x.bedgraph) = 6ea9d16b7c88775863e5f890144d900b
MD5 (35F_R1_val_1_10x.bedgraph) = fc230a6475afe86a21a99f70ddd51012
MD5 (36F_R1_val_1_10x.bedgraph) = bb83960173d236e832183e7b7a641156
MD5 (39F_R1_val_1_10x.bedgraph) = de0e77ea6da72f370dca9d5b09d7b1f3
MD5 (3F_R1_val_1_10x.bedgraph) = 7be44c21611319f59cfa94eec5a84851
MD5 (41F_R1_val_1_10x.bedgraph) = c3e39faa21463fdecafd78eb6f17b51a
MD5 (44F_R1_val_1_10x.bedgraph) = cd82809a568298c8cc84f7c8ab0a1fa7
MD5 (48M_R1_val_1_10x.bedgraph) = beeb7af3ab210324b787e6b50eed5

In [44]:
%%bash

for f in *10x.bedgraph
do
/opt/homebrew/bin/sortBed \
-i ${f} \
> $(basename ${f%_10x.bedgraph})_10x.sort.bedgraph
done

In [6]:
!ls *sort*

12M_R1_val_1_10x.sort.bedgraph 44F_R1_val_1_10x.sort.bedgraph
13M_R1_val_1_10x.sort.bedgraph 48M_R1_val_1_10x.sort.bedgraph
16F_R1_val_1_10x.sort.bedgraph 50F_R1_val_1_10x.sort.bedgraph
19F_R1_val_1_10x.sort.bedgraph 52F_R1_val_1_10x.sort.bedgraph
22F_R1_val_1_10x.sort.bedgraph 53F_R1_val_1_10x.sort.bedgraph
23M_R1_val_1_10x.sort.bedgraph 54F_R1_val_1_10x.sort.bedgraph
29F_R1_val_1_10x.sort.bedgraph 59M_R1_val_1_10x.sort.bedgraph
31M_R1_val_1_10x.sort.bedgraph 64M_R1_val_1_10x.sort.bedgraph
35F_R1_val_1_10x.sort.bedgraph 6M_R1_val_1_10x.sort.bedgraph
36F_R1_val_1_10x.sort.bedgraph 76F_R1_val_1_10x.sort.bedgraph
39F_R1_val_1_10x.sort.bedgraph 77F_R1_val_1_10x.sort.bedgraph
3F_R1_val_1_10x.sort.bedgraph  7M_R1_val_1_10x.sort.bedgraph
41F_R1_val_1_10x.sort.bedgraph 9M_R1_val_1_10x.sort.bedgraph


### 1b. Remove C->T SNPs

For each sample, I will use BS-Snper output to change the percent methylation for a C->T SNP to 0.

In [46]:
#Download 10x SNP
!wget -r \
--no-check-certificate --no-directories --no-parent --reject "index.html*" \
-P . \
-A "*vcf" https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp02/

--2022-02-21 15:07:09--  https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp02/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘./index.html.tmp’

index.html.tmp          [ <=>                ]  32.59K  --.-KB/s    in 0.02s   

2022-02-21 15:07:11 (1.89 MB/s) - ‘./index.html.tmp’ saved [33377]

Loading robots.txt; please ignore errors.
--2022-02-21 15:07:11--  https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2022-02-21 15:07:11 ERROR 404: Not Found.

Removing ./index.html.tmp since it should be rejected.

--2022-02-21 15:07:11--  https://gannet.fish.washington.edu

HTTP request sent, awaiting response... 200 OK
Length: 537035183 (512M) [text/x-vcard]
Saving to: ‘./23M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’


2022-02-21 15:17:44 (7.32 MB/s) - ‘./23M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [537035183/537035183]

--2022-02-21 15:17:44--  https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp02/29F_R1_val_1_bismark_bt2_pe.SNP-results.vcf
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 492776744 (470M) [text/x-vcard]
Saving to: ‘./29F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’


2022-02-21 15:18:39 (8.49 MB/s) - ‘./29F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [492776744/492776744]

--2022-02-21 15:18:39--  https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp02/31M_R1_val_1_bismark_bt2_pe.SNP-results.vcf
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response..

HTTP request sent, awaiting response... 200 OK
Length: 494986093 (472M) [text/x-vcard]
Saving to: ‘./76F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’


2022-02-21 15:31:35 (8.92 MB/s) - ‘./76F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [494986093/494986093]

--2022-02-21 15:31:35--  https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp02/77F_R1_val_1_bismark_bt2_pe.SNP-results.vcf
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: 518123700 (494M) [text/x-vcard]
Saving to: ‘./77F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’


2022-02-21 15:32:33 (8.60 MB/s) - ‘./77F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [518123700/518123700]

--2022-02-21 15:32:33--  https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp02/?C=N;O=A
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/

In [28]:
%%bash

for f in *10x.sort.bedgraph
do
awk -F"\t" 'NR==FNR{a[$1"\t"$2]=$1"\t"$2;next}{if(($1"\t"$2) in a) print $1"\t"$2"\t"$3"\t"0;else print $0}' \
$(basename ${f%_10x.sort.bedgraph})_bismark_bt2_pe.SNP-results.vcf \
${f} \
> $(basename ${f%_10x.sort.bedgraph})_10x.SNPcorr.bedgraph
done

In [7]:
!ls *10x.SNPcorr.bedgraph

12M_R1_val_1_10x.SNPcorr.bedgraph 44F_R1_val_1_10x.SNPcorr.bedgraph
13M_R1_val_1_10x.SNPcorr.bedgraph 48M_R1_val_1_10x.SNPcorr.bedgraph
16F_R1_val_1_10x.SNPcorr.bedgraph 50F_R1_val_1_10x.SNPcorr.bedgraph
19F_R1_val_1_10x.SNPcorr.bedgraph 52F_R1_val_1_10x.SNPcorr.bedgraph
22F_R1_val_1_10x.SNPcorr.bedgraph 53F_R1_val_1_10x.SNPcorr.bedgraph
23M_R1_val_1_10x.SNPcorr.bedgraph 54F_R1_val_1_10x.SNPcorr.bedgraph
29F_R1_val_1_10x.SNPcorr.bedgraph 59M_R1_val_1_10x.SNPcorr.bedgraph
31M_R1_val_1_10x.SNPcorr.bedgraph 64M_R1_val_1_10x.SNPcorr.bedgraph
35F_R1_val_1_10x.SNPcorr.bedgraph 6M_R1_val_1_10x.SNPcorr.bedgraph
36F_R1_val_1_10x.SNPcorr.bedgraph 76F_R1_val_1_10x.SNPcorr.bedgraph
39F_R1_val_1_10x.SNPcorr.bedgraph 77F_R1_val_1_10x.SNPcorr.bedgraph
3F_R1_val_1_10x.SNPcorr.bedgraph  7M_R1_val_1_10x.SNPcorr.bedgraph
41F_R1_val_1_10x.SNPcorr.bedgraph 9M_R1_val_1_10x.SNPcorr.bedgraph


In [44]:
%%bash

for f in *10x.sort.bedgraph
do
/opt/homebrew/bin/intersectBed \
-wa \
-a ${f} \
-b $(basename ${f%_10x.sort.bedgraph})_bismark_bt2_pe.SNP-results.vcf \
> $(basename ${f%_10x.sort.bedgraph})_10x.SNPcorr.changedpos.bedgraph
done

## 2. General methylation landscape

### 2a. Create a union BEDgraph

I will use `unionBedGraphs` to concatenate information for all loci across samples. This will be use in separate analyses.

In [8]:
!{bedtoolsDirectory}unionBedGraphs -h


Tool:    bedtools unionbedg (aka unionBedGraphs)
Version: v2.30.0
Summary: Combines multiple BedGraph files into a single file,
	 allowing coverage comparisons between them.

Usage:   bedtools unionbedg [OPTIONS] -i FILE1 FILE2 .. FILEn
	 Assumes that each BedGraph file is sorted by chrom/start 
	 and that the intervals in each are non-overlapping.

Options: 
	-header		Print a header line.
			(chrom/start/end + names of each file).

	-names		A list of names (one/file) to describe each file in -i.
			These names will be printed in the header line.

	-g		Use genome file to calculate empty regions.
			- STRING.

	-empty		Report empty regions (i.e., start/end intervals w/o
			values in all files).
			- Requires the '-g FILE' parameter.

	-filler TEXT	Use TEXT when representing intervals having no value.
			- Default is '0', but you can use 'N/A' or any text.

	-examples	Show detailed usage examples.



In [29]:
#Create union BEDgraph from sorted files
#Include a header
#Use N/A when there is no data for a CpG in a sample
#Define sample IDs
#Use sorted bedgraphs
#Save output
!{bedtoolsDirectory}unionBedGraphs \
-header \
-filler N/A \
-names 12M 13M 16F 19F 22F 23M 29F 31M 35F 36F 39F 3F 41F 44F 48M 50F 52F 53F 54F 59M 64M 6M 76F 77F 7M 9M \
-i \
*10x.SNPcorr.bedgraph \
> union_10x.bedgraph

In [30]:
#Check output
!head union_10x.bedgraph
!wc -l union_10x.bedgraph

chrom	start	end	12M	13M	16F	19F	22F	23M	29F	31M	35F	36F	39F	3F	41F	44F	48M	50F	52F	53F	54F	59M	64M	6M	76F	77F	7M	9M
NC_007175.2	48	50	0.000000	0.000000	1.923077	0.731452	1.015228	0.000000	1.444623	3.125000	0.844511	1.699182	0.541339	1.694915	1.928375	2.136076	1.086957	1.672640	1.806240	1.100917	1.780694	0.000000	3.973510	0.653595	0.682057	1.661475	3.750000	1.754386
NC_007175.2	50	52	0.000000	0.000000	1.626016	0.733855	1.507937	0.000000	1.349325	1.470588	0.600462	1.700880	0.599908	1.500000	2.570694	1.327434	1.036269	1.874311	1.371951	1.069218	1.842105	0.000000	3.797468	0.632911	0.442478	1.439539	3.488372	1.666667
NC_007175.2	87	89	1.169591	1.293103	1.045857	0.286907	0.789771	0.990099	0.859599	0.671141	0.666349	0.813008	0.444115	1.284875	0.800915	1.361796	0.245700	0.959596	0.852273	0.758853	0.866927	0.000000	0.319489	0.666667	0.208008	1.021477	0.478469	0.000000
NC_007175.2	146	148	1.261830	0.461894	1.502146	0.684369	1.081187	0.819672	1.183206	0.332226	0.721028	1.522344	0.562023	0.995025	

### 2b. Manipulate with `pandas`

In [31]:
#Import union data into pandas
#Check head
df = pd.read_table("union_10x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,12M,13M,16F,19F,22F,23M,29F,...,52F,53F,54F,59M,64M,6M,76F,77F,7M,9M
0,NC_007175.2,48,50,0.0,0.0,1.923077,0.731452,1.015228,0.0,1.444623,...,1.80624,1.100917,1.780694,0.0,3.97351,0.653595,0.682057,1.661475,3.75,1.754386
1,NC_007175.2,50,52,0.0,0.0,1.626016,0.733855,1.507937,0.0,1.349325,...,1.371951,1.069218,1.842105,0.0,3.797468,0.632911,0.442478,1.439539,3.488372,1.666667
2,NC_007175.2,87,89,1.169591,1.293103,1.045857,0.286907,0.789771,0.990099,0.859599,...,0.852273,0.758853,0.866927,0.0,0.319489,0.666667,0.208008,1.021477,0.478469,0.0
3,NC_007175.2,146,148,1.26183,0.461894,1.502146,0.684369,1.081187,0.819672,1.183206,...,1.306458,1.210914,0.924855,0.0,2.027027,0.702988,0.475325,1.308017,1.56658,0.456621
4,NC_007175.2,192,194,1.129944,1.271186,1.726908,0.691776,1.094563,1.95599,1.097734,...,1.486346,1.1942,0.896287,0.47619,1.369863,0.161031,0.558659,1.286383,2.195122,0.840336


In [32]:
#Average all samples for total genome methylation information and save as a new column
#NA are not included in averages
#Check output
df['total'] = df[['12M', '13M', '16F', '19F', '22F', '23M', '29F', '31M', '35F', '36F', '39F', '3F', '41F', '44F', '48M', '50F', '52F', '53F', '54F', '59M', '64M', '6M', '76F', '77F', '7M', '9M']].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,12M,13M,16F,19F,22F,23M,29F,...,53F,54F,59M,64M,6M,76F,77F,7M,9M,total
12951894,NC_035789.1,32649654,32649656,0.0,0.0,0.0,,,,0.0,...,,,0.0,,0.0,0.0,0.0,0.0,,0.0
12951895,NC_035789.1,32649732,32649734,0.0,0.0,0.0,,,,0.0,...,0.0,,,,0.0,,0.0,0.0,,0.0
12951896,NC_035789.1,32649736,32649738,0.0,0.0,0.0,,,,0.0,...,0.0,,,,0.0,,0.0,0.0,,0.0
12951897,NC_035789.1,32649799,32649801,,0.0,,,,,,...,,0.0,,,,,,,,0.0
12951898,NC_035789.1,32649876,32649878,0.0,0.0,0.0,,0.0,,0.0,...,,0.0,0.0,,0.0,0.0,0.0,0.0,,0.0
12951899,NC_035789.1,32649885,32649887,0.0,0.0,0.0,,0.0,,0.0,...,,0.0,0.0,,0.0,0.0,0.0,0.0,,0.0
12951900,NC_035789.1,32649895,32649897,0.0,0.0,0.0,,0.0,,0.0,...,,0.0,0.0,,0.0,0.0,0.0,0.0,,0.0
12951901,NC_035789.1,32649930,32649932,,0.0,,,0.0,,,...,,0.0,0.0,,0.0,,0.0,,,0.0
12951902,NC_035789.1,32649933,32649935,,0.0,,,0.0,,,...,,0.0,0.0,0.0,0.0,,0.0,,,0.0
12951903,NC_035789.1,32649966,32649968,,0.0,,,0.0,,,...,,0.0,0.0,0.0,,,0.0,,,0.0


In [33]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("union-averages_10x.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

In [34]:
#Check pandas manipulations
!tail union-averages_10x.bedgraph

12951894	NC_035789.1	32649654	32649656	0.0	0.0	0.0	N/A	N/A	N/A	0.0	N/A	N/A	0.0	N/A	0.0	N/A	N/A	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	0.0	0.0	N/A	0.0
12951895	NC_035789.1	32649732	32649734	0.0	0.0	0.0	N/A	N/A	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A	N/A	0.0	N/A	N/A	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
12951896	NC_035789.1	32649736	32649738	0.0	0.0	0.0	N/A	N/A	N/A	0.0	N/A	N/A	N/A	N/A	0.0	N/A	N/A	0.0	N/A	N/A	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
12951897	NC_035789.1	32649799	32649801	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0
12951898	NC_035789.1	32649876	32649878	0.0	0.0	0.0	N/A	0.0	N/A	0.0	N/A	0.0	0.0	N/A	0.0	N/A	N/A	N/A	0.0	0.0	N/A	0.0	0.0	N/A	0.0	0.0	0.0	0.0	N/A	0.0
12951899	NC_035789.1	32649885	32649887	0.0	0.0	0.0	N/A	0.0	N/A	0.0	N/A	0.0	0.0	N/A	0.0	N/A	N/A	N/A	0.0	0.0	N/A	0.0	0.0	N/A	0.0	0.0	0.0	0.0	N/A	0.0
12951900	NC_035789.1	32649895	32649897	0.0	0.0	0.0	N/A	0.0	N/A	0.0	N/A	0.0	0.0	N/A	0.0	N/A	N/A	N/A	0.0	0.0	N/A	0

## 3. Sex-specific methylation landscape

### 3a. Female

In [9]:
!ls *F*10x.SNPcorr.bedgraph

16F_R1_val_1_10x.SNPcorr.bedgraph 41F_R1_val_1_10x.SNPcorr.bedgraph
19F_R1_val_1_10x.SNPcorr.bedgraph 44F_R1_val_1_10x.SNPcorr.bedgraph
22F_R1_val_1_10x.SNPcorr.bedgraph 50F_R1_val_1_10x.SNPcorr.bedgraph
29F_R1_val_1_10x.SNPcorr.bedgraph 52F_R1_val_1_10x.SNPcorr.bedgraph
35F_R1_val_1_10x.SNPcorr.bedgraph 53F_R1_val_1_10x.SNPcorr.bedgraph
36F_R1_val_1_10x.SNPcorr.bedgraph 54F_R1_val_1_10x.SNPcorr.bedgraph
39F_R1_val_1_10x.SNPcorr.bedgraph 76F_R1_val_1_10x.SNPcorr.bedgraph
3F_R1_val_1_10x.SNPcorr.bedgraph  77F_R1_val_1_10x.SNPcorr.bedgraph


In [35]:
#Create union BEDgraph from sorted files
#Include a header
#Use N/A when there is no data for a CpG in a sample
#Define sample IDs
#Use sorted bedgraphs
#Save output
!{bedtoolsDirectory}unionBedGraphs \
-header \
-filler N/A \
-names 16F 19F 22F 29F 35F 36F 39F 3F 41F 44F 50F 52F 53F 54F 76F 77F \
-i \
*F*10x.SNPcorr.bedgraph \
> fem-union_10x.bedgraph

In [36]:
#Check output
!head fem-union_10x.bedgraph
!wc -l fem-union_10x.bedgraph

chrom	start	end	16F	19F	22F	29F	35F	36F	39F	3F	41F	44F	50F	52F	53F	54F	76F	77F
NC_007175.2	48	50	1.923077	0.731452	1.015228	1.444623	0.844511	1.699182	0.541339	1.694915	1.928375	2.136076	1.672640	1.806240	1.100917	1.780694	0.682057	1.661475
NC_007175.2	50	52	1.626016	0.733855	1.507937	1.349325	0.600462	1.700880	0.599908	1.500000	2.570694	1.327434	1.874311	1.371951	1.069218	1.842105	0.442478	1.439539
NC_007175.2	87	89	1.045857	0.286907	0.789771	0.859599	0.666349	0.813008	0.444115	1.284875	0.800915	1.361796	0.959596	0.852273	0.758853	0.866927	0.208008	1.021477
NC_007175.2	146	148	1.502146	0.684369	1.081187	1.183206	0.721028	1.522344	0.562023	0.995025	1.548541	1.321760	1.619645	1.306458	1.210914	0.924855	0.475325	1.308017
NC_007175.2	192	194	1.726908	0.691776	1.094563	1.097734	0.996997	1.613626	0.538915	1.198820	1.716501	1.497326	1.575555	1.486346	1.194200	0.896287	0.558659	1.286383
NC_007175.2	245	247	1.228250	0.414110	0.794148	1.226456	0.824253	1.138753	0.516834	1.106095	1.192146	1.2878

In [37]:
#Import union data into pandas
#Check head
df = pd.read_table("fem-union_10x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,16F,19F,22F,29F,35F,36F,39F,3F,41F,44F,50F,52F,53F,54F,76F,77F
0,NC_007175.2,48,50,1.923077,0.731452,1.015228,1.444623,0.844511,1.699182,0.541339,1.694915,1.928375,2.136076,1.67264,1.80624,1.100917,1.780694,0.682057,1.661475
1,NC_007175.2,50,52,1.626016,0.733855,1.507937,1.349325,0.600462,1.70088,0.599908,1.5,2.570694,1.327434,1.874311,1.371951,1.069218,1.842105,0.442478,1.439539
2,NC_007175.2,87,89,1.045857,0.286907,0.789771,0.859599,0.666349,0.813008,0.444115,1.284875,0.800915,1.361796,0.959596,0.852273,0.758853,0.866927,0.208008,1.021477
3,NC_007175.2,146,148,1.502146,0.684369,1.081187,1.183206,0.721028,1.522344,0.562023,0.995025,1.548541,1.32176,1.619645,1.306458,1.210914,0.924855,0.475325,1.308017
4,NC_007175.2,192,194,1.726908,0.691776,1.094563,1.097734,0.996997,1.613626,0.538915,1.19882,1.716501,1.497326,1.575555,1.486346,1.1942,0.896287,0.558659,1.286383


In [38]:
#Average all samples for total genome methylation information and save as a new column
#NA are not included in averages
#Check output
df['total'] = df[['16F', '19F', '22F', '29F', '35F', '36F', '39F', '3F', '41F', '44F', '50F', '52F', '53F', '54F', '76F', '77F']].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,16F,19F,22F,29F,35F,36F,39F,3F,41F,44F,50F,52F,53F,54F,76F,77F,total
12678561,NC_035789.1,32649654,32649656,0.0,,,0.0,,0.0,,0.0,,,0.0,,,,0.0,0.0,0.0
12678562,NC_035789.1,32649732,32649734,0.0,,,0.0,,,,0.0,,,,,0.0,,,0.0,0.0
12678563,NC_035789.1,32649736,32649738,0.0,,,0.0,,,,0.0,,,,,0.0,,,0.0,0.0
12678564,NC_035789.1,32649799,32649801,,,,,0.0,,,0.0,,,,,,0.0,,,0.0
12678565,NC_035789.1,32649876,32649878,0.0,,0.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,,0.0,0.0,0.0,0.0
12678566,NC_035789.1,32649885,32649887,0.0,,0.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,,0.0,0.0,0.0,0.0
12678567,NC_035789.1,32649895,32649897,0.0,,0.0,0.0,0.0,0.0,,0.0,,,0.0,0.0,,0.0,0.0,0.0,0.0
12678568,NC_035789.1,32649930,32649932,,,0.0,,0.0,,,0.0,,,,0.0,,0.0,,0.0,0.0
12678569,NC_035789.1,32649933,32649935,,,0.0,,0.0,,,0.0,,,,0.0,,0.0,,0.0,0.0
12678570,NC_035789.1,32649966,32649968,,,0.0,,0.0,,,0.0,,,,,,0.0,,0.0,0.0


In [55]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("fem-union-averages_10x.SNPcorr.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

In [56]:
#Check pandas manipulations
!tail fem-union-averages_10x.SNPcorr.bedgraph

11991486	NC_035789.1	32649654	32649656	0.0	0.0	N/A	N/A	0.0	0.0	N/A	0.0	0.0	N/A	0.0
11991487	NC_035789.1	32649732	32649734	0.0	0.0	N/A	N/A	0.0	N/A	N/A	0.0	0.0	N/A	0.0
11991488	NC_035789.1	32649736	32649738	0.0	0.0	N/A	N/A	0.0	N/A	N/A	0.0	0.0	N/A	0.0
11991489	NC_035789.1	32649799	32649801	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0
11991490	NC_035789.1	32649876	32649878	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
11991491	NC_035789.1	32649885	32649887	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
11991492	NC_035789.1	32649895	32649897	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
11991493	NC_035789.1	32649930	32649932	N/A	0.0	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	0.0
11991494	NC_035789.1	32649933	32649935	N/A	0.0	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	0.0
11991495	NC_035789.1	32649966	32649968	N/A	0.0	N/A	N/A	N/A	0.0	0.0	N/A	N/A	N/A	0.0


### 3b. Male methylation landscape

In [43]:
#Create union BEDgraph from sorted files
#Include a header
#Use N/A when there is no data for a CpG in a sample
#Define sample IDs
#Use sorted bedgraphs
#Save output
!{bedtoolsDirectory}unionBedGraphs \
-header \
-filler N/A \
-names 12M 13M 23M 31M 48M 59M 64M 6M 7M 9M \
-i \
*M*10x.SNPcorr.bedgraph \
> male-union_10x.bedgraph

In [44]:
#Check output
!head male-union_10x.bedgraph
!wc -l male-union_10x.bedgraph

chrom	start	end	12M	13M	23M	31M	48M	59M	64M	6M	7M	9M
NC_007175.2	48	50	0.000000	0.000000	0.000000	3.125000	1.086957	0.000000	3.973510	0.653595	3.750000	1.754386
NC_007175.2	50	52	0.000000	0.000000	0.000000	1.470588	1.036269	0.000000	3.797468	0.632911	3.488372	1.666667
NC_007175.2	87	89	1.169591	1.293103	0.990099	0.671141	0.245700	0.000000	0.319489	0.666667	0.478469	0.000000
NC_007175.2	146	148	1.261830	0.461894	0.819672	0.332226	0.383142	0.000000	2.027027	0.702988	1.566580	0.456621
NC_007175.2	192	194	1.129944	1.271186	1.955990	0.303951	0.907029	0.476190	1.369863	0.161031	2.195122	0.840336
NC_007175.2	245	247	0.829876	0.550964	0.626959	1.321586	0.891530	1.986755	1.195219	0.451467	0.993377	0.609756
NC_007175.2	256	258	0.738007	0.496278	0.877193	0.404858	0.405954	1.111111	2.083333	0.202429	0.282486	1.117318
NC_007175.2	263	265	1.365188	0.917431	1.133144	0.000000	0.527704	1.058201	1.827243	0.193424	0.555556	0.537634
NC_007175.2	265	267	0.337838	0.453515	1.111111	0.763359	0.651890	1.595745

In [45]:
#Import union data into pandas
#Check head
df = pd.read_table("male-union_10x.bedgraph")
df.head(5)

Unnamed: 0,chrom,start,end,12M,13M,23M,31M,48M,59M,64M,6M,7M,9M
0,NC_007175.2,48,50,0.0,0.0,0.0,3.125,1.086957,0.0,3.97351,0.653595,3.75,1.754386
1,NC_007175.2,50,52,0.0,0.0,0.0,1.470588,1.036269,0.0,3.797468,0.632911,3.488372,1.666667
2,NC_007175.2,87,89,1.169591,1.293103,0.990099,0.671141,0.2457,0.0,0.319489,0.666667,0.478469,0.0
3,NC_007175.2,146,148,1.26183,0.461894,0.819672,0.332226,0.383142,0.0,2.027027,0.702988,1.56658,0.456621
4,NC_007175.2,192,194,1.129944,1.271186,1.95599,0.303951,0.907029,0.47619,1.369863,0.161031,2.195122,0.840336


In [46]:
#Average all samples for total genome methylation information and save as a new column
#NA are not included in averages
#Check output
df['total'] = df[['12M', '13M', '23M', '31M', '48M', '59M', '64M', '6M', '7M', '9M']].mean(axis=1)
df.tail(10)

Unnamed: 0,chrom,start,end,12M,13M,23M,31M,48M,59M,64M,6M,7M,9M,total
11991486,NC_035789.1,32649654,32649656,0.0,0.0,,,0.0,0.0,,0.0,0.0,,0.0
11991487,NC_035789.1,32649732,32649734,0.0,0.0,,,0.0,,,0.0,0.0,,0.0
11991488,NC_035789.1,32649736,32649738,0.0,0.0,,,0.0,,,0.0,0.0,,0.0
11991489,NC_035789.1,32649799,32649801,,0.0,,,,,,,,,0.0
11991490,NC_035789.1,32649876,32649878,0.0,0.0,,,,0.0,,0.0,0.0,,0.0
11991491,NC_035789.1,32649885,32649887,0.0,0.0,,,,0.0,,0.0,0.0,,0.0
11991492,NC_035789.1,32649895,32649897,0.0,0.0,,,,0.0,,0.0,0.0,,0.0
11991493,NC_035789.1,32649930,32649932,,0.0,,,,0.0,,0.0,,,0.0
11991494,NC_035789.1,32649933,32649935,,0.0,,,,0.0,0.0,0.0,,,0.0
11991495,NC_035789.1,32649966,32649968,,0.0,,,,0.0,0.0,,,,0.0


In [59]:
#Save dataframe in a tabular format and include N/As. Do not include quotes.
df.to_csv("male-union-averages_10x.SNPcorr.bedgraph", sep = "\t", na_rep = "N/A", quoting = 3)

In [60]:
#Check pandas manipulations
!tail male-union-averages_10x.SNPcorr.bedgraph

11991486	NC_035789.1	32649654	32649656	0.0	0.0	N/A	N/A	0.0	0.0	N/A	0.0	0.0	N/A	0.0
11991487	NC_035789.1	32649732	32649734	0.0	0.0	N/A	N/A	0.0	N/A	N/A	0.0	0.0	N/A	0.0
11991488	NC_035789.1	32649736	32649738	0.0	0.0	N/A	N/A	0.0	N/A	N/A	0.0	0.0	N/A	0.0
11991489	NC_035789.1	32649799	32649801	N/A	0.0	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	0.0
11991490	NC_035789.1	32649876	32649878	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
11991491	NC_035789.1	32649885	32649887	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
11991492	NC_035789.1	32649895	32649897	0.0	0.0	N/A	N/A	N/A	0.0	N/A	0.0	0.0	N/A	0.0
11991493	NC_035789.1	32649930	32649932	N/A	0.0	N/A	N/A	N/A	0.0	N/A	0.0	N/A	N/A	0.0
11991494	NC_035789.1	32649933	32649935	N/A	0.0	N/A	N/A	N/A	0.0	0.0	0.0	N/A	N/A	0.0
11991495	NC_035789.1	32649966	32649968	N/A	0.0	N/A	N/A	N/A	0.0	0.0	N/A	N/A	N/A	0.0


## 4. Characterize methylation for each CpG dinucleotude

I will use the following definitions:

- highly methylated: ≥ 50%
- moderately methylated: 10-50%
- lowly methylated : ≤ 10%

I will use the files with corrected SNP positions.

In [61]:
!find *10x.SNPcorr.bedgraph
!wc -l *10x.SNPcorr.bedgraph

12M_R1_val_1_10x.SNPcorr.bedgraph
13M_R1_val_1_10x.SNPcorr.bedgraph
16F_R1_val_1_10x.SNPcorr.bedgraph
19F_R1_val_1_10x.SNPcorr.bedgraph
22F_R1_val_1_10x.SNPcorr.bedgraph
23M_R1_val_1_10x.SNPcorr.bedgraph
29F_R1_val_1_10x.SNPcorr.bedgraph
31M_R1_val_1_10x.SNPcorr.bedgraph
35F_R1_val_1_10x.SNPcorr.bedgraph
36F_R1_val_1_10x.SNPcorr.bedgraph
39F_R1_val_1_10x.SNPcorr.bedgraph
3F_R1_val_1_10x.SNPcorr.bedgraph
41F_R1_val_1_10x.SNPcorr.bedgraph
44F_R1_val_1_10x.SNPcorr.bedgraph
48M_R1_val_1_10x.SNPcorr.bedgraph
50F_R1_val_1_10x.SNPcorr.bedgraph
52F_R1_val_1_10x.SNPcorr.bedgraph
53F_R1_val_1_10x.SNPcorr.bedgraph
54F_R1_val_1_10x.SNPcorr.bedgraph
59M_R1_val_1_10x.SNPcorr.bedgraph
64M_R1_val_1_10x.SNPcorr.bedgraph
6M_R1_val_1_10x.SNPcorr.bedgraph
76F_R1_val_1_10x.SNPcorr.bedgraph
77F_R1_val_1_10x.SNPcorr.bedgraph
7M_R1_val_1_10x.SNPcorr.bedgraph
9M_R1_val_1_10x.SNPcorr.bedgraph
fem-union-averages_10x.SNPcorr.bedgraph
male-union-averages_10x.SNPcorr.bedgraph
 8414935 12M_R1_val_1_10x.SNPcorr.bedgr

### 4a. Highly methylated loci

In [62]:
%%bash
for f in *10x.SNPcorr.bedgraph
do
    awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-Meth
done

In [63]:
!head *-Meth

==> 12M_R1_val_1_10x.SNPcorr.bedgraph-Meth <==
NC_035780.1 100575 100577 93.750000
NC_035780.1 100634 100636 98.461538
NC_035780.1 100643 100645 98.571429
NC_035780.1 100651 100653 98.529412
NC_035780.1 100664 100666 98.484848
NC_035780.1 101083 101085 100.000000
NC_035780.1 101305 101307 95.121951
NC_035780.1 101408 101410 96.774194
NC_035780.1 101464 101466 94.871795
NC_035780.1 101594 101596 96.153846

==> 13M_R1_val_1_10x.SNPcorr.bedgraph-Meth <==
NC_035780.1 22531 22533 56.250000
NC_035780.1 22536 22538 56.250000
NC_035780.1 75804 75806 58.823529
NC_035780.1 100558 100560 51.351351
NC_035780.1 100575 100577 96.551724
NC_035780.1 100634 100636 98.611111
NC_035780.1 100643 100645 95.000000
NC_035780.1 100651 100653 94.736842
NC_035780.1 100664 100666 95.945946
NC_035780.1 100974 100976 98.529412

==> 16F_R1_val_1_10x.SNPcorr.bedgraph-Meth <==
NC_035780.1 100575 100577 100.000000
NC_035780.1 100634 100636 91.525424
NC_035780.1 100643 100645 80.303030
NC_035780.1 100651 100653 93.7500


==> 6M_R1_val_1_10x.SNPcorr.bedgraph-Meth <==
NC_035780.1 100575 100577 100.000000
NC_035780.1 100634 100636 100.000000
NC_035780.1 100643 100645 97.916667
NC_035780.1 100651 100653 100.000000
NC_035780.1 100664 100666 97.727273
NC_035780.1 100974 100976 63.333333
NC_035780.1 101083 101085 97.142857
NC_035780.1 101305 101307 89.024390
NC_035780.1 101408 101410 100.000000
NC_035780.1 101464 101466 96.551724

==> 76F_R1_val_1_10x.SNPcorr.bedgraph-Meth <==
NC_035780.1 16113 16115 50.000000
NC_035780.1 100575 100577 97.916667
NC_035780.1 100581 100583 52.000000
NC_035780.1 100643 100645 61.818182
NC_035780.1 100651 100653 90.740741
NC_035780.1 100664 100666 78.947368
NC_035780.1 100916 100918 57.575758
NC_035780.1 101083 101085 80.851064
NC_035780.1 101305 101307 55.555556
NC_035780.1 101408 101410 54.545455

==> 77F_R1_val_1_10x.SNPcorr.bedgraph-Meth <==
NC_035780.1 69481 69483 50.000000
NC_035780.1 69487 69489 52.500000
NC_035780.1 100575 100577 95.238095
NC_035780.1 100634 100636 97.05

In [64]:
!wc -l *-Meth

 1172075 12M_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1166184 13M_R1_val_1_10x.SNPcorr.bedgraph-Meth
  989685 16F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  998280 19F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  925525 22F_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1207749 23M_R1_val_1_10x.SNPcorr.bedgraph-Meth
  928713 29F_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1119351 31M_R1_val_1_10x.SNPcorr.bedgraph-Meth
  997234 35F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  995424 36F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  972293 39F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  970953 3F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  825342 41F_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1036681 44F_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1099961 48M_R1_val_1_10x.SNPcorr.bedgraph-Meth
  964824 50F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  972492 52F_R1_val_1_10x.SNPcorr.bedgraph-Meth
  999871 53F_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1007868 54F_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1034559 59M_R1_val_1_10x.SNPcorr.bedgraph-Meth
 1123237 64M_R1_val_1_10x.SNPcorr.bedgrap

In [68]:
#Get line counts for each fine
# Remove 10th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-Meth \
| sed '29,$ d' \
| awk '{print $1"\t"$2}' \
> Meth-counts.txt

In [69]:
!tail Meth-counts.txt

1007868	54F_R1_val_1_10x.SNPcorr.bedgraph-Meth
1034559	59M_R1_val_1_10x.SNPcorr.bedgraph-Meth
1123237	64M_R1_val_1_10x.SNPcorr.bedgraph-Meth
1050757	6M_R1_val_1_10x.SNPcorr.bedgraph-Meth
998920	76F_R1_val_1_10x.SNPcorr.bedgraph-Meth
983080	77F_R1_val_1_10x.SNPcorr.bedgraph-Meth
1136818	7M_R1_val_1_10x.SNPcorr.bedgraph-Meth
1065292	9M_R1_val_1_10x.SNPcorr.bedgraph-Meth
11991495	fem-union-averages_10x.SNPcorr.bedgraph-Meth
11991495	male-union-averages_10x.SNPcorr.bedgraph-Meth


### 4b. Moderately methylated loci

In [70]:
%%bash
for f in *10x.SNPcorr.bedgraph
do
    awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
    | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
    > ${f}-modMeth
done

In [71]:
!head *-modMeth

==> 12M_R1_val_1_10x.SNPcorr.bedgraph-modMeth <==
NC_035780.1 11418 11420 12.500000
NC_035780.1 11437 11439 15.789474
NC_035780.1 11446 11448 16.666667
NC_035780.1 11465 11467 20.000000
NC_035780.1 14430 14432 27.272727
NC_035780.1 14453 14455 14.285714
NC_035780.1 16113 16115 45.454545
NC_035780.1 16307 16309 14.285714
NC_035780.1 16817 16819 15.000000
NC_035780.1 21767 21769 10.526316

==> 13M_R1_val_1_10x.SNPcorr.bedgraph-modMeth <==
NC_035780.1 9266 9268 12.121212
NC_035780.1 14430 14432 15.384615
NC_035780.1 14453 14455 15.384615
NC_035780.1 19741 19743 17.567568
NC_035780.1 19960 19962 17.000000
NC_035780.1 20026 20028 11.666667
NC_035780.1 20044 20046 10.526316
NC_035780.1 20081 20083 10.416667
NC_035780.1 22529 22531 43.750000
NC_035780.1 23584 23586 23.076923

==> 16F_R1_val_1_10x.SNPcorr.bedgraph-modMeth <==
NC_035780.1 12757 12759 41.666667
NC_035780.1 12796 12798 27.272727
NC_035780.1 12835 12837 20.000000
NC_035780.1 14430 14432 18.750000
NC_035780.1 14453 14455 18.750000



==> 76F_R1_val_1_10x.SNPcorr.bedgraph-modMeth <==
NC_035780.1 9853 9855 11.111111
NC_035780.1 10160 10162 10.638298
NC_035780.1 10200 10202 16.129032
NC_035780.1 10258 10260 18.181818
NC_035780.1 17551 17553 28.571429
NC_035780.1 20623 20625 18.000000
NC_035780.1 20625 20627 22.000000
NC_035780.1 20630 20632 12.727273
NC_035780.1 22558 22560 13.793103
NC_035780.1 22592 22594 16.000000

==> 77F_R1_val_1_10x.SNPcorr.bedgraph-modMeth <==
NC_035780.1 1882 1884 12.550607
NC_035780.1 10160 10162 23.809524
NC_035780.1 10200 10202 18.181818
NC_035780.1 20157 20159 11.764706
NC_035780.1 23620 23622 12.903226
NC_035780.1 23892 23894 10.344828
NC_035780.1 23896 23898 11.111111
NC_035780.1 36010 36012 10.810811
NC_035780.1 69491 69493 45.945946
NC_035780.1 69511 69513 20.833333

==> 7M_R1_val_1_10x.SNPcorr.bedgraph-modMeth <==
NC_035780.1 9637 9639 10.344828
NC_035780.1 9657 9659 12.121212
NC_035780.1 9729 9731 16.000000
NC_035780.1 9788 9790 11.538462
NC_035780.1 10160 10162 20.000000
NC_035780.

In [72]:
!wc -l *-modMeth

  469413 12M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  468971 13M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  628398 16F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  640938 19F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  589664 22F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  481345 23M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  607814 29F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  372895 31M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  599484 35F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  610781 36F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  589170 39F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  587527 3F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  550976 41F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  816442 44F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  430914 48M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  636465 50F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  650263 52F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  629421 53F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  638438 54F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
  353563 59M_R1_val_1_10x.SNPcor

In [73]:
#Get line counts for each fine
# Remove 29th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-modMeth \
| sed '29,$ d' \
| awk '{print $1"\t"$2}' \
> modMeth-counts.txt

In [75]:
!tail modMeth-counts.txt

638438	54F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
353563	59M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
425919	64M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
397826	6M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
566946	76F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
600153	77F_R1_val_1_10x.SNPcorr.bedgraph-modMeth
407720	7M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
383314	9M_R1_val_1_10x.SNPcorr.bedgraph-modMeth
2	fem-union-averages_10x.SNPcorr.bedgraph-modMeth
2	male-union-averages_10x.SNPcorr.bedgraph-modMeth


### 4c. Lowly methylated loci

In [77]:
%%bash
for f in *10x.SNPcorr.bedgraph
do
    awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
    > ${f}-lowMeth
done

In [78]:
!head *-lowMeth

==> 10x.SNPcorr.bedgraph-lowMeth <==

==> 12M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth <==
NC_007175.2 48 50 0.000000
NC_007175.2 50 52 0.000000
NC_007175.2 87 89 1.169591
NC_007175.2 146 148 1.261830
NC_007175.2 192 194 1.129944
NC_007175.2 245 247 0.829876
NC_007175.2 256 258 0.738007
NC_007175.2 263 265 1.365188
NC_007175.2 265 267 0.337838
NC_007175.2 331 333 0.645161

==> 13M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth <==
NC_007175.2 48 50 0.000000
NC_007175.2 50 52 0.000000
NC_007175.2 87 89 1.293103
NC_007175.2 146 148 0.461894
NC_007175.2 192 194 1.271186
NC_007175.2 245 247 0.550964
NC_007175.2 256 258 0.496278
NC_007175.2 263 265 0.917431
NC_007175.2 265 267 0.453515
NC_007175.2 331 333 0.826446

==> 16F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth <==
NC_007175.2 48 50 1.923077
NC_007175.2 50 52 1.626016
NC_007175.2 87 89 1.045857
NC_007175.2 146 148 1.502146
NC_007175.2 192 194 1.726908
NC_007175.2 245 247 1.228250
NC_007175.2 256 258 1.417467
NC_007175.2 263 265 1.681034
NC_007175.2 265 


==> 9M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth <==
NC_007175.2 48 50 1.754386
NC_007175.2 50 52 1.666667
NC_007175.2 87 89 0.000000
NC_007175.2 146 148 0.456621
NC_007175.2 192 194 0.840336
NC_007175.2 245 247 0.609756
NC_007175.2 256 258 1.117318
NC_007175.2 263 265 0.537634
NC_007175.2 265 267 0.000000
NC_007175.2 331 333 1.010101

==> fem-union-averages_10x.SNPcorr.bedgraph-lowMeth <==

==> male-union-averages_10x.SNPcorr.bedgraph-lowMeth <==


In [79]:
!wc -l *-lowMeth

       0 10x.SNPcorr.bedgraph-lowMeth
 6773447 12M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6795754 13M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6657213 16F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6678673 19F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6105344 22F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6864672 23M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6412017 29F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6266637 31M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6504469 35F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6584940 36F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6375032 39F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6772748 3F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 5471284 41F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 7538432 44F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6377489 48M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6516209 50F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6753644 52F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6684138 53F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
 6750931 54F_R1_val_1_10x.SNPcorr.bedgraph-lo

In [80]:
#Get line counts for each fine
# Remove 10th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-lowMeth \
| sed '29,$ d' \
| awk '{print $1"\t"$2}' \
> lowMeth-counts.txt

In [81]:
!head lowMeth-counts.txt

0	10x.SNPcorr.bedgraph-lowMeth
6773447	12M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6795754	13M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6657213	16F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6678673	19F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6105344	22F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6864672	23M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6412017	29F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6266637	31M_R1_val_1_10x.SNPcorr.bedgraph-lowMeth
6504469	35F_R1_val_1_10x.SNPcorr.bedgraph-lowMeth


## 5. Characterize genomic location of CpGs

I will identify overlaps between CpG loci (methylated, sparsely methylated, unmethylated) and various genome feature tracks:

- gene
- exon UTR
- CDS
- intron
- upstream flanks
- downstream flanks
- intergenic regions
- lncRNA
- transposable elements

Since the exon track = exon UTR + CDS, and mRNA = exon + intron, I will not need to use those tracks separately.

In [31]:
#28 file types (26 samples + 2 unions), 3 files per type (Meth, modMeth, lowMeth) = 27 total
!find 10x.SNPcorr.bedgraph-*
!find 10x.SNPcorr.bedgraph-* | wc -l

zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth
zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth
zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth
zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth
zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth
zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth
zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth
zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth
zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth
zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth
zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth
zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth
zr3616_5_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth
zr3616_5_R1_val_1_val_1_val_1_bismark_bt2_pe.

In [32]:
%%bash

for f in zr3616*5x.bedgraph*SNPs-*
do
    awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
    wc -l ${f}.bed
done

  852626 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
  651862 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
 6644420 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
  840818 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
  654311 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
 6698813 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
  861662 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
  622053 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
 6626863 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
  891492 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
  668946 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
 6852461 zr3616_4_R1_val_1_val_1_val_1_bismark

In [33]:
!find *bed

zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed
zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed
zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed
zr3616_5_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgrap

### 4b. Gene

In [9]:
#Get overlaps between CpG background and genes for downstream annotation

!{bedtoolsDirectory}intersectBed \
-wb \
-a zr3616_union-averages_5x.bedgraph.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> zr3616_union-averages_5x-Gene-wb.bed
!head zr3616_union-averages_5x-Gene-wb.bed
!wc -l zr3616_union-averages_5x-Gene-wb.bed

NC_047559.1	10155	10157	NC_047559.1	Gnomon	gene	9839	11386	.	+	.	ID=gene-LOC117693020;Dbxref=GeneID:117693020;Name=LOC117693020;gbkey=Gene;gene=LOC117693020;gene_biotype=lncRNA
NC_047559.1	10215	10217	NC_047559.1	Gnomon	gene	9839	11386	.	+	.	ID=gene-LOC117693020;Dbxref=GeneID:117693020;Name=LOC117693020;gbkey=Gene;gene=LOC117693020;gene_biotype=lncRNA
NC_047559.1	10270	10272	NC_047559.1	Gnomon	gene	9839	11386	.	+	.	ID=gene-LOC117693020;Dbxref=GeneID:117693020;Name=LOC117693020;gbkey=Gene;gene=LOC117693020;gene_biotype=lncRNA
NC_047559.1	10292	10294	NC_047559.1	Gnomon	gene	9839	11386	.	+	.	ID=gene-LOC117693020;Dbxref=GeneID:117693020;Name=LOC117693020;gbkey=Gene;gene=LOC117693020;gene_biotype=lncRNA
NC_047559.1	10314	10316	NC_047559.1	Gnomon	gene	9839	11386	.	+	.	ID=gene-LOC117693020;Dbxref=GeneID:117693020;Name=LOC117693020;gbkey=Gene;gene=LOC117693020;gene_biotype=lncRNA
NC_047559.1	10358	10360	NC_047559.1	Gnomon	gene	9839	11386	.	+	.	ID=gene-LOC117693020;Dbxref=GeneID:117693020;Name=

In [34]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
    > ${f}-Gene
done

In [35]:
#Check output
!head *Gene

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-Gene <==
NC_047559.1	360322	360324
NC_047559.1	360361	360363
NC_047559.1	361347	361349
NC_047559.1	364329	364331
NC_047559.1	375341	375343
NC_047559.1	375360	375362
NC_047559.1	376111	376113
NC_047559.1	376170	376172
NC_047559.1	376962	376964
NC_047559.1	377335	377337

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-Gene <==
NC_047559.1	10899	10901
NC_047559.1	18234	18236
NC_047559.1	61081	61083
NC_047559.1	68070	68072
NC_047559.1	100249	100251
NC_047559.1	100276	100278
NC_047559.1	100305	100307
NC_047559.1	100319	100321
NC_047559.1	100440	100442
NC_047559.1	100454	100456

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-Gene <==
NC_047559.1	10270	10272
NC_047559.1	10292	10294
NC_047559.1	10314	10316
NC_047559.1	10358	10360
NC_047559.1	10380	10382
NC_047559.1	10391	10393
NC_047559.1	10402	10404
NC_047559.1	10413	10415
NC_047559.1	10457	1045


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-Gene <==
NC_047559.1	60313	60315
NC_047559.1	60371	60373
NC_047559.1	60492	60494
NC_047559.1	61081	61083
NC_047559.1	62078	62080
NC_047559.1	62234	62236
NC_047559.1	63068	63070
NC_047559.1	63072	63074
NC_047559.1	63078	63080
NC_047559.1	64815	64817

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-Gene <==
NC_047559.1	10155	10157
NC_047559.1	10215	10217
NC_047559.1	10270	10272
NC_047559.1	10292	10294
NC_047559.1	10314	10316
NC_047559.1	10358	10360
NC_047559.1	10380	10382
NC_047559.1	10391	10393
NC_047559.1	10402	10404
NC_047559.1	10413	10415


In [36]:
#Count number of overlaps
!wc -l *Gene

  793356 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-Gene
  456348 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-Gene
 3615169 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-Gene
  780836 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-Gene
  454638 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-Gene
 3648886 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-Gene
  800933 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-Gene
  432941 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-Gene
 3613729 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-Gene
  828225 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-Gene
  463701 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.

In [37]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-Gene \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-Gene-counts.txt

### 4c. CDS

In [40]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
    > ${f}-CDS
done

In [41]:
#Check output
!head *CDS

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-CDS <==
NC_047559.1	432531	432533
NC_047559.1	432606	432608
NC_047559.1	433203	433205
NC_047559.1	433271	433273
NC_047559.1	545968	545970
NC_047559.1	548186	548188
NC_047559.1	548188	548190
NC_047559.1	548918	548920
NC_047559.1	549827	549829
NC_047559.1	549832	549834

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-CDS <==
NC_047559.1	177876	177878
NC_047559.1	337616	337618
NC_047559.1	337639	337641
NC_047559.1	338662	338664
NC_047559.1	338708	338710
NC_047559.1	338753	338755
NC_047559.1	338777	338779
NC_047559.1	338819	338821
NC_047559.1	338862	338864
NC_047559.1	432696	432698

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-CDS <==
NC_047559.1	14862	14864
NC_047559.1	15162	15164
NC_047559.1	15194	15196
NC_047559.1	15204	15206
NC_047559.1	15291	15293
NC_047559.1	15293	15295
NC_047559.1	15336	15338
NC_047559.1	15338	15340
NC_047559.1	15357


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-Meth.bed-CDS <==
NC_047559.1	140284	140286
NC_047559.1	140286	140288
NC_047559.1	432531	432533
NC_047559.1	432606	432608
NC_047559.1	432670	432672
NC_047559.1	432696	432698
NC_047559.1	433203	433205
NC_047559.1	433271	433273
NC_047559.1	545968	545970
NC_047559.1	548148	548150

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-CDS <==
NC_047559.1	140131	140133
NC_047559.1	140139	140141
NC_047559.1	140172	140174
NC_047559.1	140209	140211
NC_047559.1	140241	140243
NC_047559.1	140257	140259
NC_047559.1	140292	140294
NC_047559.1	140295	140297
NC_047559.1	140334	140336
NC_047559.1	140381	140383

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-CDS <==
NC_047559.1	14862	14864
NC_047559.1	15099	15101
NC_047559.1	15162	15164
NC_047559.1	15194	15196
NC_047559.1	15204	15206
NC_047559.1	15291	15293
NC_047559.1	15293	15295
NC_047559.1	15336	15338
NC_047559.1	15338	15340
NC_047559.1	15357	15359


In [42]:
#Count number of overlaps
!wc -l *CDS

  305690 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-CDS
   62303 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-CDS
  829997 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-CDS
  304850 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-CDS
   63217 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-CDS
  827906 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-CDS
  310883 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-CDS
   55085 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-CDS
  826079 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-CDS
  313947 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-CDS
   58402 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-CDS
  

In [43]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-CDS \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-CDS-counts.txt

### 4d. Exon UTR

In [44]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
    > ${f}-exonUTR
done

In [45]:
#Check output
!head *exonUTR

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-exonUTR <==
NC_047559.1	545229	545231
NC_047559.1	545256	545258
NC_047559.1	571906	571908
NC_047559.1	571929	571931
NC_047559.1	572049	572051
NC_047559.1	572233	572235
NC_047559.1	572245	572247
NC_047559.1	572263	572265
NC_047559.1	572453	572455
NC_047559.1	572557	572559

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-exonUTR <==
NC_047559.1	10899	10901
NC_047559.1	431124	431126
NC_047559.1	431205	431207
NC_047559.1	545687	545689
NC_047559.1	571583	571585
NC_047559.1	571838	571840
NC_047559.1	572075	572077
NC_047559.1	572153	572155
NC_047559.1	572367	572369
NC_047559.1	572615	572617

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-exonUTR <==
NC_047559.1	10920	10922
NC_047559.1	10950	10952
NC_047559.1	11000	11002
NC_047559.1	11026	11028
NC_047559.1	14214	14216
NC_047559.1	14232	14234
NC_047559.1	14243	14245
NC_047559.1	14259	14261
NC_0475


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-Meth.bed-exonUTR <==
NC_047559.1	418038	418040
NC_047559.1	418118	418120
NC_047559.1	418153	418155
NC_047559.1	418166	418168
NC_047559.1	418179	418181
NC_047559.1	431124	431126
NC_047559.1	545256	545258
NC_047559.1	571583	571585
NC_047559.1	571838	571840
NC_047559.1	571906	571908

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-exonUTR <==
NC_047559.1	62234	62236
NC_047559.1	343574	343576
NC_047559.1	343603	343605
NC_047559.1	372095	372097
NC_047559.1	415945	415947
NC_047559.1	415975	415977
NC_047559.1	415998	416000
NC_047559.1	416071	416073
NC_047559.1	416078	416080
NC_047559.1	416088	416090

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-exonUTR <==
NC_047559.1	10899	10901
NC_047559.1	10920	10922
NC_047559.1	10950	10952
NC_047559.1	11000	11002
NC_047559.1	11026	11028
NC_047559.1	11080	11082
NC_047559.1	14214	14216
NC_047559.1	14232	14234
NC_047559.1	14243	14245
NC_047559.1	14259	14261


In [46]:
#Count number of overlaps
!wc -l *exonUTR

   48945 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-exonUTR
   32728 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-exonUTR
  422149 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-exonUTR
   46369 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-exonUTR
   32998 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-exonUTR
  427078 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-exonUTR
   49947 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-exonUTR
   31460 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-exonUTR
  422437 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-exonUTR
   51903 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-exonUTR
   33259 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5

In [47]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-exonUTR \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-exonUTR-counts.txt

### 4e. Intron

In [48]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
    > ${f}-intron
done

In [49]:
#Check output
!head *intron

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intron <==
NC_047559.1	360322	360324
NC_047559.1	360361	360363
NC_047559.1	361347	361349
NC_047559.1	364329	364331
NC_047559.1	375341	375343
NC_047559.1	375360	375362
NC_047559.1	376111	376113
NC_047559.1	376170	376172
NC_047559.1	376962	376964
NC_047559.1	377335	377337

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intron <==
NC_047559.1	18234	18236
NC_047559.1	61081	61083
NC_047559.1	68070	68072
NC_047559.1	100249	100251
NC_047559.1	100276	100278
NC_047559.1	100305	100307
NC_047559.1	100319	100321
NC_047559.1	100440	100442
NC_047559.1	100454	100456
NC_047559.1	101107	101109

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intron <==
NC_047559.1	10270	10272
NC_047559.1	10292	10294
NC_047559.1	10314	10316
NC_047559.1	10358	10360
NC_047559.1	10380	10382
NC_047559.1	10391	10393
NC_047559.1	10402	10404
NC_047559.1	10413	10415
NC_047559.1	10


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-Meth.bed-intron <==
NC_047559.1	61996	61998
NC_047559.1	110795	110797
NC_047559.1	356335	356337
NC_047559.1	356378	356380
NC_047559.1	356395	356397
NC_047559.1	356430	356432
NC_047559.1	356436	356438
NC_047559.1	356532	356534
NC_047559.1	356685	356687
NC_047559.1	356699	356701

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-intron <==
NC_047559.1	60313	60315
NC_047559.1	60371	60373
NC_047559.1	60492	60494
NC_047559.1	61081	61083
NC_047559.1	62078	62080
NC_047559.1	63068	63070
NC_047559.1	63072	63074
NC_047559.1	63078	63080
NC_047559.1	64815	64817
NC_047559.1	66982	66984

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-intron <==
NC_047559.1	10155	10157
NC_047559.1	10215	10217
NC_047559.1	10270	10272
NC_047559.1	10292	10294
NC_047559.1	10314	10316
NC_047559.1	10358	10360
NC_047559.1	10380	10382
NC_047559.1	10391	10393
NC_047559.1	10402	10404
NC_047559.1	10413	10415


In [50]:
#Count number of overlaps
!wc -l *intron

  441010 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intron
  361922 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intron
 2370103 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intron
  431889 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intron
  359018 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intron
 2400993 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intron
  442471 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intron
  346894 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intron
 2372125 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intron
  464828 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intron
  372558 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph

In [51]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-intron \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-intron-counts.txt

### 4f. Upstream flanks

In [52]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
    > ${f}-upstreamFlanks
done

In [53]:
#Check output
!head *upstreamFlanks

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-upstreamFlanks <==
NC_047559.1	576634	576636
NC_047559.1	576752	576754
NC_047559.1	1468258	1468260
NC_047559.1	1800917	1800919
NC_047559.1	1800924	1800926
NC_047559.1	2253122	2253124
NC_047559.1	3763635	3763637
NC_047559.1	3763649	3763651
NC_047559.1	3763653	3763655
NC_047559.1	3763678	3763680

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-upstreamFlanks <==
NC_047559.1	9237	9239
NC_047559.1	9658	9660
NC_047559.1	9661	9663
NC_047559.1	335850	335852
NC_047559.1	335858	335860
NC_047559.1	335878	335880
NC_047559.1	335886	335888
NC_047559.1	335892	335894
NC_047559.1	576308	576310
NC_047559.1	576693	576695

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-upstreamFlanks <==
NC_047559.1	9122	9124
NC_047559.1	9140	9142
NC_047559.1	9159	9161
NC_047559.1	9240	9242
NC_047559.1	9664	9666
NC_047559.1	9774	9776
NC_047559.1	9781	9783
NC_047559.1	9787	9


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-upstreamFlanks <==
NC_047559.1	8979	8981
NC_047559.1	9658	9660
NC_047559.1	141708	141710
NC_047559.1	141717	141719
NC_047559.1	141777	141779
NC_047559.1	141800	141802
NC_047559.1	141853	141855
NC_047559.1	141858	141860
NC_047559.1	141868	141870
NC_047559.1	141885	141887

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-upstreamFlanks <==
NC_047559.1	8970	8972
NC_047559.1	9122	9124
NC_047559.1	9140	9142
NC_047559.1	9159	9161
NC_047559.1	9237	9239
NC_047559.1	9240	9242
NC_047559.1	9603	9605
NC_047559.1	9661	9663
NC_047559.1	9664	9666
NC_047559.1	9774	9776


In [54]:
#Count number of overlaps
!wc -l *upstreamFlanks

    4883 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-upstreamFlanks
   15102 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-upstreamFlanks
  365079 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-upstreamFlanks
    4865 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-upstreamFlanks
   15819 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-upstreamFlanks
  366613 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-upstreamFlanks
    4934 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-upstreamFlanks
   15049 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-upstreamFlanks
  365126 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-upstreamFlanks
    5239 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-u

In [55]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-upstreamFlanks \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-upstreamFlanks-counts.txt

### 4g. Downstream flanks

In [56]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
    > ${f}-downstreamFlanks
done

In [57]:
#Check output
!head *downstreamFlanks

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-downstreamFlanks <==
NC_047559.1	264885	264887
NC_047559.1	264911	264913
NC_047559.1	264924	264926
NC_047559.1	344440	344442
NC_047559.1	344447	344449
NC_047559.1	344477	344479
NC_047559.1	344549	344551
NC_047559.1	344794	344796
NC_047559.1	344812	344814
NC_047559.1	344829	344831

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-downstreamFlanks <==
NC_047559.1	258442	258444
NC_047559.1	264959	264961
NC_047559.1	265013	265015
NC_047559.1	265028	265030
NC_047559.1	265111	265113
NC_047559.1	326295	326297
NC_047559.1	326317	326319
NC_047559.1	344861	344863
NC_047559.1	345009	345011
NC_047559.1	434413	434415

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-downstreamFlanks <==
NC_047559.1	16061	16063
NC_047559.1	16105	16107
NC_047559.1	16112	16114
NC_047559.1	16220	16222
NC_047559.1	16260	16262
NC_047559.1	16289	16291
NC_047559.1	16310	16312
NC


==> zr3616_8_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-downstreamFlanks <==
NC_047559.1	16061	16063
NC_047559.1	16105	16107
NC_047559.1	16112	16114
NC_047559.1	16220	16222
NC_047559.1	16260	16262
NC_047559.1	16289	16291
NC_047559.1	16310	16312
NC_047559.1	16369	16371
NC_047559.1	16409	16411
NC_047559.1	16438	16440

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-Meth.bed-downstreamFlanks <==
NC_047559.1	433588	433590
NC_047559.1	433598	433600
NC_047559.1	576634	576636
NC_047559.1	576693	576695
NC_047559.1	576752	576754
NC_047559.1	576866	576868
NC_047559.1	576880	576882
NC_047559.1	577126	577128
NC_047559.1	577456	577458
NC_047559.1	1010121	1010123

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-downstreamFlanks <==
NC_047559.1	139183	139185
NC_047559.1	141853	141855
NC_047559.1	141858	141860
NC_047559.1	141868	141870
NC_047559.1	141885	141887
NC_047559.1	141888	141890
NC_047559.1	141894	141896
NC_047559.1	141905	141907
NC_047559.1	141912	141914
N

In [58]:
#Count number of overlaps
!wc -l *downstreamFlanks

   20108 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-downstreamFlanks
   26394 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-downstreamFlanks
  304826 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-downstreamFlanks
   18944 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-downstreamFlanks
   26915 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-downstreamFlanks
  307855 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-downstreamFlanks
   19824 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-downstreamFlanks
   25556 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-downstreamFlanks
  306478 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-downstreamFlanks
   20462 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.

In [59]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-downstreamFlanks \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-downstreamFlanks-counts.txt

### 4h. Intergenic regions

In [60]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
    > ${f}-intergenic
done

In [61]:
#Check output
!head *intergenic

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intergenic <==
NC_047559.1	1547	1549
NC_047559.1	1571	1573
NC_047559.1	2267	2269
NC_047559.1	2291	2293
NC_047559.1	4073	4075
NC_047559.1	4791	4793
NC_047559.1	4835	4837
NC_047559.1	4843	4845
NC_047559.1	5605	5607
NC_047559.1	5613	5615

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intergenic <==
NC_047559.1	4887	4889
NC_047559.1	4909	4911
NC_047559.1	5500	5502
NC_047559.1	7716	7718
NC_047559.1	7814	7816
NC_047559.1	23610	23612
NC_047559.1	24932	24934
NC_047559.1	24934	24936
NC_047559.1	26463	26465
NC_047559.1	26485	26487

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intergenic <==
NC_047559.1	5883	5885
NC_047559.1	20252	20254
NC_047559.1	20297	20299
NC_047559.1	20319	20321
NC_047559.1	20341	20343
NC_047559.1	20363	20365
NC_047559.1	20385	20387
NC_047559.1	20407	20409
NC_047559.1	20429	20431
NC_047559.1	20451	20453

==> zr3616_2_R1_val


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-intergenic <==
NC_047559.1	5883	5885
NC_047559.1	8295	8297
NC_047559.1	12439	12441
NC_047559.1	12503	12505
NC_047559.1	12520	12522
NC_047559.1	12906	12908
NC_047559.1	12923	12925
NC_047559.1	20252	20254
NC_047559.1	20297	20299
NC_047559.1	20319	20321


In [62]:
#Count number of overlaps
!wc -l *intergenic

   35770 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intergenic
  155853 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intergenic
 2389807 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intergenic
   37568 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intergenic
  159001 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intergenic
 2406131 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intergenic
   37512 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intergenic
  150346 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-intergenic
 2372012 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-intergenic
   39199 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-intergenic
  163198 zr3616_4_R1_val_1

In [63]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-intergenic \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-intergenic-counts.txt

### 4i. lncRNA

In [64]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
    > ${f}-lncRNA
done

In [65]:
#Check output
!head *lncRNA

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-lncRNA <==
NC_047559.1	1664457	1664459
NC_047559.1	1664745	1664747
NC_047559.1	1664773	1664775
NC_047559.1	1664902	1664904
NC_047559.1	1665259	1665261
NC_047559.1	1688362	1688364
NC_047559.1	1688423	1688425
NC_047559.1	1688433	1688435
NC_047559.1	1688466	1688468
NC_047559.1	2139066	2139068

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-lncRNA <==
NC_047559.1	10899	10901
NC_047559.1	255751	255753
NC_047559.1	255789	255791
NC_047559.1	416920	416922
NC_047559.1	417337	417339
NC_047559.1	418447	418449
NC_047559.1	419514	419516
NC_047559.1	786896	786898
NC_047559.1	789201	789203
NC_047559.1	789687	789689

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-lncRNA <==
NC_047559.1	10270	10272
NC_047559.1	10292	10294
NC_047559.1	10314	10316
NC_047559.1	10358	10360
NC_047559.1	10380	10382
NC_047559.1	10391	10393
NC_047559.1	10402	10404
NC_047559.1	10


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-Meth.bed-lncRNA <==
NC_047559.1	417944	417946
NC_047559.1	418038	418040
NC_047559.1	418118	418120
NC_047559.1	418153	418155
NC_047559.1	418166	418168
NC_047559.1	418179	418181
NC_047559.1	418646	418648
NC_047559.1	418665	418667
NC_047559.1	419784	419786
NC_047559.1	419933	419935

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-lncRNA <==
NC_047559.1	255751	255753
NC_047559.1	255789	255791
NC_047559.1	415945	415947
NC_047559.1	415975	415977
NC_047559.1	415998	416000
NC_047559.1	416071	416073
NC_047559.1	416078	416080
NC_047559.1	416088	416090
NC_047559.1	416920	416922
NC_047559.1	416993	416995

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-lncRNA <==
NC_047559.1	10155	10157
NC_047559.1	10215	10217
NC_047559.1	10270	10272
NC_047559.1	10292	10294
NC_047559.1	10314	10316
NC_047559.1	10358	10360
NC_047559.1	10380	10382
NC_047559.1	10391	10393
NC_047559.1	10402	10404
NC_047559.1	10413	10415


In [66]:
#Count number of overlaps
!wc -l *lncRNA

   16324 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-lncRNA
   24360 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-lncRNA
  215175 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-lncRNA
   15369 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-lncRNA
   24724 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-lncRNA
  218468 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-lncRNA
   15533 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-lncRNA
   23462 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-lncRNA
  217209 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-lncRNA
   16604 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-lncRNA
   25814 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph

In [67]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-lncRNA \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-lncRNA-counts.txt

### 4j. Transposable elements

In [68]:
%%bash
for f in *bed
do
    /opt/homebrew/bin/intersectBed \
    -u \
    -a ${f} \
    -b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
    > ${f}-TE
done

In [69]:
#Check output
!head *TE

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-TE <==
NC_047559.1	91258	91260
NC_047559.1	91312	91314
NC_047559.1	232127	232129
NC_047559.1	234978	234980
NC_047559.1	264885	264887
NC_047559.1	264911	264913
NC_047559.1	264924	264926
NC_047559.1	293248	293250
NC_047559.1	294921	294923
NC_047559.1	294970	294972

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-TE <==
NC_047559.1	26463	26465
NC_047559.1	26485	26487
NC_047559.1	26966	26968
NC_047559.1	27211	27213
NC_047559.1	44183	44185
NC_047559.1	46646	46648
NC_047559.1	47794	47796
NC_047559.1	50864	50866
NC_047559.1	50869	50871
NC_047559.1	50878	50880

==> zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-TE <==
NC_047559.1	15746	15748
NC_047559.1	24588	24590
NC_047559.1	26419	26421
NC_047559.1	26448	26450
NC_047559.1	26509	26511
NC_047559.1	26532	26534
NC_047559.1	26547	26549
NC_047559.1	26555	26557
NC_047559.1	26561	26563
NC_047559.1	26573	26


==> zr3616_union-averages_5x.bedgraph.NO-SNPs-sparseMeth.bed-TE <==
NC_047559.1	45010	45012
NC_047559.1	46646	46648
NC_047559.1	47276	47278
NC_047559.1	47794	47796
NC_047559.1	50821	50823
NC_047559.1	50848	50850
NC_047559.1	50857	50859
NC_047559.1	50864	50866
NC_047559.1	50869	50871
NC_047559.1	50878	50880

==> zr3616_union-averages_5x.bedgraph.NO-SNPs-unMeth.bed-TE <==
NC_047559.1	15746	15748
NC_047559.1	24588	24590
NC_047559.1	25485	25487
NC_047559.1	25667	25669
NC_047559.1	26135	26137
NC_047559.1	26419	26421
NC_047559.1	26448	26450
NC_047559.1	26463	26465
NC_047559.1	26485	26487
NC_047559.1	26509	26511


In [70]:
#Count number of overlaps
!wc -l *TE

  246906 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-TE
  345635 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-TE
 2341295 zr3616_1_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-TE
  248977 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-TE
  352367 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-TE
 2353758 zr3616_2_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-TE
  244808 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-TE
  334790 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-TE
 2329821 zr3616_3_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-unMeth.bed-TE
  252579 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-Meth.bed-TE
  365887 zr3616_4_R1_val_1_val_1_val_1_bismark_bt2_pe._5x.bedgraph.NO-SNPs-sparseMeth.bed-TE
 2452141 zr36

In [71]:
#Get line counts for each fine
# Remove 28th line (total entries)
#Ensure output is tab-delimited
#Save output
!wc -l *-TE \
| sed '28,$ d' \
| awk '{print $1"\t"$2}' \
> zr3616_5x-TE-counts.txt