# DML and DMR Analysis

In this notebook, I will examine the location of differentially methylated loci (DML) and regions (DMR) in the *C. virginica* genome. The DML and DMR were identified using methylKit in [this R script](https://github.com/fish546-2018/yaamini-virginica/tree/master/analyses/2018-10-25-MethylKit).

Methods:

1. Prepare for Analyses
2. Locate Files and Set Variable Paths
3. Identify Overlaps between Genomic Feature Tracks
4. Gene Flanking

## 0. Prepare for Analyses

### 0a. Set Working Directory

In [1]:
pwd

'/Users/yaamini/Documents/yaamini-virginica/notebooks'

In [2]:
cd ../analyses/

/Users/yaamini/Documents/yaamini-virginica/analyses


In [1]:
!mkdir 2018-11-01-DML-and-DMR-Analysis

In [4]:
ls -F

[34m2018-10-25-MethylKit[m[m/                [34m2019-01-15-Sample-Clustering[m[m/
[34m2018-11-01-DML-and-DMR-Analysis[m[m/     [34m2019-03-07-IGV-Verification[m[m/
[34m2018-12-02-Gene-Enrichment-Analysis[m[m/ README.md


In [3]:
cd 2018-11-01-DML-and-DMR-Analysis/

/Users/yaamini/Documents/yaamini-virginica/analyses/2018-11-01-DML-and-DMR-Analysis


### 0b. Download Genome Feature Files

I will be using the following tracks:

1. Exon: Coding regions
2. Intron: Regions that are removed
3. Genes: This includes exons and introns, as well as constituent mRNA.
4. Transposable elements (all): Transposable elements located using information all species in the RepeatMasker databse (see [Sam's notes](http://onsnetwork.org/kubu4/2018/08/28/transposable-element-mapping-crassostrea-virginica-genome-cvirginica_v300-using-repeatmasker-4-07/) for more information)
5. Tranpsosable elements (_C. gigas_): Transposable elements located using information from _C. gigas_ only (see [Sam's notes](http://onsnetwork.org/kubu4/2018/08/28/transposable-element-mapping-crassostrea-virginica-genome-cvirginica_v300-using-repeatmasker-4-07/) for more information)
4. CG motifs: Regions with CGs where methylation can occur

In [None]:
!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_exon_sorted_yrv.bed > C_virginica-3.0_Gnomon_exon_sorted_yrv.bed

In [None]:
!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_intron_yrv.bed > C_virginica-3.0_Gnomon_intron_yrv.bed

In [None]:
!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_gene_sorted_yrv.bed > C_virginica-3.0_Gnomon_gene_sorted_yrv.bed

In [9]:
!curl http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-all.gff > C_virginica-3.0_TE-all.gff

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 63.0M  100 63.0M    0     0  45.2M      0  0:00:01  0:00:01 --:--:-- 45.3M


In [10]:
!curl http://owl.fish.washington.edu/halfshell/genomic-databank/C_virginica-3.0_TE-Cg.gff > C_virginica-3.0_TE-Cg.gff

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 57.4M  100 57.4M    0     0  47.4M      0  0:00:01  0:00:01 --:--:-- 47.5M


In [None]:
!curl https://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2019-05-13-Yaamini-Virginica-Repository/analyses/2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_mRNA_yrv.bed >  C_virginica-3.0_Gnomon_mRNA_yrv.bed

In [104]:
!ls C_virginica*

C_virginica-3.0_CG-motif.bed
C_virginica-3.0_CG-motif.bed.idx
[31mC_virginica-3.0_Gnomon_exon_sorted_yrv.bed[m[m
[31mC_virginica-3.0_Gnomon_gene_sorted_yrv.bed[m[m
[31mC_virginica-3.0_Gnomon_intron_yrv.bed[m[m
[31mC_virginica-3.0_Gnomon_mRNA_yrv.bed[m[m
C_virginica-3.0_TE-Cg.gff
C_virginica-3.0_TE-all.gff


## 1. Locate Relevant Files and Set Variable Path Names

### 1a. Set Variable Path Names

Setting the variable path names allows me to reuse this script with different input files or different paths to programs without manually changing the file names each time.

In [4]:
bedtoolsDirectory = "/Users/Shared/bioinformatics/bedtools2/bin/"

In [5]:
DMLlist = "../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations.bed"

In [6]:
hyperDML = "../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypermethylated.bed"

In [7]:
hypoDML = "../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypomethylated.bed"

In [5]:
DMLBackground = "../2018-10-25-MethylKit/2019-05-14-Methylation-Information-Filtered-Destrand-Cov5.bed"

In [8]:
DMRlist = "../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Locations.bed"

In [9]:
hyperDMR = "../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypermethylated.bed"

In [10]:
hypoDMR = "../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypomethylated.bed"

In [11]:
DMRBackground = "../../analyses/2018-10-25-MethylKit/2019-06-05-Methylation-Information-Filtered-Destrand-Cov5-Tiles100.bed"

In [10]:
exonList = "C_virginica-3.0_Gnomon_exon_sorted_yrv.bed"

In [11]:
intronList = "C_virginica-3.0_Gnomon_intron_yrv.bed"

In [12]:
geneList = "C_virginica-3.0_Gnomon_gene_sorted_yrv.bed"

In [13]:
transposableElementsAll = "C_virginica-3.0_TE-all.gff"

In [14]:
transposableElementsCg = "C_virginica-3.0_TE-Cg.gff"

In [15]:
CGMotifList = "C_virginica-3.0_CG-motif.bed"

In [6]:
mRNAList = "C_virginica-3.0_Gnomon_mRNA_yrv.bed"

### 1b. Confirm Variable Path Works and Characterize Files

The BEDfiles with DML and DMR can be viewed below. Columns are are the chromosome, start position, end position, strand, and fold difference with direction. The files only have DML and DMR that were at least 50% different between the two treatments (control and elevated pCO<sub>2</sub>).

In [14]:
#Previewing the files
!head {DMLlist}

NC_035780.1	571138	571140	58
NC_035780.1	1882691	1882693	64
NC_035780.1	1885022	1885024	61
NC_035780.1	1933499	1933501	51
NC_035780.1	1958998	1959000	50
NC_035780.1	2538924	2538926	-50
NC_035780.1	2541726	2541728	-54
NC_035780.1	2584492	2584494	56
NC_035780.1	2586508	2586510	-53
NC_035780.1	2588794	2588796	-53


In [13]:
#Counting the number of lines to count DML
!wc -l {DMLlist}

     598 ../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations.bed


In [16]:
!wc -l {hyperDML}

     310 ../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypermethylated.bed


In [18]:
!head {hyperDML}

NC_035780.1	401630	401632	53
NC_035780.1	571138	571140	58
NC_035780.1	1882691	1882693	64
NC_035780.1	1885022	1885024	61
NC_035780.1	1933499	1933501	51
NC_035780.1	2584492	2584494	56
NC_035780.1	2589720	2589722	57
NC_035780.1	4286286	4286288	67
NC_035780.1	8833124	8833126	60
NC_035780.1	12631453	12631455	60


In [17]:
!wc -l {hypoDML}

     288 ../../analyses/2018-10-25-MethylKit/2019-04-05-DML-Destrand-5x-Locations-Hypomethylated.bed


In [19]:
!head {hypoDML}

NC_035780.1	2538924	2538926	-50
NC_035780.1	2541726	2541728	-54
NC_035780.1	2586508	2586510	-53
NC_035780.1	4286802	4286804	-62
NC_035780.1	4288213	4288215	-58
NC_035780.1	4289628	4289630	-52
NC_035780.1	8693287	8693289	-52
NC_035780.1	9110274	9110276	-63
NC_035780.1	17093218	17093220	-52
NC_035780.1	17488958	17488960	-57


In [15]:
!head {DMRlist}

NC_035780.1	571100	571200	DMR	58
NC_035780.1	1885000	1885100	DMR	50
NC_035780.1	1933500	1933600	DMR	53
NC_035780.1	2538900	2539000	DMR	-50
NC_035780.1	22276700	22276800	DMR	56
NC_035780.1	28563400	28563500	DMR	61
NC_035780.1	31302900	31303000	DMR	-60
NC_035780.1	35969100	35969200	DMR	-53
NC_035780.1	38236400	38236500	DMR	50
NC_035781.1	5386400	5386500	DMR	51


In [16]:
!wc -l {DMRlist}

      71 ../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Locations.bed


In [17]:
!head {hyperDMR}

NC_035780.1	571100	571201	58
NC_035780.1	1885000	1885101	50
NC_035780.1	1933500	1933601	53
NC_035780.1	22276700	22276801	56
NC_035780.1	28563400	28563501	61
NC_035780.1	38236400	38236501	50
NC_035781.1	5386400	5386501	51
NC_035781.1	24474500	24474601	53
NC_035781.1	43942600	43942701	52
NC_035781.1	45110100	45110201	71


In [18]:
!wc -l {hyperDMR}

      37 ../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypermethylated.bed


In [19]:
!head {hypoDMR}

NC_035780.1	2538900	2539001	-50
NC_035780.1	31302900	31303001	-60
NC_035780.1	35969100	35969201	-53
NC_035781.1	7626500	7626601	-56
NC_035781.1	13281000	13281101	-57
NC_035781.1	20126000	20126101	-52
NC_035781.1	30789600	30789701	-57
NC_035781.1	43054100	43054201	-60
NC_035781.1	45110200	45110301	-51
NC_035781.1	59605700	59605801	-54


In [20]:
!wc -l {hypoDMR}

      34 ../../analyses/2018-10-25-MethylKit/2019-06-05-DMR-Destrand-5x-Locations-Tiles100-Hypomethylated.bed


In [13]:
!head {DMRBackground}

NC_007175.2	101	200	*
NC_007175.2	601	700	*
NC_007175.2	1501	1600	*
NC_007175.2	2201	2300	*
NC_007175.2	3301	3400	*
NC_007175.2	4801	4900	*
NC_007175.2	5301	5400	*
NC_007175.2	5401	5500	*
NC_007175.2	5501	5600	*
NC_007175.2	6001	6100	*


In [14]:
!wc -l {DMRBackground}

  152226 ../../analyses/2018-10-25-MethylKit/2019-06-05-Methylation-Information-Filtered-Destrand-Cov5-Tiles100.bed


In [17]:
!head {exonList}

NC_035780.1	13578	13603
NC_035780.1	14237	14290
NC_035780.1	14557	14594
NC_035780.1	28961	29073
NC_035780.1	30524	31557
NC_035780.1	31736	31887
NC_035780.1	31977	32565
NC_035780.1	32959	33324
NC_035780.1	43111	44358
NC_035780.1	43111	44358


In [18]:
!wc -l {exonList}

  731279 C_virginica-3.0_Gnomon_exon_sorted_yrv.bed


In [21]:
!head {intronList}

NC_035780.1	13603	14236
NC_035780.1	14290	14556
NC_035780.1	29073	30523
NC_035780.1	31557	31735
NC_035780.1	31887	31976
NC_035780.1	32565	32958
NC_035780.1	44358	45912
NC_035780.1	46506	64122
NC_035780.1	64334	66868
NC_035780.1	85777	88422


In [22]:
!wc -l {intronList}

  316614 C_virginica-3.0_Gnomon_intron_yrv.bed


In [24]:
!head {geneList}

NC_035780.1	13578	14594
NC_035780.1	28961	33324
NC_035780.1	43111	66897
NC_035780.1	85606	95254
NC_035780.1	99840	106460
NC_035780.1	108305	110077
NC_035780.1	151859	157536
NC_035780.1	163809	183798
NC_035780.1	164820	166793
NC_035780.1	169468	170178


In [25]:
!wc -l {geneList}

   38929 C_virginica-3.0_Gnomon_gene_sorted_yrv.bed


In [27]:
!head {transposableElementsAll}

##gff-version 2
##date 2018-08-23
##sequence-region Cvirginica_v300.fa
NC_007175.2	RepeatMasker	similarity	262	1389	31.1	+	.	Target "Motif:REP-6_LMi" 2920 4055
NC_007175.2	RepeatMasker	similarity	1728	1947	26.1	-	.	Target "Motif:REP-6_LMi" 14320 14534
NC_007175.2	RepeatMasker	similarity	1866	2013	33.6	+	.	Target "Motif:LSU-rRNA_Cel" 2372 2520
NC_007175.2	RepeatMasker	similarity	2129	2367	20.5	-	.	Target "Motif:REP-6_LMi" 13886 14118
NC_007175.2	RepeatMasker	similarity	2836	2980	31.5	+	.	Target "Motif:REP-6_LMi" 6216 6359
NC_007175.2	RepeatMasker	similarity	3196	3277	30.5	+	.	Target "Motif:REP-6_LMi" 6572 6653
NC_007175.2	RepeatMasker	similarity	5168	5532	32.9	+	.	Target "Motif:REP-6_LMi" 4620 4983


In [29]:
!wc -l {transposableElementsAll}

  692371 C_virginica-3,0_TE-all.gff


In [30]:
!head {transposableElementsCg}

##gff-version 2
##date 2018-08-27
##sequence-region Cvirginica_v300.fa
NC_007175.2	RepeatMasker	similarity	1866	2013	33.6	+	.	Target "Motif:LSU-rRNA_Cel" 2372 2520
NC_007175.2	RepeatMasker	similarity	6529	6628	19.0	+	.	Target "Motif:(TA)n" 2 102
NC_035780.1	RepeatMasker	similarity	1473	1535	 0.0	+	.	Target "Motif:(TAACCC)n" 1 63
NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	RepeatMasker	similarity	7423	7489	25.4	-	.	Target "Motif:Gypsy-62_CGi-I" 2097 2163
NC_035780.1	RepeatMasker	similarity	7623	8079	34.1	-	.	Target "Motif:Gypsy-62_CGi-I" 1516 1975
NC_035780.1	RepeatMasker	similarity	8261	8295	14.1	+	.	Target "Motif:(CTCCT)n" 1 33


In [31]:
!wc -l {transposableElementsCg}

  626665 C_virginica-3.0_TE-Cg.gff


In [25]:
!head {CGMotifList}

NC_035780.1	28	30	CG_motif
NC_035780.1	54	56	CG_motif
NC_035780.1	75	77	CG_motif
NC_035780.1	93	95	CG_motif
NC_035780.1	103	105	CG_motif
NC_035780.1	116	118	CG_motif
NC_035780.1	134	136	CG_motif
NC_035780.1	159	161	CG_motif
NC_035780.1	209	211	CG_motif
NC_035780.1	224	226	CG_motif


In [26]:
!wc -l {CGMotifList}

 14458703 C_virginica-3.0_CG-motif.bed


In [107]:
!head {mRNAList}

NC_035780.1	28961	33324
NC_035780.1	43111	66897
NC_035780.1	43111	46506
NC_035780.1	85606	95254
NC_035780.1	99840	106460
NC_035780.1	108305	110077
NC_035780.1	151859	157536
NC_035780.1	163809	183798
NC_035780.1	164820	166793
NC_035780.1	190449	193594


In [109]:
!wc -l {mRNAList}

   60201 C_virginica-3.0_Gnomon_mRNA_yrv.bed


## 2. Identify DML and DMR Overlaps with Genomic Feature Tracks

To identify the location of DML and DMR in the *C. virginica* genome, I will use `intersect` from `bedtools`. [The BEDtools suite](http://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html) allows me to easily find overlapping regions of different bed files.

In [36]:
! {bedtoolsDirectory}intersectBed -h


Tool:    bedtools intersect (aka intersectBed)
Version: v2.26.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlapping features restricted by -f 

### 2a. Exons

#### All DML

In [26]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMLlist} \
-b {exonList} \
| wc -l
!echo "DML overlaps with exons"

     368
DML overlaps with exons


In [27]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMLlist} \
-b {exonList} \
> 2019-05-29-DML-Exon.txt

In [28]:
!head 2019-05-29-DML-Exon.txt

NC_035780.1	571138	571140	58	NC_035780.1	570942	571194
NC_035780.1	2538924	2538926	-50	NC_035780.1	2538624	2538955
NC_035780.1	2586508	2586510	-53	NC_035780.1	2586438	2586557
NC_035780.1	2589720	2589722	57	NC_035780.1	2589716	2589955
NC_035780.1	4286286	4286288	67	NC_035780.1	4286174	4286407
NC_035780.1	4286802	4286804	-62	NC_035780.1	4286783	4286927
NC_035780.1	4289628	4289630	-52	NC_035780.1	4288592	4290756
NC_035780.1	8693287	8693289	-52	NC_035780.1	8692509	8693320
NC_035780.1	9110274	9110276	-63	NC_035780.1	9109982	9111843
NC_035780.1	12631453	12631455	60	NC_035780.1	12630576	12631487


#### Hypermethylated DML

In [29]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDML} \
-b {exonList} \
| wc -l
!echo "hypermethylated DML overlaps with exons"

     190
hypermethylated DML overlaps with exons


In [30]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDML} \
-b {exonList} \
> 2019-05-29-Hypermethylated-DML-Exon.txt

In [31]:
!head 2019-05-29-Hypermethylated-DML-Exon.txt

NC_035780.1	571138	571140	58	NC_035780.1	570942	571194
NC_035780.1	2589720	2589722	57	NC_035780.1	2589716	2589955
NC_035780.1	4286286	4286288	67	NC_035780.1	4286174	4286407
NC_035780.1	12631453	12631455	60	NC_035780.1	12630576	12631487
NC_035780.1	12631453	12631455	60	NC_035780.1	12630576	12631487
NC_035780.1	12631453	12631455	60	NC_035780.1	12630577	12631487
NC_035780.1	12631453	12631455	60	NC_035780.1	12630577	12631487
NC_035780.1	15412264	15412266	50	NC_035780.1	15412219	15412410
NC_035780.1	15412264	15412266	50	NC_035780.1	15412219	15412410
NC_035780.1	15414935	15414936	51	NC_035780.1	15414935	15415225


#### Hypomethylated DML

In [32]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDML} \
-b {exonList} \
| wc -l
!echo "hypomethylated DML overlaps with exons"

     178
hypomethylated DML overlaps with exons


In [33]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDML} \
-b {exonList} \
> 2019-05-29-Hypomethylated-DML-Exon.txt

In [34]:
!head 2019-05-29-Hypomethylated-DML-Exon.txt

NC_035780.1	2538924	2538926	-50	NC_035780.1	2538624	2538955
NC_035780.1	2586508	2586510	-53	NC_035780.1	2586438	2586557
NC_035780.1	4286802	4286804	-62	NC_035780.1	4286783	4286927
NC_035780.1	4289628	4289630	-52	NC_035780.1	4288592	4290756
NC_035780.1	8693287	8693289	-52	NC_035780.1	8692509	8693320
NC_035780.1	9110274	9110276	-63	NC_035780.1	9109982	9111843
NC_035780.1	17093218	17093220	-52	NC_035780.1	17092983	17093548
NC_035780.1	19149580	19149582	-61	NC_035780.1	19149513	19149749
NC_035780.1	19149580	19149582	-61	NC_035780.1	19149513	19149749
NC_035780.1	19149580	19149582	-61	NC_035780.1	19149513	19150486


#### All DMR

In [21]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRlist} \
-b {exonList} \
| wc -l
!echo "DMR overlaps with exons"

      38
DMR overlaps with exons


In [22]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRlist} \
-b {exonList} \
> 2019-06-05-DMR-Exon.txt

In [23]:
!head 2019-06-05-DMR-Exon.txt

NC_035780.1	571100	571194	DMR	58	NC_035780.1	570942	571194
NC_035780.1	1933574	1933600	DMR	53	NC_035780.1	1933574	1933615
NC_035780.1	2538900	2538955	DMR	-50	NC_035780.1	2538624	2538955
NC_035780.1	22276700	22276800	DMR	56	NC_035780.1	22275427	22278631
NC_035780.1	22276700	22276800	DMR	56	NC_035780.1	22275427	22278631
NC_035781.1	5386400	5386493	DMR	51	NC_035781.1	5386310	5386493
NC_035781.1	5386400	5386493	DMR	51	NC_035781.1	5386310	5386493
NC_035781.1	7626500	7626546	DMR	-56	NC_035781.1	7626417	7626546
NC_035781.1	13281000	13281010	DMR	-57	NC_035781.1	13280898	13281010
NC_035781.1	20126000	20126100	DMR	-52	NC_035781.1	20125936	20126403


#### Hypermethylated DMR

In [24]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDMR} \
-b {exonList} \
| wc -l
!echo "hyper DMR overlaps with exons"

      19
hyper DMR overlaps with exons


In [25]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDMR} \
-b {exonList} \
> 2019-06-05-HyperDMR-Exon.txt

In [26]:
!head 2019-06-05-HyperDMR-Exon.txt

NC_035780.1	571100	571194	58	NC_035780.1	570942	571194
NC_035780.1	1933574	1933601	53	NC_035780.1	1933574	1933615
NC_035780.1	22276700	22276801	56	NC_035780.1	22275427	22278631
NC_035780.1	22276700	22276801	56	NC_035780.1	22275427	22278631
NC_035781.1	5386400	5386493	51	NC_035781.1	5386310	5386493
NC_035781.1	5386400	5386493	51	NC_035781.1	5386310	5386493
NC_035781.1	24474587	24474601	53	NC_035781.1	24474587	24474788
NC_035781.1	45110100	45110157	71	NC_035781.1	45109859	45110157
NC_035783.1	19526000	19526101	50	NC_035783.1	19525785	19526784
NC_035783.1	19526000	19526101	50	NC_035783.1	19525785	19526784


#### Hypomethylated DMR

In [27]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDMR} \
-b {exonList} \
| wc -l
!echo "hypo DMR overlaps with exons"

      19
hypo DMR overlaps with exons


In [28]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDMR} \
-b {exonList} \
> 2019-06-05-HypoDMR-Exon.txt

In [29]:
!head 2019-06-05-HypoDMR-Exon.txt

NC_035780.1	2538900	2538955	-50	NC_035780.1	2538624	2538955
NC_035781.1	7626500	7626546	-56	NC_035781.1	7626417	7626546
NC_035781.1	13281000	13281010	-57	NC_035781.1	13280898	13281010
NC_035781.1	20126000	20126101	-52	NC_035781.1	20125936	20126403
NC_035781.1	30789600	30789701	-57	NC_035781.1	30789514	30790310
NC_035781.1	43054149	43054201	-60	NC_035781.1	43054149	43054333
NC_035783.1	38039050	38039101	-51	NC_035783.1	38039050	38039211
NC_035783.1	38039050	38039101	-51	NC_035783.1	38039050	38039211
NC_035783.1	38039050	38039101	-51	NC_035783.1	38039050	38039211
NC_035783.1	58980000	58980101	-61	NC_035783.1	58979191	58980564


#### DMR Background

In [15]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRBackground} \
-b {exonList} \
| wc -l
!echo "DMR background overlaps with exons"

   92552
DMR background overlaps with exons


In [16]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRBackground} \
-b {exonList} \
> 2019-06-05-DMRBackground-Exon.txt

In [17]:
!head 2019-06-05-DMRBackground-Exon.txt

NC_035780.1	100554	100600	*	NC_035780.1	100554	100661
NC_035780.1	100601	100661	*	NC_035780.1	100554	100661
NC_035780.1	250301	250400	*	NC_035780.1	250285	250608
NC_035780.1	250401	250500	*	NC_035780.1	250285	250608
NC_035780.1	250501	250600	*	NC_035780.1	250285	250608
NC_035780.1	250601	250608	*	NC_035780.1	250285	250608
NC_035780.1	258108	258200	*	NC_035780.1	258108	259494
NC_035780.1	258201	258300	*	NC_035780.1	258108	259494
NC_035780.1	258301	258400	*	NC_035780.1	258108	259494
NC_035780.1	258901	259000	*	NC_035780.1	258108	259494


### 2b. Introns

#### DML

In [36]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMLlist} \
-b {intronList} \
| wc -l
!echo "DML overlaps with introns"

     192
DML overlaps with introns


In [37]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMLlist} \
-b {intronList} \
> 2019-06-05-DML-Intron.txt

In [38]:
!head 2019-06-05-DML-Intron.txt

NC_035780.1	401630	401632	53	NC_035780.1	401604	401800
NC_035780.1	1882691	1882693	64	NC_035780.1	1882355	1882971
NC_035780.1	1885022	1885024	61	NC_035780.1	1884754	1886042
NC_035780.1	1933499	1933501	51	NC_035780.1	1932876	1933573
NC_035780.1	2541726	2541728	-54	NC_035780.1	2538955	2541768
NC_035780.1	2584492	2584494	56	NC_035780.1	2584153	2584504
NC_035780.1	4288213	4288215	-58	NC_035780.1	4288128	4288230
NC_035780.1	8833124	8833126	60	NC_035780.1	8832171	8833699
NC_035780.1	17488958	17488960	-57	NC_035780.1	17488942	17489178
NC_035780.1	22177828	22177830	-51	NC_035780.1	22154686	22178240


#### Hypermethylated DML

In [39]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDML} \
-b {intronList} \
| wc -l
!echo "hypermethylated DML overlaps with introns"

      99
hypermethylated DML overlaps with introns


In [40]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDML} \
-b {intronList} \
> 2019-06-05-Hypermethylated-DML-Intron.txt

In [41]:
!head 2019-05-29-Hypermethylated-DML-Intron.txt

NC_035780.1	401630	401632	53	NC_035780.1	401604	401800
NC_035780.1	1882691	1882693	64	NC_035780.1	1882355	1882971
NC_035780.1	1885022	1885024	61	NC_035780.1	1884754	1886042
NC_035780.1	1933499	1933501	51	NC_035780.1	1932876	1933573
NC_035780.1	2584492	2584494	56	NC_035780.1	2584153	2584504
NC_035780.1	8833124	8833126	60	NC_035780.1	8832171	8833699
NC_035780.1	27396182	27396184	52	NC_035780.1	27396140	27396706
NC_035780.1	32766797	32766799	58	NC_035780.1	32766346	32769863
NC_035780.1	32766797	32766799	58	NC_035780.1	32766346	32769863
NC_035780.1	38236493	38236495	50	NC_035780.1	38236121	38236506


#### Hypomethylated DML

In [42]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDML} \
-b {intronList} \
| wc -l
!echo "hypomethylated DML overlaps with introns"

      93
hypomethylated DML overlaps with introns


In [43]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDML} \
-b {intronList} \
> 2019-05-29-Hypomethylated-DML-Intron.txt

In [44]:
!head 2019-05-29-Hypomethylated-DML-Intron.txt

NC_035780.1	2541726	2541728	-54	NC_035780.1	2538955	2541768
NC_035780.1	4288213	4288215	-58	NC_035780.1	4288128	4288230
NC_035780.1	17488958	17488960	-57	NC_035780.1	17488942	17489178
NC_035780.1	22177828	22177830	-51	NC_035780.1	22154686	22178240
NC_035780.1	22177828	22177830	-51	NC_035780.1	22154686	22178240
NC_035780.1	25858297	25858299	-51	NC_035780.1	25858281	25863048
NC_035780.1	31302904	31302906	-60	NC_035780.1	31302841	31303151
NC_035780.1	31302934	31302936	-58	NC_035780.1	31302841	31303151
NC_035780.1	32717030	32717032	-52	NC_035780.1	32716795	32717179
NC_035780.1	35969128	35969130	-53	NC_035780.1	35969070	35986498


#### DMR

In [30]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRlist} \
-b {intronList} \
| wc -l
!echo "DMR overlaps with introns"

      51
DMR overlaps with introns


In [31]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRlist} \
-b {intronList} \
> 2019-06-05-DMR-Intron.txt

In [32]:
!head 2019-06-05-DMR-Intron.txt

NC_035780.1	571194	571200	DMR	58	NC_035780.1	571194	572676
NC_035780.1	1885000	1885100	DMR	50	NC_035780.1	1884754	1886042
NC_035780.1	1933500	1933573	DMR	53	NC_035780.1	1932876	1933573
NC_035780.1	2538955	2539000	DMR	-50	NC_035780.1	2538955	2541768
NC_035780.1	28563400	28563500	DMR	61	NC_035780.1	28563399	28564615
NC_035780.1	31302900	31303000	DMR	-60	NC_035780.1	31302841	31303151
NC_035780.1	35969100	35969200	DMR	-53	NC_035780.1	35969070	35986498
NC_035780.1	38236400	38236500	DMR	50	NC_035780.1	38236121	38236506
NC_035781.1	5386493	5386500	DMR	51	NC_035781.1	5386493	5386634
NC_035781.1	7626546	7626600	DMR	-56	NC_035781.1	7626546	7626816


#### Hypermethylated DMR

In [33]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDMR} \
-b {intronList} \
| wc -l
!echo "hyperDMR overlaps with introns"

      27
hyperDMR overlaps with introns


In [34]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDMR} \
-b {intronList} \
> 2019-06-05-HyperDMR-Intron.txt

In [35]:
!head 2019-06-05-HyperDMR-Intron.txt

NC_035780.1	571194	571201	58	NC_035780.1	571194	572676
NC_035780.1	1885000	1885101	50	NC_035780.1	1884754	1886042
NC_035780.1	1933500	1933573	53	NC_035780.1	1932876	1933573
NC_035780.1	28563400	28563501	61	NC_035780.1	28563399	28564615
NC_035780.1	38236400	38236501	50	NC_035780.1	38236121	38236506
NC_035781.1	5386493	5386501	51	NC_035781.1	5386493	5386634
NC_035781.1	24474500	24474586	53	NC_035781.1	24473564	24474586
NC_035781.1	43942600	43942701	52	NC_035781.1	43940334	43944055
NC_035781.1	45110157	45110201	71	NC_035781.1	45110157	45110508
NC_035781.1	53358700	53358801	56	NC_035781.1	53358683	53358814


#### Hypomethylated DMR

In [36]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDMR} \
-b {intronList} \
| wc -l
!echo "hypoDMR overlaps with introns"

      24
hypoDMR overlaps with introns


In [37]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDMR} \
-b {intronList} \
> 2019-06-05-HypoDMR-Intron.txt

In [38]:
!head 2019-06-05-HypoDMR-Intron.txt

NC_035780.1	2538955	2539001	-50	NC_035780.1	2538955	2541768
NC_035780.1	31302900	31303001	-60	NC_035780.1	31302841	31303151
NC_035780.1	35969100	35969201	-53	NC_035780.1	35969070	35986498
NC_035781.1	7626546	7626601	-56	NC_035781.1	7626546	7626816
NC_035781.1	7626546	7626601	-56	NC_035781.1	7626546	7626816
NC_035781.1	13281010	13281101	-57	NC_035781.1	13281010	13281749
NC_035781.1	43054100	43054148	-60	NC_035781.1	43054064	43054148
NC_035781.1	45110200	45110301	-51	NC_035781.1	45110157	45110508
NC_035781.1	59605700	59605801	-54	NC_035781.1	59605698	59605807
NC_035783.1	38039000	38039049	-51	NC_035783.1	38038913	38039049


#### DMR Background

In [18]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRBackground} \
-b {intronList} \
| wc -l
!echo "DMR background overlaps with introns"

   93707
DMR background overlaps with introns


In [19]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRBackground} \
-b {intronList} \
> 2019-06-05-DMRBackground-Intron.txt

In [20]:
!head 2019-06-05-DMRBackground-Intron.txt

NC_035780.1	100501	100553	*	NC_035780.1	100122	100553
NC_035780.1	100661	100700	*	NC_035780.1	100661	104928
NC_035780.1	103201	103300	*	NC_035780.1	100661	104928
NC_035780.1	250608	250700	*	NC_035780.1	250608	252746
NC_035780.1	250701	250800	*	NC_035780.1	250608	252746
NC_035780.1	259494	259500	*	NC_035780.1	259494	261477
NC_035780.1	259801	259900	*	NC_035780.1	259494	261477
NC_035780.1	260001	260100	*	NC_035780.1	259494	261477
NC_035780.1	260101	260200	*	NC_035780.1	259494	261477
NC_035780.1	260401	260500	*	NC_035780.1	259494	261477


### 2c. Genes

#### DML

In [45]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMLlist} \
-b {geneList} \
| wc -l
!echo "DML overlaps with genes"

     560
DML overlaps with genes


In [55]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMLlist} \
-b {geneList} \
> 2019-05-29-DML-Genes.txt

In [56]:
!head 2019-05-29-DML-Genes.txt

NC_035780.1	401630	401632	53	NC_035780.1	394983	409280
NC_035780.1	571138	571140	58	NC_035780.1	544088	573497
NC_035780.1	1882691	1882693	64	NC_035780.1	1882143	1890106
NC_035780.1	1885022	1885024	61	NC_035780.1	1882143	1890106
NC_035780.1	1933499	1933501	51	NC_035780.1	1928718	1940217
NC_035780.1	2538924	2538926	-50	NC_035780.1	2524425	2553408
NC_035780.1	2541726	2541728	-54	NC_035780.1	2524425	2553408
NC_035780.1	2584492	2584494	56	NC_035780.1	2554181	2599559
NC_035780.1	2586508	2586510	-53	NC_035780.1	2554181	2599559
NC_035780.1	2589720	2589722	57	NC_035780.1	2554181	2599559


I know how many overlaps there are, but I also want to know how many unique genes have DMLs in them. For this, I will use the following code:

`cut -f7 2019-05-29-DML-Genes.txt | sort | uniq -c`

`cut` is the command that isolates the column information. Each gene has a unique end position, so I'll look at unique entries in the seventh column (`-f7`). The column is piped into `sort`, then that output is counted for unique lines by `uniq`. Finally, I'll pipe this into `wc -l` to count the number of unique genes.

In [57]:
! cut -f7 2019-05-29-DML-Genes.txt | sort | uniq -c | wc -l
!echo "unique genes overlapping with DML"

     481
unique genes overlapping with DML


#### Hypermethylated DML

In [58]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDML} \
-b {geneList} \
| wc -l
!echo "hypermethylated DML overlaps with genes"

     289
hypermethylated DML overlaps with genes


In [59]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDML} \
-b {geneList} \
> 2019-05-29-Hypermethylated-DML-Genes.txt

In [60]:
!head 2019-05-29-Hypermethylated-DML-Genes.txt

NC_035780.1	401630	401632	53	NC_035780.1	394983	409280
NC_035780.1	571138	571140	58	NC_035780.1	544088	573497
NC_035780.1	1882691	1882693	64	NC_035780.1	1882143	1890106
NC_035780.1	1885022	1885024	61	NC_035780.1	1882143	1890106
NC_035780.1	1933499	1933501	51	NC_035780.1	1928718	1940217
NC_035780.1	2584492	2584494	56	NC_035780.1	2554181	2599559
NC_035780.1	2589720	2589722	57	NC_035780.1	2554181	2599559
NC_035780.1	4286286	4286288	67	NC_035780.1	4282771	4298209
NC_035780.1	8833124	8833126	60	NC_035780.1	8829533	8833841
NC_035780.1	12631453	12631455	60	NC_035780.1	12630576	12697104


In [63]:
! cut -f7 2019-05-29-Hypermethylated-DML-Genes.txt | sort | uniq -c | wc -l
!echo "unique genes overlapping with hypermethylated DML"

     269
unique genes overlapping with hypermethylated DML


#### Hypomethylated DML

In [64]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDML} \
-b {geneList} \
| wc -l
!echo "hypomethylated DML overlaps with genes"

     271
hypomethylated DML overlaps with genes


In [65]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDML} \
-b {geneList} \
> 2019-05-29-Hypomethylated-DML-mRNA.txt

In [66]:
!head 2019-05-29-Hypomethylated-DML-mRNA.txt

NC_035780.1	2538924	2538926	-50	NC_035780.1	2524425	2553408
NC_035780.1	2541726	2541728	-54	NC_035780.1	2524425	2553408
NC_035780.1	2586508	2586510	-53	NC_035780.1	2554181	2599559
NC_035780.1	4286802	4286804	-62	NC_035780.1	4282771	4298209
NC_035780.1	4288213	4288215	-58	NC_035780.1	4282771	4298209
NC_035780.1	4289628	4289630	-52	NC_035780.1	4282771	4298209
NC_035780.1	8693287	8693289	-52	NC_035780.1	8692509	8698183
NC_035780.1	9110274	9110276	-63	NC_035780.1	9103662	9111843
NC_035780.1	17093218	17093220	-52	NC_035780.1	17089706	17093548
NC_035780.1	17488958	17488960	-57	NC_035780.1	17457431	17541765


In [67]:
! cut -f7 2019-05-29-Hypomethylated-DML-mRNA.txt | sort | uniq -c | wc -l
!echo "unique genes overlapping with hypomethylated DML"

     241
unique genes overlapping with hypomethylated DML


#### DMR

In [40]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRlist} \
-b {geneList} \
| wc -l
!echo "DMR overlaps with genes"

      66
DMR overlaps with genes


In [41]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRlist} \
-b {geneList} \
> 2019-06-05-DMR-Genes.txt

In [42]:
!head 2019-06-05-DMR-Genes.txt

NC_035780.1	571100	571200	DMR	58	NC_035780.1	544088	573497
NC_035780.1	1885000	1885100	DMR	50	NC_035780.1	1882143	1890106
NC_035780.1	1933500	1933600	DMR	53	NC_035780.1	1928718	1940217
NC_035780.1	2538900	2539000	DMR	-50	NC_035780.1	2524425	2553408
NC_035780.1	22276700	22276800	DMR	56	NC_035780.1	22269635	22278631
NC_035780.1	28563400	28563500	DMR	61	NC_035780.1	28552157	28576101
NC_035780.1	31302900	31303000	DMR	-60	NC_035780.1	31295876	31307973
NC_035780.1	35969100	35969200	DMR	-53	NC_035780.1	35960923	35999467
NC_035780.1	38236400	38236500	DMR	50	NC_035780.1	38209799	38243110
NC_035781.1	5386400	5386500	DMR	51	NC_035781.1	5383711	5397505


In [44]:
! cut -f8 2019-06-05-DMR-Genes.txt | sort | uniq -c | wc -l
!echo "DMR overlaps with unique genes"

      65
DMR overlaps with unique genes


#### Hypermethylated DMR

In [45]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDMR} \
-b {geneList} \
| wc -l
!echo "hyperDMR overlaps with genes"

      33
hyperDMR overlaps with genes


In [46]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDMR} \
-b {geneList} \
> 2019-06-05-HyperDMR-Genes.txt

In [49]:
!head 2019-06-05-HyperDMR-Genes.txt

NC_035780.1	571100	571201	58	NC_035780.1	544088	573497
NC_035780.1	1885000	1885101	50	NC_035780.1	1882143	1890106
NC_035780.1	1933500	1933601	53	NC_035780.1	1928718	1940217
NC_035780.1	22276700	22276801	56	NC_035780.1	22269635	22278631
NC_035780.1	28563400	28563501	61	NC_035780.1	28552157	28576101
NC_035780.1	38236400	38236501	50	NC_035780.1	38209799	38243110
NC_035781.1	5386400	5386501	51	NC_035781.1	5383711	5397505
NC_035781.1	24474500	24474601	53	NC_035781.1	24468785	24491957
NC_035781.1	43942600	43942701	52	NC_035781.1	43936785	43944143
NC_035781.1	45110100	45110201	71	NC_035781.1	45108521	45113815


In [50]:
! cut -f7 2019-06-05-HyperDMR-Genes.txt | sort | uniq -c | wc -l
!echo "hyperDMR overlaps with unique genes"

      33
hyperDMR overlaps with unique genes


#### Hypomethylated DMR

In [51]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDMR} \
-b {geneList} \
| wc -l
!echo "hypoDMR overlaps with genes"

      33
hypoDMR overlaps with genes


In [52]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDMR} \
-b {geneList} \
> 2019-06-05-HypoDMR-Genes.txt

In [53]:
!head 2019-06-05-HypoDMR-Genes.txt

NC_035780.1	2538900	2539001	-50	NC_035780.1	2524425	2553408
NC_035780.1	31302900	31303001	-60	NC_035780.1	31295876	31307973
NC_035780.1	35969100	35969201	-53	NC_035780.1	35960923	35999467
NC_035781.1	7626500	7626601	-56	NC_035781.1	7589782	7641768
NC_035781.1	7626500	7626601	-56	NC_035781.1	7590843	7626904
NC_035781.1	13281000	13281101	-57	NC_035781.1	13268605	13286933
NC_035781.1	20126000	20126101	-52	NC_035781.1	20125441	20128033
NC_035781.1	30789600	30789701	-57	NC_035781.1	30781806	30790310
NC_035781.1	43054100	43054201	-60	NC_035781.1	43048000	43060502
NC_035781.1	45110200	45110301	-51	NC_035781.1	45108521	45113815


In [54]:
! cut -f7 2019-06-05-HypoDMR-Genes.txt | sort | uniq -c | wc -l
!echo "hyperDMR overlaps with unique genes"

      33
hyperDMR overlaps with unique genes


#### DMR Background

In [21]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRBackground} \
-b {geneList} \
| wc -l
!echo "DMR background overlaps with genes"

  142153
DMR background overlaps with genes


In [22]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRBackground} \
-b {geneList} \
> 2019-06-05-DMRBackground-Genes.txt

In [23]:
!head 2019-06-05-DMRBackground-Genes.txt

NC_035780.1	100501	100600	*	NC_035780.1	99840	106460
NC_035780.1	100601	100700	*	NC_035780.1	99840	106460
NC_035780.1	103201	103300	*	NC_035780.1	99840	106460
NC_035780.1	250301	250400	*	NC_035780.1	245532	253042
NC_035780.1	250401	250500	*	NC_035780.1	245532	253042
NC_035780.1	250501	250600	*	NC_035780.1	245532	253042
NC_035780.1	250601	250700	*	NC_035780.1	245532	253042
NC_035780.1	250701	250800	*	NC_035780.1	245532	253042
NC_035780.1	258108	258200	*	NC_035780.1	258108	272839
NC_035780.1	258201	258300	*	NC_035780.1	258108	272839


In [27]:
! cut -f7 2019-06-05-DMRBackground-Genes.txt | sort | uniq -c | wc -l
!echo "DMR background overlaps with unique genes"

   11578
DMR background overlaps with unique genes


### 2c. Transposable Elements (All)

#### DML

In [68]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMLlist} \
-b {transposableElementsAll} \
| wc -l
!echo "DML overlaps with transposable elements (all)"

      57
DML overlaps with transposable elements (all)


In [69]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMLlist} \
-b {transposableElementsAll} \
> 2019-05-29-DML-TE-all.txt

In [70]:
!head 2019-05-29-DML-TE-all.txt

NC_035780.1	8833124	8833126	60	NC_035780.1	RepeatMasker	similarity	8833042	8833288	18.2	-	.	Target "Motif:CVA" 1 272
NC_035780.1	22177828	22177830	-51	NC_035780.1	RepeatMasker	similarity	22177766	22177877	22.3	-	.	Target "Motif:DNA9-6_CGi" 1 115
NC_035780.1	57337100	57337102	-54	NC_035780.1	RepeatMasker	similarity	57337042	57337128	18.6	-	.	Target "Motif:DNA2-2_CGi" 413 498
NC_035780.1	58135767	58135769	74	NC_035780.1	RepeatMasker	similarity	58135699	58135837	22.4	+	.	Target "Motif:BivaMD-SINE1_CrVi" 169 314
NC_035781.1	22439769	22439771	53	NC_035781.1	RepeatMasker	similarity	22439740	22439796	28.1	+	.	Target "Motif:Mariner-6_AMi" 698 754
NC_035781.1	29178318	29178320	-55	NC_035781.1	RepeatMasker	similarity	29177336	29178341	16.0	-	.	Target "Motif:CVA" 2 863
NC_035781.1	54151548	54151550	54	NC_035781.1	RepeatMasker	similarity	54150482	54151750	14.3	+	.	Target "Motif:CVA" 1 1018
NC_035781.1	59742649	59742651	-65	NC_035781.1	RepeatMasker	similarity	59742603	59742651	 4.2	+	.	Targe

#### Hypermethylated DML

In [71]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDML} \
-b {transposableElementsAll} \
| wc -l
!echo "hypermethylated DML overlaps with TE (all)"

      26
hypermethylated DML overlaps with TE (all)


In [72]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDML} \
-b {transposableElementsAll} \
> 2019-05-29-Hypermethylated-DML-TEall.txt

In [73]:
!head 2019-05-29-Hypermethylated-DML-TEall.txt

NC_035780.1	8833124	8833126	60	NC_035780.1	RepeatMasker	similarity	8833042	8833288	18.2	-	.	Target "Motif:CVA" 1 272
NC_035780.1	58135767	58135769	74	NC_035780.1	RepeatMasker	similarity	58135699	58135837	22.4	+	.	Target "Motif:BivaMD-SINE1_CrVi" 169 314
NC_035781.1	22439769	22439771	53	NC_035781.1	RepeatMasker	similarity	22439740	22439796	28.1	+	.	Target "Motif:Mariner-6_AMi" 698 754
NC_035781.1	54151548	54151550	54	NC_035781.1	RepeatMasker	similarity	54150482	54151750	14.3	+	.	Target "Motif:CVA" 1 1018
NC_035782.1	45857195	45857197	52	NC_035782.1	RepeatMasker	similarity	45857026	45858123	32.7	+	.	Target "Motif:Mariner-21_LCh" 450 1999
NC_035782.1	53693367	53693369	61	NC_035782.1	RepeatMasker	similarity	53693299	53693466	19.6	+	.	Target "Motif:Crypton-N6B_CGi" 566 735
NC_035782.1	58675269	58675271	50	NC_035782.1	RepeatMasker	similarity	58675249	58675337	19.1	-	.	Target "Motif:DNA3-12_CGi" 290 378
NC_035782.1	61203970	61203972	51	NC_035782.1	RepeatMasker	similarity	61203541	61204

#### Hypomethylated DML

In [36]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDML} \
-b {transposableElementsAll} \
| wc -l
!echo "hypomethylated DML overlaps with TE (all)"

      31
hypomethylated DML overlaps with TE (all)


In [74]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDML} \
-b {transposableElementsAll} \
> 2019-05-29-Hypomethylated-DML-TEall.txt

In [75]:
!head 2019-05-29-Hypomethylated-DML-TEall.txt

NC_035780.1	22177828	22177830	-51	NC_035780.1	RepeatMasker	similarity	22177766	22177877	22.3	-	.	Target "Motif:DNA9-6_CGi" 1 115
NC_035780.1	57337100	57337102	-54	NC_035780.1	RepeatMasker	similarity	57337042	57337128	18.6	-	.	Target "Motif:DNA2-2_CGi" 413 498
NC_035781.1	29178318	29178320	-55	NC_035781.1	RepeatMasker	similarity	29177336	29178341	16.0	-	.	Target "Motif:CVA" 2 863
NC_035781.1	59742649	59742651	-65	NC_035781.1	RepeatMasker	similarity	59742603	59742651	 4.2	+	.	Target "Motif:(ACTAACG)n" 1 49
NC_035782.1	6685343	6685345	-68	NC_035782.1	RepeatMasker	similarity	6685308	6685646	15.0	+	.	Target "Motif:BivaMD-SINE1_CrVi" 1 335
NC_035782.1	6685349	6685351	-50	NC_035782.1	RepeatMasker	similarity	6685308	6685646	15.0	+	.	Target "Motif:BivaMD-SINE1_CrVi" 1 335
NC_035782.1	34498893	34498895	-55	NC_035782.1	RepeatMasker	similarity	34498501	34500091	24.8	+	.	Target "Motif:Helitron-N40_CGi" 1 1569
NC_035782.1	34498895	34498897	-71	NC_035782.1	RepeatMasker	similarity	34498501	3450

#### DMR

In [55]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRlist} \
-b {transposableElementsAll} \
| wc -l
!echo "DMR overlaps with transposable elements (all)"

      11
DMR overlaps with transposable elements (all)


In [37]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRlist} \
-b {transposableElementsAll} \
> 2019-06-05-DMR-TE-all.txt

In [38]:
!head 2019-06-05-DMR-TE-all.txt

NC_035781.1	54151500	54151600	DMR	54	NC_035781.1	RepeatMasker	similarity	54150482	54151750	14.3	+	.	Target "Motif:CVA" 1 1018
NC_035783.1	30386535	30386600	DMR	52	NC_035783.1	RepeatMasker	similarity	30386536	30387049	25.2	+	.	Target "Motif:Kolobok-N4_CGi" 1 497
NC_035784.1	41345100	41345105	DMR	-51	NC_035784.1	RepeatMasker	similarity	41345048	41345105	 6.9	+	.	Target "Motif:Helitron-10_CGi" 282 358
NC_035784.1	41345184	41345200	DMR	-51	NC_035784.1	RepeatMasker	similarity	41345185	41345249	20.3	-	.	Target "Motif:Helitron-N42_CGi" 1 65
NC_035784.1	57163819	57163900	DMR	-63	NC_035784.1	RepeatMasker	similarity	57163820	57163967	27.4	+	.	Target "Motif:BivaMD-SINE1_CrVi" 179 325
NC_035784.1	86309411	86309430	DMR	-50	NC_035784.1	RepeatMasker	similarity	86309412	86309430	 5.5	+	.	Target "Motif:(C)n" 1 19
NC_035785.1	35798179	35798200	DMR	-52	NC_035785.1	RepeatMasker	similarity	35798180	35798247	31.6	+	.	Target "Motif:GA-rich" 1 68
NC_035787.1	47281023	47281063	DMR	50	NC_035787.1	RepeatM

#### Hypermethylated DMR

In [58]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDMR} \
-b {transposableElementsAll} \
| wc -l
!echo "hyperDMR overlaps with transposable elements (all)"

       3
hyperDMR overlaps with transposable elements (all)


In [41]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDMR} \
-b {transposableElementsAll} \
> 2019-06-05-HyperDMR-TE-all.txt

In [42]:
!head 2019-06-05-HyperDMR-TE-all.txt

NC_035781.1	54151500	54151601	54	NC_035781.1	RepeatMasker	similarity	54150482	54151750	14.3	+	.	Target "Motif:CVA" 1 1018
NC_035783.1	30386535	30386601	52	NC_035783.1	RepeatMasker	similarity	30386536	30387049	25.2	+	.	Target "Motif:Kolobok-N4_CGi" 1 497
NC_035787.1	47281023	47281063	50	NC_035787.1	RepeatMasker	similarity	47281024	47281063	17.5	-	.	Target "Motif:DNA9-6_CGi" 758 797
NC_035787.1	47281063	47281101	50	NC_035787.1	RepeatMasker	similarity	47281064	47281118	22.2	+	.	Target "Motif:DNA9-6_CGi" 744 797


#### Hypomethylated DMR

In [61]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDMR} \
-b {transposableElementsAll} \
| wc -l
!echo "hypoDMR overlaps with transposable elements (all)"

       8
hypoDMR overlaps with transposable elements (all)


In [43]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDMR} \
-b {transposableElementsAll} \
> 2019-06-05-HypoDMR-TE-all.txt

In [44]:
!head 2019-06-05-HypoDMR-TE-all.txt

NC_035784.1	41345100	41345105	-51	NC_035784.1	RepeatMasker	similarity	41345048	41345105	 6.9	+	.	Target "Motif:Helitron-10_CGi" 282 358
NC_035784.1	41345184	41345201	-51	NC_035784.1	RepeatMasker	similarity	41345185	41345249	20.3	-	.	Target "Motif:Helitron-N42_CGi" 1 65
NC_035784.1	57163819	57163901	-63	NC_035784.1	RepeatMasker	similarity	57163820	57163967	27.4	+	.	Target "Motif:BivaMD-SINE1_CrVi" 179 325
NC_035784.1	86309411	86309430	-50	NC_035784.1	RepeatMasker	similarity	86309412	86309430	 5.5	+	.	Target "Motif:(C)n" 1 19
NC_035785.1	35798179	35798201	-52	NC_035785.1	RepeatMasker	similarity	35798180	35798247	31.6	+	.	Target "Motif:GA-rich" 1 68
NC_035787.1	52112659	52112681	-53	NC_035787.1	RepeatMasker	similarity	52112660	52112681	 0.0	+	.	Target "Motif:(G)n" 1 22
NC_035787.1	61149800	61149804	-53	NC_035787.1	RepeatMasker	similarity	61149705	61149804	26.0	-	.	Target "Motif:DNA2-22_CGi" 1 103
NC_035787.1	61149807	61149901	-53	NC_035787.1	RepeatMasker	similarity	61149808	6114999

#### DMR Background

In [28]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRBackground} \
-b {transposableElementsAll} \
| wc -l
!echo "DMR background overlaps with transposable elements (all)"

   25117
DMR background overlaps with transposable elements (all)


In [45]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRBackground} \
-b {transposableElementsAll} \
> 2019-06-05-DMRBackground-TE-all.txt

In [46]:
!head 2019-06-05-DMRBackground-TE-all.txt

NC_007175.2	601	700	*	NC_007175.2	RepeatMasker	similarity	262	1389	31.1	+	.	Target "Motif:REP-6_LMi" 2920 4055
NC_007175.2	2201	2300	*	NC_007175.2	RepeatMasker	similarity	2129	2367	20.5	-	.	Target "Motif:REP-6_LMi" 13886 14118
NC_007175.2	5301	5400	*	NC_007175.2	RepeatMasker	similarity	5168	5532	32.9	+	.	Target "Motif:REP-6_LMi" 4620 4983
NC_007175.2	5401	5500	*	NC_007175.2	RepeatMasker	similarity	5168	5532	32.9	+	.	Target "Motif:REP-6_LMi" 4620 4983
NC_007175.2	5501	5532	*	NC_007175.2	RepeatMasker	similarity	5168	5532	32.9	+	.	Target "Motif:REP-6_LMi" 4620 4983
NC_007175.2	12301	12368	*	NC_007175.2	RepeatMasker	similarity	12086	12368	30.0	-	.	Target "Motif:REP-6_LMi" 9850 10131
NC_007175.2	16531	16600	*	NC_007175.2	RepeatMasker	similarity	16532	16610	24.1	-	.	Target "Motif:REP-6_LMi" 13114 13192
NC_007175.2	16601	16610	*	NC_007175.2	RepeatMasker	similarity	16532	16610	24.1	-	.	Target "Motif:REP-6_LMi" 13114 13192
NC_035780.1	1472	1500	*	NC_035780.1	RepeatMasker	similarity	1473

### 2e. Transposable Elements (_C. gigas_ only)

#### DML

In [29]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMLlist} \
-b {transposableElementsCg} \
| wc -l
!echo "DML overlaps with transposable elements (Cg)"

      39
DML overlaps with transposable elements (Cg)


In [76]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMLlist} \
-b {transposableElementsCg} \
> 2019-05-29-DML-TE-Cg.txt

In [77]:
!head 2019-05-29-DML-TE-Cg.txt

NC_035780.1	8833124	8833126	60	NC_035780.1	RepeatMasker	similarity	8833045	8833287	22.6	-	.	Target "Motif:Helitron-N2f_CGi" 1 276
NC_035780.1	22177828	22177830	-51	NC_035780.1	RepeatMasker	similarity	22177766	22177877	22.3	-	.	Target "Motif:DNA9-6_CGi" 1 115
NC_035780.1	57337100	57337102	-54	NC_035780.1	RepeatMasker	similarity	57337042	57337128	18.6	-	.	Target "Motif:DNA2-2_CGi" 413 498
NC_035781.1	29178318	29178320	-55	NC_035781.1	RepeatMasker	similarity	29177333	29178341	24.4	-	.	Target "Motif:Helitron-N2d_CGi" 2 863
NC_035781.1	54151548	54151550	54	NC_035781.1	RepeatMasker	similarity	54150483	54151741	23.3	+	.	Target "Motif:Helitron-N2f_CGi" 1 1018
NC_035781.1	59742649	59742651	-65	NC_035781.1	RepeatMasker	similarity	59742603	59742651	 4.2	+	.	Target "Motif:(ACTAACG)n" 1 49
NC_035782.1	34498893	34498895	-55	NC_035782.1	RepeatMasker	similarity	34498501	34500091	24.8	+	.	Target "Motif:Helitron-N40_CGi" 1 1569
NC_035782.1	34498895	34498897	-71	NC_035782.1	RepeatMasker	similarity

#### Hypermethylated DML

In [32]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {hyperDML} \
-b {transposableElementsCg} \
| wc -l
!echo "hypermethylated DML overlaps with TE (Cg)"

      16
hypermethylated DML overlaps with TE (Cg)


In [80]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDML} \
-b {transposableElementsCg} \
> 2019-05-29-Hypermethylated-DML-TECg.txt

In [81]:
!head 2019-05-29-Hypermethylated-DML-TECg.txt

NC_035780.1	8833124	8833126	60	NC_035780.1	RepeatMasker	similarity	8833045	8833287	22.6	-	.	Target "Motif:Helitron-N2f_CGi" 1 276
NC_035781.1	54151548	54151550	54	NC_035781.1	RepeatMasker	similarity	54150483	54151741	23.3	+	.	Target "Motif:Helitron-N2f_CGi" 1 1018
NC_035782.1	53693367	53693369	61	NC_035782.1	RepeatMasker	similarity	53693299	53693466	19.6	+	.	Target "Motif:Crypton-N6B_CGi" 566 735
NC_035782.1	58675269	58675271	50	NC_035782.1	RepeatMasker	similarity	58675249	58675337	19.1	-	.	Target "Motif:DNA3-12_CGi" 290 378
NC_035782.1	61203970	61203972	51	NC_035782.1	RepeatMasker	similarity	61203650	61204350	24.8	-	.	Target "Motif:Helitron-N2d_CGi" 1 686
NC_035783.1	4336100	4336102	63	NC_035783.1	RepeatMasker	similarity	4335884	4336135	21.8	+	.	Target "Motif:DNA8-4_CGi" 42 268
NC_035783.1	23130125	23130127	53	NC_035783.1	RepeatMasker	similarity	23130086	23130209	18.6	+	.	Target "Motif:Crypton-8N1_CGi" 516 639
NC_035783.1	29749414	29749416	57	NC_035783.1	RepeatMasker	similarity

#### Hypomethylated DML

In [35]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDML} \
-b {transposableElementsCg} \
| wc -l
!echo "hypomethylated DML overlaps with TE (Cg)"

      23
hypomethylated DML overlaps with TE (Cg)


In [82]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDML} \
-b {transposableElementsCg} \
> 2019-05-29-Hypomethylated-DML-TECg.txt

In [83]:
!head 2019-05-29-Hypomethylated-DML-TECg.txt

NC_035780.1	22177828	22177830	-51	NC_035780.1	RepeatMasker	similarity	22177766	22177877	22.3	-	.	Target "Motif:DNA9-6_CGi" 1 115
NC_035780.1	57337100	57337102	-54	NC_035780.1	RepeatMasker	similarity	57337042	57337128	18.6	-	.	Target "Motif:DNA2-2_CGi" 413 498
NC_035781.1	29178318	29178320	-55	NC_035781.1	RepeatMasker	similarity	29177333	29178341	24.4	-	.	Target "Motif:Helitron-N2d_CGi" 2 863
NC_035781.1	59742649	59742651	-65	NC_035781.1	RepeatMasker	similarity	59742603	59742651	 4.2	+	.	Target "Motif:(ACTAACG)n" 1 49
NC_035782.1	34498893	34498895	-55	NC_035782.1	RepeatMasker	similarity	34498501	34500091	24.8	+	.	Target "Motif:Helitron-N40_CGi" 1 1569
NC_035782.1	34498895	34498897	-71	NC_035782.1	RepeatMasker	similarity	34498501	34500091	24.8	+	.	Target "Motif:Helitron-N40_CGi" 1 1569
NC_035783.1	48434286	48434288	-53	NC_035783.1	RepeatMasker	similarity	48434172	48434360	26.1	-	.	Target "Motif:DNA3-11_CGi" 1856 2040
NC_035783.1	49079096	49079097	-50	NC_035783.1	RepeatMasker	simil

#### DMR

In [64]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRlist} \
-b {transposableElementsCg} \
| wc -l
!echo "DMR overlaps with transposable elements (Cg)"

       9
DMR overlaps with transposable elements (Cg)


In [65]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRlist} \
-b {transposableElementsCg} \
> 2019-06-05-DMR-TE-Cg.txt

In [66]:
!head 2019-06-05-DMR-TE-Cg.txt

NC_035781.1	54151500	54151600	DMR	54	NC_035781.1	RepeatMasker	similarity	54150483	54151741	23.3	+	.	Target "Motif:Helitron-N2f_CGi" 1 1018
NC_035783.1	30386535	30386600	DMR	52	NC_035783.1	RepeatMasker	similarity	30386536	30387049	25.2	+	.	Target "Motif:Kolobok-N4_CGi" 1 497
NC_035784.1	41345100	41345105	DMR	-51	NC_035784.1	RepeatMasker	similarity	41345048	41345105	 6.9	+	.	Target "Motif:Helitron-10_CGi" 282 358
NC_035784.1	41345184	41345200	DMR	-51	NC_035784.1	RepeatMasker	similarity	41345185	41345249	20.3	-	.	Target "Motif:Helitron-N42_CGi" 1 65
NC_035784.1	86309411	86309430	DMR	-50	NC_035784.1	RepeatMasker	similarity	86309412	86309430	 5.5	+	.	Target "Motif:(C)n" 1 19
NC_035785.1	35798179	35798200	DMR	-52	NC_035785.1	RepeatMasker	similarity	35798180	35798247	31.6	+	.	Target "Motif:GA-rich" 1 68
NC_035787.1	47281023	47281063	DMR	50	NC_035787.1	RepeatMasker	similarity	47281024	47281063	17.5	-	.	Target "Motif:DNA9-6_CGi" 758 797
NC_035787.1	47281063	47281100	DMR	50	NC_035787.1	Re

#### Hypermethylated DMR

In [67]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hyperDMR} \
-b {transposableElementsCg} \
| wc -l
!echo "hyperDMR overlaps with transposable elements (Cg)"

       3
hyperDMR overlaps with transposable elements (Cg)


In [68]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hyperDMR} \
-b {transposableElementsCg} \
> 2019-06-05-HyperDMR-TE-Cg.txt

In [69]:
!head 2019-06-05-HyperDMR-TE-Cg.txt

NC_035781.1	54151500	54151601	54	NC_035781.1	RepeatMasker	similarity	54150483	54151741	23.3	+	.	Target "Motif:Helitron-N2f_CGi" 1 1018
NC_035783.1	30386535	30386601	52	NC_035783.1	RepeatMasker	similarity	30386536	30387049	25.2	+	.	Target "Motif:Kolobok-N4_CGi" 1 497
NC_035787.1	47281023	47281063	50	NC_035787.1	RepeatMasker	similarity	47281024	47281063	17.5	-	.	Target "Motif:DNA9-6_CGi" 758 797
NC_035787.1	47281063	47281101	50	NC_035787.1	RepeatMasker	similarity	47281064	47281118	22.2	+	.	Target "Motif:DNA9-6_CGi" 744 797


#### Hypomethylated DMR

In [70]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {hypoDMR} \
-b {transposableElementsCg} \
| wc -l
!echo "hypoDMR overlaps with transposable elements (Cg)"

       6
hypoDMR overlaps with transposable elements (Cg)


In [71]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {hypoDMR} \
-b {transposableElementsCg} \
> 2019-06-05-HypoDMR-TE-Cg.txt

In [72]:
!head 2019-06-05-HypoDMR-TE-Cg.txt

NC_035784.1	41345100	41345105	-51	NC_035784.1	RepeatMasker	similarity	41345048	41345105	 6.9	+	.	Target "Motif:Helitron-10_CGi" 282 358
NC_035784.1	41345184	41345201	-51	NC_035784.1	RepeatMasker	similarity	41345185	41345249	20.3	-	.	Target "Motif:Helitron-N42_CGi" 1 65
NC_035784.1	86309411	86309430	-50	NC_035784.1	RepeatMasker	similarity	86309412	86309430	 5.5	+	.	Target "Motif:(C)n" 1 19
NC_035785.1	35798179	35798201	-52	NC_035785.1	RepeatMasker	similarity	35798180	35798247	31.6	+	.	Target "Motif:GA-rich" 1 68
NC_035787.1	52112659	52112681	-53	NC_035787.1	RepeatMasker	similarity	52112660	52112681	 0.0	+	.	Target "Motif:(G)n" 1 22
NC_035787.1	61149800	61149804	-53	NC_035787.1	RepeatMasker	similarity	61149705	61149804	26.0	-	.	Target "Motif:DNA2-22_CGi" 1 103
NC_035787.1	61149807	61149901	-53	NC_035787.1	RepeatMasker	similarity	61149808	61149990	24.0	+	.	Target "Motif:Helitron-N2_CGi" 127 305
NC_035788.1	56052700	56052733	-73	NC_035788.1	RepeatMasker	similarity	56052674	56052733	

#### DMR Background

In [31]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMRBackground} \
-b {transposableElementsCg} \
| wc -l
!echo "DMR background overlaps with transposable elements (Cg)"

   20228
DMR background overlaps with transposable elements (Cg)


In [32]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMRBackground} \
-b {transposableElementsCg} \
> 2019-06-05-DMRBackground-TE-Cg.txt

In [33]:
!head 2019-06-05-DMRBackground-TE-Cg.txt

NC_035780.1	1472	1500	*	NC_035780.1	RepeatMasker	similarity	1473	1535	 0.0	+	.	Target "Motif:(TAACCC)n" 1 63
NC_035780.1	259831	259900	*	NC_035780.1	RepeatMasker	similarity	259832	259930	25.8	-	.	Target "Motif:DNA9-7_CGi" 1 97
NC_035780.1	269562	269600	*	NC_035780.1	RepeatMasker	similarity	269563	269603	17.1	+	.	Target "Motif:(ATG)n" 1 42
NC_035780.1	269601	269603	*	NC_035780.1	RepeatMasker	similarity	269563	269603	17.1	+	.	Target "Motif:(ATG)n" 1 42
NC_035780.1	270801	270894	*	NC_035780.1	RepeatMasker	similarity	270702	270894	22.2	+	.	Target "Motif:DNA2-5_CGi" 1 213
NC_035780.1	272001	272062	*	NC_035780.1	RepeatMasker	similarity	271965	272062	26.5	+	.	Target "Motif:Kolobok-2_CGi" 2384 2485
NC_035780.1	283736	283800	*	NC_035780.1	RepeatMasker	similarity	283737	283817	 6.7	-	.	Target "Motif:DIRS-1_CGi" 4930 5010
NC_035780.1	291341	291377	*	NC_035780.1	RepeatMasker	similarity	291342	291377	 8.9	+	.	Target "Motif:(CAAGCA)n" 1 39
NC_035780.1	293792	293800	*	NC_035780.1	RepeatMasker

## 3. Identify Overlaps between Other Genome Feature Tracks

I began some of this work in [this Jupyter notebook](https://github.com/fish546-2018/yaamini-virginica/blob/master/notebooks/2019-05-13-Generating-Genome-Feature-Tracks.ipynb) for CG motif overlaps with genomic feature tracks. Now I'll continue this for other tracks.

### 3a. Transposable Elements (all)

To fully understand my results, I also need to know where TEs are located with respect to exons, introns, and genes.

#### Exons

In [85]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {exonList} \
-b {transposableElementsAll} \
| wc -l
!echo "Exon overlaps with transposable elements (all)"

   50331
Exon overlaps with transposable elements (all)


In [18]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {exonList} \
-b {transposableElementsAll} \
> 2018-11-07-Exon-TE-all.txt

In [19]:
!head 2018-11-07-Exon-TE-all.txt

NC_035780.1	108305	110077	NC_035780.1	RepeatMasker	similarity	109968	109996	 0.0	+	.	Target "Motif:(CCT)n" 1 29	29
NC_035780.1	164820	164941	NC_035780.1	RepeatMasker	similarity	164886	164914	 7.3	+	.	Target "Motif:(GAG)n" 1 29	29
NC_035780.1	165620	166793	NC_035780.1	RepeatMasker	similarity	166075	166280	32.8	+	.	Target "Motif:Harbinger1_DR" 1472 1676	206
NC_035780.1	165620	166793	NC_035780.1	RepeatMasker	similarity	166501	166566	30.3	+	.	Target "Motif:Harbinger-6_DR" 1152 1217	66
NC_035780.1	165620	166793	NC_035780.1	RepeatMasker	similarity	166598	166642	17.8	+	.	Target "Motif:hATw-1_HM" 2778 2822	45
NC_035780.1	219451	220204	NC_035780.1	RepeatMasker	similarity	220122	220199	24.7	-	.	Target "Motif:Gypsy-75_CQ-I" 1012 1091	78
NC_035780.1	227734	228033	NC_035780.1	RepeatMasker	similarity	227768	227819	25.0	+	.	Target "Motif:A-rich" 1 54	52
NC_035780.1	227734	228033	NC_035780.1	RepeatMasker	similarity	227768	227819	25.0	+	.	Target "Motif:A-rich" 1 54	52
NC_035780.1	227734	228033	

#### Introns

In [86]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {intronList} \
-b {transposableElementsAll} \
| wc -l
!echo "Intron overlaps with transposable elements (all)"

  115151
Intron overlaps with transposable elements (all)


In [20]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {intronList} \
-b {transposableElementsAll} \
> 2018-11-07-Intron-TE-all.txt

In [21]:
!head 2018-11-07-Intron-TE-all.txt

NC_035780.1	32565	32958	NC_035780.1	RepeatMasker	similarity	32720	32819	18.2	+	.	Target "Motif:Crypton-9N1_CGi" 239 337	100
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	48463	48520	 8.8	+	.	Target "Motif:BivaMD-SINE1_CrVi" 280 337	58
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	48666	49000	10.9	-	.	Target "Motif:BivaMD-SINE1_CrVi" 1 337	335
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	50251	50279	 0.0	+	.	Target "Motif:(GGTTAG)n" 1 29	29
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	50606	50760	21.3	+	.	Target "Motif:Harbinger-2N1_CGi" 1 166	155
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	50977	51034	 0.0	+	.	Target "Motif:(TA)n" 1 58	58
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	51456	51498	 0.0	+	.	Target "Motif:(AG)n" 1 43	43
NC_035780.1	46506	64122	NC_035780.1	RepeatMasker	similarity	51721	51922	21.8	+	.	Target "Motif:Harbinger-2N1_CGi" 2568 2776	202
NC_035780.1	46506	64122	NC_035780

#### Genes

In [92]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {geneList} \
-b {transposableElementsAll} \
| wc -l
!echo "gene overlaps with transposable elements (all)"

   33739
gene overlaps with transposable elements (all)


In [16]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {geneList} \
-b {transposableElementsAll} \
> 2018-11-07-Genes-TE-all.txt

In [17]:
!head 2018-11-07-Genes-TE-all.txt

NC_035780.1	28961	33324	NC_035780.1	RepeatMasker	similarity	32720	32819	18.2	+	.	Target "Motif:Crypton-9N1_CGi" 239 337	100
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	48463	48520	 8.8	+	.	Target "Motif:BivaMD-SINE1_CrVi" 280 337	58
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	48666	49000	10.9	-	.	Target "Motif:BivaMD-SINE1_CrVi" 1 337	335
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	50251	50279	 0.0	+	.	Target "Motif:(GGTTAG)n" 1 29	29
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	50606	50760	21.3	+	.	Target "Motif:Harbinger-2N1_CGi" 1 166	155
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	50977	51034	 0.0	+	.	Target "Motif:(TA)n" 1 58	58
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	51456	51498	 0.0	+	.	Target "Motif:(AG)n" 1 43	43
NC_035780.1	43111	66897	NC_035780.1	RepeatMasker	similarity	51721	51922	21.8	+	.	Target "Motif:Harbinger-2N1_CGi" 2568 2776	202
NC_035780.1	43111	66897	NC_035780

#### CG motifs

In [48]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {CGMotifList} \
-b {transposableElementsAll} \
| wc -l
!echo "CG motif overlaps with transposable elements (all)"

 2828372
CG motif overlaps with transposable elements (all)


In [22]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a {CGMotifList} \
-b {transposableElementsAll} \
> 2018-11-07-TE-all-CGmotif.txt

In [23]:
!head 2018-11-07-TE-all-CGmotif.txt

NC_035780.1	5078	5080	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	1
NC_035780.1	5159	5161	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	2
NC_035780.1	5162	5164	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	2
NC_035780.1	5174	5176	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	2
NC_035780.1	5191	5193	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	2
NC_035780.1	5220	5222	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	2
NC_035780.1	5317	5319	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631	2
NC_035780.1	5357	5359	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Mot

### 3b. Transposable Elements (_C. gigas_ only)

#### Exons

In [24]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {exonList} \
-b {transposableElementsCg} \
| wc -l
!echo "Exon overlaps with transposable elements (Cg)"

   41511
Exon overlaps with transposable elements (Cg)


In [27]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {exonList} \
-b {transposableElementsCg} \
> 2018-11-07-Exon-TE-Cg.txt

In [28]:
!head 2018-11-07-Exon-TE-Cg.txt

NC_035780.1	109967	109996	NC_035780.1	RepeatMasker	similarity	109968	109996	 0.0	+	.	Target "Motif:(CCT)n" 1 29
NC_035780.1	164885	164914	NC_035780.1	RepeatMasker	similarity	164886	164914	 7.3	+	.	Target "Motif:(GAG)n" 1 29
NC_035780.1	227767	227819	NC_035780.1	RepeatMasker	similarity	227768	227819	25.0	+	.	Target "Motif:A-rich" 1 54
NC_035780.1	227767	227819	NC_035780.1	RepeatMasker	similarity	227768	227819	25.0	+	.	Target "Motif:A-rich" 1 54
NC_035780.1	227767	227819	NC_035780.1	RepeatMasker	similarity	227768	227819	25.0	+	.	Target "Motif:A-rich" 1 54
NC_035780.1	233475	233478	NC_035780.1	RepeatMasker	similarity	233445	233478	10.1	+	.	Target "Motif:(CCTTT)n" 1 35
NC_035780.1	232863	233028	NC_035780.1	RepeatMasker	similarity	232798	233028	29.7	-	.	Target "Motif:ISL2EU-N8_CGi" 15 237
NC_035780.1	269562	269603	NC_035780.1	RepeatMasker	similarity	269563	269603	17.1	+	.	Target "Motif:(ATG)n" 1 42
NC_035780.1	258539	258574	NC_035780.1	RepeatMasker	similarity	258540	258574	16.3	+	.	

#### Introns

In [100]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {intronList} \
-b {transposableElementsCg} \
| wc -l
!echo "Intron overlaps with transposable elements (Cg)"

  107542
Intron overlaps with transposable elements (Cg)


In [101]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {intronList} \
-b {transposableElementsCg} \
> 2018-11-07-Intron-TE-Cg.txt

In [102]:
!head 2018-11-07-Intron-TE-Cg.txt

NC_035780.1	32719	32819	NC_035780.1	RepeatMasker	similarity	32720	32819	18.2	+	.	Target "Motif:Crypton-9N1_CGi" 239 337
NC_035780.1	46753	46805	NC_035780.1	RepeatMasker	similarity	46754	46805	 6.8	+	.	Target "Motif:DNA-22_CGi" 631 722
NC_035780.1	50250	50279	NC_035780.1	RepeatMasker	similarity	50251	50279	 0.0	+	.	Target "Motif:(GGTTAG)n" 1 29
NC_035780.1	50605	50760	NC_035780.1	RepeatMasker	similarity	50606	50760	21.3	+	.	Target "Motif:Harbinger-2N1_CGi" 1 166
NC_035780.1	50976	51034	NC_035780.1	RepeatMasker	similarity	50977	51034	 0.0	+	.	Target "Motif:(TA)n" 1 58
NC_035780.1	51455	51498	NC_035780.1	RepeatMasker	similarity	51456	51498	 0.0	+	.	Target "Motif:(AG)n" 1 43
NC_035780.1	51720	51922	NC_035780.1	RepeatMasker	similarity	51721	51922	21.8	+	.	Target "Motif:Harbinger-2N1_CGi" 2568 2776
NC_035780.1	86839	86942	NC_035780.1	RepeatMasker	similarity	86840	86942	27.4	-	.	Target "Motif:Helitron-N14_CGi" 83 189
NC_035780.1	87408	87513	NC_035780.1	RepeatMasker	similarity	87409	87

#### Genes

In [97]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {geneList} \
-b {transposableElementsCg} \
| wc -l
!echo "gene overlaps with transposable elements (Cg)"

   32705
gene overlaps with transposable elements (Cg)


In [98]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {geneList} \
-b {transposableElementsCg} \
> 2018-11-07-Gene-TE-Cg.txt

In [99]:
!head 2018-11-07-Gene-TE-Cg.txt

NC_035780.1	32719	32819	NC_035780.1	RepeatMasker	similarity	32720	32819	18.2	+	.	Target "Motif:Crypton-9N1_CGi" 239 337
NC_035780.1	46753	46805	NC_035780.1	RepeatMasker	similarity	46754	46805	 6.8	+	.	Target "Motif:DNA-22_CGi" 631 722
NC_035780.1	50250	50279	NC_035780.1	RepeatMasker	similarity	50251	50279	 0.0	+	.	Target "Motif:(GGTTAG)n" 1 29
NC_035780.1	50605	50760	NC_035780.1	RepeatMasker	similarity	50606	50760	21.3	+	.	Target "Motif:Harbinger-2N1_CGi" 1 166
NC_035780.1	50976	51034	NC_035780.1	RepeatMasker	similarity	50977	51034	 0.0	+	.	Target "Motif:(TA)n" 1 58
NC_035780.1	51455	51498	NC_035780.1	RepeatMasker	similarity	51456	51498	 0.0	+	.	Target "Motif:(AG)n" 1 43
NC_035780.1	51720	51922	NC_035780.1	RepeatMasker	similarity	51721	51922	21.8	+	.	Target "Motif:Harbinger-2N1_CGi" 2568 2776
NC_035780.1	86839	86942	NC_035780.1	RepeatMasker	similarity	86840	86942	27.4	-	.	Target "Motif:Helitron-N14_CGi" 83 189
NC_035780.1	87408	87513	NC_035780.1	RepeatMasker	similarity	87409	87

#### CG motifs

In [84]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {CGMotifList} \
-b {transposableElementsCg} \
| wc -l
!echo "CG motif overlaps with transposable elements (Cg)"

 2142774
CG motif overlaps with transposable elements (Cg)


In [52]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {CGMotifList} \
-b {transposableElementsCg} \
> 2018-11-07-TE-Cg-CGmotif.txt

In [53]:
!head 2018-11-07-TE-Cg-CGmotif.txt

NC_035780.1	5079	5080	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5159	5161	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5162	5164	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5174	5176	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5191	5193	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5220	5222	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5317	5319	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CGi-I" 2102 4631
NC_035780.1	5357	5359	CG_motif	NC_035780.1	RepeatMasker	similarity	5080	7289	32.5	-	.	Target "Motif:Gypsy-62_CG

### 3c. Exons

To help with downstream annotations, I also want to look at exon overlaps with genes.

In [8]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {exonList} \
-b {geneList} \
| wc -l
!echo "exon overlaps with genes"

  731279
exon overlaps with genes


In [9]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {exonList} \
-b {geneList} \
> 2019-06-20-Exon-Gene.txt

In [10]:
!head 2019-06-20-Exon-Gene.txt

NC_035780.1	13578	13603	NC_035780.1	13578	14594
NC_035780.1	14237	14290	NC_035780.1	13578	14594
NC_035780.1	14557	14594	NC_035780.1	13578	14594
NC_035780.1	28961	29073	NC_035780.1	28961	33324
NC_035780.1	30524	31557	NC_035780.1	28961	33324
NC_035780.1	31736	31887	NC_035780.1	28961	33324
NC_035780.1	31977	32565	NC_035780.1	28961	33324
NC_035780.1	32959	33324	NC_035780.1	28961	33324
NC_035780.1	43111	44358	NC_035780.1	43111	66897
NC_035780.1	43111	44358	NC_035780.1	43111	66897


### 3d. Introns

In [11]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {intronList} \
-b {geneList} \
| wc -l
!echo "intron overlaps with genes"

  316614
intron overlaps with genes


In [12]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {intronList} \
-b {geneList} \
> 2019-06-20-Intron-Gene.txt

In [13]:
!head 2019-06-20-Intron-Gene.txt

NC_035780.1	13603	14236	NC_035780.1	13578	14594
NC_035780.1	14290	14556	NC_035780.1	13578	14594
NC_035780.1	29073	30523	NC_035780.1	28961	33324
NC_035780.1	31557	31735	NC_035780.1	28961	33324
NC_035780.1	31887	31976	NC_035780.1	28961	33324
NC_035780.1	32565	32958	NC_035780.1	28961	33324
NC_035780.1	44358	45912	NC_035780.1	43111	66897
NC_035780.1	46506	64122	NC_035780.1	43111	66897
NC_035780.1	64334	66868	NC_035780.1	43111	66897
NC_035780.1	85777	88422	NC_035780.1	85606	95254


## 4. Gene Flanking

I will perform a flanking analysis in two ways. First, I will use `bedtools flank` to add 1000 bp regions to each mRNA coding region. I can then isolate these flanks and intersect them with various genomic feature files. Second I will use `bedtools closest` to find the closest non-overlapping DML or DMR to each mRNA coding region.

In [110]:
mkdir 2019-05-29-Flanking-Analysis #Create a new directory for flanking analysis output

### 4a. `flank`

I also need to know if DMLs and CG motifs overlap with regions that flank mRNA. These flanking regions could be promoters or transcription factors that could regulate these processes. To do this, I will use `bedtools flank`:

1. Path to `flankBed`
2. -i: Path to mRNA GFF file
3. -g: Path to C. virginica "genome" file. flankBed requires the start and stop position of each genome (see this issue). I created a file like in TextWrangler using chromosome lengths from NCBI.
4. -b 1000: Add 1000 bp flanks to each end of the coding region

In [111]:
! {bedtoolsDirectory}flankBed \
-i {mRNAList} \
-g 2018-11-14-Flanking-Analysis/2018-11-14-bedtools-Chromosome-Length.txt \
-b 1000 \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed

In [112]:
!head {mRNAList} #The original file, just for comparison

NC_035780.1	28961	33324
NC_035780.1	43111	66897
NC_035780.1	43111	46506
NC_035780.1	85606	95254
NC_035780.1	99840	106460
NC_035780.1	108305	110077
NC_035780.1	151859	157536
NC_035780.1	163809	183798
NC_035780.1	164820	166793
NC_035780.1	190449	193594


In [113]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed #Isolated flanks. The first entry is the upstream flank for the first mRNA coding region, second is the downstream flank for the mRNA coding region, etc.

NC_035780.1	27961	28961
NC_035780.1	33324	34324
NC_035780.1	42111	43111
NC_035780.1	66897	67897
NC_035780.1	42111	43111
NC_035780.1	46506	47506
NC_035780.1	84606	85606
NC_035780.1	95254	96254
NC_035780.1	98840	99840
NC_035780.1	106460	107460


In [114]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed

  120402 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed


Now that I have these flanks, I want to separate the upstream flank from the downstream flank. I will do this using `awk`. If th row number is odd, the rows go into the upstream flank file. If the row number is even, it goes into the downstream flank file.

In [115]:
!awk '{ if (NR%2) print > "2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed"; \
else print > "2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed" }' \
2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-Flanks.bed

#### Upstream flanks (i.e. putative promoters)

In [15]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed

NC_035780.1	27961	28961
NC_035780.1	42111	43111
NC_035780.1	42111	43111
NC_035780.1	84606	85606
NC_035780.1	98840	99840
NC_035780.1	107305	108305
NC_035780.1	150859	151859
NC_035780.1	162809	163809
NC_035780.1	163820	164820
NC_035780.1	189449	190449


In [118]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed

   60201 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed


There are just as many upstream flanks as there are mRNA, so that's good!

#### Downstream flanks

In [120]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed

NC_035780.1	33324	34324
NC_035780.1	66897	67897
NC_035780.1	46506	47506
NC_035780.1	95254	96254
NC_035780.1	106460	107460
NC_035780.1	110077	111077
NC_035780.1	157536	158536
NC_035780.1	183798	184798
NC_035780.1	166793	167793
NC_035780.1	193594	194594


In [121]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed

   60201 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed


Now I'll take the upstream and downstream flank BEDfiles I made and use it in intersectBed to find overlaps with DML, DMR, and CG motifs!

1. Path to intersectBed
2. -wo: Write output according to both files
3. -a: Path to BEDfile created with flanks
4. -b: Specify either DML, DMR, or CG motif file. Overlaps between the flanks and CG motifs can be used as a background when comparing DML-flank and DMR-flank results
5. ">" filename: Redirect output to a .txt file

#### DML

In [19]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {DMLlist} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-DML.txt

In [20]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-DML.txt

NC_035780.1	8832968	8833968	NC_035780.1	8833124	8833126	60	2
NC_035780.1	55484028	55485028	NC_035780.1	55484705	55484707	-52	2
NC_035780.1	55484028	55485028	NC_035780.1	55484705	55484707	-52	2
NC_035780.1	58134923	58135923	NC_035780.1	58135767	58135769	74	2
NC_035781.1	7626060	7627060	NC_035781.1	7626510	7626512	-56	2
NC_035781.1	7626070	7627070	NC_035781.1	7626510	7626512	-56	2
NC_035781.1	30789563	30790563	NC_035781.1	30789623	30789625	-57	2
NC_035781.1	31149663	31150663	NC_035781.1	31150010	31150012	53	2
NC_035782.1	4729317	4730317	NC_035782.1	4729348	4729350	55	2
NC_035782.1	4729322	4730322	NC_035782.1	4729348	4729350	55	2


In [21]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-DML.txt

      67 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-DML.txt


In [22]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {DMLlist} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-DML.txt

In [23]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-DML.txt

NC_035780.1	1881983	1882983	NC_035780.1	1882691	1882693	64	2
NC_035781.1	20125253	20126253	NC_035781.1	20126029	20126031	-52	2
NC_035781.1	28992058	28993058	NC_035781.1	28992818	28992820	52	2
NC_035781.1	30061881	30062881	NC_035781.1	30062222	30062224	60	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2


In [24]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-DML.txt

      49 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-DML.txt


#### Hypermethylated DML

In [25]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {hyperDML} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypermethylated-DML.txt

In [26]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypermethylated-DML.txt

NC_035780.1	8832968	8833968	NC_035780.1	8833124	8833126	60	2
NC_035780.1	58134923	58135923	NC_035780.1	58135767	58135769	74	2
NC_035781.1	31149663	31150663	NC_035781.1	31150010	31150012	53	2
NC_035782.1	4729317	4730317	NC_035782.1	4729348	4729350	55	2
NC_035782.1	4729322	4730322	NC_035782.1	4729348	4729350	55	2
NC_035783.1	19545403	19546403	NC_035783.1	19545473	19545475	50	2
NC_035783.1	45068784	45069784	NC_035783.1	45068802	45068804	52	2
NC_035783.1	57930863	57931863	NC_035783.1	57931740	57931742	50	2
NC_035784.1	26799941	26800941	NC_035784.1	26800339	26800341	66	2
NC_035784.1	57514330	57515330	NC_035784.1	57514692	57514694	53	2


In [27]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypermethylated-DML.txt

      44 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypermethylated-DML.txt


In [28]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {hyperDML} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypermethylated-DML.txt

In [29]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypermethylated-DML.txt

NC_035780.1	1881983	1882983	NC_035780.1	1882691	1882693	64	2
NC_035781.1	28992058	28993058	NC_035781.1	28992818	28992820	52	2
NC_035781.1	30061881	30062881	NC_035781.1	30062222	30062224	60	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	31149350	31150350	NC_035781.1	31150010	31150012	53	2
NC_035781.1	49000124	49001124	NC_035781.1	49000882	49000884	61	2


In [31]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypermethylated-DML.txt

      34 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypermethylated-DML.txt


#### Hypomethylated DML

In [32]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {hypoDML} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypomethylated-DML.txt

In [33]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypomethylated-DML.txt

NC_035780.1	55484028	55485028	NC_035780.1	55484705	55484707	-52	2
NC_035780.1	55484028	55485028	NC_035780.1	55484705	55484707	-52	2
NC_035781.1	7626060	7627060	NC_035781.1	7626510	7626512	-56	2
NC_035781.1	7626070	7627070	NC_035781.1	7626510	7626512	-56	2
NC_035781.1	30789563	30790563	NC_035781.1	30789623	30789625	-57	2
NC_035782.1	6684983	6685983	NC_035782.1	6685343	6685345	-68	2
NC_035782.1	6684983	6685983	NC_035782.1	6685349	6685351	-50	2
NC_035782.1	6897189	6898189	NC_035782.1	6897406	6897408	-53	2
NC_035782.1	72204997	72205997	NC_035782.1	72205396	72205398	-55	2
NC_035782.1	72204997	72205997	NC_035782.1	72205396	72205398	-55	2


In [34]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypomethylated-DML.txt

      23 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-Hypomethylated-DML.txt


In [35]:
! {bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {hypoDML} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypomethylated-DML.txt

In [36]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypomethylated-DML.txt

NC_035781.1	20125253	20126253	NC_035781.1	20126029	20126031	-52	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2
NC_035784.1	1061925	1062925	NC_035784.1	1062719	1062721	-51	2


In [37]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypomethylated-DML.txt

      15 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-Hypomethylated-DML.txt


#### DMR

In [74]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {DMRlist} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMR.txt

In [75]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMR.txt

NC_035781.1	7626500	7626600	NC_035781.1	7626500	7626600	DMR	-56
NC_035781.1	7626500	7626600	NC_035781.1	7626500	7626600	DMR	-56
NC_035781.1	30789600	30789700	NC_035781.1	30789600	30789700	DMR	-57
NC_035784.1	86309481	86309500	NC_035784.1	86309400	86309500	DMR	-50
NC_035784.1	92841500	92841546	NC_035784.1	92841500	92841600	DMR	-52
NC_035787.1	47281000	47281100	NC_035787.1	47281000	47281100	DMR	50
NC_035787.1	55630800	55630900	NC_035787.1	55630800	55630900	DMR	-51
NC_035788.1	56054548	56054600	NC_035788.1	56054500	56054600	DMR	-51


In [83]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMR.txt

       8 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMR.txt


In [77]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {DMRlist} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMR.txt

In [78]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMR.txt

NC_035780.1	2538955	2539000	NC_035780.1	2538900	2539000	DMR	-50
NC_035781.1	20126000	20126100	NC_035781.1	20126000	20126100	DMR	-52
NC_035785.1	31238800	31238900	NC_035785.1	31238800	31238900	DMR	59


In [79]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMR.txt

       3 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMR.txt


#### Hypermethylated DMR

In [80]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {hyperDMR} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HyperDMR.txt

In [81]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HyperDMR.txt

NC_035787.1	47281000	47281101	NC_035787.1	47281000	47281101	50


In [84]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HyperDMR.txt

       1 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HyperDMR.txt


In [85]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {DMRlist} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HyperDMR.txt

In [86]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HyperDMR.txt

NC_035780.1	2538955	2539000	NC_035780.1	2538900	2539000	DMR	-50
NC_035781.1	20126000	20126100	NC_035781.1	20126000	20126100	DMR	-52
NC_035785.1	31238800	31238900	NC_035785.1	31238800	31238900	DMR	59


In [87]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HyperDMR.txt

       3 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HyperDMR.txt


#### Hypomethylated DMR

In [88]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {hypoDMR} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HypoDMR.txt

In [89]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HypoDMR.txt

NC_035781.1	7626500	7626601	NC_035781.1	7626500	7626601	-56
NC_035781.1	7626500	7626601	NC_035781.1	7626500	7626601	-56
NC_035781.1	30789600	30789701	NC_035781.1	30789600	30789701	-57
NC_035784.1	86309481	86309501	NC_035784.1	86309400	86309501	-50
NC_035784.1	92841500	92841546	NC_035784.1	92841500	92841601	-52
NC_035787.1	55630800	55630901	NC_035787.1	55630800	55630901	-51
NC_035788.1	56054548	56054601	NC_035788.1	56054500	56054601	-51


In [90]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HypoDMR.txt

       7 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-HypoDMR.txt


In [91]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {DMRlist} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HypoDMR.txt

In [92]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HypoDMR.txt

NC_035780.1	2538955	2539000	NC_035780.1	2538900	2539000	DMR	-50
NC_035781.1	20126000	20126100	NC_035781.1	20126000	20126100	DMR	-52
NC_035785.1	31238800	31238900	NC_035785.1	31238800	31238900	DMR	59


In [93]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HypoDMR.txt

       3 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-HypoDMR.txt


#### DMR Background

In [47]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {DMRBackground} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMRBackground.txt

In [48]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMRBackground.txt

NC_035780.1	257108	257200	NC_035780.1	257101	257200	*
NC_035780.1	257201	257300	NC_035780.1	257201	257300	*
NC_035780.1	257401	257500	NC_035780.1	257401	257500	*
NC_035780.1	257501	257600	NC_035780.1	257501	257600	*
NC_035780.1	257701	257800	NC_035780.1	257701	257800	*
NC_035780.1	257801	257900	NC_035780.1	257801	257900	*
NC_035780.1	257901	258000	NC_035780.1	257901	258000	*
NC_035780.1	258001	258100	NC_035780.1	258001	258100	*
NC_035780.1	258101	258108	NC_035780.1	258101	258200	*
NC_035780.1	260478	260500	NC_035780.1	260401	260500	*


In [49]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMRBackground.txt

    8238 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-UpstreamFlanks-DMRBackground.txt


In [50]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {DMRBackground} \
> 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMRBackground.txt

In [51]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMRBackground.txt

NC_035780.1	273501	273600	NC_035780.1	273501	273600	*
NC_035780.1	273601	273700	NC_035780.1	273601	273700	*
NC_035780.1	273701	273800	NC_035780.1	273701	273800	*
NC_035780.1	273501	273600	NC_035780.1	273501	273600	*
NC_035780.1	273601	273700	NC_035780.1	273601	273700	*
NC_035780.1	273701	273800	NC_035780.1	273701	273800	*
NC_035780.1	273501	273600	NC_035780.1	273501	273600	*
NC_035780.1	273601	273700	NC_035780.1	273601	273700	*
NC_035780.1	273701	273800	NC_035780.1	273701	273800	*
NC_035780.1	273501	273600	NC_035780.1	273501	273600	*


In [52]:
!wc -l 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMRBackground.txt

    7515 2019-05-29-Flanking-Analysis/2019-06-05-mRNA-100bp-DownstreamFlanks-DMRBackground.txt


#### CG motifs

In [53]:
!{bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
-b {CGMotifList} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-CGmotif.txt

In [54]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-CGmotif.txt

NC_035780.1	27961	28961	NC_035780.1	27969	27971	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	27979	27981	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28081	28083	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28130	28132	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28147	28149	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28169	28171	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28209	28211	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28211	28213	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28228	28230	CG_motif	2
NC_035780.1	27961	28961	NC_035780.1	28308	28310	CG_motif	2


In [55]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-CGmotif.txt

 1287046 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-UpstreamFlanks-CGmotif.txt


In [56]:
!{bedtoolsDirectory}intersectBed \
-wo \
-a 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Downstream-Flanks.bed \
-b {CGMotifList} \
> 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-CGmotif.txt

In [57]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-CGmotif.txt

NC_035780.1	33324	34324	NC_035780.1	33407	33409	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	33451	33453	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	33480	33482	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	33637	33639	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	33646	33648	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	33783	33785	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	33796	33798	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	34283	34285	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	34310	34312	CG_motif	2
NC_035780.1	33324	34324	NC_035780.1	34321	34323	CG_motif	2


In [58]:
!wc -l 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-CGmotif.txt

 1285232 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-1000bp-DownstreamFlanks-CGmotif.txt


### 4b. No overlaps

I also want to count the number of DML or DMR that do not overlap with any features (i.e. DML and DMR in unannotated intergenic regions). To do this, I'll use the `-v` argument in `bedtools`, which reports "those entries in A that have no overlap in B." I can specify multiple files with `-b`. I'll use exons, introns, transposable elements identified using all species, and putative promoter regions (upstream flanks).

#### DML

In [147]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {DMLlist} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "DML do not overlap with exons, introns, transposable elements (all), or putative promoters"

      15
DML do not overlap with exons, introns, transposable elements (all), or putative promoters


In [148]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {DMLlist} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-05-29-No-Overlap-DML.txt

In [149]:
!head 2019-05-29-No-Overlap-DML.txt

NC_035781.1	20620123	20620125	57
NC_035781.1	30062222	30062224	60
NC_035781.1	39583208	39583210	-50
NC_035781.1	50711254	50711256	-71
NC_035782.1	58675230	58675232	52
NC_035782.1	65377028	65377030	51
NC_035784.1	2011997	2011999	-60
NC_035784.1	45667412	45667414	56
NC_035784.1	53515949	53515951	50
NC_035784.1	81666532	81666534	-65


#### Hypermethylated DML

In [150]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hyperDML} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "hypermethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters"

       9
hypermethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters


In [151]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hyperDML} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-05-29-No-Overlap-Hypermethylated-DML.txt

In [152]:
!head 2019-05-29-No-Overlap-Hypermethylated-DML.txt

NC_035781.1	20620123	20620125	57
NC_035781.1	30062222	30062224	60
NC_035782.1	58675230	58675232	52
NC_035782.1	65377028	65377030	51
NC_035784.1	45667412	45667414	56
NC_035784.1	53515949	53515951	50
NC_035785.1	31238802	31238804	59
NC_035787.1	42603398	42603400	57
NC_035787.1	44016221	44016223	70


#### Hypomethylated DML

In [154]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hypoDML} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "hypomethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters"

       6
hypomethylated DML do not overlap with exons, introns, transposable elements (all), or putative promoters


In [155]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hypoDML} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-05-29-No-Overlap-Hypomethylated-DML.txt

In [156]:
!head 2019-05-29-No-Overlap-Hypomethylated-DML.txt

NC_035781.1	39583208	39583210	-50
NC_035781.1	50711254	50711256	-71
NC_035784.1	2011997	2011999	-60
NC_035784.1	81666532	81666534	-65
NC_035787.1	42755937	42755939	-54
NC_035788.1	78353418	78353420	-75


#### DMR

In [94]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {DMRlist} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "DMR do not overlap with exons, introns, transposable elements (all), or putative promoters"

       2
DMR do not overlap with exons, introns, transposable elements (all), or putative promoters


In [95]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {DMRlist} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-06-05-No-Overlap-DMR.txt

In [96]:
!head 2019-06-05-No-Overlap-DMR.txt

NC_035782.1	65377000	65377100	DMR	51
NC_035785.1	31238800	31238900	DMR	59


#### Hypermethylated DMR

In [97]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hyperDMR} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "hyperDMR do not overlap with exons, introns, transposable elements (all), or putative promoters"

       2
hyperDMR do not overlap with exons, introns, transposable elements (all), or putative promoters


In [98]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hyperDMR} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-06-05-No-Overlap-HyperDMR.txt

In [99]:
!head 2019-06-05-No-Overlap-HyperDMR.txt

NC_035782.1	65377000	65377101	51
NC_035785.1	31238800	31238901	59


#### Hypomethylated DMR

In [100]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {hypoDMR} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "hypoDMR do not overlap with exons, introns, transposable elements (all), or putative promoters"

       0
hypoDMR do not overlap with exons, introns, transposable elements (all), or putative promoters


#### DMR Background

In [53]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {DMRBackground} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "DMR do not overlap with exons, introns, transposable elements (all), or putative promoters"

    4649
DMR do not overlap with exons, introns, transposable elements (all), or putative promoters


In [54]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {DMRBackground} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-06-05-No-Overlap-DMRBackground.txt

In [55]:
!head 2019-06-05-No-Overlap-DMRBackground.txt

NC_007175.2	101	200	*
NC_007175.2	1501	1600	*
NC_007175.2	3301	3400	*
NC_007175.2	4801	4900	*
NC_007175.2	6001	6100	*
NC_007175.2	7201	7300	*
NC_007175.2	11601	11700	*
NC_007175.2	11701	11800	*
NC_007175.2	12501	12600	*
NC_007175.2	14101	14200	*


#### CG motifs

In [157]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {CGMotifList} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed  \
| wc -l
!echo "CG motifs do not overlap with exons, introns, transposable elements (all), or putative promoters"

 4528757
CG motifs do not overlap with exons, introns, transposable elements (all), or putative promoters


In [158]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {CGMotifList} \
-b {exonList} {intronList} {transposableElementsAll} 2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Upstream-Flanks.bed \
> 2019-05-29-No-Overlap-CGmotifs.txt

In [159]:
!head 2019-05-29-No-Overlap-CGmotifs.txt

NC_035780.1	28	30	CG_motif
NC_035780.1	54	56	CG_motif
NC_035780.1	75	77	CG_motif
NC_035780.1	93	95	CG_motif
NC_035780.1	103	105	CG_motif
NC_035780.1	116	118	CG_motif
NC_035780.1	134	136	CG_motif
NC_035780.1	159	161	CG_motif
NC_035780.1	209	211	CG_motif
NC_035780.1	224	226	CG_motif


### 4c. `closest`

[`bedtools closest`](https://bedtools.readthedocs.io/en/latest/content/tools/closest.html) will find the nearest gene to a DML or DMR. If the closest feature is not overlapping, I'll get the distance to the next feature. If the closest feature is overlapping, the distance would be zero. I will use the following code:

1. Path to `closestBed`
3. -a: Specify either DML, DMR, or CG motif file.
4. -b: Path to gene list
6. -t all: In case of a tie, report all matches
7. -D ref: Report distance to A in an extra column. Use negative distances to report upstream features with respect to the reference genome. B features with a lower (start, stop) are upstream.
8. ">" filename: Redirect output to a .txt file

In [15]:
! {bedtoolsDirectory}closestBed \
-a {DMLlist} \
-b {geneList} \
-t all \
-D ref \
> 2019-05-29-Flanking-Analysis/2019-05-29-Genes-Closest-NoOverlap-DMLs.txt

In [16]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-Genes-Closest-NoOverlap-DMLs.txt

NC_035780.1	401630	401632	53	NC_035780.1	394983	409280	0
NC_035780.1	571138	571140	58	NC_035780.1	544088	573497	0
NC_035780.1	1882691	1882693	64	NC_035780.1	1882143	1890106	0
NC_035780.1	1885022	1885024	61	NC_035780.1	1882143	1890106	0
NC_035780.1	1933499	1933501	51	NC_035780.1	1928718	1940217	0
NC_035780.1	2538924	2538926	-50	NC_035780.1	2524425	2553408	0
NC_035780.1	2541726	2541728	-54	NC_035780.1	2524425	2553408	0
NC_035780.1	2584492	2584494	56	NC_035780.1	2554181	2599559	0
NC_035780.1	2586508	2586510	-53	NC_035780.1	2554181	2599559	0
NC_035780.1	2589720	2589722	57	NC_035780.1	2554181	2599559	0


In [17]:
! {bedtoolsDirectory}closestBed \
-a {DMRlist} \
-b {geneList} \
-t all \
-D ref \
> 2019-05-29-Flanking-Analysis/2019-06-05-Genes-Closest-NoOverlap-DMRs.txt

In [18]:
!head 2019-05-29-Flanking-Analysis/2019-06-05-Genes-Closest-NoOverlap-DMRs.txt

NC_035780.1	571100	571200	DMR	58	NC_035780.1	544088	573497	0
NC_035780.1	1885000	1885100	DMR	50	NC_035780.1	1882143	1890106	0
NC_035780.1	1933500	1933600	DMR	53	NC_035780.1	1928718	1940217	0
NC_035780.1	2538900	2539000	DMR	-50	NC_035780.1	2524425	2553408	0
NC_035780.1	22276700	22276800	DMR	56	NC_035780.1	22269635	22278631	0
NC_035780.1	28563400	28563500	DMR	61	NC_035780.1	28552157	28576101	0
NC_035780.1	31302900	31303000	DMR	-60	NC_035780.1	31295876	31307973	0
NC_035780.1	35969100	35969200	DMR	-53	NC_035780.1	35960923	35999467	0
NC_035780.1	38236400	38236500	DMR	50	NC_035780.1	38209799	38243110	0
NC_035781.1	5386400	5386500	DMR	51	NC_035781.1	5383711	5397505	0


In [19]:
! {bedtoolsDirectory}closestBed \
-a {CGMotifList} \
-b {geneList} \
-t all \
-D ref \
> 2019-05-29-Flanking-Analysis/2019-05-29-Gene-Closest-NoOverlap-CGmotifs.txt

In [20]:
!head 2019-05-29-Flanking-Analysis/2019-05-29-Gene-Closest-NoOverlap-CGmotifs.txt

NC_035780.1	28	30	CG_motif	NC_035780.1	13578	14594	13549
NC_035780.1	54	56	CG_motif	NC_035780.1	13578	14594	13523
NC_035780.1	75	77	CG_motif	NC_035780.1	13578	14594	13502
NC_035780.1	93	95	CG_motif	NC_035780.1	13578	14594	13484
NC_035780.1	103	105	CG_motif	NC_035780.1	13578	14594	13474
NC_035780.1	116	118	CG_motif	NC_035780.1	13578	14594	13461
NC_035780.1	134	136	CG_motif	NC_035780.1	13578	14594	13443
NC_035780.1	159	161	CG_motif	NC_035780.1	13578	14594	13418
NC_035780.1	209	211	CG_motif	NC_035780.1	13578	14594	13368
NC_035780.1	224	226	CG_motif	NC_035780.1	13578	14594	13353


## 5. Characterize DML Background

In [7]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {DMLBackground} \
-b {mRNAList} \
| wc -l
!echo "DML background overlaps with mRNA"

  333917
DML background overlaps with mRNA


In [8]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {DMLBackground} \
-b {mRNAList} \
> 2019-06-20-DMLBackground-mRNA.txt

In [9]:
!head 2019-06-20-DMLBackground-mRNA.txt

NC_035780.1	100558	100559	+	NC_035780.1	99840	106460
NC_035780.1	100575	100576	+	NC_035780.1	99840	106460
NC_035780.1	100581	100582	+	NC_035780.1	99840	106460
NC_035780.1	100634	100635	+	NC_035780.1	99840	106460
NC_035780.1	100643	100644	+	NC_035780.1	99840	106460
NC_035780.1	100651	100652	+	NC_035780.1	99840	106460
NC_035780.1	100664	100665	+	NC_035780.1	99840	106460
NC_035780.1	103268	103269	+	NC_035780.1	99840	106460
NC_035780.1	103272	103273	+	NC_035780.1	99840	106460
NC_035780.1	103283	103284	+	NC_035780.1	99840	106460
