## Use bedtools to see where DMLs and MACAU loci are located.

DMLs between the Olympia oyster populations, Hood Canal and South Sound, were identified using MethylKit. File is: [analyses/dml25.bed](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/analyses/dml25.bed) 

MACAU was used to identify loci at which methylation is associated with a phenotype, in our case shell length, while controlling for relatedness. 
- Loci, all samples with 10x coverage: [analyses/macau-all10x.bed](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/analyses/macau-all10x.bed)
- Loci, any samples with 10x coverage:[analyses/macau-any10x.bed](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/analyses/macau-any10x.bed)

### Preview DML and MACAU loci bed files 

In [3]:
!head ../analyses/dml25.bed
!wc -l ../analyses/dml25.bed

Contig102998	2220	2222	26
Contig104531	8145	8147	-37
Contig109515	3377	3379	54
Contig1104	15920	15922	29
Contig128059	154	156	-27
Contig129435	3172	3174	-26
Contig1297	49910	49912	25
Contig131260	1798	1800	-29
Contig132309	816	818	28
Contig13829	2520	2522	25
      51 ../analyses/dml25.bed


In [43]:
!head ../analyses/macau/macau-any10x.bed
!wc -l ../analyses/macau/macau-any10x.bed

Contig1054173286	3286	3286
Contig109628199	28199	28199
Contig1102783989	3989	3989
Contig110906429	429	429
Contig110927764	7764	7764
Contig111772433	2433	2433
Contig1118734850	4850	4850
Contig1129975372	5372	5372
Contig11310801	10801	10801
Contig113403874	874	874
     219 ../analyses/macau/macau-any10x.bed


In [44]:
!head ../analyses/macau/macau-all10x.bed
!wc -l ../analyses/macau/macau-all10x.bed

Contig110906429	429	429
Contig120547984	984	984
Contig1665313524	13524	13524
Contig1808410113	10113	10113
Contig186755901	5901	5901
Contig20226526	26526	26526
Contig218272401	401	401
Contig221432429	2429	2429
Contig229987506	7506	7506
Contig229987604	7604	7604
      27 ../analyses/macau/macau-all10x.bed


### Preview feature files 

[Olurida_v081.gene.gff](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.gene.gff) - genes    
[Olurida_v081.CDS.gff](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.CDS.gff) - Coding regions of genes    
[Olurida_v081.exon.gff](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.exon.gff) - Exons   
[Olurida_v081.intron.bed](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.intron.bed) - Introns    
[Olurida_v081.mRNA.gff](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.mRNA.gff) - mRNA    
[Olurida_v081.three_prime_UTR.gff](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.three_prime_UTR.gff) - 3' untranslated regions   
[Olurida_v081.five_prime_UTR.gff](https://github.com/sr320/paper-oly-mbdbs-gen/blob/master/genome-features/Olurida_v081.five_prime_UTR.gff) - 5' untranslated regions   

In [11]:
! bedtools intersect \


Tool:    bedtools intersect (aka intersectBed)
Version: v2.29.0
Summary: Report overlaps between two feature files.

Usage:   bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B.  If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		  Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlapping features restricted by -f 

Bedtool options to use:  
`-u` - Write the original A entry _once_ if _any_ overlaps found in B, _i.e._ just report the fact >=1 hit was found  
`-a` - File A  
`-b` - File B  

### There are 51 DMLs, which are located ....

In [79]:
# Within any feature track  
! bedtools intersect \
-u \
-a ../analyses/dml25.bed \
-b ../genome-features/O* | wc -l
!echo "of 51 DMLs overlap with any feature track"


      24
of 51 DMLs overlap with any feature track


In [80]:
# Within genes 
! bedtools intersect \
-u \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.gene.gff | wc -l
!echo "of 51 DMLs overlap with genes"

      22
of 51 DMLs overlap with genes


In [81]:
# Within coding sequences   
! bedtools intersect \
-u \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.CDS.gff | wc -l
!echo "of 51 DMLs overlap with coding sequences"

      19
of 51 DMLs overlap with coding sequences


In [82]:
# Within exons 
! bedtools intersect \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.exon.gff | wc -l
!echo "of 51 DMLs overlap with exons"

      20
of 51 DMLs overlap with exons


In [83]:
# Within introns  
! bedtools intersect \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.intron.bed | wc -l
!echo "of 51 DMLs overlap with introns"

       3
of 51 DMLs overlap with introns


In [84]:
# Within mRNA   
! bedtools intersect \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.mRNA.gff | wc -l
!echo "of 51 DMLs overlap with mRNA"

      23
of 51 DMLs overlap with mRNA


In [85]:
# Within 5' untranslated regions   
! bedtools intersect \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.five_prime_UTR.gff | wc -l
!echo "of 51 DML overlaps with 5' UTRs"

       0
of 51 DML overlaps with 5' UTRs


In [86]:
# Within 3' untranslated regions   
! bedtools intersect \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081.three_prime_UTR.gff | wc -l
!echo "of 51 DMLs overlap with 3' UTRs"

       1
of 51 DMLs overlap with 3' UTRs


In [87]:
# Within TE's
! bedtools intersect \
-a ../analyses/dml25.bed \
-b ../genome-features/Olurida_v081_TE-Cg.gff | wc -l
!echo "of 51 DMLs overlap with Transposable Elements"

       3
of 51 DMLs overlap with Transposable Elements


### MACAU-identified 219 loci with 10x coverage in at least 1 sample ("10x-any"), and 27 loci with 10x coverage in all samples  ("10x-any"). Of those ... 

In [70]:
# Within any feature track  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/O* | wc -l
!echo "of 219 10x-any loci overlap with any feature track"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/O* | wc -l
!echo "of 27 10x-all loci overlap with any feature track"

     114
of 219 10x-any loci overlap with any feature track
      13
of 27 10x-all loci overlap with any feature track


In [71]:
# Within genes 
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.gene.gff | wc -l
!echo "of 219 10x-any loci overlap with genes"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.gene.gff | wc -l
!echo "of 27 10x-all loci overlap with genes"


     101
of 219 10x-any loci overlap with genes
      12
of 27 10x-all loci overlap with genes


In [72]:
# Within CDS  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.CDS.gff | wc -l
!echo "of 219 10x-any loci overlap with coding sequences"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.CDS.gff | wc -l
!echo "of 27 10x-all overlap with coding sequences"


      85
of 219 10x-any loci overlap with coding sequences
      11
of 27 10x-all overlap with coding sequences


In [73]:
# Within exons  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.exon.gff | wc -l
!echo "of 219 10x-any loci overlap with exons"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.exon.gff | wc -l
!echo "10x-all overlap with exons"

      87
of 219 10x-any loci overlap with exons
      11
10x-all overlap with exons


In [74]:
# Within introns  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.intron.bed | wc -l
!echo "of 219 10x-any overlap with introns"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.intron.bed | wc -l
!echo "of 27 10x-all overlap with introns"


      14
of 219 10x-any overlap with introns
       1
of 27 10x-all overlap with introns


In [75]:
# Within mRNA  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.mRNA.gff | wc -l
!echo "of 219 10x-any overlap with mRNA"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.mRNA.gff | wc -l
!echo "of 27 10x-all overlap with mRNA"


     101
of 219 10x-any overlap with mRNA
      12
of 27 10x-all overlap with mRNA


In [76]:
# Within 5' UTRs  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.five_prime_UTR.gff | wc -l
!echo "of 219 10x-any overlap with 5' UTRs"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.five_prime_UTR.gff | wc -l
!echo "of 27 10x-all overlap with 5' UTRs"


       0
of 219 10x-any overlap with 5' UTRs
       0
of 27 10x-all overlap with 5' UTRs


In [77]:
# Within 3' UTRs  
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081.mRNA.gff | wc -l
!echo "of 219 10x-any overlap with 3' UTRs"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081.mRNA.gff | wc -l
!echo "of 27 10x-all overlap with 3' UTRs"


     101
of 219 10x-any overlap with 3' UTRs
      12
of 27 10x-all overlap with 3' UTRs


In [78]:
# Within Transposable Elements
! bedtools intersect \
-u \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/Olurida_v081_TE-Cg.gff | wc -l
!echo "of 219 10x-any overlap with 3' UTRs"

! bedtools intersect \
-u \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/Olurida_v081_TE-Cg.gff | wc -l
!echo "of 27 10x-all overlap with 3' UTRs"

      15
of 219 10x-any overlap with 3' UTRs
       1
of 27 10x-all overlap with 3' UTRs


In [92]:
pwd

'/Users/laura/Documents/roberts-lab/paper-oly-mbdbs-gen/code'

In [94]:
! bedtools intersect \
-wb \
-a ../analyses/dml25.bed \
-b ../genome-features/O* \
> ../analyses/20191120-DML-annotated.txt
! head ../analyses/20191120-DML-annotated.txt

In [107]:
! bedtools intersect \
-wb \
-a ../analyses/macau/macau-all10x.bed \
-b ../genome-features/O* \
> ../analyses/macau/20191120-MACAU-10xall-features.txt

==> ../analyses/macau/20191120-MACAU-10xall-features.txt <==
Contig18084	10113	10113	1	Contig18084	maker	CDS	10046	10226	.	+	1	ID=Olurida_00008129-RA:cds;Parent=Olurida_00008129-RA;
Contig18084	10113	10113	2	Contig18084	maker	exon	10046	10226	.	+	.	ID=Olurida_00008129-RA:exon:403;Parent=Olurida_00008129-RA;
Contig18084	10113	10113	4	Contig18084	maker	gene	8545	20785	.	+	.	ID=Olurida_00008129;Name=Olurida_00008129;Alias=snap_masked-Contig18084-processed-gene-0.1;Note=Similar to Vcpkmt: Protein-lysine methyltransferase METTL21D (Mus musculus OX%3D10090);Dbxref=Gene3D:G3DSA:3.40.50.150,InterPro:IPR019410,InterPro:IPR029063,Pfam:PF10294,SUPERFAMILY:SSF53335;
Contig18084	10113	10113	4	Contig18084	maker	gene	1368	20969	.	-	.	ID=Olurida_00008128;Name=Olurida_00008128;Alias=snap_masked-Contig18084-processed-gene-0.2;Note=Protein of unknown function;Dbxref=MobiDBLite:mobidb-lite;
Contig18084	10113	10113	6	Contig18084	maker	mRNA	8545	20785	.	+	.	ID=Olurida_00008129-RA;Parent=Olurida_0000812

In [104]:
! bedtools intersect \
-wb \
-a ../analyses/macau/macau-any10x.bed \
-b ../genome-features/O* \
> ../analyses/macau/20191120-MACAU-10xany-features.txt 

     421
Contig11092	7764	7764	1	Contig11092	maker	CDS	7652	7967	.	+	1	ID=Olurida_00013018-RA:cds;Parent=Olurida_00013018-RA;
Contig11092	7764	7764	2	Contig11092	maker	exon	7652	9918	.	+	.	ID=Olurida_00013018-RA:exon:353;Parent=Olurida_00013018-RA;
Contig11092	7764	7764	4	Contig11092	maker	gene	2832	9918	.	+	.	ID=Olurida_00013018;Name=Olurida_00013018;Alias=maker-Contig11092-snap-gene-0.2;Note=Similar to mknk1: MAP kinase-interacting serine/threonine-protein kinase 1 (Xenopus tropicalis OX%3D8364);Dbxref=Gene3D:G3DSA:1.10.510.10,Gene3D:G3DSA:3.30.200.20,InterPro:IPR000719,InterPro:IPR008271,InterPro:IPR011009,InterPro:IPR017441,MobiDBLite:mobidb-lite,Pfam:PF00069,ProSitePatterns:PS00107,ProSitePatterns:PS00108,ProSiteProfiles:PS50011,SMART:SM00220,SUPERFAMILY:SSF56112;Ontology_term=GO:0004672,GO:0005524,GO:0006468;
Contig11092	7764	7764	6	Contig11092	maker	mRNA	2832	9918	.	+	.	ID=Olurida_00013018-RA;Parent=Olurida_00013018;Name=Olurida_00013018-RA;Alias=maker-Contig11092-snap-gene-0.2-