# Meta'omics for Ocean Science

![scales](./tutorial_images/rainbow_satellite_to_microbe.png)  

### What is 'omics data?

* Data on biological molecules  
* 'Meta' refers to collecting and processing samples in bulk. 
* Data often focused on specific size fractions  


### What can it tell us?

#### Taxonomy: 
* What microbes are present -- DNA
* Which microbes are active -- RNA

#### Function:
* What is the metabolic potential? -- DNA
* What processes are being carried out? -- RNA

### How is it generated?
![omics](./tutorial_images/metaXomics_diagram.png)  

Data is generated through collecting bulk community samples. For planktonic microbes, samples are collected based on a specific planktonic size fraction that targets different microbial groups (e.g. bacteria and archaea, protists, phytoplankton, viruses, etc).Nucleic acids, proteins or other target molecules are then extracted and sequenced. For nucleotide data (DNA + RNA), samples are often sequenced via Illumina sequencing, generating short __paired end__ reads that can be characterized directly or used to assemble larger __contiguous sequences__. 

### What does it look like?




### How can we use it?

Read profiling is one of the most commonly used processes in 'omics analysis. It is applied to access the relative abundance of taxonomic groups within metagenomic datasets (when using DNA metagenomes) or to estimate the expression of different microbial taxa (when RNA metatranscriptomes are used).

In a nutshell short reads are aligned to a genomic reference sequences, which have taxonomic information assigned to them that may be assigned to the reads.

![recruitment](./tutorial_images/01-metagenomic-read-recruitment-simple.gif)  
(Thank you [MerenLab](https://merenlab.org/) for the animation)

# Read classification tools

# [Kaiju](https://kaiju.binf.ku.dk/server)
![Kaiju](https://kaiju.binf.ku.dk/images/kaiju3_header.gif)

#### Alternative: [Kraken2](https://github.com/DerrickWood/kraken2/)

### How do they work?
Translated Reads --> Proteins --> Genomes

![kaiju_diagram](./tutorial_images/read_classification_workflow.png)

# Tools are as good as your reference database

Kaiju and other classifiers rely on genome databases that primarily contain genomes from isolated microbes and genomes assembled from metagenomes ('MAGs').

__Isolates__ not representative of the microbial diversity of the planet, particularly in oligotrophic and extreme environments.  

__MAGs__ also not representative of overal microbial diversity. For example, it is challenging to assemble a MAG of SAR11 (i.e. Pelagibacter) because their genomes are variable between individuals.


# GORG-Tropics: A collection of reference genomes from individual cells from the Tropical and Sub-tropical Epipelagic Ocean
![GORG-Figure2](https://ars.els-cdn.com/content/image/1-s2.0-S0092867419312735-gr2.jpg)  

GORG-Tropics is more representative of global ocean microbes than MAGs.


![GORG-Figure](https://ars.els-cdn.com/content/image/1-s2.0-S0092867419312735-gr6.jpg)  

GORG-Tropics is more accurate and sensitive than default databases used for read classification by Kaiju. 

In [4]:
import pandas as pd

df = pd.read_csv("./data/PRJNA385855_sra_metadata.csv", sep = ",")
mgoi = df[df['cruise_id'].str.contains('BATS') & df['Depth'].isin(['10m','1m'])][['Run','Collection_date','cruise_id','BioSample','Depth']].sort_values(by = 'Collection_date')



# going to save this table to file
mgoi.to_csv("./data/bats_metagenomes_of_interest.csv", index=False)

In [3]:
df.head()

Unnamed: 0,Run,Assay Type,AvgSpotLen,Bases,BioProject,BioSample,BioSampleModel,bottle_id,Bytes,Center Name,...,lat_lon,Library Name,LibraryLayout,LibrarySelection,LibrarySource,Organism,Platform,ReleaseDate,Sample Name,SRA Study
0,SRR6507277,WGS,300,16133582100,PRJNA385855,SAMN08390922,"MIMS.me,MIGS/MIMS/MIMARKS.water",2140200308,6618578156,MIT,...,22.75 N 158 W,S0627,PAIRED,RANDOM,METAGENOMIC,marine metagenome,ILLUMINA,2018-05-01T00:00:00Z,S0627,SRP109831
1,SRR6507278,WGS,300,15874959000,PRJNA385855,SAMN08390923,"MIMS.me,MIGS/MIMS/MIMARKS.water",2160200304,6562862443,MIT,...,22.75 N 158 W,S0628,PAIRED,RANDOM,METAGENOMIC,marine metagenome,ILLUMINA,2018-05-01T00:00:00Z,S0628,SRP109831
2,SRR6507279,WGS,300,15069825300,PRJNA385855,SAMN08390924,"MIMS.me,MIGS/MIMS/MIMARKS.water",1024800503,6265839401,MIT,...,31.66 N 64.16 W,S0629,PAIRED,RANDOM,METAGENOMIC,marine metagenome,ILLUMINA,2018-05-01T00:00:00Z,S0629,SRP109831
3,SRR6507280,WGS,300,25807308000,PRJNA385855,SAMN08390925,"MIMS.me,MIGS/MIMS/MIMARKS.water",1025200510,10523504402,MIT,...,31.66 N 64.16 W,S0630,PAIRED,RANDOM,METAGENOMIC,marine metagenome,ILLUMINA,2018-05-01T00:00:00Z,S0630,SRP109831
4,SRR5720219,WGS,300,6713331000,PRJNA385855,SAMN07137016,"MIMS.me,MIGS/MIMS/MIMARKS.water",1640201117,2811014041,MIT,...,22.75 N 158 W,S0519,PAIRED,RANDOM,METAGENOMIC,marine metagenome,ILLUMINA,2018-05-01T00:00:00Z,S0519,SRP109831


/home/jovyan/metagenomics_tutorial
