# Preliminary Proteomic Data Analyses

Using ABACUS data from the [pacific oyster seed experiment](https://github.com/RobertsLab/project-pacific.oyster-larvae/wiki/2016-Oyster-Seed-experiment-23C-vs.-29C), I am going to identify proteins that are differentially expressed similar to how Yaamini did [here](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-03-21-Preliminary-Proteomic-Data-Analyses.ipynb).

I will be answering two main questions from the [ABACUS-Gigaton-Uniprot table](https://github.com/kaitlynrm/OysterSeedProject/blob/master/raw_os_data/annotated_os.xlsx) based on the NMDS plot ![alternatetext](https://raw.githubusercontent.com/kaitlynrm/labnotebook/master/oysterprojectinfo/Pictures/NMDS_oysterseed.JPG)


1) What proteins were differentially expressed in Silo 2 on days 11 and 13?

2) what proteins were differentially expressed between all silos on day 9? 

I will parse out data from the ABACUS-Gigaton-Uniprot table to answer each question, started with proteins that increased in expression.

### Data Exploration

In order to simplify the process of identify proteins differentially expressed between day 11 and 13, I created a [new table](https://github.com/kaitlynrm/OysterSeedProject/blob/master/unq-exp/silo2.csv).

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/silo2Q.JPG) ![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/silo2Qend.JPG)

Many of the proteins ABACUS identified were note expressed and as expected most proteins were expressed at the same level. Because this is primliminary analysis, I was able to arbituarily decide that proteins expressed at a difference of greater than 10 were differentially expressed. 

I highlighted proteins that decreased in expression because I want to analyze proteins that increased and decreased in expression separately since proteins that increased in expression were much higher in numbers. It is also worth mentioning that it may be easier to jsutify increased expression rather than decreased expression.

### Enrichment Analysis Preparation


The objective is to create plots in REVIGO to better visualize changes in biological processes between day 11 and 13 for silo 2 the same way [Steven did for Laura's data](https://sr320.github.io/Proteomic-Visualization/). 

#### Accession Codes

Since Yaamini already modified Rhonda's uniprot accessions for [*Crassostrea gigas* proteome](https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/DNR_Preliminary_Analyses_20170321/background-proteome-accession-no-pipes.txt) I was able to quickly download the background, and my ABACUS-Gigaton-Uniprot table already provides Uniprot accessions.

### DAVID

Now I will use the functional annotation tool for [DAVID](https://david.ncifcrf.gov/tools.jsp) to get my gene ontology (GO) terms for proteins that increased in expression.

#### Uploading Codes

First I need to upload my gene list: ![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/genelist-oysterseed.JPG) 


However the accsession codes were matching to multiple codes so I converted all codes to uniprot accession. ![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/increase-convertids.JPG)


I found out they matched to genpept codes: ![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/increase-conversion-OS.JPG)


After converting to Uniprot a second time, I was able to submit the converted list as my gene list.
![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/increase-submitconversion.JPG)


I followed the same steps, without having to convert to Uniprot accessions, with my background. I did get a message that multiple species were detected but that is probably due to the oyster not being a "model species".![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/background-multiplespeciesdeetected.JPG) 


#### Information from DAVID

After submitting my lists, I had the following options:

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/DAVID-options-increase.JPG)

To find what GO terms were overrespresented I looked at [biological process](https://github.com/kaitlynrm/OysterSeedProject/blob/master/DAVID-analysis/biological_processes-increasedproteins-oysterseed.txt) and [molecular function](https://github.com/kaitlynrm/OysterSeedProject/blob/master/DAVID-analysis/molecular_function-increasedproteins-oysterseed.txt).

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/BP-incr-os.JPG)


![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/MF-incr-os.JPG)

I also looked at the [KEGG pathway](https://github.com/kaitlynrm/OysterSeedProject/blob/master/DAVID-analysis/kegg_pathway-incr-os.txt) but I only recieved one result back which is why I didn't get a pathway map.

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/kegg_pathway-incr-os.JPG)

I looked at the [excluded genes](https://github.com/kaitlynrm/OysterSeedProject/blob/master/DAVID-analysis/excluded_genes_kegg_pathway-incr-os.txt) as well.

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/excluded_genes-kegg-pathway-incr-os.JPG)

### REVIGO

Using [REVIGO](http://revigo.irb.hr/) I will make plots of biologicalprocesses affected by treatment indicated, in this case, by increased protein expression.

I took the GO terms from the the [biological process table generated by DAVID](https://github.com/kaitlynrm/OysterSeedProject/blob/master/DAVID-analysis/biological_processes-increasedproteins-oysterseed.txt) and entered them into REVIGO.

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/enterGO-REVIGO-incr-os.JPG)

This gave me a lovely plot,

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/REVIGOplot-incr-os.JPG)


as well as a table.

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/REVIGOtable-incr-os.JPG)

The plot shows me what biological processes are overrespresented out of the proteins that increased expression by more than 10 between days 11 and 13 in silo 2.

#### Decreased Proteins

Using the same process, I produced a plot of biological processes based on proteins that decreased in expression from day 11 to 13 in silo 2. 

I had the same problem with my accsession codes in DAVID as before. No KEGG pathway, BP direct or MF direct was chartable. So I only looked at [all biological processes](https://github.com/kaitlynrm/OysterSeedProject/blob/master/DAVID-analysis/BPall-decr-os.txt).

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/REVIGOplot-decr-os.JPG)


![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/REVIGOtable-dcr-os.JPG)



Or I can look at both together to see all overrepresentaiton. 

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/REVIGOplot-allproteins-os.JPG)

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/revigotable-allproteins-os.JPG)

### CompGO

Emma also had another [GO enrichment analysis software](http://www.yeastrc.org/compgo_oyster/pages/goAnalysisForm.jsp) produced exclusively for the oyster seed data. When I put in all the proteins that changed expression I got back this plot.

![alternatetext](https://raw.githubusercontent.com/kaitlynrm/OysterSeedProject/master/Pictures/goAnalysisplot-allproteins-os.JPG) 
