---
# Functional and Pathway Analysis in Microbiome Research

### Questions:
- What functional pathways differ based on your groups?
- Do you see differences in the groups using functional analysis, compared to taxonomic analysis?
- Do you see specific pathways that are significantly different between groups?
- How do these significant pathways compare to the literature from journal club?

### Objectives:
- Understand how to visualize and explore functional differences in groups of microbiome samples.

### Keypoints:
- Differences in taxonomy, don't always translate to differences in function between groups.
---

## Getting Started

In [None]:
# set the variables for your netid
netid = "NETID"

In [None]:
# make a variable for the working directory
work_dir = "/xdisk/bhurwitz/bh_class/" + netid + "/assignments/17_pathways"

### Functional and Pathway Analyses

In this assignment, we are going to explore Functional and Pathway Analyses in the Microbiome Analyst.



### Getting Started

Let's try this out in Microbiome Analyst. 

We are going to use the Shotgun data profiling tab in this analysis.

![image.png](attachment:image.png)

### Upload your sample and KO functional data

First you will need to upload your data to the portal using the gene abundance table tab. There are two files. The first file is the metadata file containing your sample information. The second file is the functional data containing a list of the KO (Kegg ortholog ids) and their abundance in each of the samples. Be sure to select Normalized data, given that the data from Humann3 are normalized.

The files are called: project*_koterms.txt and project*_metadata.txt for your dataset

![image.png](attachment:image.png)

### Data Inspection

You should see that everything looks good when you do the data inspection. Check for errors and then click "Proceed".

![image.png](attachment:image.png)


### Data Filter

Use the defaults to remove any low abundance or low variance features. Click submit, and then Proceed.

![image.png](attachment:image.png)

### Normalization

Your data are already normalized. So, there is no need to do this step in the microbiome analyst. Select "Do not rareify" and "Do not scale my data" and "Do not transform my data". Click submit, and then Proceed.

![image.png](attachment:image.png)


#### Part 1

First, we will check functional differences between groups, by visually exploring the abundance profiles of different functional categories across experimental factors.

Select Diversity Overview in the functional profiling section:

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

#### Question 1: 

Diversity Overview. Explore the functional categories in the drop down menu. Do the samples differ by groups based on what you see visually? How does this compare to what you found in the taxonomic analysis and bar charts?

Describe your parameters / methods here:


Paste your Figure here:


Describe your results here:

#### Part 2

Clustering Analysis

Next, let's take a look at the clusters that our groups are separated into. Try using a Principal Components Analysis (PCA) to separate your samples by the groups they are from.

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)


#### Question 2: 

Do your samples cluster nicely into the groups you are investigating for your study? How does this differ from the taxonomic analysis? Do you see a more clear pattern with the functional compared to taxonomic data?

Describe your parameters / methods here:


Paste your Figure here:


Describe your results:

#### Part 3

Differential abundance analysis. Go back to the analysis menu and scroll down to the Differential Abundance Analysis section.

![image.png](attachment:image.png)

Compare and contrast which KEGG ids are found to be significanly different using a univariate and multivariate analysis. 

First, try running a Univariate analysis with the Mann-Whitney/KW test.

![image-2.png](attachment:image-2.png)

Next, try running a Multivariate analysis using metagenomeSeq and selecting the zero-inflated Gaussian fit (fitZig) for a single group comparison.

![image-3.png](attachment:image-3.png)



#### Question 3 (part1): 

Do you see different KO terms returned when using a uni- vs multi-variate approach? 

Describe your Univariate parameters / methods here:

Paste your Univariate Figure here:

#### Question 3 (part2): 

Describe your Multivariate parameters / methods here:

Paste your Multivariate Figure here:

Describe your results:

#### Part 4

Try using LEfSe to find biomarkers! 

![image.png](attachment:image.png)

Try using different models for your analysis. 

![image-2.png](attachment:image-2.png)

Which KO terms do you see? Check out the metabolic map to visualize differences.

![image-3.png](attachment:image-3.png)

#### Question 4 (Part 1): 

Find features that are significantly different between groups for your project using LEfSe. Which model works best? Do you see the same results for the top differences between KO terms?

Describe your parameters / methods here:


Paste your Figure here.


Describe your results:


#### Question 4 (Part 2):

Check out the metabolic map from LEfSe. Do you see any pathways that are significantly different based on your biomarker analysis?

![image.png](attachment:image.png)

Paste your Figure here.



## The End

Copy your notebook to turn-in...

In [None]:
!cp ~/be487-fall-2024/assignments/17_pathways/hw17_pathways_abridged.ipynb $work_dir