---
# Clustering and Correlation Analyses

### Questions:
- How can we use heatmaps and correlation analysis to analyze our microbiome samples?

### Objectives:
- Understand how clustering and correlation analysis can be used to understand complex microbial community structures, interactions, and their relationships.

### Key tasks:
- Extract meaningful patterns from microbiome data to identify significant relationships between microbial taxa.
- Uncover insights into the roles microbes play in the context of your project

---

## Getting Started

In [None]:
# set the variables for your netid
netid = "NETID"

In [None]:
# make a variable for the working directory
work_dir = "/xdisk/bhurwitz/bh_class/" + netid + "/assignments/14_clustering"

### Clustering and Correlation Networks

In this assignment, we are going to use clustering techniques and correlation networks to extract meaningful patterns from our project data, identify significant relationships between microbial taxa, and uncover insights into the roles microbes play in various contexts (e.g., via birth mode, over time in infant gut development, and with antibiotic usage). 

# Go to Microbiome Analyst. 

Go through the same steps from the previous assignments with your project biom file to get to the Analysis Overview step below. Go to the Clustering and Correlation Network Section.

![image.png](attachment:image.png)

#### Part 1 Heatmap visualization

Click ‘Interactive Heatmap’ from the Analysis Overview. The heatmap analysis uses the hierarchical structure of taxonomic classifications to depict group-wise relative abundance for microbial communities. The upper part of the page contains key parameters for creating and customizing a heatmap. Use the following initial set of parameters:

Set ‘Family’ as the current taxonomy level, specify ‘Minkowski distance’ for the distance metric, specify ‘complete’ for the clustering method, and then select a group for the comparison of interest based on your project. Click ‘Submit’ to generate the corresponding the heatmap. 

Try out different distance metrics and clustering methods for your analysis to find the best heatmap to show differences amoung taxa for your project.

Here are the possible distance metrics:

![image.png](attachment:image.png)


Here are the possible clustering methods:

![image.png](attachment:image.png)

#### Question 1

What are the heatmap parameters you selected to reveal patterns in the composition and structure of microbial communities across different samples? 

Describe your parameters / methods here:



#### Question 2

Heatmaps can help to identify relationships between taxa and how they vary across samples. Did you discover any taxa that can help you understand microbial diversity between groups in your samples? At what taxonomic level do you see these differences?

Answer:

#### Question 3

Set your heatmap to the "sepcies-level". Are there any species that could potentially be used a biomarkers to distinguish between groups in your project?

Answer:

#### Part 2 Correlation Analysis

Click ‘Correlation Analysis’ from the Analysis Overview. The aim of correlation networks is to identify biologically meaningful relationships between taxa or features. These can be potential interactions between microbes that could represent mutualistic, commensal, parasitic or even competitive relationships. Uncovering such interactions could hold important therapeutic implications for the health of the microbial community and ultimately lead to understanding microbiome function. Use the following initial set of parameters:

Set "Spearman rank correlation" as the algorithm, ‘Class’ as the current taxonomy level, and then select an experimental factor for the comparison of interest based on your project. Keep over defaults. Click ‘Submit’ to generate the corresponding the correlation. 

Try out different algorithms and taxonomic levels for your analysis to find the best result to show differences amoung taxa for your project.

> ## Correlation Exploration
> Try using a Spearman rank correlation at the Class level for host_sex as the experimental factor.

![image.png](attachment:image.png)

Can you find classes of bacteria that have different abundances based on your project?

![image.png](attachment:image.png)

Note that you will get an error if the correlation analysis does not produce any meaningful results. For example, if I run the same analysis as above, but at the family level, there are no meaningful results and I get an error that looks like this:

![image.png](attachment:image.png)

#### Question 4

What are the parameters you used to construct your correlation network? Why did you select these parameters?

Answer:

#### Question 5

What taxonomic level best illustrated the relationships between different microbial taxa for your project?

Answer:

#### Question 6

Community structure: By visualizing correlations, researchers can identify how microbial communities might interact with each other. Positive correlations suggest microbes that are likely to be in a similar ecological niche, while negative correlations suggest competitive or antagonistic relationships. Could you identify any taxa that might interact with each other? Which ones?

Answer:


#### Question 7

Certain microbes may act as hubs in a correlation network, being strongly connected to many other taxa. These "keystone" species are often important for maintaining the balance of the microbiome ecosystem. Did you identification any keystone species/taxa? 

Answer:

## The End

Copy your notebook to turn it in...

In [None]:
!cp ~/be487-fall-2024/assignments/14_clustering/hw14_clustering.ipynb $work_dir