---
# Functional Analyses

### Questions:
- What is HUMAnN 3?
- What is the difference between a taxonomic profiler and classifier?
- What is a pangenome?
- Why do we used marker genes to identify taxa?
- How does HUMAnN 3 convert marker genes into gene and pathway abundance?

### Objectives:
- Show how HUMAnN 3 can be used to quantify gene abundance to identify a parsimonious set of pathways.

### Keypoints:

- We can quantify pathway abundance and coverage by species using HUMAnN 3
- HUMAnN 3 is mostly trained on human microbiome datasets.
---

## Getting Started

In [None]:
# set the variables for your netid
netid = "NETID"

In [None]:
# make a variable for the working directory
work_dir = "/xdisk/bhurwitz/bh_class/" + netid + "/exercises/16_function"

### Functional analyses with HUMAnN 3

HUMAnN 3 (Huttenhower’s Unified Metabolic Analysis Network version 3) is an advanced bioinformatics tool designed for profiling microbial functional and metabolic pathways in metagenomic sequencing data. It is part of the broader HUMAnN suite of tools, which are widely used in microbiome research to interpret complex microbial communities by linking sequencing data to biological functions, rather than just microbial taxonomic identification.

### Overview

Here’s an overview of HUMAnN 3, its main features, and how it fits into microbiome research:

Key Features of HUMAnN 3
Functional Profiling: HUMAnN 3 focuses on functional analysis of microbial communities rather than just taxonomic identification. It aims to provide insight into the metabolic potential of the microbiome by identifying and quantifying genes, pathways, and metabolites that microbes in a given sample may produce.

Comprehensive Pathway Mapping: HUMAnN 3 can identify and map microbial genes to known metabolic pathways and predict the functional capacities of microbial communities based on genomic or metagenomic sequencing data.

Expanded Databases: HUMAnN 3 uses large databases, such as MetaCyc, KEGG, and UniProt, which contain detailed information on metabolic pathways, enzymes, and gene families. This allows for more accurate and comprehensive functional annotations than earlier versions.

Improved Database Integration: It integrates several different databases and functional annotations into one unified framework. This includes integrating microbial genomes, pathways, and microbial gene families with metabolic and biosynthetic capabilities.

Microbial Community Functionality: HUMAnN 3 identifies functional categories, such as carbohydrate metabolism, amino acid metabolism, and more, helping researchers understand the ecological and metabolic functions of microbiomes in different environments (e.g., human gut, skin, environmental samples).

Strain-Level Profiling: Unlike some other tools that only analyze community-level functions, HUMAnN 3 can also provide strain-level resolution for microbial functional capabilities, offering insights into the diversity of functional traits even among closely related organisms.

#### How HUMAnN 3 Works

Input Data:

Metagenomic sequence data is the primary input. HUMAnN 3 can work with raw sequencing reads (e.g., from Illumina sequencing) or with assembled contigs from metagenomic sequencing projects.
The tool takes as input FASTQ, FASTA, or SAM/BAM files, which are the standard formats for metagenomic sequence data.
Preprocessing (Optional):

The sequence data may undergo pre-processing steps like quality filtering, read trimming, or adapter removal using other tools before feeding it into HUMAnN 3.
The tool can also optionally include a host DNA filtering step if working with human samples to remove contaminating host sequences (important for metagenomic datasets derived from human microbiomes).
Taxonomic and Functional Annotation:

HUMAnN 3 first maps sequence reads to a curated database of microbial genes and proteins using sequence alignment tools like DIAMOND or BLAST.
It then uses the resulting gene matches to assign functional annotations (e.g., linking genes to metabolic pathways or enzymes). This step utilizes functional databases such as MetaCyc (pathways), KEGG (enzymes), and UniProt (protein sequences).
Pathway Mapping:

HUMAnN 3 performs pathway-level mapping by aggregating gene-level annotations into higher-level pathways, allowing for the identification of which metabolic pathways are present and their relative abundance in a sample.
Pathways are categorized into different metabolic classes (e.g., carbohydrate metabolism, amino acid biosynthesis, or lipid metabolism).
Output:

The main output of HUMAnN 3 is a table of functional abundances, which can be used to compare microbial metabolic profiles across samples.
The output includes several levels of information:
Gene-level annotations (e.g., specific genes detected in the metagenomic data).
Pathway-level abundances (e.g., relative abundance of specific metabolic pathways).
Module-level information (e.g., a higher-level aggregation of related pathways).
Protein family-level information.

The output can be formatted for use with standard bioinformatics tools for statistical analysis, such as R or Python.

Key Advantages of HUMAnN 3
Comprehensive Functional Insight: HUMAnN 3 is designed to go beyond taxonomic composition to provide a deeper understanding of functional capacity and metabolic diversity within microbial communities. This is particularly important for microbiome research, where function often matters more than the identity of individual microbes.

### Flexibility and Customization:

HUMAnN 3 is highly customizable, allowing users to select specific databases, taxonomic groups, or functional pathways to focus on.
It also provides options for filtering out certain taxa or pathways, depending on the research question.
Scalability:

HUMAnN 3 is built to handle large metagenomic datasets, making it suitable for high-throughput microbiome studies and longitudinal analysis of microbiome function across different environments or conditions.
It can process data from diverse sources, from human microbiomes to environmental samples.
Integration with Other Tools: HUMAnN 3 can be integrated with other microbiome analysis tools and pipelines, such as QIIME 2 or Mothur, allowing users to combine taxonomic and functional data for more complete microbiome profiles.

### Applications of HUMAnN 3 in Microbiome Research
Functional Microbiome Characterization:

HUMAnN 3 is widely used to study the functional potential of microbial communities in the human gut, oral cavity, skin, and other environments. It helps identify key microbial pathways involved in digestion, metabolism, and immune modulation.
Disease Associations:

Researchers can use HUMAnN 3 to investigate how the functional profiles of microbiomes differ in disease states, such as inflammatory bowel disease (IBD), obesity, type 2 diabetes, and other conditions where microbial function plays a key role.
Therapeutic Target Discovery:

HUMAnN 3 can be used to identify microbial pathways that might be targeted for therapeutic interventions, such as probiotics, dietary changes, or antibiotics.
By profiling the metabolic capabilities of microbiomes, HUMAnN 3 can help identify biomarkers associated with different diseases or therapeutic responses.
Environmental Microbiomes:

In addition to human microbiomes, HUMAnN 3 is also useful for profiling environmental microbiomes (e.g., soil, water, and air) to assess microbial activity and its role in ecological processes.
Comparative Microbiome Analysis:

HUMAnN 3 enables the comparison of microbial communities across different groups, conditions, or time points, to understand how microbiomes evolve or respond to treatments or environmental changes.

### Limitations of HUMAnN 3
Database Dependency:

While HUMAnN 3 integrates multiple databases, its performance and accuracy depend heavily on the completeness and quality of the underlying databases. Gaps in the databases can lead to incomplete or inaccurate functional annotations.
Reference Bias:

As with most functional annotation tools, HUMAnN 3 may exhibit reference bias, particularly when dealing with microbiomes from less-characterized or rare organisms. Some taxa or pathways may not be well-represented in the available reference databases.
Computational Demands:

Running HUMAnN 3 on large metagenomic datasets can be computationally intensive, requiring substantial memory and processing power, especially for complex microbial communities with high diversity.

#### Conclusion

HUMAnN 3 is a powerful tool for microbiome researchers who want to explore microbial functionality in addition to taxonomic composition. By providing detailed profiles of microbial metabolic pathways, functional genes, and protein families, HUMAnN 3 enables deeper insights into the biological roles that microbiomes play in health and disease. It is a key tool for studying complex microbial systems and discovering how microbial communities contribute to human health, disease states, and environmental interactions.

# Check out the slides for the workshop on HUMAnN 3

## The End

Copy your notebook for future reference...

In [None]:
!cp ~/be487-fall-2024/exercises/16_function/ex16_function.ipynb $work_dir