Functional Profiling Pipeline (HUMAnN3)

This notebook runs a functional profiling workflow:

Install HUMAnN3

Download databases (ChocoPhlAn + UniRef)

Merge paired reads

Run HUMAnN3 functional profiling

Post-process outputs (join + normalize tables)

# Install HUMAnN3
(adjust depending on available Colab packages)

In [None]:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --add channels biobakery

conda install -y humann -c biobakery


# 1. Set configuration
Define input reads, databases, and output directories.

In [None]:
import os

READ1 = "/content/data/sample_R1.fastq.gz"
READ2 = "/content/data/sample_R2.fastq.gz"

OUTPUT_DIR = "/content/humann_output"
MERGED_DIR = os.path.join(OUTPUT_DIR, "merged_reads")
os.makedirs(MERGED_DIR, exist_ok=True)

DB_DIR = "/content/humann_databases"
CHOCO_DB = os.path.join(DB_DIR, "chocophlan")
UNIREF_DB = os.path.join(DB_DIR, "uniref")
os.makedirs(CHOCO_DB, exist_ok=True)
os.makedirs(UNIREF_DB, exist_ok=True)

THREADS = "4"


# 2. Database setup
Download ChocoPhlAn + UniRef if not already present.


In [None]:
# ChocoPhlAn
humann_databases --download chocophlan full /content/humann_databases/chocophlan --update-config yes

# UniRef50 (can change to uniref90_diamond)
humann_databases --download uniref uniref50_diamond /content/humann_databases/uniref --update-config yes


# 3. Merge paired reads
Combine forward and reverse reads into one file.


In [None]:
cat /content/data/sample_R1.fastq.gz /content/data/sample_R2.fastq.gz > /content/humann_output/merged_reads/sample_merged.fastq.gz

# 4. Run HUMAnN3
Run functional profiling on merged reads.

In [None]:
humann \
  --input /content/humann_output/merged_reads/sample_merged.fastq.gz \
  --output /content/humann_output \
  --threads 4

# 5. Post-processing
Join and normalize HUMAnN3 output tables.

In [None]:
humann_join_tables --input /content/humann_output --output /content/humann_output/genefamilies.tsv --file_name genefamilies
humann_join_tables --input /content/humann_output --output /content/humann_output/pathabundance.tsv --file_name pathabundance

humann_normalize_table --input /content/humann_output/genefamilies.tsv --output /content/humann_output/genefamilies_cpm.tsv --units cpm
humann_normalize_table --input /content/humann_output/pathabundance.tsv --output /content/humann_output/pathabundance_relab.tsv --units relab
