Skip to content

korem-lab/copangraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

build and test codecov

copangraph: Comparative pan-metagenomic graph

Copangraph is designed to represent the pan-metagenomic content across multiple metagenomic samples as a sequence graph. By doing so, it provides the foundation of a "comparative metagenomic" framework to associate microbial genomic signatures to host (or microbiome) phenotypes.

Dependencies

  • Anaconda-2025-06

Downloading and installing copangraph

Copangraph is implemented in C++20. To download, clone copangraph with

git clone https://github.com/korem-lab/copangraph.git

Installation depends on conda, all other dependencies are installed in the copangraph conda environment.

To construct the copangraph conda environment

cd copangraph

From here, run

conda env create --file environments/copangraph.yaml

which creates the copangraph environment, installing all dependencies required for compiliation and runtime into it. Please consult environments/copangraph.yaml for a complete list of all dependencies with version numbers.

To compile copangraph, activate the conda environment

conda activate copangraph

and then compile with

snakemake -c 1 -s compile_code.smk

This will run a snakemake routine that compiles the copangraph from source. On standard laptops, compilation takes a few (under 5) minutes. Succesfull compilation results in the four executables in the following directory structure:

./bin/
  release/
    copangraph
    extension
  debug/
    copangraph
    extension

Running copangraph

Paired-end extension

Copangraph requires paired-end extended contigs, one set constructed for each sample, as input. Paired-end extended contigs are constructed for each sample as follows:

# First assemble each sample with a metagenomic assembler. Currently, we support MEGAHIT:
megahit -t 4 -1 sampleA_1.fastq.gz -2 sampleA_2.fastq.gz -o sampleA_asm

# Next, construct an index from the sample's assembly and map the sample's reads to the assembly, writing the mappings to bam format.
bowtie2-build --threads 4 sampleA_asm/final.contigs.fa sampleA_asm/sampleA_idx
bowtie2 --threads 4 -x sampleA_asm/sampleA_idx -1 sampleA_1.fastq.gz -2 sampleA_2.fastq.gz | samtools view -@ 2 -bS -h - > sampleA_asm/sampleA_mapping.bam

# Next, sort the mappings by read name.
samtools sort -n -@ 4 -o sampleA_asm/sampleA_sorted_mapping.bam sampleA_asm/sampleA_mapping.bam

# Then run paired-end extension. 
copangraph/bin/release/extension -t 4 -i sampleA_asm/final.contigs.fa -b sampleA_asm/sampleA_sorted_mapping.bam --pe-only -o extended_contigs -n sampleA

# This will result in a file, ./extended_contigs/sampleA.pe_ext.fasta, which can be input into copangraph.

Demo

We have a included a small demo file and use it to show how to run copangraph, and to explain its output. This demo runs quickly (under 5 minutes). Note that, to construct a multi-sample copangraph, rather than passing the path of a single paired-end extended fasta file to -s, pass a file containing the absolute paths to a list of paired-end extended fasta files. One file path per line. To run copangraph on the demo, execute

./bin/release/copangraph  -s demo/simple_test.pe_ext.fasta -g demo -o demo/ -t 2 -d 0.02

Which will run copangraph using two threads (-t 2) and collapsing homologous sequences with 98% (-d 0.02; 1-0.02=0.98) sequence identity. Once the demo completes, the follwing output files will be written to the demo folder:

- demo.gfa : the copangraph in gfa format
- demo.fasta : a multi-fasta file, each element being a sequence assigned to a copangraph node
- demo.ncolor.gfa : the node (color) occurrence file, describing the sample occurrence in each node
- demo.ecolor.gfa : the edge (color) occurence file, describing the sample occurrence in each edge.
- demo.log : a log file writing the copangraph output. 

Citing copangraph

Coleman I. et al. Comparative metagenomics using pan-metagenomic graphs. bioRxiv (2025). https://doi.org/10.1101/2025.09.07.674724

About

Comparative pan-metagenomic graph.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •