Skip to content

raw-lab/pathview-plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pathview-plus β€” Complete Pathway Visualization

Full-featured Python implementation of R pathview + SBGNview with support for KEGG, Reactome, MetaCyc, and more.

Python 3.10+


🎯 Features

Core Capabilities

  • βœ… KEGG Pathways β€” Download and visualize any KEGG pathway
  • βœ… SBGN Pathways β€” Support for Reactome, MetaCyc, PANTHER, SMPDB
  • βœ… Multiple Formats β€” PNG (native overlay), SVG (vector), PDF (graph layout)
  • βœ… Gene & Metabolite Data β€” Overlay expression and abundance data
  • βœ… Multi-Condition β€” Visualize multiple experiments side-by-side
  • βœ… ID Conversion β€” Automatic mapping: Entrez ↔ Symbol ↔ UniProt ↔ Ensembl
  • βœ… Highlighting β€” Post-hoc emphasis of specific nodes/edges/paths
  • βœ… Spline Curves β€” Smooth Bezier edge routing
  • βœ… Custom Colors β€” Configurable diverging color scales

New in v2.0

  • πŸ†• Full SBGN-ML support β€” Parse and render SBGN Process Description files
  • πŸ†• Database integration β€” Direct download from Reactome, MetaCyc
  • πŸ†• SVG vector output β€” Scalable graphics for web and publication
  • πŸ†• Highlighting system β€” ggplot2-style composable modifications
  • πŸ†• Spline rendering β€” Cubic Bezier and Catmull-Rom curves

πŸ“¦ Installation

Quick install

pip install pathview-plus

Custom install

# Clone repository
git clone https://github.com/raw-lab/pathview-plus
cd pathview-plus

# Install dependencies
pip install -r requirements.txt
pip install .

# Or install specific packages
pip install polars numpy matplotlib seaborn Pillow networkx requests

Dependencies:

  • Python β‰₯ 3.10
  • polars β‰₯ 0.19.0
  • matplotlib β‰₯ 3.7.0
  • seaborn β‰₯ 0.12.0
  • numpy β‰₯ 1.24.0
  • Pillow β‰₯ 10.0.0
  • networkx β‰₯ 3.1
  • requests β‰₯ 2.31.0

workflow


πŸš€ Quick Start

1. Basic KEGG Pathway

import polars as pl
from pathview import pathview

# Load your data
gene_data = pl.read_csv("gene_expr.tsv", separator="\t")

# Visualize on KEGG pathway
result = pathview(
    pathway_id="04110",      # Cell cycle
    gene_data=gene_data,
    species="hsa",
    output_format="png"
)

2. Reactome SBGN Pathway

from pathview import download_reactome, parse_sbgn, sbgn_to_df, pathview

# Download Reactome pathway
path = download_reactome("R-HSA-109582")  # Hemostasis

# Parse and visualize
pathway = parse_sbgn(path)
node_df = sbgn_to_df(pathway)

# Overlay data
result = pathview(
    pathway_id="R-HSA-109582",
    gene_data=gene_data,
    output_format="svg"  # Vector graphics
)

3. Multi-Condition Comparison

# Three experimental conditions
gene_data = pl.DataFrame({
    "entrez": ["1956", "2099", "5594", "207"],
    "Control": [0.5, -0.3, 1.2, -0.8],
    "Treatment_A": [2.1, -1.5, 0.4, 1.3],
    "Treatment_B": [1.8, -0.9, 2.3, 0.7],
})

result = pathview(
    pathway_id="04010",  # MAPK signaling
    gene_data=gene_data,
    species="hsa",
    limit={"gene": 2.5, "cpd": 1.5},
)
# Each node shows 3 color bands (one per condition)

4. Custom Color Schemes

result = pathview(
    pathway_id="04151",
    gene_data=gene_data,
    species="hsa",
    low={"gene": "#2166AC", "cpd": "#4575B4"},   # Blue
    mid={"gene": "#F7F7F7", "cpd": "#F7F7F7"},   # White
    high={"gene": "#D6604D", "cpd": "#B2182B"},  # Red
)

πŸ“– Complete Examples

Example 1: Gene Symbol IDs

gene_data = pl.DataFrame({
    "symbol": ["TP53", "EGFR", "KRAS", "PIK3CA", "AKT1"],
    "log2fc": [-1.8, 2.4, 1.1, 1.5, 0.9],
})

result = pathview(
    pathway_id="04151",
    gene_data=gene_data,
    species="hsa",
    gene_idtype="SYMBOL",  # Automatic conversion to Entrez
)

Example 2: Combined Gene + Metabolite

from pathview import sim_mol_data

gene_data = sim_mol_data(mol_type="gene", species="hsa", n_mol=80)
cpd_data = sim_mol_data(mol_type="cpd", n_mol=30)

result = pathview(
    pathway_id="00010",  # Glycolysis
    gene_data=gene_data,
    cpd_data=cpd_data,
    species="hsa",
    low={"gene": "green", "cpd": "blue"},
    high={"gene": "red", "cpd": "yellow"},
)

Example 3: SVG Vector Output

result = pathview(
    pathway_id="04110",
    gene_data=gene_data,
    species="hsa",
    output_format="svg",  # Scalable vector graphics
)
# Output: hsa04110.pathview.svg
# - Scalable without quality loss
# - Smaller file size
# - Editable in Inkscape/Illustrator

Example 4: Graph Layout (No PNG Background)

result = pathview(
    pathway_id="04010",
    gene_data=gene_data,
    species="hsa",
    kegg_native=False,     # Use NetworkX layout
    output_format="pdf",
)
# Output: hsa04010.pathview.pdf

Example 5: Highlighting (API Preview)

from pathview import highlight_nodes, highlight_path

result = pathview("04010", gene_data=data)

# Composable modifications (ggplot2-style)
highlighted = (result
               + highlight_nodes(["1956", "2099"], color="red", width=4)
               + highlight_path(["1956", "2099", "5594"], color="orange"))

highlighted.save("highlighted.png")

Example 6: Spline Curves

from pathview import cubic_bezier, catmull_rom_spline
import matplotlib.pyplot as plt

# Smooth Bezier curve
curve = cubic_bezier((0,0), (1,2), (3,2), (4,0), n_points=100)

plt.plot(curve[:, 0], curve[:, 1], linewidth=2)
plt.title("Bezier Curve Edge Routing")
plt.savefig("bezier_example.png")

Example 7: Batch Processing

pathways = ["04110", "04010", "04151", "00010"]

for pw_id in pathways:
    try:
        result = pathview(
            pathway_id=pw_id,
            gene_data=gene_data,
            species="hsa",
            out_suffix=f"batch_{pw_id}",
        )
        print(f"βœ“ Completed {pw_id}")
    except Exception as e:
        print(f"βœ— Failed {pw_id}: {e}")

πŸ–₯️ Command Line Interface

# Basic usage
python pathview_cli.py --pathway-id 04110 --gene-data expr.tsv

# Specify species and ID type
python pathview_cli.py \
    --pathway-id 04110 \
    --species hsa \
    --gene-data expr.tsv \
    --gene-idtype SYMBOL

# Custom colors
python pathview_cli.py \
    --pathway-id 04010 \
    --gene-data expr.tsv \
    --low-gene '#2166AC' \
    --high-gene '#D6604D' \
    --output-format svg

# Simulate data (for testing)
python pathview_cli.py \
    --pathway-id 04110 \
    --simulate \
    --n-sim 200

# Display KEGG legend
python pathview_cli.py --legend

CLI Arguments:

Pathway:
  --pathway-id ID          KEGG pathway number (e.g., '04110')

Input data:
  --gene-data TSV          Gene expression file (TSV)
  --cpd-data TSV           Compound abundance file (TSV)
  --gene-idtype TYPE       Gene ID type: ENTREZ, SYMBOL, UNIPROT, ENSEMBL
  --cpd-idtype TYPE        Compound ID type: KEGG, PUBCHEM, CHEBI

Species & paths:
  --species CODE           KEGG species code (default: hsa)
  --kegg-dir DIR           Directory for files (default: .)
  --out-suffix SUFFIX      Output filename suffix (default: pathview)

Rendering:
  --kegg-native            Use KEGG PNG background (default: True)
  --output-format FORMAT   Output format: png, pdf, svg (default: png)
  --map-symbol             Replace Entrez with symbols (default: True)
  --node-sum METHOD        Aggregation: sum, mean, median, max
  --no-signature           Suppress watermark
  --no-col-key             Suppress color legend

Color scale:
  --limit-gene FLOAT       Color scale limit (default: 1.0)
  --bins-gene INT          Color bins (default: 10)
  --low-gene COLOR         Low-end color (default: green)
  --mid-gene COLOR         Mid-point color (default: gray)
  --high-gene COLOR        High-end color (default: red)
  --low-cpd COLOR          Low compound color (default: blue)
  --high-cpd COLOR         High compound color (default: yellow)

Utilities:
  --legend                 Display KEGG legend and exit
  --simulate               Generate simulated data
  --n-sim INT              Number of simulated molecules (default: 200)

πŸ“Š Input File Formats

Gene Data (TSV)

First column = gene IDs, remaining columns = numeric expression values.

entrez	Control	Treatment_A	Treatment_B
1956	2.31	0.45	1.82
2099	-1.14	-0.88	0.33
5594	0.72	1.33	-0.51
207	-0.88	1.21	0.94

Gene Symbols

gene_symbol	log2fc	p_value
TP53	-1.8	0.001
EGFR	2.4	0.0001
KRAS	1.1	0.01

Compound Data (TSV)

kegg	abundance
C00031	1.45
C00118	-0.83
C00022	2.11

🎨 Color Scale Configuration

Three-Point Diverging Scale

pathview(
    pathway_id="04110",
    gene_data=data,
    limit={"gene": 2.0, "cpd": 1.5},      # Β±2.0 for genes, Β±1.5 for compounds
    bins={"gene": 20, "cpd": 10},          # Color resolution
    low={"gene": "blue", "cpd": "green"},
    mid={"gene": "white", "cpd": "gray"},
    high={"gene": "red", "cpd": "yellow"},
)

The scale maps:

  • low value β†’ low color (default: green/blue)
  • 0 β†’ mid color (default: gray)
  • high value β†’ high color (default: red/yellow)

One-Directional Scale

both_dirs={"gene": False, "cpd": False}
# Maps: 0 (mid) β†’ max (high)

πŸ—‚οΈ Supported ID Types

Gene IDs

Type Value Example
Entrez ENTREZ 1956
Symbol SYMBOL EGFR
UniProt UNIPROT P00533
Ensembl ENSEMBL ENSG00000146648
KEGG KEGG hsa:1956

Compound IDs

Type Value Example
KEGG KEGG C00031
PubChem PUBCHEM 5793
ChEBI CHEBI 4167

🧬 Supported Databases

KEGG

  • Format: KGML (XML)
  • Species: 500+ organisms
  • Download: Automatic via KEGG REST API
  • Example: pathway_id="hsa04110"

Reactome

  • Format: SBGN-ML
  • Species: Human, mouse, rat, and more
  • Download: download_reactome("R-HSA-109582")
  • Example: Hemostasis, Immune System, Signaling

MetaCyc

  • Format: SBGN-ML
  • Coverage: 2,800+ metabolic pathways
  • Download: download_metacyc("PWY-7210")
  • Example: Pyrimidine biosynthesis

PANTHER

  • Format: SBGN-ML
  • Coverage: 177 signaling and metabolic pathways
  • Note: Manual download required

SMPDB

  • Format: SBGN-ML
  • Coverage: Small molecule pathways
  • Note: Manual download from website

πŸ—οΈ Architecture

pathview/
β”œβ”€β”€ __init__.py           # Public API exports
β”œβ”€β”€ constants.py          # Type definitions
β”œβ”€β”€ utils.py              # String/numeric utilities
β”‚
β”œβ”€β”€ id_mapping.py         # Gene/compound ID conversion
β”œβ”€β”€ mol_data.py           # Data aggregation, simulation
β”‚
β”œβ”€β”€ kegg_api.py           # KEGG REST API
β”œβ”€β”€ databases.py          # Reactome, MetaCyc downloaders
β”‚
β”œβ”€β”€ kgml_parser.py        # KEGG KGML (XML) parser
β”œβ”€β”€ sbgn_parser.py        # SBGN-ML (XML) parser
β”‚
β”œβ”€β”€ color_mapping.py      # Colormaps, node coloring
β”œβ”€β”€ node_mapping.py       # Map data onto nodes
β”‚
β”œβ”€β”€ rendering.py          # PNG/PDF renderers
β”œβ”€β”€ svg_rendering.py      # SVG vector renderer
β”œβ”€β”€ highlighting.py       # Post-hoc modifications
β”œβ”€β”€ splines.py            # Bezier curve math
β”‚
└── pathview.py           # Core orchestrator

pathview_cli.py           # Command-line interface
requirements.txt          # Dependencies
README.md                 # This file

Module Statistics:

  • 15 modules | 3,506 lines of code
  • Functional programming style
  • Full type hints
  • Comprehensive docstrings

πŸ”§ API Reference

Core Function

pathview(
    pathway_id: str,
    gene_data: Optional[pl.DataFrame] = None,
    cpd_data: Optional[pl.DataFrame] = None,
    species: str = "hsa",
    kegg_dir: Path = ".",
    kegg_native: bool = True,
    output_format: str = "png",  # "png", "pdf", "svg"
    gene_idtype: str = "ENTREZ",
    cpd_idtype: str = "KEGG",
    out_suffix: str = "pathview",
    node_sum: str = "sum",
    map_symbol: bool = True,
    map_null: bool = True,
    min_nnodes: int = 3,
    new_signature: bool = True,
    plot_col_key: bool = True,
    # Color scale parameters
    limit: dict = {"gene": 1.0, "cpd": 1.0},
    bins: dict = {"gene": 10, "cpd": 10},
    both_dirs: dict = {"gene": True, "cpd": True},
    low: dict = {"gene": "green", "cpd": "blue"},
    mid: dict = {"gene": "gray", "cpd": "gray"},
    high: dict = {"gene": "red", "cpd": "yellow"},
    na_col: str = "transparent",
) -> dict

Data Functions

sim_mol_data(mol_type="gene", species="hsa", n_mol=100, n_exp=1) β†’ pl.DataFrame
mol_sum(mol_data, id_map, sum_method="sum") β†’ pl.DataFrame

ID Mapping

id2eg(ids, category, org="Hs") β†’ pl.DataFrame
eg2id(eg_ids, category="SYMBOL", org="Hs") β†’ pl.DataFrame
cpd_id_map(in_ids, in_type, out_type="KEGG") β†’ pl.DataFrame

Parsing

# KEGG
parse_kgml(filepath) β†’ KGMLPathway
node_info(pathway) β†’ pl.DataFrame

# SBGN
parse_sbgn(filepath) β†’ SBGNPathway
sbgn_to_df(pathway) β†’ pl.DataFrame

Database Downloads

download_kegg(pathway_id, species="hsa", kegg_dir=".") β†’ dict
download_reactome(pathway_id, output_dir=".") β†’ Path
download_metacyc(pathway_id, output_dir=".") β†’ Path
list_reactome_pathways(species="Homo sapiens") β†’ list[dict]
detect_database(pathway_id) β†’ str

Highlighting

# API design (full implementation in progress)
result = pathview(...)
highlighted = result + highlight_nodes(["1956", "2099"], color="red")
highlighted.save("output.png")

Splines

cubic_bezier(p0, p1, p2, p3, n_points=50) β†’ np.ndarray
quadratic_bezier(p0, p1, p2, n_points=50) β†’ np.ndarray
catmull_rom_spline(points, n_points=50, alpha=0.5) β†’ np.ndarray
route_edge_spline(source, target, obstacles, mode="orthogonal") β†’ np.ndarray
bezier_to_svg_path(curve, close=False) β†’ str

πŸ“ˆ Performance

  • KEGG pathways: ~2-5 seconds (download + render)
  • SBGN pathways: ~3-8 seconds (more complex)
  • Multi-condition: Linear scaling with # conditions
  • Batch processing: Parallel processing possible

Optimization tips:

  • Cache downloaded files (automatic)
  • Use output_format="svg" for faster rendering
  • Disable color key for batch jobs: plot_col_key=False

🀝 Contributing

Contributions welcome! Areas for improvement:

  1. SBGN rendering β€” Improve glyph shape variety
  2. Edge routing β€” Implement A* pathfinding for splines
  3. Database integration β€” Add PANTHER, SMPDB auto-download
  4. Highlighting β€” Wire up image modification backend
  5. Performance β€” Parallel pathway processing

πŸ“„ License

Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) β€” See LICENSE file

Citations:

If you are publishing results obtained using Pathview-Plus, please cite:

  • Pre-Print Pathview-Plus: Figueroa III JL, Brouwer CR, White III RA. 2026. Pathview-plus: unlocking the metabolic pathways from cells to ecosystems. bioRxiv.

If you using the R version please cite:

  • Original Pathview R: Luo, W., & Brouwer, C. 2013. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics, 29(14), 1830–1831. Pathview
  • Original SBGNview R: Shashikant, T., et al. 2022. SBGNview: Data analysis, integration and visualization on all pathways using SBGN. Bioinformatics, 38(11), 3006–3008. SBGNview

Contributing to Pathview-plus

We welcome contributions of other experts expanding features in Pathview-plus including the R and python versions. Please contact us via support.


πŸ“ž Support


Made with ❀️ for the pathway visualization community

About

Python version of Pathview & SBGNview with extra features

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages