PathwayDB

A lightweight Python library for querying and storing biological pathway and gene set annotations from major databases.

Perfect for:

🧬 Gene set enrichment analysis (GSEA)
🔬 Pathway annotation and analysis
📊 Functional genomics workflows
🧪 Bioinformatics pipelines
📈 Integration with pandas/R for downstream analysis

Why PathwayDB?

No Dependencies Hassle: Pure Python stdlib - no compilation, no conflicts, works everywhere
Offline-First: Download once, query forever - perfect for HPC clusters without internet
Fast: Millisecond queries on local SQLite databases
DataFrame-Friendly: Export directly to pandas format for analysis (like clusterProfiler in R)
Simple API: Intuitive methods that feel natural for bioinformaticians
Well-Documented: Clear examples and comprehensive documentation

What's New in v0.2.0

🎉 Major update with game-changing features!

🔍 Search by Description: Filter pathways/terms by name instead of remembering IDs

# KEGG
cancer = kegg.filter(pathway_name='cancer')

# GO
dna_repair = go.filter(term_name='DNA repair')

# MSigDB
apoptosis = msigdb.filter(gene_set_name='apoptosis')

⚡ Instant GO Term Names: ~1.5 MB mapping bundled with package - no downloads needed!
```
go.download_annotations(species='human')  # Term names included automatically!
```

💾 Centralized Caching: Download once, use across all projects

go = GO.from_cache(species='human')  # Loads from shared cache

📊 Complete DataFrame Export: All databases support pandas-compatible export

df_data = kegg.to_dataframe()  # GeneID, PATH, Annot
df_data = go.to_dataframe()    # GeneID, TERM, Aspect, Evidence
df_data = msigdb.to_dataframe()  # GeneID, GeneSet, Collection, Description

See CHANGELOG.md for complete details.

Features

✅ Multiple Database Support: KEGG, Gene Ontology (GO), and MSigDB
✅ Zero External Dependencies: Uses only Python standard library
✅ Description-Based Filtering: Search by pathway/term names, not just IDs
✅ Bundled GO Term Names: ~1.5 MB mapping included for instant term name access
✅ Local SQLite Storage: Download once, query offline forever
✅ DataFrame Export: Export to pandas-compatible format (like clusterProfiler)
✅ Smart Caching: HTTP response caching and centralized annotation cache
✅ Rate Limiting: Built-in rate limiting for respectful API usage
✅ Gene ID Conversion: Convert between Entrez, Symbol, Ensembl, and UniProt IDs
✅ Fast Queries: Millisecond-level queries on local databases

Installation

From Source

git clone https://github.com/guokai8/pathwaydb.git
cd pathwaydb
pip install -e .

From PyPI (coming soon)

pip install pathwaydb

Quick Start

KEGG Pathways

from pathwaydb import KEGG

# Initialize KEGG client with local storage
kegg = KEGG(species='hsa', storage_path='kegg_human.db')

# Download all pathway annotations (first time only)
# Automatically includes pathway hierarchy (Level1, Level2, Level3)!
kegg.download_annotations()
# Output: Downloaded 8,000+ pathway-gene annotations
#         Downloading KEGG pathway hierarchy...
#         ✓ Updated 354 pathways with hierarchy information
kegg.convert_ids_to_symbols()

# Query pathways for a specific gene
results = kegg.query_by_gene('TP53')
print(f"TP53 is in {len(results)} pathways")
# Output: TP53 is in 73 pathways

for pathway in results[:3]:
    print(f"  {pathway.pathway_id}: {pathway.pathway_name}")
# Output:
#   hsa05200: Pathways in cancer
#   hsa04115: p53 signaling pathway
#   hsa04110: Cell cycle

# Filter by pathway name (case-insensitive substring match)
cancer_pathways = kegg.filter(pathway_name='cancer')
print(f"Found {len(cancer_pathways)} cancer-related annotations")
# Output: Found 2,389 cancer-related annotations

# Combine filters: specific gene + pathway name
tp53_cancer = kegg.filter(gene_symbols=['TP53'], pathway_name='cancer')
print(f"TP53 in {len(tp53_cancer)} cancer pathway annotations")
# Output: TP53 in 15 cancer pathway annotations

# Get database statistics
stats = kegg.stats()
print(stats)
# Output: {'total_annotations': 8234, 'unique_genes': 7894, 'unique_pathways': 354}

# Export to DataFrame format (includes hierarchy!)
df_data = kegg.to_dataframe()
# Returns: [{'GeneID': 'TP53', 'PATH': 'hsa05200', 'Annot': 'Pathways in cancer',
#            'Level1': 'Human Diseases', 'Level2': 'Cancer: overview', 'Level3': 'Pathways in cancer'}, ...]

KEGG Pathway Hierarchy

Pathway annotations include hierarchical classification from KEGG BRITE:

Level	Description	Example
Level1	Top-level category	Metabolism, Human Diseases, Cellular Processes
Level2	Sub-category	Carbohydrate metabolism, Cancer, Cell growth
Level3	Pathway name	Glycolysis, Pathways in cancer, Cell cycle

# Access hierarchy in DataFrame export
import pandas as pd
df = pd.DataFrame(kegg.to_dataframe())
print(df[['GeneID', 'PATH', 'Level1', 'Level2']].head())
#    GeneID       PATH           Level1                  Level2
# 0    TP53  hsa05200  Human Diseases         Cancer: overview
# 1    TP53  hsa04115  Human Diseases  Cancer: specific types
# 2    TP53  hsa04110  Cellular Processes    Cell growth and death


### Gene Ontology (GO)

PathwayDB offers multiple ways to build and use GO annotations:

#### Method 1: Download Fresh (Recommended for Production)

```python
from pathwaydb import GO

# Initialize GO client with local storage
go = GO(storage_path='go_human.db')

# Download GO annotations (first time only)
# Term names are automatically populated!
go.download_annotations(species='human')
# Output: Downloading GO annotations for human...
#         Populating names for 18,000+ GO terms...
#         ✓ All GO term names populated successfully!

# Check database statistics
print(go.stats())
# {'total_annotations': 500000+, 'unique_genes': 20000+, 'unique_terms': 18000+}

Method 2: Use Centralized Cache (Share Across Projects)

from pathwaydb import GO

# Load from cache - downloads automatically if not cached
go = GO.from_cache(species='human')
# Uses ~/.pathwaydb_cache/go_annotations/go_human_cached.db

# Or manually download to cache first
from pathwaydb import download_to_cache, load_from_cache
download_to_cache(species='human')  # Download once
db = load_from_cache(species='human')  # Reuse in any project

Method 3: Auto-Detect Best Source

from pathwaydb import GO

# Automatically uses best available source:
# 1. Bundled package data (instant, if available)
# 2. User cache (~/.pathwaydb_cache/)
# 3. Download fresh (if nothing found)
go = GO.load(species='human')

Querying GO Annotations

# Query GO terms for a specific gene
annotations = go.query_by_gene('BRCA1')
print(f"BRCA1 has {len(annotations)} GO annotations")
# Output: BRCA1 has 156 GO annotations

# Term names are already available!
for ann in annotations[:3]:
    print(f"  {ann.go_id}: {ann.term_name} [{ann.evidence_code}]")
# Output:
#   GO:0006281: DNA repair [IBA]
#   GO:0006355: regulation of transcription, DNA-templated [TAS]
#   GO:0005515: protein binding [IPI]

# Filter by term name (case-insensitive substring match)
dna_repair = go.filter(term_name='DNA repair')
apoptosis = go.filter(term_name='apoptosis')
print(f"Found {len(dna_repair)} DNA repair annotations")

# Filter by namespace (biological_process, molecular_function, cellular_component)
bp_terms = go.filter(namespace='biological_process')
print(f"Biological Process annotations: {len(bp_terms)}")

# Filter by evidence codes (experimental evidence only)
exp_annotations = go.filter(evidence_codes=['EXP', 'IDA', 'IPI', 'IMP'])
print(f"Experimental evidence: {len(exp_annotations)}")

# Combine filters: TP53 + term name + experimental evidence
tp53_dna_exp = go.filter(
    gene_symbols=['TP53'],
    term_name='DNA',
    evidence_codes=['EXP', 'IDA']
)
print(f"TP53 DNA-related (experimental): {len(tp53_dna_exp)}")

# Export to DataFrame format
df_data = go.to_dataframe()
# Returns: [{'GeneID': 'BRCA1', 'TERM': 'GO:0006281', 'Aspect': 'P', 'Evidence': 'IBA'}, ...]

Term Name Sources

GO term names are populated automatically from bundled package data (instant, no download needed!).

The package includes ~1.5 MB of pre-compiled GO term name mappings, so term names are available immediately after downloading annotations.

# Default behavior: uses bundled data (instant!)
go.download_annotations(species='human')  # Term names included automatically

# Skip term names if you don't need them
go.download_annotations(species='human', fetch_term_names=False)

# Manually populate term names with different sources
go.populate_term_names(source='bundled')  # Use bundled data only (default, instant)
go.populate_term_names(source='obo')      # Download GO OBO file (~35MB)
go.populate_term_names(source='auto')     # Try bundled > OBO > QuickGO API
go.populate_term_names(source='quickgo')  # Use QuickGO API only (slow)

MSigDB Gene Sets

from pathwaydb import MSigDB

# Initialize MSigDB client
msigdb = MSigDB(storage_path='msigdb.db')

# Download specific collections
msigdb.download_collection('H')  # Hallmark gene sets
msigdb.download_collection('C2')  # Curated gene sets (KEGG, Reactome, etc.)

# NEW: Filter by gene set name (case-insensitive substring match)
apoptosis_sets = msigdb.filter(gene_set_name='apoptosis')
print(f"Found {len(apoptosis_sets)} apoptosis gene sets")
# Output: Found 15 apoptosis gene sets

# Filter by description
immune_sets = msigdb.filter(description='immune')
print(f"Found {len(immune_sets)} immune-related gene sets")

# Filter by collection
hallmark_sets = msigdb.filter(collection='H')
print(f"Found {len(hallmark_sets)} Hallmark gene sets")

# Query gene sets containing specific genes
tp53_sets = msigdb.filter(gene_symbols=['TP53'])
print(f"TP53 in {len(tp53_sets)} gene sets")

# Combine filters
hallmark_interferon = msigdb.filter(
    collection='H',
    gene_set_name='interferon'
)
print(f"Hallmark interferon sets: {len(hallmark_interferon)}")

# Export to DataFrame format
df_data = msigdb.to_dataframe(collection='H')
# Returns: [{'GeneID': 'TP53', 'GeneSet': 'HALLMARK_APOPTOSIS', 'Collection': 'H', 'Description': '...'}, ...]

Gene ID Conversion

from pathwaydb import IDConverter

# Initialize converter
converter = IDConverter(species='human')

# Convert single ID
symbol = converter.entrez_to_symbol('7157')  # Returns 'TP53'

# Batch conversion
entrez_ids = ['7157', '675', '4609']
symbols = converter.batch_convert(entrez_ids, from_type='entrez', to_type='symbol')

# Multiple ID types supported
ensembl_id = converter.symbol_to_ensembl('TP53')
uniprot_id = converter.symbol_to_uniprot('TP53')

Database Information

KEGG (Kyoto Encyclopedia of Genes and Genomes)

Coverage: 500+ organisms, 500+ pathways per species
Content: Metabolic, signaling, disease pathways
Update: Manually curated, regularly updated
Species codes: 'hsa' (human), 'mmu' (mouse), 'rno' (rat), etc.

GO (Gene Ontology)

Coverage: Thousands of species
Content: Biological processes, molecular functions, cellular components
Update: Continuously updated by consortium
Hierarchy: DAG structure with parent-child relationships

MSigDB (Molecular Signatures Database)

Collections:
- H: Hallmark gene sets (50 sets)
- C1: Positional gene sets
- C2: Curated gene sets (KEGG, Reactome, BioCarta, etc.)
- C3: Regulatory target gene sets
- C4: Computational gene sets
- C5: Gene Ontology gene sets
- C6: Oncogenic signatures
- C7: Immunologic signatures
- C8: Cell type signatures

Advanced Usage

Working with Local Databases

from pathwaydb.storage import KEGGAnnotationDB

# Load existing database
db = KEGGAnnotationDB('kegg_human.db')

# Query with filters - search by pathway name (case-insensitive substring match)
results = db.filter(pathway_name='cancer')
print(f"Found {len(results)} annotations in cancer-related pathways")
# Output: Found 2389 annotations in cancer-related pathways

# Combine multiple filters
cancer_tp53 = db.filter(pathway_name='cancer', gene_symbols=['TP53'])
print(f"TP53 in {len(cancer_tp53)} cancer pathways")
# Output: TP53 in 15 cancer pathways

# Other filter options
metabolism = db.filter(pathway_name='metabolism')
specific_genes = db.filter(gene_symbols=['TP53', 'BRCA1', 'EGFR'])
specific_pathways = db.filter(pathway_ids=['hsa04110', 'hsa04115'])

# Export to different formats
records = db.to_records()  # List of dicts
gene_sets = db.to_gene_sets()  # For enrichment tools

# Database statistics
stats = db.stats()
print(f"Total annotations: {stats['total_annotations']}")
print(f"Unique pathways: {stats['unique_pathways']}")
print(f"Unique genes: {stats['unique_genes']}")

Centralized GO Caching (NEW in v0.2.0)

Download GO annotations once and reuse across all your projects:

from pathwaydb import GO

# Option 1: Load from cache (auto-downloads if missing)
go = GO.from_cache(species='human')  # Uses ~/.pathwaydb_cache/go_annotations/

# Option 2: Smart load - auto-detects best source
# Tries: bundled package data > cache > download
go = GO.load(species='human')

# Option 3: Manually download to cache first
from pathwaydb.storage.go_db import download_to_cache
download_to_cache(species='human')  # Download once
go = GO.from_cache(species='human')  # Reuse in any project

See GO_CACHE_GUIDE.md for complete caching documentation.

Custom HTTP Caching

from pathwaydb import KEGG

# Use custom HTTP cache directory
kegg = KEGG(
    species='hsa',
    cache_dir='/path/to/custom/cache',
    storage_path='kegg.db'
)

Batch Operations

from pathwaydb import KEGG

kegg = KEGG(species='hsa', storage_path='kegg.db')

# Download and convert IDs in one step
kegg.download_annotations()
kegg.convert_ids_to_symbols()  # Convert Entrez IDs to gene symbols

# Query multiple genes
genes = ['TP53', 'BRCA1', 'EGFR']
for gene in genes:
    pathways = kegg.query_by_gene(gene)
    print(f"{gene}: {len(pathways)} pathways")

DataFrame Export for Enrichment Analysis

NEW FEATURE: Export annotations in tabular format compatible with pandas DataFrame and enrichment tools (similar to clusterProfiler in R).

Direct Export from Connectors

from pathwaydb import KEGG, GO
import pandas as pd

# KEGG - Export to DataFrame format
kegg = KEGG(species='hsa', storage_path='kegg_human.db')
df_data = kegg.to_dataframe()  # Get all annotations

# Convert to pandas DataFrame
df = pd.DataFrame(df_data)
print(df.head())

Output:

     GeneID      PATH                                  Annot
0       A2M  hsa04610  Complement and coagulation cascades
1      NAT1  hsa00232                    Caffeine metabolism
2      NAT1  hsa00983        Drug metabolism - other enzymes
3      NAT1  hsa01100                     Metabolic pathways
4      NAT2  hsa00232                    Caffeine metabolism

DataFrame Format Specifications

KEGG DataFrame columns:

GeneID: Gene symbol (e.g., 'TP53')
PATH: Pathway ID (e.g., 'hsa04110')
Annot: Pathway name/description

GO DataFrame columns:

GeneID: Gene symbol (e.g., 'BRCA1')
TERM: GO term ID (e.g., 'GO:0006281')
Aspect: P (biological_process), F (molecular_function), C (cellular_component)
Evidence: Evidence code (e.g., 'EXP', 'IDA', 'IEA')

MSigDB DataFrame columns:

GeneID: Gene symbol (e.g., 'TP53')
GeneSet: Gene set name (e.g., 'HALLMARK_APOPTOSIS')
Collection: Collection code (e.g., 'H', 'C2')
Description: Gene set description

Analysis Examples with pandas

# Get KEGG annotations
kegg = KEGG(species='hsa', storage_path='kegg_human.db')
df = pd.DataFrame(kegg.to_dataframe())

# Save to CSV
df.to_csv('kegg_annotations.csv', index=False)

# Filter for specific gene
tp53_pathways = df[df['GeneID'] == 'TP53']
print(f"TP53 pathways: {len(tp53_pathways)}")

# Find all genes in cancer-related pathways
cancer_df = df[df['Annot'].str.contains('cancer', case=False)]
cancer_genes = cancer_df['GeneID'].unique()
print(f"Genes in cancer pathways: {len(cancer_genes)}")

# Get pathway sizes
pathway_sizes = df.groupby('PATH')['GeneID'].count()
print(pathway_sizes.head())

# GO annotations
go = GO(storage_path='go_human.db')
df_go = pd.DataFrame(go.to_dataframe())

# Filter biological processes only
bp_df = df_go[df_go['Aspect'] == 'P']

# Get genes with experimental evidence
exp_df = df_go[df_go['Evidence'].isin(['EXP', 'IDA', 'IPI', 'IMP'])]
print(f"Annotations with experimental evidence: {len(exp_df)}")

# Create gene-to-term mapping
gene_to_terms = df_go.groupby('GeneID')['TERM'].apply(list).to_dict()

# MSigDB gene sets
msigdb = MSigDB(storage_path='msigdb.db')
df_msigdb = pd.DataFrame(msigdb.to_dataframe(collection='H'))

# Find genes in specific gene sets
apoptosis_genes = df_msigdb[df_msigdb['GeneSet'].str.contains('APOPTOSIS', case=False)]
print(f"Genes in apoptosis gene sets: {len(apoptosis_genes)}")

# Get all gene sets for a specific gene
tp53_sets = df_msigdb[df_msigdb['GeneID'] == 'TP53']['GeneSet'].unique()
print(f"TP53 is in {len(tp53_sets)} gene sets")

Use with Enrichment Analysis Tools

# Prepare background gene set
all_genes = df['GeneID'].unique()

# Prepare pathway gene sets for enrichment
pathway_dict = df.groupby('PATH').apply(
    lambda x: {
        'genes': x['GeneID'].tolist(),
        'name': x['Annot'].iloc[0]
    }
).to_dict()

# Your gene list of interest
my_genes = ['TP53', 'BRCA1', 'EGFR', 'MYC', 'KRAS']

# Find enriched pathways (simple overlap example)
for pathway_id, info in pathway_dict.items():
    overlap = set(my_genes) & set(info['genes'])
    if overlap:
        print(f"{pathway_id}: {info['name']} - {len(overlap)} genes")

Architecture

PathwayDB follows a clean 3-layer architecture:

Connectors Layer (pathwaydb/connectors/): API clients for external databases
Storage Layer (pathwaydb/storage/): SQLite-backed local storage with query interfaces
HTTP Layer (pathwaydb/http/): Centralized HTTP client with caching and rate limiting

Key Design Principles

No external dependencies: Easier deployment, fewer conflicts
Caching by default: Respectful of API servers, faster repeat queries
Separation of concerns: Connectors and storage are independent
Extensible: Easy to add new databases following existing patterns

Performance

Initial download: 1-5 minutes depending on database size
Subsequent queries: Milliseconds (SQLite local queries)
Memory footprint: Low (streaming downloads, efficient storage)
Storage size:
- KEGG (human): ~8 MB
- MSigDB (all collections): ~77 MB
- GO (human): ~50 MB

Species Support

KEGG

Use organism codes: hsa (human), mmu (mouse), rno (rat), dme (fly), cel (worm), sce (yeast), etc.

GO (Gene Ontology)

Supported model organisms:

Category	Species	Name
Mammals	`human`	Homo sapiens
	`mouse`	Mus musculus
	`rat`	Rattus norvegicus
	`pig`	Sus scrofa
	`cow`	Bos taurus
	`dog`	Canis familiaris
	`chicken`	Gallus gallus
Fish	`zebrafish`	Danio rerio
Invertebrates	`fly`	Drosophila melanogaster
	`worm`	Caenorhabditis elegans
Plants	`arabidopsis`	Arabidopsis thaliana
Fungi	`yeast`	Saccharomyces cerevisiae

from pathwaydb import get_supported_species

# List all supported species
print(get_supported_species())
# ['arabidopsis', 'chicken', 'cow', 'dog', 'fly', 'human', 'mouse', 'pig', 'rat', 'worm', 'yeast', 'zebrafish']

# Download for any supported species
go = GO(storage_path='go_fly.db')
go.download_annotations(species='fly')

MSigDB

Use common names: human, mouse.

Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# With coverage
pytest --cov=pathwaydb tests/

Code Formatting

# Format with black
black pathwaydb/

# Lint with flake8
flake8 pathwaydb/

# Type checking
mypy pathwaydb/

Documentation

Guides

Feature Guides:

DATABASE_FILTERING_GUIDE.md - Complete filtering guide for all databases
GO_TERM_NAME_GUIDE.md - GO term name filtering
GO_CACHE_GUIDE.md - Centralized caching system
GO_TERM_NAMES_PACKAGING.md - How bundled term names work

Developer Guides:

CLAUDE.md - Architecture and development guidelines
PACKAGING_GUIDE.md - Building and packaging instructions
CONTRIBUTING.md - Contribution guidelines

API Reference

Main Classes:

KEGG(species, storage_path, cache_dir) - KEGG pathway database client
GO(storage_path, cache_dir) - Gene Ontology client
- GO.from_cache(species) - Load from centralized cache
- GO.load(species) - Auto-detect best source (bundled > cache > download)
MSigDB(storage_path, cache_dir) - MSigDB gene sets client
IDConverter(species, cache_path) - Gene ID converter

Key Methods:

download_annotations() - Download and store annotations (auto-populates term names for GO)
query_by_gene(gene) - Query annotations for a specific gene
to_dataframe(limit) - Export to pandas-compatible format
filter(**criteria) - Filter annotations by various criteria
- KEGG: pathway_name, gene_symbols, pathway_ids, organism
- GO: term_name, gene_symbols, go_ids, namespace, evidence_codes
- MSigDB: gene_set_name, description, gene_symbols, collection
stats() - Get database statistics
populate_term_names() - Manually populate GO term names (uses bundled data)

Storage Classes:

KEGGAnnotationDB(db_path) - Direct access to KEGG storage
GOAnnotationDB(db_path) - Direct access to GO storage

Package Data Functions:

load_go_term_names() - Load bundled GO term name mapping
download_to_cache(species) - Download GO annotations to centralized cache
load_from_cache(species) - Load GO annotations from cache

For detailed architecture and development guidelines, see CLAUDE.md.

Examples

See the examples/ directory for comprehensive usage examples:

examples/quickstart.py - Basic usage for all databases
examples/dataframe_export.py - DataFrame export and analysis
examples/go_filter_examples.py - GO filtering examples
test_go_cache.py - Centralized caching examples
test_msigdb_filter.py - MSigDB filtering examples

Contributing

Contributions are welcome! Here are some ways to contribute:

Add new database connectors (WikiPathways, STRING, DisGeNET, etc.)
Improve documentation
Add tests
Report bugs
Suggest features

See CONTRIBUTING.md for contribution guidelines and CLAUDE.md for detailed development guidelines.

License

MIT License - see LICENSE file for details

Citation

If you use PathwayDB in your research, please cite:

@software{pathwaydb,
  title = {PathwayDB: A Lightweight Pathway Annotation Toolkit},
  author = {Guo, Kai},
  year = {2026},
  url = {https://github.com/guokai8/pathwaydb}
}

Acknowledgments

KEGG: Kanehisa, M. et al. (2023) KEGG for taxonomy-based analysis
GO: Gene Ontology Consortium (2023) The Gene Ontology knowledgebase
MSigDB: Liberzon, A. et al. (2015) The Molecular Signatures Database
MyGene.info: Used for gene ID conversion

Support

Issues: GitHub Issues
Documentation: CLAUDE.md
Email: guokai8@gmail.com

Roadmap

Version 0.2.0 (Released):

✅ Description-based filtering for KEGG, GO, and MSigDB
✅ Bundled GO term name mapping (~1.5 MB)
✅ Automatic term name population
✅ Centralized caching system
✅ Enhanced DataFrame export for all databases
✅ Unified filtering API across databases

Version 0.3.0 (Planned):

WikiPathways connector
Batch download utilities
Comprehensive test suite
Performance optimizations

Future Considerations (based on user feedback):

STRING protein-protein interactions
DisGeNET disease-gene associations
Human Phenotype Ontology (HPO)
Integration helpers for GSEA/enrichR
REST API server mode
Command-line interface (CLI)

Want to contribute? See CONTRIBUTING.md for how to add new database connectors!

Related Projects

mygene - Gene annotation queries
bioservices - Comprehensive bio web services
gprofiler - Functional enrichment analysis
gseapy - GSEA in Python

Made with ❤️ for the bioinformatics community

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
examples		examples
pathwaydb		pathwaydb
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DATABASE_FILTERING_GUIDE.md		DATABASE_FILTERING_GUIDE.md
FILTERING_SUMMARY.md		FILTERING_SUMMARY.md
FILTER_GUIDE.md		FILTER_GUIDE.md
GO_CACHE_GUIDE.md		GO_CACHE_GUIDE.md
GO_TERM_NAMES_PACKAGING.md		GO_TERM_NAMES_PACKAGING.md
GO_TERM_NAME_GUIDE.md		GO_TERM_NAME_GUIDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PACKAGING_GUIDE.md		PACKAGING_GUIDE.md
README.md		README.md
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py
test_go_term_name_filter.py		test_go_term_name_filter.py
test_msigdb_filter.py		test_msigdb_filter.py

Folders and files

Latest commit

History

Repository files navigation

PathwayDB

Why PathwayDB?

Table of Contents

What's New in v0.2.0

Features

Installation

From Source

From PyPI (coming soon)

Quick Start

KEGG Pathways

KEGG Pathway Hierarchy

Method 2: Use Centralized Cache (Share Across Projects)

Method 3: Auto-Detect Best Source

Querying GO Annotations

Term Name Sources

MSigDB Gene Sets

Gene ID Conversion

Database Information

KEGG (Kyoto Encyclopedia of Genes and Genomes)

GO (Gene Ontology)

MSigDB (Molecular Signatures Database)

Advanced Usage

Working with Local Databases

Centralized GO Caching (NEW in v0.2.0)

Custom HTTP Caching

Batch Operations

DataFrame Export for Enrichment Analysis

Direct Export from Connectors

DataFrame Format Specifications

Analysis Examples with pandas

Use with Enrichment Analysis Tools

Architecture

Key Design Principles

Performance

Species Support

KEGG

GO (Gene Ontology)

MSigDB

Development

Running Tests

Code Formatting

Documentation

Guides

API Reference

Examples

Contributing

License

Citation

Acknowledgments

Support

Roadmap

Related Projects

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages