# Gen Python API Tutorial

This notebook demonstrates how to use the Gen Python API for graph-based genomic data management with git-like versioning capabilities.

## Setup and Initialization

First, let's import the necessary modules and initialize a repository.

In [1]:
import gen



In [2]:
import gen
from gen.helpers import Repository

# Get the default database path
default_db_path = gen.get_gen_db_path()
print(f"Default database path: {default_db_path}")

# Open a repository from the default path
repo = Repository()
print(f"Repository initialized with database at: {repo.db_path}")

Default database path: /Users/bvh/Git/gen/.gen/gen.db
Repository initialized with database at: /Users/bvh/Git/gen/.gen/default.db


## Working with Graphs

Gen organizes genomic data in graphs. Let's explore how to access and work with graphs.

In [3]:
# List all available collections
collections = repo.query("SELECT DISTINCT collection_name FROM block_groups")
print("Available collections:")
for collection in collections:
    print(f"- {collection[0]}")

# If there are collections available, get the first one
collection_name = collections[0][0] if collections else None

if collection_name:
    # Get a graph from the collection
    graph = repo.get_graph(collection_name=collection_name)
    print(f"\nLoaded graph: {graph}")
    
    # Get graph as dictionary
    graph_dict = graph.as_dict()
    print(f"\nGraph has {len(graph_dict.get('nodes', []))} nodes and {len(graph_dict.get('edges', []))} edges")
else:
    print("\nNo collections available. You need to create or import a graph first.")

Available collections:
- default
- default/foo

Loaded graph: <gen.helpers.Graph object at 0x105d05fd0>

Graph has 3 nodes and 2 edges


## Direct SQL Queries

Gen allows direct SQL queries to the underlying database.

In [5]:
# Execute a query that doesn't return results
# repo.execute("CREATE INDEX IF NOT EXISTS idx_nodes_block_id ON nodes(block_id)")

# Query that returns results

# BUG: no queries return indices 
result = repo.query("SELECT COUNT(*) FROM block_groups")
print(f"Number of block groups: {result[0][0]}")



Number of block groups: 
Node statistics: Count=, Min ID=, Max ID=


In [8]:
repo.query("SELECT * FROM block_groups")

[['', 'default', 'NULL', 'm123'],
 ['', 'default', 'unknown', 'm123'],
 ['', 'default', 'G1', 'm123'],
 ['', 'default', 'foo', 'm123'],
 ['', 'default', 'test', 'm123'],
 ['', 'default/foo', 'NULL', 'm123']]

## Graph Visualization

Gen provides tools for visualizing graphs. Let's create a layout for a graph.

In [10]:
# BUG: edge_positions is empty
edge_positions

{}

In [9]:
if collection_name:
    # Create a layout for the graph
    layout = graph.create_layout()
    print(f"Created layout for graph")
    
    # Create a scaled layout for visualization
    layout.create_scaled_layout(label_width=100, scale=20)
    
    # Get node positions (now returns a dictionary)
    node_positions = layout.get_node_positions()
    print(f"\nNode positions (first 3):")
    for i, (node_id, pos) in enumerate(list(node_positions.items())[:3]):
        print(f"Node {node_id}: position {pos}")
    
    # Get edge positions (now returns a dictionary)
    edge_positions = layout.get_edge_positions()
    print(f"\nEdge positions (first 3):")
    for i, (edge_id, pos) in enumerate(list(edge_positions.items())[:3]):
        print(f"Edge {edge_id}: positions {pos}")
    
    # Expand the layout
    layout.expand_right()
    print("\nExpanded layout to the right")
    
    # Convert layout to dictionary
    layout_dict = layout.to_dict()
    print(f"\nLayout dictionary keys: {list(layout_dict.keys())}")

Created layout for graph

Node positions (first 3):
Node (3, 2, 0, 0): position ((115.0, 0.0), (115.0, 0.0))
Node (0, 3, 0, 34): position ((41.0, 0.0), (75.0, 0.0))
Node (2, 1, 0, 0): position ((1.0, 0.0), (1.0, 0.0))

Edge positions (first 3):

Expanded layout to the right

Layout dictionary keys: ['label_width', 'scale', 'aspect_ratio', 'nodes', 'edges']


## Working with RustWorkX

Gen can convert graphs to RustWorkX format for additional analysis.

In [None]:
try:
    import rustworkx as rx
    
    if collection_name:
        # Convert to RustWorkX graph
        rx_graph = graph.as_rustworkx()
        print(f"Converted to RustWorkX graph with {rx_graph.num_nodes()} nodes and {rx_graph.num_edges()} edges")
        
        # Analyze the graph
        if rx_graph.num_nodes() > 0:
            # Get node degrees
            degrees = [rx_graph.degree(i) for i in range(rx_graph.num_nodes())]
            avg_degree = sum(degrees) / len(degrees) if degrees else 0
            print(f"Average node degree: {avg_degree:.2f}")
            
            # Check if graph is connected
            components = rx.connected_components(rx_graph)
            print(f"Number of connected components: {len(components)}")
except ImportError:
    print("RustWorkX not installed. Install with: pip install rustworkx")

## Conclusion

This notebook demonstrated the basic functionality of the Gen Python API. You can use these tools to manage and analyze genomic data in a graph-based format with version control capabilities.