# Multi-Omics Network Integration: A Walkthrough

This notebook provides a step-by-step guide to using the `Multi-Omics-Network-Integrator`. We will:
1. Load differential expression/abundance data from RNA-Seq and proteomics.
2. Load a background Protein-Protein Interaction (PPI) network.
3. Integrate the omics data to score each gene/protein in the network.
4. Build a weighted graph using `networkx`.
5. Identify and analyze significantly dysregulated "hotspot" subnetworks.
6. Visualize the results.

### Step 1: Setup and Imports

In [None]:
import os
import yaml
import pandas as pd
import sys

# Add the src directory to the Python path
sys.path.append('../src')

from loader import load_omics_data, load_interaction_network
from integrator import score_nodes
from network_builder import build_graph
from analysis import find_hotspot_subnetworks
from visualize import plot_network, plot_interactive_network

### Step 2: Load Configuration and Data Paths

In [None]:
# Load config file
with open('../config/default_config.yaml', 'r') as f:
    config = yaml.safe_load(f)

print("Configuration loaded:")
print(yaml.dump(config))

# Define file paths
rna_file = '../data/example_rnaseq_results.csv'
prot_file = '../data/example_proteomics_results.csv'
ppi_file = '../data/example_ppi_network.tsv'

### Step 3: Load Omics and Network Data

In [None]:
rna_df = load_omics_data(rna_file)
prot_df = load_omics_data(prot_file)
ppi_df = load_interaction_network(ppi_file)

print("--- RNA-Seq Data ---")
display(rna_df.head())

print("--- Proteomics Data ---")
display(prot_df.head())

print("--- PPI Network Data ---")
display(ppi_df.head())

### Step 4: Integrate Omics Data and Score Nodes

Here, we calculate a score for each gene based on its p-value in the RNA-Seq and proteomics datasets. The weights for each omics layer are defined in our config file.

In [None]:
node_scores = score_nodes(rna_df, prot_df, ppi_df, config['scoring'])

# Display top 10 scoring nodes
sorted_scores = sorted(node_scores.items(), key=lambda item: item[1], reverse=True)
print("Top 10 scoring nodes:")
for node, score in sorted_scores[:10]:
    print(f"{node}: {score:.2f}")

### Step 5: Build the Weighted Network Graph

Now we construct a `networkx` graph where nodes have our calculated scores as attributes.

In [None]:
graph = build_graph(ppi_df, node_scores)
print(f"Graph created with {graph.number_of_nodes()} nodes and {graph.number_of_edges()} edges.")

### Step 6: Find Dysregulated 'Hotspot' Subnetworks

We identify connected components in the graph and rank them by their total score to find the most dysregulated subnetworks (pathways).

In [None]:
hotspots_df = find_hotspot_subnetworks(graph, min_size=config['analysis']['min_subnetwork_size'])

print("Identified Hotspot Subnetworks:")
display(hotspots_df)

### Step 7: Visualize the Top Subnetwork

Finally, we'll visualize the largest and highest-scoring subnetwork to inspect the dysregulated pathway.

In [None]:
# Get the nodes of the top hotspot
if not hotspots_df.empty:
    top_hotspot_nodes = hotspots_df.iloc[0]['nodes'].split(', ')
    subgraph = graph.subgraph(top_hotspot_nodes)
    
    # Create output directory if it doesn't exist
    os.makedirs('../notebooks/output', exist_ok=True)
    
    # Static Plot
    print("Generating static plot...")
    static_plot_path = '../notebooks/output/top_hotspot_static.png'
    plot_network(subgraph, static_plot_path, title="Top Dysregulated Subnetwork")
    print(f"Static plot saved to {static_plot_path}")
    
    # Interactive Plot
    print("\nGenerating interactive plot...")
    interactive_plot_path = '../notebooks/output/top_hotspot_interactive.html'
    plot_interactive_network(subgraph, interactive_plot_path, title="Top Dysregulated Subnetwork")
    print(f"Interactive plot saved to {interactive_plot_path}. Open this file in your browser to explore.")
else:
    print("No significant hotspots found with the current settings.")