# Protein Sub-Cellular Localization Pipeline - Jupyter Notebook

This notebook demonstrates how to use the protein localization pipeline in an interactive Jupyter environment.

**You do NOT need to copy-paste all the code!** Simply import the modules and use them as shown below.

## Setup

First, ensure you have installed all dependencies:

```bash
pip install -r requirements.txt
```

In [None]:
# Add the scripts directory to the Python path
import sys
import os

# Adjust this path to point to your scripts directory
sys.path.insert(0, os.path.join(os.getcwd(), 'scripts'))

print("✓ Path configured")

## Option 1: Run Complete Pipeline (Easiest)

This is the simplest way - just import and run the complete pipeline:

In [None]:
from pipeline import ProteinLocalizationPipeline

# Configure your paths
input_dir = "D:\\5TH_SEM\\CELLULAR\\input"  # Your TIFF files location
output_dir = "D:\\5TH_SEM\\CELLULAR\\output"  # Where to save results

# Create and run pipeline
pipeline = ProteinLocalizationPipeline(input_dir, output_dir)
model, history = pipeline.run_complete_pipeline(epochs=20)

print("\n✓ Pipeline completed!")
print(f"Final accuracy: {history['test_accuracy'][-1]:.4f}")

## Option 2: Step-by-Step Execution

If you want more control, you can run each step individually:

### Step 1: Load TIFF Images

In [None]:
from tiff_loader import TIFFLoader

# Initialize loader
loader = TIFFLoader("D:\\5TH_SEM\\CELLULAR\\input")

# Scan for TIFF files
tiff_files = loader.scan_directory()

print(f"Found {len(tiff_files)} TIFF files")

# Load first image as example
if tiff_files:
    first_image = loader.load_single_tiff(tiff_files[0])
    info = loader.get_image_info(first_image)
    print("\nFirst image info:")
    for key, value in info.items():
        print(f"  {key}: {value}")

### Step 2: Preprocess and Segment Images

In [None]:
from preprocessing import ImagePreprocessor
import matplotlib.pyplot as plt

# Initialize preprocessor
preprocessor = ImagePreprocessor()

# Process the first image
if tiff_files:
    image = loader.load_single_tiff(tiff_files[0])
    labeled_regions, features = preprocessor.process_image(image)
    
    print(f"Detected {len(features)} regions")
    print("\nFirst region features:")
    for key, value in features[0].items():
        print(f"  {key}: {value}")
    
    # Visualize segmentation
    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.imshow(image if image.ndim == 2 else image.max(axis=0), cmap='gray')
    plt.title('Original Image')
    plt.axis('off')
    
    plt.subplot(1, 2, 2)
    plt.imshow(labeled_regions, cmap='tab20')
    plt.title('Segmented Regions')
    plt.axis('off')
    plt.tight_layout()
    plt.show()

### Step 3: Construct Graph from Segmented Image

In [None]:
from graph_construction import GraphConstructor

# Initialize constructor
constructor = GraphConstructor()

# Create graph
graph = constructor.create_graph_from_regions(features, distance_threshold=50.0)

# Print graph statistics
stats = constructor.get_graph_statistics(graph)
print("Graph statistics:")
for key, value in stats.items():
    print(f"  {key}: {value}")

# Get feature matrix
feature_matrix = constructor.get_node_feature_matrix(graph)
print(f"\nFeature matrix shape: {feature_matrix.shape}")

### Step 4: Visualize Graph

In [None]:
from visualization import GraphVisualizer

# Initialize visualizer
visualizer = GraphVisualizer()

# Visualize graph
fig = visualizer.visualize_graph(
    graph, 
    predictions=None,  # No predictions yet
    title="Protein Location Graph"
)
plt.show()

# Visualize features
fig = visualizer.visualize_features(
    feature_matrix,
    labels=[f"Node {i+1}" for i in range(feature_matrix.shape[0])],
    feature_names=['Area', 'Intensity', 'Eccentricity', 'Solidity']
)
plt.show()

### Step 5: Train Graph-CNN Model

In [None]:
from model_training import GraphCNN, ModelTrainer
import torch
from torch_geometric.data import Data, DataLoader

# For this example, we'll create some dummy data
# In practice, you would prepare your actual data

# Model configuration
num_features = 4  # area, intensity, eccentricity, solidity
num_classes = 5   # nucleus, mitochondria, ER, golgi, cytoplasm

# Initialize model
model = GraphCNN(num_features, num_classes)
trainer = ModelTrainer(model, device='cpu')
trainer.setup_training(learning_rate=0.001)

print(f"Model has {sum(p.numel() for p in model.parameters())} parameters")
print("\nModel ready for training!")
print("To train with your data, prepare DataLoader objects and call:")
print("  history = trainer.train(train_loader, test_loader, num_epochs=50)")

### Step 6: Make Predictions

In [None]:
# After training, you can make predictions

# Example prediction code (requires trained model)
# model.eval()
# with torch.no_grad():
#     output = model(test_data)
#     predictions = output.argmax(dim=1)

# Class names
class_names = ['Nucleus', 'Mitochondria', 'Endoplasmic Reticulum', 'Golgi', 'Cytoplasm']

print("Prediction classes:")
for i, name in enumerate(class_names):
    print(f"  {i}: {name}")

## Option 3: Use Pre-trained Model (If Available)

In [None]:
# Load a pre-trained model
import torch
from model_training import GraphCNN, ModelTrainer

# Initialize model
model = GraphCNN(num_features=4, num_classes=5)
trainer = ModelTrainer(model)

# Load trained weights (if you have them)
model_path = "output/models/graph_cnn.pt"
if os.path.exists(model_path):
    trainer.load_model(model_path)
    print("✓ Pre-trained model loaded!")
else:
    print("No pre-trained model found. Train one using the pipeline first.")

## Quick Demo with Synthetic Data

In [None]:
# Run a quick demo with synthetic data
from pipeline import ProteinLocalizationPipeline

# This will automatically generate synthetic data if no real data is found
pipeline = ProteinLocalizationPipeline(
    input_dir="./nonexistent",  # Use fake path to trigger synthetic data
    output_dir="./demo_output"
)

print("Running demo with synthetic data...")
model, history = pipeline.run_complete_pipeline(epochs=10)

print("\n✓ Demo completed!")
print(f"Final test accuracy: {history['test_accuracy'][-1]:.4f}")

## Visualize Training Results

In [None]:
from visualization import GraphVisualizer

visualizer = GraphVisualizer()

# Plot training history
fig = visualizer.plot_training_history(history)
plt.show()

## Summary

**You have three options:**

1. **Easiest**: Use `pipeline.run_complete_pipeline()` - runs everything automatically
2. **More Control**: Import individual modules and run step-by-step
3. **Custom**: Modify the code in the scripts directory for your specific needs

**You do NOT need to copy-paste code!** Just import the modules as shown above.

### Next Steps

1. Ensure your TIFF files are in the input directory
2. Run the complete pipeline or execute steps individually
3. Check the output directory for results:
   - `graphs/` - Graph structures
   - `models/` - Trained models
   - `visualizations/` - Result images

### Need Help?

- See `README.md` for detailed documentation
- See `docs/QUICKSTART.md` for quick setup guide
- See `docs/PROJECT_OVERVIEW.md` for technical details