# OpenOrganelle Cellular Imaging Data Explorer

This notebook demonstrates how to explore and download cellular imaging data from the OpenOrganelle platform using your uv virtual environment.

## About OpenOrganelle

OpenOrganelle is a data portal that provides access to FIB-SEM (Focused Ion Beam Scanning Electron Microscopy) datasets and organelle segmentations. The platform hosts high-resolution cellular imaging data that can be used for research in cell biology, machine learning, and image analysis.

**Key Features:**
- High-resolution FIB-SEM volumes
- Machine learning-generated organelle segmentations  
- Correlative light microscopy data
- Analysis results and measurements
- Open access with CC BY 4.0 license

Let's start by setting up our environment and exploring the available data!

## 1. Check uv Installation and Environment

First, let's verify that uv is installed and check our current environment status.

In [1]:
import subprocess
import sys
import os

# Check uv version
try:
    result = subprocess.run(['uv', '--version'], capture_output=True, text=True)
    print(f"uv version: {result.stdout.strip()}")
except FileNotFoundError:
    print("uv is not installed or not in PATH")

# Check current Python environment
print(f"Python executable: {sys.executable}")
print(f"Python version: {sys.version}")
print(f"Current working directory: {os.getcwd()}")

# Check if we're in a virtual environment
if hasattr(sys, 'real_prefix') or (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix):
    print("✅ Running in a virtual environment")
else:
    print("❌ Not running in a virtual environment")

uv version: uv 0.8.5 (ce3728681 2025-08-05)
Python executable: C:\Users\nhg43\OneDrive\Documents\code_directory\uv-python-project\.venv\Scripts\python.exe
Python version: 3.13.5 (main, Jul 23 2025, 00:30:06) [MSC v.1944 64 bit (AMD64)]
Current working directory: C:\Users\nhg43\OneDrive\Documents\code_directory\uv-python-project
✅ Running in a virtual environment


## 2. Import Required Libraries

Let's import the necessary libraries for working with OpenOrganelle data. We'll use the packages we installed in our uv environment.

In [2]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
import zarr
import fsspec
import dask.array as da
from typing import List, Dict, Optional, Tuple
import json
from tqdm import tqdm

# Import our custom OpenOrganelle downloader
sys.path.append('./src')
from openorganelle_downloader import OpenOrganelleDownloader

# Set up matplotlib for inline plotting
%matplotlib inline

# Initialize the downloader
downloader = OpenOrganelleDownloader(output_dir="./data")

print("✅ Libraries imported successfully!")
print("✅ OpenOrganelle downloader initialized!")

2025-08-07 16:19:59,324 - INFO - OpenOrganelle downloader initialized. Output directory: ./data


✅ Libraries imported successfully!
✅ OpenOrganelle downloader initialized!


## 3. Explore Available Datasets

Let's start by listing all available datasets on the OpenOrganelle platform.

In [3]:
# List available datasets
print("🔍 Discovering available datasets...")
datasets = downloader.list_datasets()

print(f"\n📊 Found {len(datasets)} datasets on OpenOrganelle:")
print("=" * 50)

for i, dataset in enumerate(datasets, 1):
    print(f"{i:2d}. {dataset}")

# Let's focus on a specific dataset for our exploration
# Look for HeLa cell data, which is commonly available
target_dataset = None
for dataset in datasets:
    if 'hela' in dataset.lower():
        target_dataset = dataset
        break

if not target_dataset and datasets:
    # If no HeLa dataset, use the first available
    target_dataset = datasets[0]

if target_dataset:
    print(f"\n🎯 Selected dataset for exploration: {target_dataset}")
else:
    print("\n❌ No datasets available")

🔍 Discovering available datasets...


severe performance issues, see also https://github.com/dask/dask/issues/10276

To fix, you should specify a lower version bound on s3fs, or
update the current installation.

2025-08-07 16:20:07,126 - INFO - Found 89 datasets



📊 Found 89 datasets on OpenOrganelle:
 1. aic_desmosome-1
 2. aic_desmosome-2
 3. aic_desmosome-3
 4. cam_hum-airway-14500
 5. cam_hum-airway-14771-b
 6. csc-zipped-data
 7. jrc_ccl81-covid-1
 8. jrc_choroid-plexus-2
 9. jrc_cos7-11
10. jrc_cos7-1a
11. jrc_cos7-1b
12. jrc_ctl-id8-1
13. jrc_ctl-id8-2
14. jrc_ctl-id8-3
15. jrc_ctl-id8-4
16. jrc_ctl-id8-5
17. jrc_dauer-larva
18. jrc_fly-acc-calyx-1
19. jrc_fly-ellipsoid-body
20. jrc_fly-fsb-1
21. jrc_fly-fsb-2
22. jrc_fly-larva-1
23. jrc_fly-mb-1a
24. jrc_fly-mb-z0419-20
25. jrc_fly-protocerebral-bridge
26. jrc_fly-vnc-1
27. jrc_hela-1
28. jrc_hela-2
29. jrc_hela-21
30. jrc_hela-22
31. jrc_hela-3
32. jrc_hela-4
33. jrc_hela-bfa
34. jrc_hela-h89-1
35. jrc_hela-h89-2
36. jrc_hela-nz-1
37. jrc_hela-nz-2
38. jrc_hum-airway-14953vc
39. jrc_jurkat-1
40. jrc_macrophage-2
41. jrc_mus-cerebellum-4
42. jrc_mus-cerebellum-5
43. jrc_mus-choroid-plexus-3
44. jrc_mus-cortex-3
45. jrc_mus-dorsal-striatum
46. jrc_mus-dorsal-striatum-2
47. jrc_mus-epidid

## 4. Explore Dataset Structure

Now let's dive deeper into the structure of our selected dataset to understand what data is available.

In [4]:
if target_dataset:
    print(f"🔬 Exploring structure of dataset: {target_dataset}")
    print("=" * 60)
    
    # Get detailed dataset information
    info = downloader.get_dataset_info(target_dataset)
    
    if 'error' not in info:
        print(f"📁 Main groups: {info.get('groups', [])}")
        print(f"📊 Main arrays: {info.get('arrays', [])}")
        
        # Explore each group in detail
        for group_name in info.get('groups', []):
            print(f"\n📂 Group: {group_name}")
            print("-" * 30)
            
            # Show arrays in this group
            group_arrays_key = f'{group_name}_arrays'
            group_groups_key = f'{group_name}_groups'
            
            arrays = info.get(group_arrays_key, [])
            subgroups = info.get(group_groups_key, [])
            
            if arrays:
                print(f"   📊 Arrays: {arrays}")
                
                # Get detailed info for each array
                for array_name in arrays[:3]:  # Limit to first 3 arrays
                    try:
                        array_path = f"{group_name}/{array_name}"
                        array_info = downloader.get_array_info(target_dataset, array_path)
                        if 'error' not in array_info:
                            shape = array_info['shape']
                            dtype = array_info['dtype']
                            size_mb = array_info['size_mb']
                            print(f"      • {array_name}: {shape} {dtype} ({size_mb:.1f} MB)")
                    except Exception as e:
                        print(f"      • {array_name}: Error getting info")
            
            if subgroups:
                print(f"   📁 Subgroups: {subgroups}")
        
        # Save metadata for later reference
        metadata_file = downloader.download_metadata(target_dataset)
        print(f"\n💾 Metadata saved to: {metadata_file}")
        
    else:
        print(f"❌ Error exploring dataset: {info['error']}")
else:
    print("❌ No dataset selected for exploration")

🔬 Exploring structure of dataset: jrc_hela-1


2025-08-07 16:20:15,258 - INFO - Retrieved information for dataset: jrc_hela-1
2025-08-07 16:20:15,321 - INFO - Array info (from attributes) for jrc_hela-1/labels/endo_pred
2025-08-07 16:20:15,380 - INFO - Array info (from attributes) for jrc_hela-1/labels/endo_seg
2025-08-07 16:20:15,425 - INFO - Array info (from attributes) for jrc_hela-1/labels/er_pred
2025-08-07 16:20:15,426 - INFO - Retrieved information for dataset: jrc_hela-1
2025-08-07 16:20:15,429 - INFO - Metadata saved to: ./data\jrc_hela-1_metadata.json


📁 Main groups: ['labels']
📊 Main arrays: []

📂 Group: labels
------------------------------
   📊 Arrays: ['endo_pred', 'endo_seg', 'er_pred', 'er_seg', 'mito_pred', 'mito_seg', 'nucleus_pred', 'nucleus_seg', 'pm_pred', 'pm_seg', 'vesicle_pred', 'vesicle_seg']
      • endo_pred: Error getting info
      • endo_seg: Error getting info
      • er_pred: Error getting info
   📁 Subgroups: ['endo_pred', 'endo_seg', 'er_pred', 'er_seg', 'mito_pred', 'mito_seg', 'nucleus_pred', 'nucleus_seg', 'pm_pred', 'pm_seg', 'vesicle_pred', 'vesicle_seg']

💾 Metadata saved to: ./data\jrc_hela-1_metadata.json


## 5. Download Sample Data

Let's download a small sample of the cellular imaging data for analysis and visualization.

In [5]:
if target_dataset:
    print(f"📥 Downloading sample data from {target_dataset}...")
    
    # Try to download EM data first (common data type)
    sample_data = None
    sample_path = None
    
    # Common data paths to try
    data_paths_to_try = [
        'em/fibsem-uint16/s0',  # Full resolution EM data
        'em/fibsem-uint8/s0',   # Alternative EM data format
        'labels/fibsem-uint64/s0',  # Segmentation labels
    ]
    
    for data_path in data_paths_to_try:
        try:
            print(f"   Trying data path: {data_path}")
            
            # Download a small 32x32x32 cube sample
            sample_file = downloader.download_array_slice(
                target_dataset, 
                data_path,
                slice_spec=(slice(0, 32), slice(0, 32), slice(0, 32))
            )
            
            if sample_file and os.path.exists(sample_file):
                print(f"   ✅ Successfully downloaded: {sample_file}")
                sample_data = np.load(sample_file)
                sample_path = data_path
                break
                
        except Exception as e:
            print(f"   ❌ Failed to download {data_path}: {str(e)[:100]}...")
            continue
    
    if sample_data is not None:
        print(f"\n📊 Sample data information:")
        print(f"   Shape: {sample_data.shape}")
        print(f"   Data type: {sample_data.dtype}")
        print(f"   Value range: {sample_data.min()} - {sample_data.max()}")
        print(f"   Memory size: {sample_data.nbytes / (1024*1024):.2f} MB")
        print(f"   Data path: {sample_path}")
    else:
        print("❌ Could not download any sample data. Dataset might have different structure.")
else:
    print("❌ No dataset available for download")

📥 Downloading sample data from jrc_hela-1...
   Trying data path: em/fibsem-uint16/s0
   Trying data path: em/fibsem-uint8/s0
   Trying data path: labels/fibsem-uint64/s0
❌ Could not download any sample data. Dataset might have different structure.


## 6. Visualize Cellular Imaging Data

Now let's create visualizations of our downloaded cellular imaging data to see the cellular structures.

In [6]:
if sample_data is not None:
    print(f"🎨 Creating visualizations of {sample_path} data...")
    
    # Create a comprehensive figure with multiple views
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    fig.suptitle(f'Cellular Imaging Data: {target_dataset}\nData Path: {sample_path}', fontsize=14, fontweight='bold')
    
    # Calculate middle slices for each dimension
    z_mid = sample_data.shape[0] // 2
    y_mid = sample_data.shape[1] // 2
    x_mid = sample_data.shape[2] // 2
    
    # Row 1: Different slice orientations
    # XY slice (looking down through Z)
    im1 = axes[0, 0].imshow(sample_data[z_mid, :, :], cmap='gray', origin='lower')
    axes[0, 0].set_title(f'XY Slice (Z={z_mid})')
    axes[0, 0].set_xlabel('X')
    axes[0, 0].set_ylabel('Y')
    plt.colorbar(im1, ax=axes[0, 0], shrink=0.8)
    
    # XZ slice (side view through Y)
    im2 = axes[0, 1].imshow(sample_data[:, y_mid, :], cmap='gray', origin='lower')
    axes[0, 1].set_title(f'XZ Slice (Y={y_mid})')
    axes[0, 1].set_xlabel('X')
    axes[0, 1].set_ylabel('Z')
    plt.colorbar(im2, ax=axes[0, 1], shrink=0.8)
    
    # YZ slice (side view through X)
    im3 = axes[0, 2].imshow(sample_data[:, :, x_mid], cmap='gray', origin='lower')
    axes[0, 2].set_title(f'YZ Slice (X={x_mid})')
    axes[0, 2].set_xlabel('Y')
    axes[0, 2].set_ylabel('Z')
    plt.colorbar(im3, ax=axes[0, 2], shrink=0.8)
    
    # Row 2: Analysis plots
    # Histogram of pixel intensities
    axes[1, 0].hist(sample_data.flatten(), bins=50, alpha=0.7, color='blue', edgecolor='black')
    axes[1, 0].set_title('Pixel Intensity Distribution')
    axes[1, 0].set_xlabel('Intensity Value')
    axes[1, 0].set_ylabel('Frequency')
    axes[1, 0].grid(True, alpha=0.3)
    
    # Maximum intensity projection (MIP) in Z direction
    mip_z = np.max(sample_data, axis=0)
    im4 = axes[1, 1].imshow(mip_z, cmap='hot', origin='lower')
    axes[1, 1].set_title('Maximum Intensity Projection (Z)')
    axes[1, 1].set_xlabel('X')
    axes[1, 1].set_ylabel('Y')
    plt.colorbar(im4, ax=axes[1, 1], shrink=0.8)
    
    # 3D structure visualization (sum projection with enhanced contrast)
    sum_proj = np.sum(sample_data, axis=0)
    im5 = axes[1, 2].imshow(sum_proj, cmap='viridis', origin='lower')
    axes[1, 2].set_title('Sum Projection (Z)')
    axes[1, 2].set_xlabel('X')
    axes[1, 2].set_ylabel('Y')
    plt.colorbar(im5, ax=axes[1, 2], shrink=0.8)
    
    plt.tight_layout()
    plt.show()
    
    # Print some statistics
    print(f"\n📈 Data Statistics:")
    print(f"   Mean intensity: {sample_data.mean():.2f}")
    print(f"   Standard deviation: {sample_data.std():.2f}")
    print(f"   Min/Max values: {sample_data.min()} / {sample_data.max()}")
    print(f"   Data shape: {sample_data.shape}")
    print(f"   Voxel count: {np.prod(sample_data.shape):,}")
    
else:
    print("❌ No sample data available for visualization")
    print("💡 Try running the previous cell again or check dataset availability")

❌ No sample data available for visualization
💡 Try running the previous cell again or check dataset availability


## 7. Advanced Data Access

For more advanced analysis, you might want to access larger portions of the data or specific organelle segmentations.

In [7]:
# Example of direct access to larger data using zarr and dask
if target_dataset:
    print(f"🔬 Advanced data access example for {target_dataset}")
    
    try:
        # Direct access to the dataset using zarr with v3 compatibility
        n5_path = f"s3://janelia-cosem-datasets/{target_dataset}/{target_dataset}.n5"
        
        # Try different methods for zarr access (v2 vs v3 compatibility)
        group = None
        try:
            # Method 1: Try zarr v3 with fsspec
            store = fsspec.get_mapper(n5_path, anon=True)
            group = zarr.open(store, mode='r')
        except Exception as e1:
            try:
                # Method 2: Try with different zarr opening
                import fsspec
                fs = fsspec.filesystem('s3', anon=True)
                store = zarr.storage.FSStore(fs=fs, path=n5_path.replace('s3://', ''))
                group = zarr.open_group(store=store, mode='r')
            except Exception as e2:
                print(f"   ❌ Could not open dataset: {e1}, {e2}")
                group = None
        
        if group is not None:
            print(f"📁 Available data groups:")
            
            # Get groups with compatibility for different zarr versions
            groups = []
            try:
                if hasattr(group, 'group_keys'):
                    groups = list(group.group_keys())
                elif hasattr(group, 'keys'):
                    all_keys = list(group.keys())
                    groups = [k for k in all_keys if hasattr(group.get(k, None), 'keys')]
            except Exception as e:
                print(f"   ❌ Could not list groups: {e}")
            
            for key in groups:
                print(f"   • {key}")
                try:
                    subgroup = group[key]
                    if hasattr(subgroup, 'array_keys'):
                        arrays = list(subgroup.array_keys())
                    elif hasattr(subgroup, 'keys'):
                        all_keys = list(subgroup.keys())
                        arrays = [k for k in all_keys if not hasattr(subgroup.get(k, None), 'keys')]
                    else:
                        arrays = []
                    
                    if arrays:
                        print(f"     Arrays: {arrays[:3]}{'...' if len(arrays) > 3 else ''}")
                except Exception as e:
                    print(f"     Error accessing {key}: {str(e)[:50]}...")
            
            # Example: Access EM data directly with dask for lazy loading
            em_data_paths = ['em/fibsem-uint16/s0', 'em/fibsem-uint8/s0']
            
            for em_path in em_data_paths:
                try:
                    if em_path.split('/')[0] in groups:
                        print(f"\n🔍 Accessing: {em_path}")
                        zarray = group[em_path]
                        
                        # Handle chunks compatibility
                        chunks = zarray.chunks if hasattr(zarray, 'chunks') else None
                        if chunks is None:
                            chunks = tuple(min(64, s) for s in zarray.shape)
                        
                        darray = da.from_array(zarray, chunks=chunks)
                        
                        print(f"   Full dataset shape: {darray.shape}")
                        print(f"   Data type: {darray.dtype}")
                        print(f"   Chunk size: {chunks}")
                        print(f"   Estimated size: {darray.nbytes / (1024**3):.2f} GB")
                        
                        # Show how to access specific regions efficiently
                        print(f"   Example: Access a 100x100x100 region:")
                        print(f"   region = darray[0:100, 0:100, 0:100].compute()")
                        break
                except Exception as e:
                    print(f"   ❌ Could not access {em_path}: {str(e)[:50]}...")
            
            # Show organelle segmentation data if available
            if 'labels' in groups:
                print(f"\n🏷️  Available organelle segmentations:")
                try:
                    labels_group = group['labels']
                    if hasattr(labels_group, 'group_keys'):
                        label_keys = list(labels_group.group_keys())
                    elif hasattr(labels_group, 'keys'):
                        all_keys = list(labels_group.keys())
                        label_keys = [k for k in all_keys if hasattr(labels_group.get(k, None), 'keys')]
                    else:
                        label_keys = []
                    
                    for key in label_keys[:5]:  # Show first 5
                        print(f"   • {key}")
                except Exception as e:
                    print(f"   ❌ Could not access labels: {str(e)[:50]}...")
        else:
            print("❌ Could not access dataset with any method")
                
    except Exception as e:
        print(f"❌ Error in advanced access: {str(e)[:100]}...")

print(f"\n💡 Tips for working with large datasets:")
print(f"   • Use dask arrays for lazy loading of large data")
print(f"   • Access only the regions you need using slicing")
print(f"   • Consider using lower resolution versions (s1, s2, etc.) for exploration")
print(f"   • Use chunked processing for analysis of full datasets")
print(f"   • The OpenOrganelle website provides Neuroglancer links for online visualization")

🔬 Advanced data access example for jrc_hela-1
   ❌ Could not open dataset: nothing found at path '', FSStore.__init__() missing 1 required positional argument: 'url'
❌ Could not access dataset with any method

💡 Tips for working with large datasets:
   • Use dask arrays for lazy loading of large data
   • Access only the regions you need using slicing
   • Consider using lower resolution versions (s1, s2, etc.) for exploration
   • Use chunked processing for analysis of full datasets
   • The OpenOrganelle website provides Neuroglancer links for online visualization


## 8. Next Steps and Resources

Congratulations! You've successfully explored cellular imaging data from OpenOrganelle using your uv virtual environment.

### What you've accomplished:
- ✅ Set up and verified your uv virtual environment
- ✅ Installed required packages for cellular imaging analysis
- ✅ Connected to the OpenOrganelle data platform
- ✅ Explored available datasets and their structure
- ✅ Downloaded sample cellular imaging data
- ✅ Created visualizations of FIB-SEM data

### Next Steps:
1. **Explore more datasets**: Try different cell types and organisms
2. **Analyze organelle segmentations**: Download and visualize organelle labels
3. **Scale up analysis**: Use dask for processing larger data regions
4. **Machine learning**: Use the data for training image analysis models
5. **Quantitative analysis**: Measure organelle properties and relationships

### Useful Resources:
- **OpenOrganelle Website**: https://openorganelle.janelia.org/
- **Documentation**: https://github.com/janelia-cosem/fibsem-tools
- **CellMap Project**: https://www.janelia.org/project-team/cellmap
- **Neuroglancer Viewer**: For online 3D visualization
- **N5 Format**: https://github.com/saalfeldlab/n5

### Command Line Usage:
You can also use the downloader from the command line:
```bash
# Activate your uv environment first
./.venv/Scripts/activate  # Windows
# or: source .venv/bin/activate  # Linux/Mac

# Then run the downloader
python src/openorganelle_downloader.py --list-datasets
python src/openorganelle_downloader.py --explore jrc_hela-2
python src/openorganelle_downloader.py --download jrc_hela-2
```

Happy exploring! 🔬