# IMC ROI Analysis Pipeline - Stage 1: Per-ROI Processing

**Goal:** This notebook processes individual Imaging Mass Cytometry (IMC) ROI files (`.txt` format). For each ROI, it performs:
1.  Data loading and validation.
2.  Optimal Arcsinh cofactor calculation.
3.  Per-channel Arcsinh transformation and scaling.
4.  Generation of resolution-independent visualizations (Pixel correlation clustermap, Raw vs. Scaled comparison).
5.  Iterates through specified Leiden resolutions:
    *   Spatial Leiden clustering on pixel data.
    *   Calculation of community profiles (average scaled expression per community).
    *   Differential expression analysis between adjacent communities (optional).
    *   Generation of resolution-dependent visualizations (Community correlation map, UMAP, Co-expression matrix).
6.  Saves processed data (scaled pixel results, community profiles, etc.) and visualizations into ROI-specific output directories.

**Methodology:** It utilizes functions imported from the `src.roi_pipeline` modules and leverages `joblib` for parallel processing of ROIs.

**Input:**
*   Raw IMC `.txt` files located in the directory specified in `config.yaml` (`paths: data_dir`).
*   Configuration settings from `config.yaml`.

**Output:**
*   A structured output directory (specified in `config.yaml`, `paths: output_dir`) containing subdirectories for each processed ROI.
*   Within each ROI directory:
    *   Cofactor information (`cofactors_*.json`).
    *   Resolution-independent plots (`pixel_channel_correlation_*.svg`, `spatial_raw_vs_scaled_matrix_*.svg`).
    *   Subdirectories for each processed resolution (e.g., `resolution_0_3/`).
        *   Community profiles (`community_profiles_scaled_*.csv`).
        *   Differential expression results (optional) (`community_diff_profiles_*.csv`, `community_top_channels_*.csv`).
        *   UMAP coordinates (optional) (`umap_coords_*.csv`).
        *   Final pixel results with community assignments (`pixel_analysis_results_final_*.csv`).
        *   Resolution-dependent plots (`community_channel_correlation_*.svg`, `umap_community_scatter_*.svg`, `coexpression_matrix_scaled_vs_avg_*.svg`).

**Next Steps:** The outputs generated by this notebook (specifically the `community_profiles_scaled_*.csv` and `pixel_analysis_results_final_*.csv` files) serve as the primary inputs for the **Experiment-Level Analysis Notebook**.

In [2]:
# Imports
import yaml
import pandas as pd
import numpy as np
import os
import glob
import time
import sys
import multiprocessing
import traceback
import gc
from typing import List, Tuple, Optional, Dict, Any
from joblib import Parallel, delayed

# --- Import Pipeline Modules ---
# Encapsulated logic resides in these modules
try:
    from src.roi_pipeline.imc_data_utils import (
        load_and_validate_roi_data,
        calculate_asinh_cofactors_for_roi,
        apply_per_channel_arcsinh_and_scale,
    )
    from src.roi_pipeline.pixel_analysis_core import (
        run_spatial_leiden,
        calculate_and_save_profiles,
        calculate_differential_expression # Keep import even if DiffEx is optional via config
    )
    from src.roi_pipeline.pixel_visualization import (
        plot_correlation_clustermap,
        plot_umap_scatter, # Requires umap-learn
        plot_coexpression_matrix,
        plot_raw_vs_scaled_spatial_comparison
    )
    # Attempt to import UMAP, set flag
    try:
        import umap
        umap_available = True
    except ImportError:
        print("WARNING: package 'umap-learn' not found. UMAP visualization will be skipped.")
        umap_available = False

    print("Successfully imported pipeline modules.")
except ImportError as e:
    print(f"ERROR: Failed to import pipeline modules. Ensure 'src' is in the Python path.")
    print(f"Details: {e}")
    # Optionally raise error or exit if imports fail
    # raise e

  from .autonotebook import tqdm as notebook_tqdm


Successfully imported pipeline modules.


## 1. Load Configuration

Load settings from the central `config.yaml` file.

In [3]:
CONFIG_PATH = "config.yaml" # Or allow user input

def load_config(config_path: str) -> Optional[Dict]:
    """Loads the pipeline configuration from a YAML file."""
    try:
        with open(config_path, 'r') as f:
            config = yaml.safe_load(f)
        print(f"Configuration loaded successfully from: {config_path}")
        # Basic validation (can be expanded)
        if not isinstance(config, dict) or not all(k in config for k in ['paths', 'data', 'analysis', 'processing']):
            print(f"ERROR: Config file {config_path} is missing required top-level keys (paths, data, analysis, processing) or is not a valid dictionary.")
            return None
        # Validate essential sub-keys
        if not config.get('paths',{}).get('data_dir') or not config.get('paths',{}).get('output_dir'):
             print("ERROR: Config missing paths -> data_dir or paths -> output_dir")
             return None
        if not config.get('data', {}).get('master_protein_channels'):
             print("ERROR: Config missing data -> master_protein_channels")
             return None
        print("Config basic validation passed.")
        return config
    except FileNotFoundError:
        print(f"ERROR: Configuration file not found at {config_path}")
        return None
    except yaml.YAMLError as e:
        print(f"ERROR: Failed to parse configuration file {config_path}: {e}")
        return None
    except Exception as e:
        print(f"ERROR: An unexpected error occurred while loading configuration: {e}")
        return None

# Load the config
config = load_config(CONFIG_PATH)

# Display some key config values (optional)
if config:
    print("\nKey Configuration Parameters:")
    print(f"  Data Directory: {config.get('paths', {}).get('data_dir')}")
    print(f"  Output Directory: {config.get('paths', {}).get('output_dir')}")
    print(f"  Metadata File: {config.get('paths', {}).get('metadata_file')}")
    print(f"  Master Protein Channels: {config.get('data', {}).get('master_protein_channels')}")
    print(f"  Default Cofactor: {config.get('data', {}).get('default_arcsinh_cofactor')}")
    print(f"  Differential Expression: {config.get('analysis', {}).get('differential_expression', {}).get('run_differential_expression', False)}")
    print(f"  Non-protein Markers for UMAP: {config.get('analysis', {}).get('differential_expression', {}).get('non_protein_markers_for_umap', [])}")
    print(f"  UMAP Parameters: {config.get('analysis', {}).get('umap', {}).get('n_neighbors', 'Not Set')} neighbors, {config.get('analysis', {}).get('umap', {}).get('min_dist', 'Not Set')} min dist, {config.get('analysis', {}).get('umap', {}).get('n_components', 'Not Set')} components")
    print(f"  Clustering Seed: {config.get('analysis', {}).get('clustering', {}).get('seed', 'Not Set')}")
    print(f"  Spatial Clustering Parameters: {config.get('analysis', {}).get('clustering', {}).get('n_neighbors', 'Not Set')} neighbors, {config.get('analysis', {}).get('clustering', {}).get('resolution_params', 'Not Set')} resolution")    

    # Add more relevant parameters as needed
else:
    print("\nStopping notebook execution due to configuration loading error.")
    # Consider raising an error or using sys.exit() if running non-interactively
    # raise ValueError("Failed to load configuration.")

Configuration loaded successfully from: config.yaml
Config basic validation passed.

Key Configuration Parameters:
  Data Directory: /home/noot/IMC/data/241218_IMC_Alun
  Output Directory: /home/noot/IMC/output_plots/
  Metadata File: /home/noot/IMC/data/Data_annotations_Karen/Metadata-Table 1.csv
  Master Protein Channels: ['CD45(Y89Di)', 'Ly6G(Pr141Di)', 'CD11b(Nd143Di)', 'CD140a(Nd148Di)', 'CD140b(Eu151Di)', 'CD31(Sm154Di)', 'CD34(Er166Di)', 'CD206(Tm169Di)', 'CD44(Yb171Di)', 'DNA1(Ir191Di)', 'DNA2(Ir193Di)', '190BCKG(BCKG190Di)', '80ArAr(ArAr80Di)', '130Ba(Ba130Di)', '131Xe(Xe131Di)']
  Default Cofactor: 5.0
  Differential Expression: False
  Non-protein Markers for UMAP: ['80ArAr(ArAr80Di)', '130Ba(Ba130Di)', '131Xe(Xe131Di)', '190BCKG(BCKG190Di)', 'DNA1(Ir191Di)', 'DNA2(Ir193Di)']
  UMAP Parameters: 15 neighbors, 0.1 min dist, 8 components
  Clustering Seed: 42
  Spatial Clustering Parameters: 30 neighbors, [1, 0.1] resolution


## 2. Define the Core ROI Processing Function (`analyze_roi`)

This function encapsulates the entire analysis workflow for a *single* ROI file. It calls the underlying functions imported from our pipeline modules.

## 3. Find Input Files
Locate all `.txt` files in the specified data directory.

In [4]:
imc_files = []
if config:
    data_dir = config['paths']['data_dir']
    try:
        imc_files = sorted(glob.glob(os.path.join(data_dir, "*.txt"))) # Sort for consistency
        if not imc_files:
            print(f"ERROR: No .txt files found in data directory: {data_dir}")
        else:
            print(f"\nFound {len(imc_files)} IMC data files to process:")
            # Print first few files
            for f in imc_files[:min(5, len(imc_files))]: print(f"  - {os.path.basename(f)}")
            if len(imc_files) > 5: print("  ...")
    except Exception as e:
         print(f"ERROR finding input files in {data_dir}: {e}")
         imc_files = [] # Ensure it's empty on error
else:
    print("Skipping file search due to missing configuration.")



Found 25 IMC data files to process:
  - IMC_241218_Alun_ROI_D1_M1_01_9.txt
  - IMC_241218_Alun_ROI_D1_M1_02_10.txt
  - IMC_241218_Alun_ROI_D1_M1_03_11.txt
  - IMC_241218_Alun_ROI_D1_M2_01_12.txt
  - IMC_241218_Alun_ROI_D1_M2_02_13.txt
  ...


## 4. Setup Parallel Processing
Determine the number of CPU cores to use based on the configuration (`processing: parallel_jobs`). `-1` uses all cores, `-2` uses all but one, etc.

In [5]:
n_jobs = 1
if config:
    try:
        parallel_jobs_config = config['processing']['parallel_jobs']
        cpu_count = multiprocessing.cpu_count()
        if isinstance(parallel_jobs_config, int):
            if parallel_jobs_config == -1:
                n_jobs = cpu_count
            elif parallel_jobs_config <= -2:
                n_jobs = max(1, cpu_count + parallel_jobs_config + 1)
            elif parallel_jobs_config > 0:
                n_jobs = min(parallel_jobs_config, cpu_count)
            else: # 0 or invalid
                n_jobs = 1
        else: n_jobs = 1 # Default for invalid type
        print(f"\nConfigured to use {n_jobs} cores for parallel processing.")
    except KeyError:
         print("\nWarning: 'parallel_jobs' not found in config. Defaulting to 1 core.")
         n_jobs = 1
    except Exception as e:
         print(f"\nWarning: Error determining parallel jobs: {e}. Defaulting to 1 core.")
         n_jobs = 1
else:
    print("Skipping parallel setup due to missing configuration.")



Configured to use 15 cores for parallel processing.


## Load metadata

In [6]:
import os
import re
import pandas as pd
import numpy as np
import traceback

# --- Load Metadata ---
metadata_file = config['paths'].get('metadata_file')
metadata = None
metadata_map = {}
first_timepoint = None
reference_channel_order = None
reference_roi_path = None
timepoint_col = None  # Define outside try block

print("--- Loading Metadata ---")
if metadata_file and os.path.exists(metadata_file):
    try:
        metadata = pd.read_csv(metadata_file)
        print(f"Metadata loaded successfully from: {metadata_file}")
        # Prepare for reference timepoint identification
        metadata_roi_col = config['experiment_analysis']['metadata_roi_col']
        timepoint_col = config['experiment_analysis']['timepoint_col']  # Assign here
        if metadata_roi_col not in metadata.columns:
            raise ValueError(f"Metadata ROI column '{metadata_roi_col}' not found in {metadata_file}")
        if timepoint_col not in metadata.columns:
            raise ValueError(f"Metadata timepoint column '{timepoint_col}' not found in {metadata_file}")

        # Define helper function (can be defined globally or here)
        def get_roi_string_from_path(p):
            """
            Extracts the ROI string (e.g. ROI_D7_M1_01_21) from 
            a full filename like IMC_241218_Alun_ROI_D7_M1_01_21.txt
            """
            fname = os.path.basename(p)
            base, _ = os.path.splitext(fname)
            m = re.search(r"(ROI_[A-Za-z0-9_]+)$", base)
            if m:
                return m.group(1)
            else:
                # fallback: return the full basename if no match
                return base
                
        metadata[metadata_roi_col] = metadata[metadata_roi_col].astype(str)
        metadata_map = {}
        for index, row in metadata.iterrows():
            roi_key = row[metadata_roi_col]
            if roi_key not in metadata_map:
                metadata_map[roi_key] = row.to_dict()

        all_timepoint_values = metadata[timepoint_col].unique()

        # Filter out potential NaN values before sorting
        valid_timepoint_values = [v for v in all_timepoint_values if pd.notna(v)]
        print(f"Valid (non-NaN) timepoint values: {valid_timepoint_values}")

        # Identify the first timepoint value
        if not valid_timepoint_values:
            print("WARNING: No valid (non-NaN) timepoint values found in metadata column.")
            first_timepoint = None
        else:
            try:
                # Attempt numeric sort first
                sorted_numeric = sorted(valid_timepoint_values, key=lambda x: float(x))
                first_timepoint = sorted_numeric[0]
                print(f"Identified first timepoint with numeric sort: {first_timepoint}")
            except (ValueError, TypeError):
                # Fallback to string sort if numeric fails
                sorted_string = sorted([str(v) for v in valid_timepoint_values])  # Ensure all are strings for sort
                first_timepoint = sorted_string[0]
                print(f"Identified first timepoint with string sort: {first_timepoint}")

        if first_timepoint is not None:
            print(f"Confirmed first timepoint: {first_timepoint} (from column '{timepoint_col}')")
        else:
            print(f"Could not identify a valid first timepoint from column '{timepoint_col}'.")

    except FileNotFoundError:
        print(f"WARNING: Metadata file specified but not found: {metadata_file}. Cannot use metadata features.")
    except KeyError as e:
        print(f"WARNING: Missing expected key in config for metadata: {e}. Cannot use metadata features.")
    except ValueError as e:
        print(f"WARNING: Error processing metadata: {e}. Cannot use metadata features.")
    except Exception as e:
        print(f"ERROR loading or processing metadata from {metadata_file}: {e}")
        metadata = None  # Ensure metadata is None on error
else:
    print("WARNING: No metadata file specified or found in config/path. Cannot apply metadata-based ordering.")

--- Loading Metadata ---
Metadata loaded successfully from: /home/noot/IMC/data/Data_annotations_Karen/Metadata-Table 1.csv
Valid (non-NaN) timepoint values: [np.int64(7), np.int64(0), np.int64(1), np.int64(3)]
Identified first timepoint with numeric sort: 0
Confirmed first timepoint: 0 (from column 'Injury_Day')


## 5. Run analysis over all ROIs

Execute the `analyze_roi` function for each input file using `joblib.Parallel`.

**Note:** This cell may take a significant amount of time depending on the number of ROIs, data size, number of resolutions, and number of cores used. **Ensure `analyze_roi` is defined in an imported `.py` module if `n_jobs > 1`.**

In [7]:
# ------------------------------------------------------------------
# 1) Collect correlation matrices from "first-timepoint" ROIs
# ------------------------------------------------------------------
import os
import numpy as np

correlation_matrices = []
all_channels_set = set()  # Keep track of all channels encountered

for fp in imc_files:
    roi_key = get_roi_string_from_path(fp)
    
    # --- MODIFIED METADATA LOOKUP ---
    md = {}  # Default to empty dict if no match found
    found_key = None
    for map_key, metadata_entry in metadata_map.items():
        # Check if the extracted ROI string is PART of the metadata map key
        if roi_key in map_key:
            md = metadata_entry
            found_key = map_key  # Keep track of the key we matched
            break  # Stop after finding the first match

    if not found_key:
        print(f"   WARNING: No metadata found for ROI pattern '{roi_key}' in file {os.path.basename(fp)}")
    # --- END MODIFICATION ---

    # Now check the timepoint using the potentially found metadata
    if md.get(timepoint_col) != first_timepoint:
        continue  # Skip this file if timepoint doesn't match or metadata wasn't found

    print(f"→ Processing ROI for consensus correlation: {roi_key} (Matched metadata key: {found_key})")

    print(f"→ Computing correlation matrix for ROI {roi_key}")
   
    _, _, raw_df, roi_chs = load_and_validate_roi_data(
        file_path=fp,
        master_protein_channels=config['data']['master_protein_channels'],
        base_output_dir=config['paths']['output_dir'],                        # no on-disk ROI folders here
        metadata_cols=config['data']['metadata_cols']
    )

    if raw_df is None or roi_chs is None:
        print(f"   WARNING: Failed to load or validate data for ROI {roi_key}. Skipping.")
        continue

    roi_cofactors = calculate_asinh_cofactors_for_roi(
        roi_df=raw_df,
        channels_to_process=roi_chs,
        default_cofactor=config['data']['default_arcsinh_cofactor'],
        output_dir=os.path.join(config['paths']['output_dir'], roi_key),
        roi_string=roi_key
    )

    scaled_df, _ = apply_per_channel_arcsinh_and_scale(
        data_df=raw_df,
        channels=roi_chs,
        cofactors_map=roi_cofactors,                            # defaults will be used
        default_cofactor=config['data']['default_arcsinh_cofactor']
    )
    
    if scaled_df.empty:
        print(f"   WARNING: Scaling failed for ROI {roi_key}. Skipping.")
        continue

    # Ensure we are only using the channels present in this specific ROI
    current_roi_channels = [ch for ch in config['data']['master_protein_channels'] if ch in scaled_df.columns]
    if not current_roi_channels:
        print(f"   WARNING: No master protein channels found in scaled data for ROI {roi_key}. Skipping.")
        continue

    # Build spearman-corr matrix FOR THE CHANNELS PRESENT IN THIS ROI
    # Reindex ensures matrices are compatible for averaging later, filling missing channels with NaN
    corr_mat = scaled_df[current_roi_channels].corr(method='spearman')
    correlation_matrices.append(corr_mat)
    all_channels_set.update(current_roi_channels)  # Add this ROI's channels to the global set

# ------------------------------------------------------------------
# 2) Compute the average correlation matrix and cluster it
# ------------------------------------------------------------------
if not correlation_matrices:
    print("ERROR: No correlation matrices were generated for first-timepoint ROIs.")
    reference_channel_order = []
else:
    # Convert set to a sorted list for consistent ordering
    consensus_channels = sorted(list(all_channels_set))

    # Reindex all matrices to the full set of channels, aligning them
    reindexed_matrices = [
        mat.reindex(index=consensus_channels, columns=consensus_channels)
        for mat in correlation_matrices
    ]

    # Stack matrices into a 3D numpy array and compute mean, ignoring NaNs
    # NaNs might appear if a channel was missing entirely in one ROI
    stacked_matrices = np.stack([mat.to_numpy() for mat in reindexed_matrices], axis=0)
    average_corr_matrix_np = np.nanmean(stacked_matrices, axis=0)

    # Convert back to DataFrame for easier handling with plotting/clustering functions
    average_corr_matrix_df = pd.DataFrame(average_corr_matrix_np, index=consensus_channels, columns=consensus_channels)

    # Handle potential NaNs remaining in the average matrix (e.g., if a channel was missing in *all* ROIs)
    # Option 1: Fill with 0 (uncorrelated) - check if appropriate
    average_corr_matrix_df = average_corr_matrix_df.fillna(0)

    print("\n--- Clustering Average Correlation Matrix ---")
    # Now, perform hierarchical clustering ONCE on this average matrix
    try:
        import seaborn as sns
        import scipy.cluster.hierarchy as sch
        import matplotlib.pyplot as plt  # Needed for figure context

        # Perform clustering using the average correlation matrix
        linkage = sch.linkage(sch.distance.pdist(average_corr_matrix_df.values), method=config['analysis']['clustering']['linkage'])  # Example: using ward linkage

        # Get the order directly from the linkage
        dendrogram = sch.dendrogram(linkage, no_plot=True)
        reference_channel_order = [average_corr_matrix_df.columns[i] for i in dendrogram['leaves']]

        print(">>> Consensus Reference channel order computed from average correlation:")
        print(reference_channel_order)

    except ImportError:
        print("ERROR: Need seaborn and scipy to perform clustering on the average matrix.")
        print("Install them (`pip install seaborn scipy`) or adapt using a different clustering library.")
        reference_channel_order = consensus_channels  # Fallback to alphabetical
    except Exception as e:
        print(f"ERROR during final clustering of average matrix: {e}")
        reference_channel_order = consensus_channels  # Fallback

→ Processing ROI for consensus correlation: ROI_Sam1_01_2 (Matched metadata key: IMC_241218_Alun_ROI_Sam1_01_2)
→ Computing correlation matrix for ROI ROI_Sam1_01_2
   Derived roi_string (for outputs): ROI_Sam1_01_2
   Derived metadata_key (for metadata lookup): IMC_241218_Alun_ROI_Sam1_01_2
Loading data...
Loaded data with shape: (250000, 21)
Using 15 channels for analysis.

Calculating optimal Arcsinh cofactors for ROI...
--- Cofactor calculation finished in 10.91 seconds ---
   Optimal cofactors saved to: /home/noot/IMC/output_plots/ROI_Sam1_01_2/asinh_cofactors_ROI_Sam1_01_2.json

--- Applying Per-Channel Arcsinh (using optimal cofactors) and Scaling ---
   Applying arcsinh transformation with specific cofactors...
   Applying MinMaxScaler to transformed data...
--- Transformation and scaling finished in 0.06 seconds ---
→ Processing ROI for consensus correlation: ROI_Sam1_02_3 (Matched metadata key: IMC_241218_Alun_ROI_Sam1_02_3)
→ Computing correlation matrix for ROI ROI_Sam1_02_

In [10]:
from run_roi_analysis import analyze_roi, prepare_reference_order

analysis_results = []
if config and imc_files:
    start_parallel_time = time.time()
    print(f"\n--- Starting main parallel analysis for {len(imc_files)} ROIs ({n_jobs} jobs) ---")
    # Run the parallel processing - Updated call signature
    analysis_results = Parallel(n_jobs=n_jobs, verbose=10)(
        delayed(analyze_roi)(
            i,
            file_path,
            len(imc_files),
            config,
            # Pass the necessary arguments:
            roi_metadata=metadata_map.get(get_roi_string_from_path(file_path)),  # Look up metadata
            reference_channel_order=reference_channel_order,  # Pass the calculated order (or None)
            first_timepoint_value=first_timepoint  # Pass the identified first timepoint value (or None)
            )
        for i, file_path in enumerate(imc_files)
    )

    print(f"\n--- Parallel processing finished in {time.time() - start_parallel_time:.2f} seconds ---")

    # --- Aggregate Results (Updated) ---
    successful_results = [r for r in analysis_results if isinstance(r, tuple) and len(r) == 2 and r[0] is not None]
    successful_rois = [r[0] for r in successful_results]
    failed_rois_count = len(imc_files) - len(successful_rois)

    print(f"\n--- Pipeline Summary ---")
    print(f"Successfully completed processing for {len(successful_rois)} ROIs.")
    if failed_rois_count > 0:
        print(f"Failed to process or fully complete {failed_rois_count} ROIs.")

else:
    print("\nSkipping parallel execution: Missing configuration or input files.")



--- Starting main parallel analysis for 25 ROIs (15 jobs) ---


[Parallel(n_jobs=15)]: Using backend LokyBackend with 15 concurrent workers.



Configured resolutions: [1, 0.1]
Loading and validating data...
   Derived roi_string (for outputs): ROI_D1_M1_01_9
   Derived metadata_key (for metadata lookup): IMC_241218_Alun_ROI_D1_M1_01_9
Loading data...
Loaded data with shape: (250000, 21)
Using 15 channels for analysis.
--- Loading finished in 0.43 seconds ---

Calculating optimal cofactors...

Calculating optimal Arcsinh cofactors for ROI...

Configured resolutions: [1, 0.1]
Loading and validating data...
   Derived roi_string (for outputs): ROI_D3_M1_03_17
   Derived metadata_key (for metadata lookup): IMC_241218_Alun_ROI_D3_M1_03_17
Loading data...
Loaded data with shape: (250000, 21)
Using 15 channels for analysis.
--- Loading finished in 0.25 seconds ---

Calculating optimal cofactors...

Calculating optimal Arcsinh cofactors for ROI...

Configured resolutions: [1, 0.1]
Loading and validating data...
   Derived roi_string (for outputs): ROI_D3_M1_01_15
   Derived metadata_key (for metadata lookup): IMC_241218_Alun_ROI_D3_

  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D1_M2_03_14_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D3_M2_02_19_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D1_M1_02_10_res_1.png

   Running UMAP and plotting communities...
   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D3_M2_03_20_res_1.png

   Running UMAP and plotting communities...


  warn(
  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D3_M1_03_17_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D1_M1_03_11_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D7_M1_01_21_res_1.png

   Running UMAP and plotting communities...


  warn(


      UMAP coordinates saved to: umap_coords_diff_profiles_ROI_D7_M1_03_23_res_1.csv
   Generating UMAP scatter plot: umap_community_scatter_protein_markers_diff_profiles_ROI_D7_M1_03_23_res_1.svg
   Plotting 11112 communities primarily identified by protein markers.
   UMAP community scatter plot saved to: /home/noot/IMC/output_plots/ROI_D7_M1_03_23/resolution_1/umap_community_scatter_protein_markers_diff_profiles_ROI_D7_M1_03_23_res_1.svg

   Generating combined scaled-pixel/avg-comm co-expression matrix...
   Saving combined co-expression matrix to: coexpression_matrix_scaled_vs_avg_ROI_D7_M1_03_23_res_1.svg
   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D1_M1_01_9_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D1_M2_02_13_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D1_M2_01_12_res_1.png

   Running UMAP and plotting communities...


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D3_M1_02_16_res_1.png

   Running UMAP and plotting communities...
   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D7_M1_02_22_res_1.png

   Running UMAP and plotting communities...


  warn(
  warn(


      UMAP coordinates saved to: umap_coords_diff_profiles_ROI_D1_M2_03_14_res_1.csv
   Generating UMAP scatter plot: umap_community_scatter_protein_markers_diff_profiles_ROI_D1_M2_03_14_res_1.svg
   Plotting 9531 communities primarily identified by protein markers.
   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D3_M2_01_18_res_1.png

   Running UMAP and plotting communities...
   UMAP community scatter plot saved to: /home/noot/IMC/output_plots/ROI_D1_M2_03_14/resolution_1/umap_community_scatter_protein_markers_diff_profiles_ROI_D1_M2_03_14_res_1.svg

   Generating combined scaled-pixel/avg-comm co-expression matrix...
   Saving combined co-expression matrix to: coexpression_matrix_scaled_vs_avg_ROI_D1_M2_03_14_res_1.svg


  warn(


   --- Community spatial grid saved to: community_spatial_expression_grid_ROI_D3_M1_01_15_res_1.png

   Running UMAP and plotting communities...


  warn(


      UMAP coordinates saved to: umap_coords_diff_profiles_ROI_D3_M2_02_19_res_1.csv
   Generating UMAP scatter plot: umap_community_scatter_protein_markers_diff_profiles_ROI_D3_M2_02_19_res_1.svg
   Plotting 11140 communities primarily identified by protein markers.
   UMAP community scatter plot saved to: /home/noot/IMC/output_plots/ROI_D3_M2_02_19/resolution_1/umap_community_scatter_protein_markers_diff_profiles_ROI_D3_M2_02_19_res_1.svg

   Generating combined scaled-pixel/avg-comm co-expression matrix...
   Saving combined co-expression matrix to: coexpression_matrix_scaled_vs_avg_ROI_D3_M2_02_19_res_1.svg
      UMAP coordinates saved to: umap_coords_diff_profiles_ROI_D1_M1_02_10_res_1.csv
   Generating UMAP scatter plot: umap_community_scatter_protein_markers_diff_profiles_ROI_D1_M1_02_10_res_1.svg
   Plotting 9236 communities primarily identified by protein markers.
   UMAP community scatter plot saved to: /home/noot/IMC/output_plots/ROI_D1_M1_02_10/resolution_1/umap_community_s

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGKILL(-9)}

## 6. Summarize Results
Count the number of successfully processed ROIs.

In [8]:
if analysis_results:
    successful_rois = [r for r in analysis_results if r is not None]
    failed_rois_count = len(analysis_results) - len(successful_rois)

    print(f"\n--- Pipeline Summary ---")
    print(f"Total ROIs processed: {len(analysis_results)}")
    print(f"Successfully completed: {len(successful_rois)}")
    if failed_rois_count > 0:
        print(f"Failed or partially failed: {failed_rois_count} (Check logs above for details).")
else:
    print("\nNo analysis was performed.")



--- Pipeline Summary ---
Total ROIs processed: 25
Successfully completed: 0
Failed or partially failed: 25 (Check logs above for details).


## 7. Next Steps

The per-ROI processing is complete. The necessary outputs (community profiles, pixel results) have been saved to the output directory structure.

Proceed to the **Experiment-Level Analysis Notebook** (`run_experiment_analysis.ipynb`) to aggregate these results and perform comparative analyses across conditions/timepoints.