# IMC ROI Analysis Pipeline - Stage 1: Per-ROI Processing

**Goal:** This notebook processes individual Imaging Mass Cytometry (IMC) ROI files (`.txt` format). For each ROI, it performs:
1.  Data loading and validation.
2.  Optimal Arcsinh cofactor calculation.
3.  Per-channel Arcsinh transformation and scaling.
4.  Generation of resolution-independent visualizations (Pixel correlation clustermap, Raw vs. Scaled comparison).
5.  Iterates through specified Leiden resolutions:
    *   Spatial Leiden clustering on pixel data.
    *   Calculation of community profiles (average scaled expression per community).
    *   Differential expression analysis between adjacent communities (optional).
    *   Generation of resolution-dependent visualizations (Community correlation map, UMAP, Co-expression matrix).
6.  Saves processed data (scaled pixel results, community profiles, etc.) and visualizations into ROI-specific output directories.

**Methodology:** It utilizes functions imported from the `src.roi_pipeline` modules and leverages `joblib` for parallel processing of ROIs.

**Input:**
*   Raw IMC `.txt` files located in the directory specified in `config.yaml` (`paths: data_dir`).
*   Configuration settings from `config.yaml`.

**Output:**
*   A structured output directory (specified in `config.yaml`, `paths: output_dir`) containing subdirectories for each processed ROI.
*   Within each ROI directory:
    *   Cofactor information (`cofactors_*.json`).
    *   Resolution-independent plots (`pixel_channel_correlation_*.svg`, `spatial_raw_vs_scaled_matrix_*.svg`).
    *   Subdirectories for each processed resolution (e.g., `resolution_0_3/`).
        *   Community profiles (`community_profiles_scaled_*.csv`).
        *   Differential expression results (optional) (`community_diff_profiles_*.csv`, `community_top_channels_*.csv`).
        *   UMAP coordinates (optional) (`umap_coords_*.csv`).
        *   Final pixel results with community assignments (`pixel_analysis_results_final_*.csv`).
        *   Resolution-dependent plots (`community_channel_correlation_*.svg`, `umap_community_scatter_*.svg`, `coexpression_matrix_scaled_vs_avg_*.svg`).

**Next Steps:** The outputs generated by this notebook (specifically the `community_profiles_scaled_*.csv` and `pixel_analysis_results_final_*.csv` files) serve as the primary inputs for the **Experiment-Level Analysis Notebook**.

In [1]:
# Imports
import yaml
import pandas as pd
import numpy as np
import os
import glob
import time
import sys
import multiprocessing
import traceback
import gc
from typing import List, Tuple, Optional, Dict, Any
from joblib import Parallel, delayed

# --- Import Pipeline Modules ---
# Encapsulated logic resides in these modules
try:
    from src.roi_pipeline.imc_data_utils import (
        load_and_validate_roi_data,
        calculate_optimal_cofactors_for_roi,
        apply_per_channel_arcsinh_and_scale,
    )
    from src.roi_pipeline.pixel_analysis_core import (
        run_spatial_leiden,
        calculate_and_save_profiles,
        calculate_differential_expression # Keep import even if DiffEx is optional via config
    )
    from src.roi_pipeline.pixel_visualization import (
        plot_correlation_clustermap,
        plot_umap_scatter, # Requires umap-learn
        plot_coexpression_matrix,
        plot_raw_vs_scaled_spatial_comparison # Corrected name
    )
    # Attempt to import UMAP, set flag
    try:
        import umap
        umap_available = True
    except ImportError:
        print("WARNING: package 'umap-learn' not found. UMAP visualization will be skipped.")
        umap_available = False

    print("Successfully imported pipeline modules.")
except ImportError as e:
    print(f"ERROR: Failed to import pipeline modules. Ensure 'src' is in the Python path.")
    print(f"Details: {e}")
    # Optionally raise error or exit if imports fail
    # raise e

  from .autonotebook import tqdm as notebook_tqdm


Successfully imported pipeline modules.


In [9]:
os.getcwd()

'/home/noot/IMC'

## 1. Load Configuration

Load settings from the central `config.yaml` file.

In [2]:
CONFIG_PATH = "config.yaml" # Or allow user input

def load_config(config_path: str) -> Optional[Dict]:
    """Loads the pipeline configuration from a YAML file."""
    try:
        with open(config_path, 'r') as f:
            config = yaml.safe_load(f)
        print(f"Configuration loaded successfully from: {config_path}")
        # Basic validation (can be expanded)
        if not isinstance(config, dict) or not all(k in config for k in ['paths', 'data', 'analysis', 'processing']):
            print(f"ERROR: Config file {config_path} is missing required top-level keys (paths, data, analysis, processing) or is not a valid dictionary.")
            return None
        # Validate essential sub-keys
        if not config.get('paths',{}).get('data_dir') or not config.get('paths',{}).get('output_dir'):
             print("ERROR: Config missing paths -> data_dir or paths -> output_dir")
             return None
        if not config.get('data', {}).get('master_protein_channels'):
             print("ERROR: Config missing data -> master_protein_channels")
             return None
        print("Config basic validation passed.")
        return config
    except FileNotFoundError:
        print(f"ERROR: Configuration file not found at {config_path}")
        return None
    except yaml.YAMLError as e:
        print(f"ERROR: Failed to parse configuration file {config_path}: {e}")
        return None
    except Exception as e:
        print(f"ERROR: An unexpected error occurred while loading configuration: {e}")
        return None

# Load the config
config = load_config(CONFIG_PATH)

# Display some key config values (optional)
if config:
    print("\nKey Configuration Parameters:")
    print(f"  Data Directory: {config.get('paths', {}).get('data_dir')}")
    print(f"  Output Directory: {config.get('paths', {}).get('output_dir')}")
    print(f"  Metadata File: {config.get('paths', {}).get('metadata_file')}")
    print(f"  Master Protein Channels: {config.get('data', {}).get('master_protein_channels')}")
    print(f"  Default Cofactor: {config.get('data', {}).get('default_arcsinh_cofactor')}")
    print(f"  Differential Expression: {config.get('analysis', {}).get('differential_expression', {}).get('run_differential_expression', False)}")
    print(f"  Non-protein Markers for UMAP: {config.get('analysis', {}).get('differential_expression', {}).get('non_protein_markers_for_umap', [])}")
    print(f"  UMAP Parameters: {config.get('analysis', {}).get('umap', {}).get('n_neighbors', 'Not Set')} neighbors, {config.get('analysis', {}).get('umap', {}).get('min_dist', 'Not Set')} min dist, {config.get('analysis', {}).get('umap', {}).get('n_components', 'Not Set')} components")
    print(f"  Clustering Seed: {config.get('analysis', {}).get('clustering', {}).get('seed', 'Not Set')}")
    print(f"  Spatial Clustering Parameters: {config.get('analysis', {}).get('clustering', {}).get('n_neighbors', 'Not Set')} neighbors, {config.get('analysis', {}).get('clustering', {}).get('resolution_params', 'Not Set')} resolution")    

    # Add more relevant parameters as needed
else:
    print("\nStopping notebook execution due to configuration loading error.")
    # Consider raising an error or using sys.exit() if running non-interactively
    # raise ValueError("Failed to load configuration.")

Configuration loaded successfully from: config.yaml
Config basic validation passed.

Key Configuration Parameters:
  Data Directory: /Users/noot/Documents/IMC/data/241218_IMC_Alun/
  Output Directory: /Users/noot/Documents/IMC/output_plots/
  Metadata File: /Users/noot/Documents/IMC/data/Data_annotations_Karen/Metadata-Table 1.csv
  Master Protein Channels: ['CD45(Y89Di)', 'Ly6G(Pr141Di)', 'CD11b(Nd143Di)', 'CD140a(Nd148Di)', 'CD140b(Eu151Di)', 'CD31(Sm154Di)', 'CD34(Er166Di)', 'CD206(Tm169Di)', 'CD44(Yb171Di)']
  Default Cofactor: 5.0
  Differential Expression: False
  Non-protein Markers for UMAP: ['80ArAr(ArAr80Di)', '130Ba(Ba130Di)', '131Xe(Xe131Di)', '190BCKG(BCKG190Di)', 'DNA1(Ir191Di)', 'DNA2(Ir193Di)']
  UMAP Parameters: 15 neighbors, 0.1 min dist, 8 components
  Clustering Seed: 42
  Spatial Clustering Parameters: 65 neighbors, [0.3, 0.001] resolution


## 2. Define the Core ROI Processing Function (`analyze_roi`)

This function encapsulates the entire analysis workflow for a *single* ROI file. It calls the underlying functions imported from our pipeline modules.

In [10]:
def _generate_resolution_independent_visualizations(roi_raw_data: pd.DataFrame,
                                                 scaled_pixel_expression: pd.DataFrame,
                                                 roi_channels: List[str],
                                                 roi_cofactors: Dict[str, float],
                                                 roi_output_dir: str, # Main ROI output dir
                                                 roi_string: str,
                                                 config: Dict) -> Optional[List[str]]: # Return the channel order
    """Generates plots that do not depend on Leiden resolution. Returns channel order from pixel clustermap."""
    print("\nGenerating resolution-independent visualizations...")
    start_time_viz = time.time()
    cfg_processing = config['processing']
    cfg_viz = cfg_processing['visualization']
    ordered_channels_from_pixel_corr = None # Initialize

    # --- Pixel-Level Correlation Clustermap ---
    print("   Generating pixel-level correlation clustermap...")
    if not scaled_pixel_expression.empty and not scaled_pixel_expression.isnull().values.any():
        try:
            pixel_correlation_matrix = scaled_pixel_expression.corr(method='spearman')
            pixel_corr_heatmap_path = os.path.join(roi_output_dir, f"pixel_channel_correlation_heatmap_spearman_{roi_string}.svg") # Changed extension
            # Capture the returned order
            ordered_channels_from_pixel_corr = plot_correlation_clustermap(
                 correlation_matrix=pixel_correlation_matrix,
                 channels=roi_channels,
                 title=f'Pixel Channel Correlation (Spearman, Asinh Scaled) - {roi_string}',
                 output_path=pixel_corr_heatmap_path,
                 plot_dpi=cfg_processing['plot_dpi']
            )
        except Exception as e:
            print(f"   WARNING: Failed to generate pixel correlation map: {e}")
    else:
         print("   Skipping pixel correlation analysis: Scaled pixel data is empty or contains NaNs.")
    # Fallback if order couldn't be determined
    if ordered_channels_from_pixel_corr is None:
        ordered_channels_from_pixel_corr = roi_channels # Use original order
        print("   Warning: Could not determine channel order from pixel correlation, using original order.")

    # --- NEW: Raw vs Scaled Spatial Expression Matrix ---
    print("   Generating Raw vs Scaled Spatial Expression Matrix...")
    if not roi_raw_data.empty and not scaled_pixel_expression.empty:
        try:
            raw_vs_scaled_plot_path = os.path.join(roi_output_dir, f"spatial_raw_vs_scaled_matrix_{roi_string}.svg")
            # Using the ordered channels if available
            plot_raw_vs_scaled_spatial_comparison(
                roi_raw_data=roi_raw_data,
                scaled_pixel_expression=scaled_pixel_expression,
                roi_channels=ordered_channels_from_pixel_corr, # Use ordered channels
                config=config,
                output_path=raw_vs_scaled_plot_path,
                roi_string=roi_string
            )
        except Exception as e:
            print(f"   WARNING: Failed to generate Raw vs Scaled spatial matrix: {e}")
            traceback.print_exc() # Print stack trace for debugging
    else:
        print("   Skipping Raw vs Scaled spatial matrix: Raw or Scaled data is empty.")

    print(f"--- Resolution-independent visualizations finished in {time.time() - start_time_viz:.2f} seconds ---")
    return ordered_channels_from_pixel_corr # Return the order for downstream use

In [11]:
def _generate_resolution_dependent_visualizations(pixel_results_df: pd.DataFrame, # Resolution specific df, contains coords, community, scaled values, mapped avg values
                                               scaled_pixel_expression: pd.DataFrame, # Original scaled pixel data
                                               scaled_community_profiles: pd.DataFrame,
                                               diff_expr_profiles: Optional[pd.DataFrame],
                                               distinguishing_channel_map: Optional[pd.Series],
                                               roi_channels: List[str], # Original channel list
                                               ordered_channels: List[str], # Channel list ordered by pixel corr
                                               roi_cofactors: Dict[str, float],
                                               resolution_output_dir: str, # Resolution specific dir
                                               roi_string: str,
                                               resolution_param: float,
                                               config: Dict):
    """Generates plots that depend on the specific Leiden resolution."""
    print(f"\nGenerating resolution-dependent visualizations (Resolution: {resolution_param})...")
    start_time_viz = time.time()
    cfg_processing = config['processing']
    cfg_analysis = config['analysis']
    cfg_viz = cfg_processing['visualization']
    res_suffix = f"_res_{resolution_param}"

    # --- Community-Level Correlation Clustermap (Use original roi_channels) ---
    print("   Generating community-level correlation clustermap...")
    if not scaled_community_profiles.empty:
        try:
            community_correlation_matrix = scaled_community_profiles.corr(method='spearman')
            comm_corr_heatmap_path = os.path.join(resolution_output_dir, f"community_channel_correlation_heatmap_spearman_{roi_string}{res_suffix}.svg") # Changed extension
            # We don't necessarily want this ordered by pixel correlation, so use original roi_channels
            plot_correlation_clustermap(
                 correlation_matrix=community_correlation_matrix,
                 channels=roi_channels, # Use original list here
                 title=f'Community Corr (Spearman, Avg. Scaled) - {roi_string} (Res: {resolution_param})',
                 output_path=comm_corr_heatmap_path,
                 plot_dpi=cfg_processing['plot_dpi']
            ) # We don't need the order returned here
        except Exception as e:
            print(f"   WARNING: Failed to generate community correlation map: {e}")
    else:
        print("   Skipping community correlation analysis: Scaled community profiles empty.")

    # --- UMAP on Differential Profiles & Plot (Uses roi_channels for filtering) ---
    umap_coords = None # Define umap_coords before the block
    if umap_available and diff_expr_profiles is not None and not diff_expr_profiles.empty:
        print("\n   Running UMAP and plotting communities...")
        try:
            non_protein_markers = cfg_analysis['differential_expression'].get('non_protein_markers_for_umap', [])
            protein_marker_channels_for_umap = [
                ch for ch in diff_expr_profiles.columns
                if ch in roi_channels and ch not in non_protein_markers # Filter based on original channels
            ]
            if not protein_marker_channels_for_umap:
                print("      Skipping UMAP: No protein marker channels found.")
            else:
                diff_data_for_umap = diff_expr_profiles[protein_marker_channels_for_umap].copy()
                communities_in_order = diff_expr_profiles.index.tolist()
                if diff_data_for_umap.isnull().values.any() or np.isinf(diff_data_for_umap.values).any():
                     print("      Warning: NaN/Inf values found. Replacing with 0.")
                     diff_data_for_umap = diff_data_for_umap.fillna(0).replace([np.inf, -np.inf], 0)

                n_communities = len(diff_data_for_umap)
                umap_n_neighbors = min(cfg_analysis['umap']['n_neighbors'], n_communities - 1) if n_communities > 1 else 1
                current_umap_n_components = max(2, cfg_analysis['umap']['n_components'])

                if n_communities > umap_n_neighbors and n_communities >= current_umap_n_components:
                     try:
                         umap_reducer = umap.UMAP(
                             n_neighbors=umap_n_neighbors,
                             min_dist=cfg_analysis['umap']['min_dist'],
                             n_components=current_umap_n_components,
                             metric=cfg_analysis['umap']['metric'],
                             random_state=cfg_analysis['clustering']['seed']
                         )
                         embedding = umap_reducer.fit_transform(diff_data_for_umap.values)
                         umap_component_names = [f'UMAP{i+1}' for i in range(current_umap_n_components)]
                         umap_coords = pd.DataFrame(embedding, index=communities_in_order, columns=umap_component_names)
                         umap_coords_path = os.path.join(resolution_output_dir, f"umap_coords_diff_profiles_{roi_string}{res_suffix}.csv") # Keep CSV for coords
                         umap_coords.to_csv(umap_coords_path)
                         print(f"      UMAP coordinates saved to: {os.path.basename(umap_coords_path)}")

                         if distinguishing_channel_map is not None and not distinguishing_channel_map.empty:
                             umap_scatter_path = os.path.join(resolution_output_dir, f"umap_community_scatter_protein_markers_diff_profiles_{roi_string}{res_suffix}.svg") # Changed extension
                             plot_umap_scatter(
                                 umap_coords=umap_coords,
                                 community_top_channel_map=distinguishing_channel_map,
                                 protein_marker_channels=protein_marker_channels_for_umap,
                                 roi_string=f"{roi_string} (Res: {resolution_param})",
                                 output_path=umap_scatter_path,
                                 plot_dpi=cfg_processing['plot_dpi']
                             )
                         else:
                             print("      Skipping UMAP scatter plot: Missing distinguishing channel map.")

                     except Exception as umap_err:
                          print(f"      ERROR during UMAP embedding or plotting: {umap_err}")
                          umap_coords = None
                else:
                     print(f"      Skipping UMAP embedding: Not enough communities ({n_communities}) vs neighbors/components.")
        except Exception as e:
            print(f"   WARNING: Failed during UMAP step: {e}")
    elif not umap_available:
        print("\n   Skipping UMAP visualization: umap-learn package not installed.")
    else:
        print("\n   Skipping UMAP visualization: Differential profiles empty or not calculated.")

    # --- Combined Co-expression Matrix Plot (Use ordered_channels) ---
    print("\n   Generating combined scaled-pixel/avg-comm co-expression matrix...")
    if not scaled_community_profiles.empty:
        avg_value_cols_map = {}
        try:
            for channel in roi_channels:
                avg_col_name = f'{channel}_asinh_scaled_avg'
                pixel_results_df[avg_col_name] = pixel_results_df['community'].map(scaled_community_profiles[channel]).fillna(0)
                avg_value_cols_map[channel] = avg_col_name

            coexp_matrix_path = os.path.join(resolution_output_dir, f"coexpression_matrix_scaled_vs_avg_{roi_string}{res_suffix}.svg") # Changed extension
            plot_coexpression_matrix(
                scaled_pixel_expression=scaled_pixel_expression,
                pixel_results_df_with_avg=pixel_results_df,
                ordered_channels=ordered_channels, # Pass the ordered list here
                roi_string=f"{roi_string} (Res: {resolution_param})",
                config=config,
                output_path=coexp_matrix_path
            )
        except Exception as e:
             print(f"   WARNING: Failed to generate combined co-expression matrix: {e}")
    else:
        print("   Skipping combined co-expression matrix: Scaled community profiles not available.")

    print(f"--- Resolution-dependent visualizations finished in {time.time() - start_time_viz:.2f} seconds ---")


In [12]:
def analyze_roi(file_idx: int, file_path: str, total_files: int, config: Dict, umap_available_flag: bool):
    """Orchestrates the analysis pipeline for a single ROI file."""
    print(f"\n================ Analyzing ROI {file_idx+1}/{total_files}: {os.path.basename(file_path)} ================")
    start_roi_time = time.time()

    # --- 1. Load & Validate ---
    roi_string, roi_output_dir, roi_raw_data, roi_channels = load_and_validate_roi_data(
        file_path=file_path,
        master_protein_channels=config['data']['master_protein_channels'],
        base_output_dir=config['paths']['output_dir'],
        metadata_cols=config['data']['metadata_cols']
    )
    if roi_raw_data is None: return None # Skip file on load error

    # --- 2. Calculate Cofactors ---
    roi_cofactors = calculate_optimal_cofactors_for_roi(
        roi_df=roi_raw_data,
        channels_to_process=roi_channels,
        default_cofactor=config['data']['default_arcsinh_cofactor'],
        output_dir=roi_output_dir,
        roi_string=roi_string
    )

    # --- 3. Preprocess Data ---
    scaled_pixel_expression, used_cofactors = apply_per_channel_arcsinh_and_scale(
        data_df=roi_raw_data,
        channels=roi_channels,
        cofactors_map=roi_cofactors,
        default_cofactor=config['data']['default_arcsinh_cofactor']
    )
    if scaled_pixel_expression is None: return None # Skip file on preprocess error

    # --- 4. Resolution-Independent Visualizations ---
    # This also determines channel order based on pixel correlation
    ordered_channels = _generate_resolution_independent_visualizations( # Assumes this helper is defined (or inline it)
        roi_raw_data=roi_raw_data,
        scaled_pixel_expression=scaled_pixel_expression,
        roi_channels=roi_channels,
        roi_cofactors=roi_cofactors, # Pass used_cofactors if preferred
        roi_output_dir=roi_output_dir,
        roi_string=roi_string,
        config=config
    )
    if ordered_channels is None: ordered_channels = roi_channels # Fallback

    # --- 5. Loop Through Resolutions ---
    resolution_params = config['analysis']['clustering'].get('resolution_params', [0.5])
    success_flag = False
    for resolution in resolution_params:
        pixel_community_df = None; pixel_graph = None; community_partition = None
        current_pixel_results_df = None; scaled_community_profiles = None
        diff_expr_profiles = None; primary_channel_map = None
        try:
            print(f"\n===== Processing Resolution: {resolution} for ROI: {roi_string} =====")
            res_start_time = time.time()
            resolution_str = f"{resolution:.3f}".rstrip('0').rstrip('.').replace('.', '_') if isinstance(resolution, float) else str(resolution)
            resolution_output_dir = os.path.join(roi_output_dir, f"resolution_{resolution_str}")
            os.makedirs(resolution_output_dir, exist_ok=True)

            # 5a. Cluster Pixels
            pixel_coordinates = roi_raw_data[['X', 'Y']].copy().loc[scaled_pixel_expression.index]
            pixel_community_df, pixel_graph, community_partition, _ = run_spatial_leiden(
                 analysis_df=pixel_coordinates,
                 protein_channels=roi_channels,
                 scaled_expression_data_for_weights=scaled_pixel_expression.values,
                 n_neighbors=config['analysis']['clustering']['n_neighbors'],
                 resolution_param=resolution,
                 seed=config['analysis']['clustering']['seed'],
                 verbose=True
            )
            if pixel_community_df is None: continue # Skip res if clustering fails

            # Create dataframe for this resolution's results
            current_pixel_results_df = roi_raw_data[['X', 'Y']].join(scaled_pixel_expression).join(pixel_community_df[['community']])
            # Add raw values back if needed for specific analyses (optional)
            # for ch in roi_channels: current_pixel_results_df[ch] = roi_raw_data.loc[current_pixel_results_df.index, ch]

            # 5b. Analyze Communities
            scaled_community_profiles = calculate_and_save_profiles(
                 results_df=current_pixel_results_df,
                 valid_channels=roi_channels,
                 roi_output_dir=resolution_output_dir,
                 roi_string=f"{roi_string}_res_{resolution_str}"
            )
            if scaled_community_profiles is None: continue # Skip res if profiles fail

            # Optional DiffEx
            if config['analysis'].get('differential_expression', {}).get('run_differential_expression', False):
                diff_expr_profiles, primary_channel_map = calculate_differential_expression(
                    results_df=current_pixel_results_df,
                    community_profiles=scaled_community_profiles,
                    graph=pixel_graph,
                    valid_channels=roi_channels
                )
                # Save DiffEx results
                if diff_expr_profiles is not None:
                    diff_expr_profiles.to_csv(os.path.join(resolution_output_dir, f"community_diff_profiles_{roi_string}_res_{resolution_str}.csv"))
                if primary_channel_map is not None:
                     primary_channel_map.to_csv(os.path.join(resolution_output_dir, f"community_primary_channels_{roi_string}_res_{resolution_str}.csv"), header=True)

            # 5c. Resolution-Dependent Visualizations
            _generate_resolution_dependent_visualizations( # Assumes this helper is defined (or inline it)
                 pixel_results_df=current_pixel_results_df,
                 scaled_pixel_expression=scaled_pixel_expression,
                 scaled_community_profiles=scaled_community_profiles,
                 diff_expr_profiles=diff_expr_profiles,
                 primary_channel_map=primary_channel_map,
                 roi_channels=roi_channels,
                 ordered_channels=ordered_channels,
                 roi_cofactors=roi_cofactors, # Pass used_cofactors if preferred
                 resolution_output_dir=resolution_output_dir,
                 roi_string=roi_string,
                 resolution_param=resolution,
                 config=config,
                 umap_available_flag=umap_available_flag # Pass the flag
            )

            # 5d. Save Final Pixel Results for this resolution
            final_results_save_path = os.path.join(resolution_output_dir, f"pixel_analysis_results_final_{roi_string}_res_{resolution_str}.csv")
            save_cols = ['X', 'Y', 'community'] + roi_channels # Example: save coords, community, scaled values
            if primary_channel_map is not None:
                current_pixel_results_df['primary_channel'] = current_pixel_results_df['community'].map(primary_channel_map).fillna('Unknown')
                save_cols.append('primary_channel')
            current_pixel_results_df[save_cols].to_csv(final_results_save_path, index=True) # Save relevant columns
            print(f"   Final pixel results saved: {os.path.basename(final_results_save_path)}")

            success_flag = True # Mark success if at least one resolution finishes
            print(f"===== Resolution {resolution} finished in {time.time() - res_start_time:.2f} seconds =====")

        except Exception as resolution_e:
             print(f"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
             print(f"   ERROR during processing resolution {resolution} for ROI {roi_string}: {str(resolution_e)}")
             print(f"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
             import traceback
             traceback.print_exc()
        finally:
            # Clean up memory-intensive objects for this resolution
            del pixel_community_df, pixel_graph, community_partition, current_pixel_results_df
            del scaled_community_profiles, diff_expr_profiles, primary_channel_map
            gc.collect()

    if not success_flag:
         print(f"--- WARNING: Analysis failed for all resolutions for ROI: {roi_string} ---")
         return None # Indicate failure for this ROI

    print(f"--- Successfully finished processing ROI: {roi_string} in {time.time() - start_roi_time:.2f} seconds ---")
    return roi_string # Indicate success

print("Core analysis function `analyze_roi` defined.")


Core analysis function `analyze_roi` defined.


## 3. Find Input Files
Locate all `.txt` files in the specified data directory.

In [6]:
imc_files = []
if config:
    data_dir = config['paths']['data_dir']
    try:
        imc_files = sorted(glob.glob(os.path.join(data_dir, "*.txt"))) # Sort for consistency
        if not imc_files:
            print(f"ERROR: No .txt files found in data directory: {data_dir}")
        else:
            print(f"\nFound {len(imc_files)} IMC data files to process:")
            # Print first few files
            for f in imc_files[:min(5, len(imc_files))]: print(f"  - {os.path.basename(f)}")
            if len(imc_files) > 5: print("  ...")
    except Exception as e:
         print(f"ERROR finding input files in {data_dir}: {e}")
         imc_files = [] # Ensure it's empty on error
else:
    print("Skipping file search due to missing configuration.")


ERROR: No .txt files found in data directory: /Users/noot/Documents/IMC/data/241218_IMC_Alun/


## 4. Setup Parallel Processing
Determine the number of CPU cores to use based on the configuration (`processing: parallel_jobs`). `-1` uses all cores, `-2` uses all but one, etc.

In [7]:
n_jobs = 1
if config:
    try:
        parallel_jobs_config = config['processing']['parallel_jobs']
        cpu_count = multiprocessing.cpu_count()
        if isinstance(parallel_jobs_config, int):
            if parallel_jobs_config == -1:
                n_jobs = cpu_count
            elif parallel_jobs_config <= -2:
                n_jobs = max(1, cpu_count + parallel_jobs_config + 1)
            elif parallel_jobs_config > 0:
                n_jobs = min(parallel_jobs_config, cpu_count)
            else: # 0 or invalid
                n_jobs = 1
        else: n_jobs = 1 # Default for invalid type
        print(f"\nConfigured to use {n_jobs} cores for parallel processing.")
    except KeyError:
         print("\nWarning: 'parallel_jobs' not found in config. Defaulting to 1 core.")
         n_jobs = 1
    except Exception as e:
         print(f"\nWarning: Error determining parallel jobs: {e}. Defaulting to 1 core.")
         n_jobs = 1
else:
    print("Skipping parallel setup due to missing configuration.")



Configured to use 15 cores for parallel processing.


## 5. Run analysis over all ROIs

Execute the `analyze_roi` function for each input file using `joblib.Parallel`.

**Note:** This cell may take a significant amount of time depending on the number of ROIs, data size, number of resolutions, and number of cores used. **Ensure `analyze_roi` is defined in an imported `.py` module if `n_jobs > 1`.**

In [None]:
analysis_results = []
if config and imc_files:
    start_parallel_time = time.time()
    print(f"\nStarting parallel execution ({n_jobs} jobs)...")
    # Run the parallel processing
    analysis_results = Parallel(n_jobs=n_jobs, verbose=10)(
        delayed(analyze_roi)(
            i,
            file_path,
            len(imc_files),
            config,
            umap_available # Pass umap flag
            )
        for i, file_path in enumerate(imc_files)
    )

    print(f"\n--- Parallel processing finished in {time.time() - start_parallel_time:.2f} seconds ---")
else:
    print("\nSkipping parallel execution: Missing configuration or input files.")



Starting parallel execution (9 jobs)...


[Parallel(n_jobs=9)]: Using backend LokyBackend with 9 concurrent workers.



--- Loading and Validating: IMC_241218_Alun_ROI_D1_M2_02_13.txt ---
ROI Identifier: ROI_D1_M2_02_13
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D1_M1_02_10.txt ---
ROI Identifier: ROI_D1_M1_02_10
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D1_M2_01_12.txt ---
ROI Identifier: ROI_D1_M2_01_12
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D3_M1_01_15.txt ---
ROI Identifier: ROI_D3_M1_01_15
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D3_M1_03_17.txt ---
ROI Identifier: ROI_D3_M1_03_17
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D1_M1_01_9.txt ---
ROI Identifier: ROI_D1_M1_01_9
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D1_M1_03_11.txt ---
ROI Identifier: ROI_D1_M1_03_11
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D3_M1_02_16.txt ---
ROI Identifier: ROI_D3_M1_02_16
Loading data...

--- Loading and Validating: IMC_241218_Alun_ROI_D1_M2_03_14.txt -

## 6. Summarize Results
Count the number of successfully processed ROIs.

In [None]:
if analysis_results:
    successful_rois = [r for r in analysis_results if r is not None]
    failed_rois_count = len(analysis_results) - len(successful_rois)

    print(f"\n--- Pipeline Summary ---")
    print(f"Total ROIs processed: {len(analysis_results)}")
    print(f"Successfully completed: {len(successful_rois)}")
    if failed_rois_count > 0:
        print(f"Failed or partially failed: {failed_rois_count} (Check logs above for details).")
else:
    print("\nNo analysis was performed.")


## 7. Next Steps

The per-ROI processing is complete. The necessary outputs (community profiles, pixel results) have been saved to the output directory structure.

Proceed to the **Experiment-Level Analysis Notebook** (`run_experiment_analysis.ipynb`) to aggregate these results and perform comparative analyses across conditions/timepoints.

In [18]:
# Explore current directory
import os

# Get current working directory
current_dir = f"{os.getcwd()}/data/241218_IMC_Alun"
print(f"Current working directory: {current_dir}")

# List files in the current directory
print("\nFiles in current directory:")
for item in os.listdir(current_dir):
    if os.path.isfile(os.path.join(current_dir, item)):
        print(f"  - {item} (File)")
    elif os.path.isdir(os.path.join(current_dir, item)):
        print(f"  - {item} (Directory)")
        
# Show directory structure (limited depth)
print("\nDirectory structure:")
for root, dirs, files in os.walk(current_dir, topdown=True, maxdepth=2):
    level = root.replace(current_dir, '').count(os.sep)
    indent = ' ' * 4 * level
    print(f"{indent}{os.path.basename(root)}/")
    sub_indent = ' ' * 4 * (level + 1)
    for f in files[:5]:  # Limit to first 5 files
        print(f"{sub_indent}{f}")
    if len(files) > 5:
        print(f"{sub_indent}... ({len(files)-5} more files)")


Current working directory: /home/noot/IMC/data/241218_IMC_Alun

Files in current directory:
  - IMC_241218_Alun_ROI_Sam2_03_8.txt (File)
  - IMC_241218_Alun_ROI_D3_M1_03_17.txt (File)
  - IMC_241218_Alun_ROI_D1_M2_02_13.txt (File)
  - IMC_241218_Alun_ROI_D3_M2_03_20.txt (File)
  - IMC_241218_Alun_ROI_D3_M2_01_18.txt (File)
  - IMC_241218_Alun_ROI_Sam1_01_2.txt (File)
  - IMC_241218_Alun_ROI_D1_M1_01_9.txt (File)
  - IMC_241218_Alun_ROI_D7_M2_02_25.txt (File)
  - IMC_241218_Alun_ROI_D1_M2_01_12.txt (File)
  - IMC_241218_Alun.mcd (File)
  - IMC_241218_Alun_ROI_D3_M1_02_16.txt (File)
  - IMC_241218_Alun_ROI_D1_M1_03_11.txt (File)
  - IMC_241218_Alun_ROI_D7_M1_03_23.txt (File)
  - IMC_241218_Alun_ROI_Test01_1.txt (File)
  - IMC_241218_Alun_ROI_Sam2_02_7.txt (File)
  - IMC_241218_Alun_ROI_D7_M1_02_22.txt (File)
  - IMC_241218_Alun_ROI_D7_M1_01_21.txt (File)
  - IMC_241218_Alun_ROI_D3_M2_02_19.txt (File)
  - Logbook_PorpigliaLab_IMC_Alun_241218.log (File)
  - IMC_241218_Alun_kidney.jpg (File

TypeError: walk() got an unexpected keyword argument 'maxdepth'