# Spatial Analysis of Cell Proximity to Interfaces

This notebook analyzes the spatial distribution of different cell types relative to defined biological interfaces. The primary goal is to determine if certain cell types are enriched or depleted near specific types of interfaces (e.g., 'hub positive' vs. 'hub negative').

The workflow consists of several key stages:
1.  **Function Definitions**: A collection of R functions for performing the core statistical analysis, including distance calculations, empirical Bayes shrinkage, and meta-analysis.
2.  **Data Loading & Preprocessing**: Loading cell spatial coordinates and interface geometries from external files.
3.  **Distance Calculation**: For each cell, calculating the distance to the nearest interface and classifying it by interface type.
4.  **Statistical Analysis**: Applying a meta-analysis across multiple samples to get robust estimates of cell type proportions at different distances from the interfaces.
5.  **Visualization**: Generating plots to visualize the results and compare cell distributions between different interface types.

### **1. Objective**

The primary goal of this analysis is to quantify and compare the spatial distribution of specific cell types (and broader lineages) relative to biological interfaces (Tumor-Stromal boundaries). It specifically seeks to determine if certain cell populations are enriched or depleted at specific distances from "Hub-positive" versus "Hub-negative" interfaces.

### **2. Data Processing Pipeline**

* **Input Data:** The method uses single-cell spatial data (cell coordinates, cell type annotations) and geometric data defining the interfaces (lines separating tumor and stroma).
* **Distance Calculation:** For every cell in a sample, the algorithm calculates the Euclidean distance to the nearest interface line.
* **Sign Assignment:** Distances are signed to distinguish between the two sides of the interface:
* **Negative Distances:** Assigned to cells in "Stromal-enriched" regions.
* **Positive Distances:** Assigned to cells in the opposing (Tumor) regions.


* **Binning:** Cells are grouped into **5µm spatial bins** spanning from -100µm (deep stroma) to +100µm (deep tumor).
* **Counting:** The number of cells of each specific type (e.g., `Tcd8-CXCL13`) or lineage (e.g., `Myeloid`) is counted within each bin for each interface type.

### **3. Statistical Methodology**

We use a hierarchical statistical framework to handle noise and biological variability across samples.

#### **Step A: Empirical Bayes Shrinkage (Per Sample)**

Raw proportions in spatial bins can be noisy, especially for rare cell types or bins with few total cells. To address this, the method uses **Empirical Bayes shrinkage**:

* It models cell counts using a Binomial distribution.
* It estimates a **Beta prior** (parameters \alpha, \beta) from the data itself using the method of moments.
* It calculates "shrunken" estimates of proportions, pulling extreme or unstable estimates from small sample sizes toward the global mean. This results in more robust estimates of cell abundance in each spatial bin for each sample.

#### **Step B: Meta-Analysis (Global Aggregation)**

To combine results across multiple patients/samples, the method uses an **Inverse Variance Weighted (IVW) meta-analysis**, implemented via the `ashr` (Adaptive Shrinkage) package.

* This aggregates the per-sample estimates and variances into a single global estimate and variance for each spatial bin.
* This approach ensures that samples with more precise estimates (more cells) contribute more to the final result.

#### **Step C: Hypothesis Testing**

To compare conditions (e.g., Hub+ vs. Hub- interfaces), the method performs statistical tests on the aggregated meta-analysis results:

* **Welch's t-test:** Used to compare the means of the two groups at each spatial bin.
* **Log2 Fold Change (LFC):** Calculated to quantify the magnitude of difference between the groups.
* **FDR Correction:** P-values are adjusted for multiple comparisons using the False Discovery Rate (FDR) method. Significance is generally defined as `padj < 0.01`.

### **4. Specific Analyses Performed**

1. **Cell State Analysis:** Analyzes specific cell subsets (e.g., `Tcd8-CXCL13`, `Myeloid-ISG`) to see how specific phenotypes pattern spatially.
2. **Lineage Analysis:** Groups cells into broad categories (TNKILC, Myeloid, Stromal, Epi, etc.) to see macro-level tissue architectural changes.
3. **Interface Comparisons:**
* **Hub+ vs. Hub-:** Compares interfaces defined by CXCL13+ (Hub) vs. CXCL13- (Non-Hub) status.



### **5. Visualization**

The results are visualized as **line traces** where:

* **X-axis:** Distance from the interface (Stromal side to Epithelial side).
* **Y-axis:** Percentage of all cells (or percentage of lineage) that belong to the cell type of interest.
* **Ribbons/Error Bars:** Represent the 95% Confidence Interval derived from the meta-analysis variance.
* **Asterisks:** Indicate spatial bins where the difference between conditions is statistically significant.

## 1. Setup: Load Libraries

First, we load all the necessary R packages for the analysis. This includes libraries for data manipulation (`data.table`, `dplyr`, `purrr`), spatial analysis (`sf`, `geos`), parallel computing (`furrr`), and plotting (`ggplot2`).

In [None]:
suppressPackageStartupMessages({
    library(data.table)
    library(sf)
    library(purrr)
    library(ggplot2)
    library(ggthemes)
    library(geos)
    library(glue)
    library(furrr)
    library(future)
    library(dplyr)
    library(patchwork)
})

# Set up parallel processing to speed up computations
plan(multicore)

# Helper function to set plot dimensions
fig.size <- function(h, w) {
    options(repr.plot.height = h, repr.plot.width = w)
}

## 2. Function Definitions

This section contains all the custom functions used throughout the analysis pipeline.

### `summarize_cells_by_interface_proximity`

This is the core data processing function. For a given sample, it takes cell coordinates and interface geometries as input. It then performs the following steps:
- Calculates the distance for each cell to its nearest interface.
- Annotates each cell with the type of that nearest interface.
- Assigns a sign to the distance based on whether the cell is in a 'Stromal-enriched' region.
- Bins the cells into 5µm distance intervals.
- Returns a named list of matrices, where each matrix contains the counts of cell types within each distance bin for a specific interface type.

In [None]:
find_midpoint <- function(interval_string) {
  # 1. Remove parentheses and brackets using gsub
  # The pattern "[()\\[\\]]" matches any character inside the outer brackets.
  # We need to escape the inner square brackets with \\.
  cleaned_string <- gsub("\\(|\\[|\\)|\\]", "", interval_string)
  
  # 2. Split the string by the comma
  # strsplit returns a list, so we take the first element [[1]]
  num_strings <- strsplit(cleaned_string, ",")[[1]]
  
  # 3. Convert character vector to numbers and calculate the mean
  midpoint <- mean(as.numeric(num_strings))
  
  return(midpoint)
}

In [None]:
tile_metadata = readr::read_rds('../Tessera tiles/Tessera processed results/tile_metadata_2025-07-22.rds') #'tile_metadata_2025-03-27.rds')
head(tile_metadata)

# Plot the number of hub+ and hub- cells around hub+ and hub- interfaces, the number of cells around MMRp interfaces - Supplementary Figure 4 

In [None]:
summarize_cells_by_interface_proximity_2 = function(cells, interfaces) {
    # retains all cells around hub+ and hub- interfaces.
    
    ## Get distances and closest interface type
    pts = st_as_sf(cells[, .(X, Y)], coords = c('X', 'Y'))
    geos_pts = geos::as_geos_geometry(pts$geometry)
    geos_lines = geos::as_geos_geometry(interfaces$x[1:nrow(interfaces)])
    
    nearest_interfaces_idx = geos::geos_nearest(geos_pts, geos_lines)
    
    cells$closest_interface_type = interfaces$Type_of_Interface[nearest_interfaces_idx]
    
    cells$dist_interface = geos::geos_distance(geos_pts, geos_lines[nearest_interfaces_idx])
    
    ## Assign sign to distances
    cells$dist_interface_signed = fifelse(
        cells$tessera_annotation == 'Stromal-enriched',
        -cells$dist_interface,
        cells$dist_interface
    )
    
    ## Assign cells to 5um bins
    dist_breaks = seq(-100, 100, by = 5)
    cells$dist_bin = cut(cells$dist_interface_signed, breaks = dist_breaks, include.lowest = TRUE)

    # --- ROBUST SUMMARIZATION --

    # cells = cells %>% filter(
    #     (closest_interface_type == 'CXCLpos tumor & CXCLpos stroma' & cxcl_pos_tile == 'CXCL_pos') | (closest_interface_type == 'CXCLneg tumor & CXCLneg stroma' & cxcl_pos_tile == 'CXCL_neg')        
    # )
    
    cells_in_range = cells[!is.na(dist_bin)]
    
    if (nrow(cells_in_range) == 0) {
        warning("No cells found within the -100 to 100µm distance range.")
        return(list())
    }

    all_interface_types = unique(cells$closest_interface_type)
    cells_in_range[, closest_interface_type := factor(closest_interface_type, levels = all_interface_types)]

    counts_long = cells_in_range[, .N, by = .(closest_interface_type, dist_bin, type_lvl3)]

    counts_wide = dcast(counts_long,
                        closest_interface_type + dist_bin ~ type_lvl3,
                        value.var = "N",
                        fill = 0,
                        drop = FALSE)

    result_list = split(counts_wide, by = "closest_interface_type")

    result_list = lapply(result_list, function(dt) {
        row_names = dt$dist_bin
        count_cols = setdiff(names(dt), c("closest_interface_type", "dist_bin"))
        mat = as.matrix(dt[, ..count_cols])
        rownames(mat) = row_names
        return(mat)
    })

    return(result_list)
}


# function definitions

In [None]:
summarize_cells_by_interface_proximity = function(cells, interfaces) {
    ## Get distances and closest interface type
    pts = st_as_sf(cells[, .(X, Y)], coords = c('X', 'Y'))
    geos_pts = geos::as_geos_geometry(pts$geometry)
    geos_lines = geos::as_geos_geometry(interfaces$x[1:nrow(interfaces)])
    
    nearest_interfaces_idx = geos::geos_nearest(geos_pts, geos_lines)
    
    cells$closest_interface_type = interfaces$Type_of_Interface[nearest_interfaces_idx]
    
    cells$dist_interface = geos::geos_distance(geos_pts, geos_lines[nearest_interfaces_idx])
    
    ## Assign sign to distances
    cells$dist_interface_signed = fifelse(
        cells$tessera_annotation == 'Stromal-enriched',
        -cells$dist_interface,
        cells$dist_interface
    )
    
    ## Assign cells to 5um bins
    dist_breaks = seq(-100, 100, by = 5)
    cells$dist_bin = cut(cells$dist_interface_signed, breaks = dist_breaks, include.lowest = TRUE)

    # --- ROBUST SUMMARIZATION --

    cells = cells %>% filter(
        (closest_interface_type == 'CXCLpos tumor & CXCLpos stroma' & cxcl_pos_tile == 'CXCL_pos') | (closest_interface_type == 'CXCLneg tumor & CXCLneg stroma' & cxcl_pos_tile == 'CXCL_neg')        
    )
    
    cells_in_range = cells[!is.na(dist_bin)]
    
    if (nrow(cells_in_range) == 0) {
        warning("No cells found within the -100 to 100µm distance range.")
        return(list())
    }

    all_interface_types = unique(cells$closest_interface_type)
    cells_in_range[, closest_interface_type := factor(closest_interface_type, levels = all_interface_types)]

    counts_long = cells_in_range[, .N, by = .(closest_interface_type, dist_bin, type_lvl3)]

    counts_wide = dcast(counts_long,
                        closest_interface_type + dist_bin ~ type_lvl3,
                        value.var = "N",
                        fill = 0,
                        drop = FALSE)

    result_list = split(counts_wide, by = "closest_interface_type")

    result_list = lapply(result_list, function(dt) {
        row_names = dt$dist_bin
        count_cols = setdiff(names(dt), c("closest_interface_type", "dist_bin"))
        mat = as.matrix(dt[, ..count_cols])
        rownames(mat) = row_names
        return(mat)
    })

    return(result_list)
}


### Empirical Bayes Functions

This group of functions implements an empirical Bayes statistical framework. The core idea is to improve estimates for individual groups (here, distance bins) by "borrowing strength" from all other groups. This shrinks noisy estimates from bins with little data towards a more stable global average.

In [None]:
# Estimates the parameters (alpha, beta) for a Beta prior distribution using the
# method of moments. This prior is used for modeling proportions (binomial data).
estimate_beta_prior <- function(k, n) {
    if (length(k) <= 1) {
        return(list(alpha = 0, beta = 0))
    }

    valid_bins <- n > 0
    k_valid <- k[valid_bins]
    n_valid <- n[valid_bins]
    
    if (length(k_valid) <= 1) {
        return(list(alpha = 0, beta = 0))
    }

    p_hat <- k_valid / n_valid
    mean_p <- mean(p_hat)
    var_p <- var(p_hat)
    mean_n <- mean(n_valid)
    
    var_true <- var_p - mean_p * (1 - mean_p) / mean_n
    
    if (is.na(var_true) || var_true <= 0) {
        return(list(alpha = 0, beta = 0))
    }
    
    nu <- mean_p * (1 - mean_p) / var_true - 1
    list(alpha = mean_p * nu, beta = (1 - mean_p) * nu)
}

# Calculates summary statistics for count data, optionally applying Empirical Bayes shrinkage.
empirical_bayes_summary <- function(k, n, bin_lvls, model = c("mle", "binomial", "poisson")) {
    model <- match.arg(model)
    if (length(k) != length(n)) stop("Input lengths must match.")
    
    est <- k / n
    
    if (model == "binomial") {
        prior <- estimate_beta_prior(k, n)
        est <- (k + prior$alpha) / (n + prior$alpha + prior$beta)
        var <- ((k + prior$alpha) * (n - k + prior$beta)) /
               ((n + prior$alpha + prior$beta)^2 * (n + prior$alpha + prior$beta + 1))
               
    } else {
       stop("Only binomial model is fully implemented in this notebook version.")
    }
    
    df = data.table(
        dist_bin = factor(bin_lvls, bin_lvls),
        model = model,
        count = k,
        size = n,
        estimate = est,
        variance = var,
        alpha = prior$alpha,
        beta = prior$beta
    )

    df[, p := exp(pnorm(estimate / sqrt(variance), lower.tail = FALSE, log.p = TRUE))]
    df[, padj := p.adjust(p)]
    df[, asterisk := ifelse(padj < 0.01, "*", "")]
    
    return(df)
}

### Meta-Analysis and Statistical Functions

These functions handle the statistical aggregation and testing across multiple samples.

In [None]:
# `meta_ashr`: Performs a meta-analysis using the `ashr` package to combine estimates.
meta_ashr <- function(p_vec, var_vec) {
    ash_fit = ashr::ash(betahat = p_vec, sebetahat = sqrt(var_vec), method = "fdr", mixcompdist = 'normal')
    w = prop.table(1 / (ash_fit$result$PosteriorSD^2 + 1e-8))
    data.table(
        estimate = sum(w * ash_fit$result$PosteriorMean),
        variance = sum(w * ash_fit$result$PosteriorSD^2)
    )
}

# `get_stats`: The main driver for the meta-analysis. It takes a list of count matrices,
# calculates empirical Bayes summaries for each, and then combines them using `meta_ashr`.
get_stats = function(counts_list, .types) {
    df_list = imap(counts_list, function(counts, .id) {
        empirical_bayes_summary(
            rowSums(counts[, .types, drop = FALSE]),
            rowSums(counts),
            rownames(counts),
            'binomial'
        )
    })
    
    df = bind_rows(df_list, .id = 'SampleID')[
        , .(SampleID, dist_bin, estimate, variance)
    ][
        , meta_ashr(estimate, variance), dist_bin
    ]
    
    df[, p := exp(pnorm(estimate / sqrt(variance), lower.tail = FALSE, log.p = TRUE))]
    df[, padj := p.adjust(p)]
    df[, asterisk := case_when(
        is.na(padj) ~ '',
        padj < 0.01 ~ "*",
        TRUE ~ ''
    )]
    
    df[]
}

# `t_test_and_lfc`: Calculates a Welch's t-test and log2 fold change from summary statistics.
t_test_and_lfc <- function(mu1, var1, n1, mu2, var2, n2) {
  se_diff <- sqrt(var1 / n1 + var2 / n2)
  t_stat <- (mu1 - mu2) / se_diff
  df_num <- (var1 / n1 + var2 / n2)^2
  df_denom <- ((var1 / n1)^2) / (n1 - 1) + ((var2 / n2)^2) / (n2 - 1)
  df <- df_num / df_denom
  p_value <- 2 * pt(-abs(t_stat), df)
  lfc <- log2((mu1) / (mu2 ))
  
  return(list(
    p_value = p_value,
    log2_fold_change = lfc
  ))
}

### Utility and Plotting Functions

In [None]:
# `standardize_matrix_columns`: A utility to ensure all matrices in a list have the same columns.
standardize_matrix_columns <- function(mat_list) {
    all_cols <- sort(unique(unlist(lapply(mat_list, colnames))))
    
    lapply(mat_list, function(mat) {
        missing_cols <- setdiff(all_cols, colnames(mat))
        if (length(missing_cols) > 0) {
            zeros <- matrix(0, nrow = nrow(mat), ncol = length(missing_cols),
                            dimnames = list(rownames(mat), missing_cols))
            mat <- cbind(mat, zeros)
        }
        mat[, all_cols, drop = FALSE]
    })
}

# `interface_plot`: A plotting function to visualize the results for a single sample.
interface_plot = function(counts, .types, est_model=c('binomial', 'poisson', 'mle')) {
    est_model <- match.arg(est_model)
    df = empirical_bayes_summary(
        rowSums(counts[, .types, drop = FALSE]),
        rowSums(counts),
        rownames(counts),
        est_model
    ) 

    ymax = 100 * max(df$estimate + 1.96 * sqrt(df$variance))
    
    ggplot(df, aes(dist_bin, 100 * estimate)) + 
        geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
        geom_point(aes(size = size)) + 
        geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0) + 
        geom_hline(yintercept = 0) + 
        geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin))) + 
        theme_bw(base_size = 16) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        labs(y = '% of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'mean & 95% CI, *padj<0.01', title = paste(.types, collapse = '; ')) + 
        geom_text(aes(y = 100 * (estimate + 1.96 * sqrt(variance)), label = asterisk), size = 6, vjust = 0) + 
        annotate("text", x = 0.5, y = ymax + .05, label = 'Stromal Side', hjust = 0, size = 6) + 
        annotate("text", x = 40.5, y = ymax + .05, label = 'Epithelial Side', hjust = 1, size = 6) + 
        NULL
}

# `run_global_hub_analysis`: The main analysis wrapper that runs the entire pipeline for all cell types.
run_global_hub_analysis <- function(types_list, counts_list) {
  n_hubPos = sum(grepl('hubPos', names(counts_list)))
  n_hubNeg = sum(grepl('hubNeg', names(counts_list)))

  results_by_type <- purrr::imap(types_list, function(.x, .y) {
    .types <- .y
    df_hubPos <- get_stats(counts_list[grepl('hubPos', names(counts_list))], .types)
    df_hubNeg <- get_stats(counts_list[grepl('hubNeg', names(counts_list))], .types)
    df <- bind_rows(list(hubPos = df_hubPos, hubNeg = df_hubNeg), .id = 'Status')
    df_stat <- dcast(df, dist_bin ~ Status, value.var = c('estimate', 'variance'))[
      , c('p', 'log2_fold_change') := t_test_and_lfc(estimate_hubPos, variance_hubPos, n_hubPos, estimate_hubNeg, variance_hubNeg, n_hubNeg), dist_bin
    ]
    return(list(
      hubPos_data = df_hubPos,
      hubNeg_data = df_hubNeg,
      stats_data = df_stat
    ))
  })
  
  transposed_results <- purrr::transpose(results_by_type)
  all_hubPos_df <- dplyr::bind_rows(transposed_results$hubPos_data, .id = "cell_type")
  all_hubNeg_df <- dplyr::bind_rows(transposed_results$hubNeg_data, .id = "cell_type")
  summary_stats <- dplyr::bind_rows(transposed_results$stats_data, .id = "cell_type")
  
  summary_stats[, padj_global := p.adjust(p, method = 'fdr')]
  summary_stats[, height := max(estimate_hubPos + 1.96 * sqrt(variance_hubPos), estimate_hubNeg + 1.96 * sqrt(variance_hubNeg)), by = .(cell_type, dist_bin)]
  summary_stats[, asterisk := fifelse(padj_global < 0.01, "*", "")]
  
  return(list(
    summary_stats = summary_stats,
    hubPos_results = all_hubPos_df,
    hubNeg_results = all_hubNeg_df
  ))
}

#' @title Run a Global Analysis Comparing MMR Status
#' @description This master function automates the entire statistical comparison.
run_global_mmr_analysis <- function(types_list, counts_list, mmr_map) {
  
  # Dynamically calculate sample sizes to make the analysis robust
  n_msi <- length(unique(mmr_map[MMRstatus == 'MMRd']$SampleID))
  n_mss <- length(unique(mmr_map[MMRstatus == 'MMRp']$SampleID))
  
  # Iterate over the simplified cell type names (`type_lvl3`)
  results_by_type <- purrr::imap(types_list, function(.x, .y) {
    .types <- .y # Use the name of the list element (the correct type) for subsetting
    
    # Run meta-analysis for each group
    df_MSI <- get_stats(counts_list[mmr_map[MMRstatus == 'MMRd']$SampleID], .types)
    df_MSS <- get_stats(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], .types)
    
    # Combine results for direct comparison
    df <- bind_rows(list(MSI = df_MSI, MSS = df_MSS), .id = 'Status')
    
    # Reshape and run Welch's t-test on the meta-analyzed estimates
    df_stat <- dcast(df, dist_bin ~ Status, value.var = c('estimate', 'variance'))[
      , c('p', 'log2_fold_change') := t_test_and_lfc(estimate_MSI, variance_MSI, n_msi, estimate_MSS, variance_MSS, n_mss), dist_bin
    ]
    
    return(list(MSI_data = df_MSI, MSS_data = df_MSS, stats_data = df_stat))
  })
  
  # Restructure the list of lists into a more usable format
  transposed_results <- purrr::transpose(results_by_type)
  all_MSI_df <- dplyr::bind_rows(transposed_results$MSI_data, .id = "cell_type")
  all_MSS_df <- dplyr::bind_rows(transposed_results$MSS_data, .id = "cell_type")
  summary_stats <- dplyr::bind_rows(transposed_results$stats_data, .id = "cell_type")
  
  # Perform global FDR correction across all p-values from all tests
  summary_stats[, padj_global := p.adjust(p, method = 'fdr')]
  summary_stats[, asterisk := fifelse(padj_global < 0.01, "*", "")]
  summary_stats[, height := max(estimate_MSI + 1.96 * sqrt(variance_MSI), estimate_MSS + 1.96 * sqrt(variance_MSS)), by = .(cell_type, dist_bin)]
  
  # Return the final, tidy list of results
  return(list(summary_stats = summary_stats, MSI_results = all_MSI_df, MSS_results = all_MSS_df))
}

## 3. Data Loading and Preprocessing

Here, we load the main cell data and the interface data. We perform some initial cleaning on the cell types, simplifying them into a `type_lvl3` category for the main analysis.

In [None]:
tiles_to_omit = read.csv('../Tessera tiles/Tessera processed results/tiles_to_exclude_from_interface_analysis.csv') %>%
    filter(tiles_to_exclude_from_interface_analysis != '') %>%
    pull(agg_id)
length(tiles_to_omit)
head(tiles_to_omit)

In [None]:
# Load cell data
cells = readr::read_rds('../Tessera tiles/Tessera processed results/tile_metadata_2025-07-22.rds') 
cells$type_lvl1[cells$type_lvl2 == 'Mast'] = 'Mast' 


# Simplify cell type annotations
cells <- cells %>%
    filter(!agg_id %in% tiles_to_omit) %>%
    mutate(type_lvl2 = case_when(type_lvl2 == 'Myeloid-ISGlow' ~ 'Myeloid-ISG', .default = type_lvl2)) %>%
    mutate(type_lvl3 = type_lvl2) %>%
    #mutate(type_lvl3 = gsub(type_lvl2, pattern = '-prolif', replacement = '')) %>% # |high|low|-PD1
    mutate(type_lvl3 = gsub(type_lvl3, pattern = 'Epi.*', replacement = 'Epi')) %>% 
    select(c('PatientID', 'SampleID', 'MMRstatus', 'X', 'Y', 'tessera_annotation', 'type_lvl3', 'type_lvl1', 'type_lvl2', 'cell_id', 'cxcl_pos_tile'))

glimpse(cells)

# Create a list for grouping cell types by lineage
lineage_list <- cells %>% 
    select(type_lvl1, type_lvl3) %>% 
    distinct %>%
    {split(.$type_lvl3, .$type_lvl1)}

In [None]:
lineage_list[['Myeloid']]

In [None]:
cells %>% 
    rename(merged_states = type_lvl3) %>% 
    group_by(PatientID, SampleID, MMRstatus, tessera_annotation, type_lvl1, type_lvl2, merged_states, cxcl_pos_tile) %>%
    summarize(n = n()) %>% 
    ungroup %>%
    fwrite(., file = 'figs/table_of_cell_states_per_tessera_region.csv')

In [None]:
require(tidyverse)
cells %>% 
    rename(merged_states = type_lvl3) %>% 
    group_by(PatientID, SampleID, MMRstatus, tessera_annotation, type_lvl1, type_lvl2, merged_states) %>%
    summarize(n = n()) %>% 
    ungroup %>%
    mutate(states = interaction(type_lvl2, tessera_annotation, sep = ' from ')) %>%
    select(states, n, PatientID, SampleID) %>%
    pivot_wider(data = ., names_from = states, values_from = n) %>%
    fwrite(., file = 'figs/table_of_cell_states_per_tessera_region_2.csv')

In [None]:
# Load interface data for each sample
ids = unique(cells$SampleID[cells$MMRstatus == 'MMRd'])
interfaces = map(ids, function(.id) {
    fname = normalizePath(list.files(path = '../Tessera tiles/Spatial objects for tumor-stromal interfaces in all MERFISH samples/', pattern = '_tumor_stromal_interfaces.rds', full.names = TRUE)[grepl(list.files(path = '../Tessera tiles/Spatial objects for tumor-stromal interfaces in all MERFISH samples/', pattern = '_tumor_stromal_interfaces.rds', full.names = TRUE), pattern = .id)])
    readRDS(fname)
})
names(interfaces) = ids

glimpse(interfaces[[1]])

# Plot lengths of interface

In [None]:
all_ids = unique(cells$SampleID)
all_ids
all_interfaces = map(all_ids, function(.id) {
    fname = normalizePath(list.files(path = '../Tessera tiles/Spatial objects for tumor-stromal interfaces in all MERFISH samples/', pattern = '_tumor_stromal_interfaces.rds', full.names = TRUE)[grepl(list.files(path = '../Tessera tiles/Spatial objects for tumor-stromal interfaces in all MERFISH samples/', pattern = '_tumor_stromal_interfaces.rds', full.names = TRUE), pattern = .id)])
    readRDS(fname)
})
names(all_interfaces) = all_ids

In [None]:
lengthsOfInterfaces = all_interfaces %>%
    rbindlist(ignore.attr=TRUE) %>%
    st_as_sf(sf_column_name = 'x') %>%
    filter(!st_is_empty(.)) %>%
    mutate(len = st_length(x)) %>%
    as.data.frame() %>%
    mutate(Type_of_Interface = case_when(
       Type_of_Interface == 'CXCLneg tumor & CXCLneg stroma' ~ 'Hub-',
       Type_of_Interface == 'CXCLpos tumor & CXCLpos stroma' ~ 'Hub+',
        .default = 'Heterotypic'
    )) %>%
    group_by(Type_of_Interface, SampleID) %>%
    summarize(len = sum(len)) %>%
    ungroup 

In [None]:
fig.size(h = 2, w = 3)
lengthsOfInterfaces_plot = ggplot(lengthsOfInterfaces %>%
    as.data.frame %>%
    left_join(., cells %>% select(SampleID, PatientID, MMRstatus) %>% distinct, by = 'SampleID') %>%
    group_by(Type_of_Interface, PatientID, MMRstatus) %>%
    summarize(len = sum(len))) +
    geom_col(aes(y = PatientID, x = len, fill = Type_of_Interface ), position = 'fill') +
    facet_wrap(~MMRstatus, ncol = 2, scales = 'free_y') +
    labs(x = 'Proportion of total interface', y = 'Patient', fill = 'Type of\nInterface') +
    cowplot::theme_half_open(7) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('Hub+' = '#D55E00', 'Hub-' = '#009E73', 'Heterotypic' = 'lightyellow'), 
        labels = c('Hub+' = 'Hub-inside', 'Hub-' = 'Hub-outside', 'Heterotypic' = 'Hub-border')
    ) + 
theme(axis.text.x = element_text(angle = 90), legend.position = 'bottom') +
guides(fill = guide_legend(ncol = 2)) +
NULL
lengthsOfInterfaces_plot
ggsave(plot = lengthsOfInterfaces_plot, filename = 'figs/lengthsOfInterfaces.pdf', width =3, height = 2, units = 'in')

In [None]:
lengthsOfInterfaces_plot

In [None]:
lengthsOfInterfaces %>%
    as.data.frame %>%
    left_join(., cells %>% select(SampleID, PatientID, MMRstatus) %>% distinct, by = 'SampleID') %>%
    group_by(Type_of_Interface, PatientID, MMRstatus) %>%
    summarize(len = sum(len)) %>%
    ungroup %>%
    group_by(MMRstatus, Type_of_Interface) %>%
    summarize(mean_len = mean(len)/1e4, sd_len = sd(len)/1e4) %>% # convert microns to cm
    ungroup

In [None]:
.temp = lengthsOfInterfaces %>%
    as.data.frame %>%
    left_join(., cells %>% select(SampleID, PatientID, MMRstatus) %>% distinct, by = 'SampleID') %>%
    group_by(Type_of_Interface, PatientID, MMRstatus) %>%
    summarize(len = sum(len)) %>%
    ungroup %>%
    group_by(PatientID, MMRstatus) %>%
    mutate(total_len = sum(len)) %>%
    mutate(prop_len = len/total_len) %>%
    ungroup %>%
    filter(Type_of_Interface == 'Hub+') %>%
    select(MMRstatus, prop_len) 
MMRd_interface = .temp %>% filter(MMRstatus == 'MMRd') %>% pull(prop_len)
MMRp_interface = .temp %>% filter(MMRstatus == 'MMRp') %>% pull(prop_len)
t.test(MMRd_interface, MMRp_interface)
# calculate the percentage of hub+ interface
(range(MMRd_interface) * 100) %>% round(digits = 1)
(range(MMRp_interface) * 100) %>% round(digits = 1)


## 4. Main Analysis: Calculate Distances and Bin Counts

This is the main computational step. We use `future_map` to run the `summarize_cells_by_interface_proximity` function in parallel for each sample. This generates a list where each element corresponds to a sample and contains the binned cell counts for its different interface types.

In [None]:
options(future.globals.maxSize = 1e10)
ids = unique(cells$SampleID[cells$MMRstatus == 'MMRd'])
system.time({
    counts_list = future_map(ids, function(.id) {
        summarize_cells_by_interface_proximity(cells[SampleID == .id], interfaces[[.id]])    
    }, .options = furrr::furrr_options(seed=TRUE))
    names(counts_list) = ids
})

## 5. Post-processing: Stratify and Standardize Data

After calculating the counts, we separate them based on the interface type ('hub positive' vs. 'hub negative'). We then use the `standardize_matrix_columns` utility function to ensure that all count matrices have the exact same set of cell type columns, which is essential for the downstream meta-analysis.

In [None]:
# Separate lists for hub positive and hub negative interfaces
hubPos_counts_list = lapply(counts_list, function(x){return(x[['CXCLpos tumor & CXCLpos stroma']])})
names(hubPos_counts_list) = paste0(names(counts_list), '_hubPos')

hubNeg_counts_list = lapply(counts_list, function(x){return(x[['CXCLneg tumor & CXCLneg stroma']])})
names(hubNeg_counts_list) = paste0(names(counts_list), '_hubNeg')

# Combine them back into a single list and standardize columns
counts_list = c(hubPos_counts_list, hubNeg_counts_list)
counts_list = standardize_matrix_columns(counts_list)

## Prepare a table of cell counts in spatial bins

In [None]:
allCounts = lapply(counts_list, function(x) x %>%
    as.data.frame() %>%
    tibble::rownames_to_column('bin')) %>%
rbindlist(idcol = 'sample_interface') %>%
mutate(sample = gsub(sample_interface, pattern = '\\_.*', replacement = '')) %>%
mutate(interface = gsub(sample_interface, pattern = '.*\\_', replacement = '')) %>%
mutate(interface = interface %>% as.factor %>% fct_recode('Hub-inside' = 'hubPos', 'Hub-outside' = 'hubNeg')) %>%
select(!sample_interface) %>%
mutate(midpoint = unlist(lapply(bin, find_midpoint))) 
allCounts$total_counts_per_bin = allCounts %>% select(!c(bin, sample, interface, midpoint)) %>% rowSums
allCounts$total_TNKILC_per_bin = allCounts %>% select(lineage_list[['TNKILC']]) %>% rowSums
allCounts$interface %>% unique
allCounts %>% pivot_longer(cols = allCounts %>% select(!c(bin, sample, interface, midpoint, 
                                                          total_counts_per_bin, total_TNKILC_per_bin)) %>% names) %>% 
mutate(name = gsub(pattern = 'Epi.*', replacement = 'Epi', x = name)) %>%
group_by(name, bin, sample, interface, midpoint) %>%
summarize(value = sum(value)) %>%
pivot_wider(values_from = value, names_from = name) %>%
write.csv(., 'figs/counts_in_bins.csv')

In [None]:
allCounts %>% select(!c(bin, sample, interface, midpoint)) %>% names

In [None]:
allCounts %>%
    group_by(midpoint, interface, sample) %>%
    summarize(percent_of_all_cells = 100*`Tcd8-CXCL13`/total_counts_per_bin) %>%
    group_by(midpoint, interface) %>%
    summarize(percent_of_all_cells = mean(percent_of_all_cells)) %>%
    ggplot() +
        geom_line(aes(x = midpoint, y = percent_of_all_cells, color = interface)) +
        facet_wrap(~interface)
allCounts %>%
    group_by(midpoint, interface, sample) %>%
    summarize(percent_of_TNKILC = 100*`Tcd8-CXCL13`/total_TNKILC_per_bin) %>%
    group_by(midpoint, interface) %>%
    summarize(percent_of_TNKILC = mean(percent_of_TNKILC)) %>%
    ggplot() +
        geom_line(aes(x = midpoint, y = percent_of_TNKILC, color = interface)) +
        facet_wrap(~interface)

In [None]:
allCounts %>%
    group_by(midpoint, interface, sample) %>%
    summarize(percent_of_all_cells = 100*`Tcd8-GZMK`/total_counts_per_bin) %>%
    group_by(midpoint, interface) %>%
    summarize(percent_of_all_cells = mean(percent_of_all_cells)) %>%
    ggplot() +
        geom_line(aes(x = midpoint, y = percent_of_all_cells, color = interface)) +
        facet_wrap(~interface)
allCounts %>%
    group_by(midpoint, interface, sample) %>%
    summarize(percent_of_TNKILC = 100*`Tcd8-GZMK`/total_TNKILC_per_bin) %>%
    group_by(midpoint, interface) %>%
    summarize(percent_of_TNKILC = mean(percent_of_TNKILC)) %>%
    ggplot() +
        geom_line(aes(x = midpoint, y = percent_of_TNKILC, color = interface)) +
        facet_wrap(~interface)

In [None]:
interface_plot = function(counts, .types, est_model=c('binomial', 'poisson', 'mle')) {
    est_model <- match.arg(est_model)
    df = empirical_bayes_summary(
        rowSums(counts[, .types, drop = FALSE]),
        rowSums(counts),
        rownames(counts),
        est_model
    ) 

    ## get max y value for plotting 
    ymax = 100 * max(df$estimate + 1.96 * sqrt(df$variance))
    
    p1 = ggplot(df, aes(dist_bin, 100 * estimate)) + 
        geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
        geom_point(aes(size = size)) + 
        geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0) + 
        geom_hline(yintercept = 0) + 
        geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin))) + 
        theme_bw(base_size = 16) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        labs(y = '% of all TNKILC', x = 'Distance Window', size = '# Cells', subtitle = 'mean & 95% CI, *padj<0.01', title = paste(.types, collapse = '; ')) + 
        geom_text(aes(y = 100 * (estimate + 1.96 * sqrt(variance)), label = asterisk), size = 6, vjust = 0) + 
        annotate("text", x = 0.5, y = ymax + .05, label = 'Stromal Side', hjust = 0, size = 6) + 
        annotate("text", x = 40.5, y = ymax + .05, label = 'Epithelial Side', hjust = 1, size = 6) + 
        NULL
    return(p1)
}

## per patient plots

In [None]:
require(patchwork)
pdf('figs/per_patient_plots_TNKILC_as_prop_of_all_cells.pdf', height = 18, width = 32)
for (state in lineage_list[['TNKILC']]){
.types = grep(state, colnames(counts_list$C110_hubPos), value = TRUE) #grep('PD1', colnames(counts_list$C110_hubPos), value = TRUE)
p1 = imap(hubPos_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub+)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '), 
                                      theme = theme(plot.title = element_text(size = 20, face = "bold", color = "darkblue")))
p2 = imap(hubNeg_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub-)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '),  theme = theme(plot.title = element_text(size = 20, face = "bold", color = "darkblue")))
print(p1)
print(p2)
}
dev.off()

In [None]:
.types = grep('PD1', colnames(counts_list$C110_hubPos), value = TRUE)
.types

In [None]:
fig.size(18, 32)
require(patchwork)
# imap(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], function(counts, .id) {    
imap(hubPos_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub+)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', ')) 

In [None]:
fig.size(18, 32)
require(patchwork)
# imap(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], function(counts, .id) {    
imap(hubNeg_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub-)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '))

## 6. Global Analysis Across All Cell Types

Now we run the main analysis function, `run_global_hub_analysis`. This function iterates through every cell type, performs the meta-analysis comparing hub-positive and hub-negative interfaces, calculates statistics, and returns a set of clean data frames ready for plotting.

In [None]:
# Create a list of cell types to iterate over
cellTypes = cells %>% 
    select(type_lvl2, type_lvl3) %>% 
    filter(!type_lvl2 %in% c('Plasma',  'Mast')) %>% # 'Epi',
    distinct
type_list <- lapply(split(cellTypes$type_lvl2, cellTypes$type_lvl3), unique)

# Run the full analysis
final_results <- run_global_hub_analysis(type_list, counts_list)

# Display the glimpse of the main summary table
glimpse(final_results$summary_stats)

## 7. Visualization

In this final section, we generate plots to visualize the results. We create faceted plots that group cell types by their major lineage (e.g., T-cells, Myeloid cells) to compare their distribution profiles between hub-positive and hub-negative interfaces.

In [None]:
MSS_results = fread('input_data/MSS_results.csv')  # THIS COMES FROM THE OUTPUT OF A PREVIOUS NOTEBOOK Step1_MSI_vs_MSS_interfaces.ipynb - RUN THAT FIRST!
head(MSS_results)

In [None]:
# Prepare data for plotting by combining hubPos and hubNeg results
df = bind_rows(list(hubPos = final_results$hubPos_results, 
                    hubNeg = final_results$hubNeg_results,
                    MSS = MSS_results %>% filter(cell_type %in% unique(c(final_results$hubPos_results$cell_type, final_results$hubNeg_results$cell_type)))
                   ), .id = 'Status') 

# Calculate y-axis limits for plotting
df <- df %>%
    group_by(cell_type) %>%
    mutate(ymax = 100 * max(estimate + 1.96 * sqrt(variance))) %>%
    ungroup



In [None]:
head(df)

# Supplementary table 4d Spatial patterning of all cell states, expressed as a proportion of all cells in each spatial bin

In [None]:
df %>% 
rename(Interface = Status) %>%
mutate(Interface = Interface %>% as.factor %>% fct_recode('Hub-inside' = 'hubPos', 'Hub-outside' = 'hubNeg', 'MMRp' = 'MSS')) %>%
mutate(estimate = 100*estimate) %>%
select(!c(asterisk, ymax)) %>%
mutate(`Lower Confidence Limit` = estimate - 1.96 * sqrt(variance)) %>%
mutate(`Upper Confidence Limit` = estimate + 1.96 * sqrt(variance)) %>%
rename(`Adjusted p-value` = padj) %>%
rename(`Spatial bin around the interface` = dist_bin) %>%
mutate(p = ifelse(test = Interface == 'MMRp', yes = NA, no = p)) %>% #, `Adjusted p-value` = NA
mutate(`Adjusted p-value` = ifelse(test = Interface == 'MMRp', yes = NA, no = `Adjusted p-value`)) %>% #, `Adjusted p-value` = NA

write.csv('figs/cellstates/table_of_cell_states_as_prop_of_all_cells.csv')


# Plot all lineages on a single page for supplementary figure

In [None]:
# 1. Define and create the list first
cell_type_list <- list(#,
    TNKILC = lineage_list[['TNKILC']], #c("Tcd8-gdlike", "Tcd8-gdlike-PD1", "Tcd8-GZMK", "Tplzf-gdlike", "Tcd4-CXCL13", "Tcd4-TFH", "Tcd4-IL7R", "NK-CD16", "NK-XCL1", "ILC3"), # "Tcd8-CXCL13", "Tcd8-HOBIT",  "Tcd4-Treg",
    Myeloid = lineage_list[['Myeloid']], #c("Myeloid-Macro-MMP9-APOE", "Myeloid-Macro",
    #             "Myeloid-Mono-VEGFA", "Myeloid-inflamm", "Myeloid-Mono-S100-VCAN",
    #             "Myeloid-Mono-CSF1R", "Myeloid-DC-pDC_ASDC", "Myeloid-Macro-C1Q", "Myeloid-Macro-SEPP1-LYVE1",
    #              "Myeloid-Granulo", "Myeloid-DC1",
    #             "Myeloid-DCmreg", "Myeloid-DC2", "Myeloid-DC2-C1Q", "Myeloid-ISG-high"), # "Myeloid-ISG",
    Strom = lineage_list[['Strom']], #c("Fibro-BMP", "Fibro-CCL2", "Fibro-StemNiche", "Fibro-MMP3", "Fibro-CXCL14", "Fibro-GREM1", "Fibro-myo", 
            #  "SmoothMuscle", "Pericyte", "Endo-art", "Endo-cap", "Endo", "Endo-tip", "Endo-ven", "Endo-lymph", "Schwann"),
    B = lineage_list[['B']]
#   Plasma = lineage_list[['Plasma']],
#   Mast = lineage_list[['Mast']]
)

# 2. Now, pipe the created list into the other functions
order_of_cell_types <- cell_type_list %>%
    unlist() %>%
    str_wrap(string = ., width = 10, whitespace_only = FALSE)

# 3. View the final output
order_of_cell_types

In [None]:
midpoints_of_bins = final_results$summary_stats$dist_bin %>%
    unique %>%
    lapply(., find_midpoint) %>%
    unlist()
names(midpoints_of_bins) = unique(final_results$summary_stats$dist_bin)
midpoints_of_bins <- setNames(names(midpoints_of_bins), midpoints_of_bins)
print(midpoints_of_bins)

In [None]:
anyNA(df$dist_bin)
anyNA(final_results$summary_stats$dist_bin)

In [None]:
df %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(midpoint = forcats::fct_recode(dist_bin, !!!midpoints_of_bins)) %>%
    mutate(midpoint = as.vector(midpoint)) %>%
    mutate(midpoint = as.numeric(midpoint)) %>%
    filter(is.na(midpoint))

In [None]:
final_results$summary_stats = final_results$summary_stats %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(midpoint = forcats::fct_recode(dist_bin, !!!midpoints_of_bins)) %>%
    mutate(midpoint = as.vector(midpoint)) %>%
    mutate(midpoint = as.numeric(midpoint))

df = df %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(midpoint = forcats::fct_recode(dist_bin, !!!midpoints_of_bins)) %>%
    mutate(midpoint = as.vector(midpoint)) %>%
    mutate(midpoint = as.numeric(midpoint))

final_results$summary_stats %>%
    select(dist_bin, midpoint) %>%
    distinct

In [None]:
final_results$summary_stats$cell_type %>% unique %>% str_wrap(string = , width = 10, whitespace_only = FALSE)

In [None]:
df %>% head

In [None]:
final_results$summary_stats = final_results$summary_stats %>% left_join(., cells %>% select(type_lvl1, type_lvl3) %>% distinct %>% rename(cell_type = type_lvl3))
head(final_results$summary_stats)

In [None]:
df = df %>% left_join(., cells %>% select(type_lvl1, type_lvl3) %>% distinct %>% rename(cell_type = type_lvl3))

## Supplementary Figure 4E

In [None]:
cell_states_with_more_than_2_signif_bins = df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>% 
    group_by(cell_type) %>%
    summarize(n_signif = sum(asterisk != '')) %>%
    #filter(n_signif >= 2) %>%
    #filter(!cell_type %in% c("Tcd8-CXCL13", "Tcd8-GZMK", "Tcd4-CXCL13", "Tcd4-Treg", 'Myeloid-ISG', 'DCmreg', 'Tcd8-\ngdlike-PD1')) %>%
    pull(cell_type)
cell_states_with_more_than_2_signif_bins

In [None]:
fig.size(h = 11, w = 8)
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    #filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

text_data_epi <- df %>% 
    #filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '  Epi') #'Epi')

text_data_stroma <- df %>% 
    #filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma') #    Stroma')
# ROBUST METHOD: Calculate the intercept based on the factor levels
levels_list <- levels(fct_reorder(df$dist_bin, df$midpoint))
zero_crossing_index <- which(levels_list == "(0,5]") - 0.5 
# If "(0,5]" isn't the exact string, adjust to match your first positive bin level.
# Alternatively, if you want it exactly between the 4th and 5th bin:
# zero_crossing_index <- 4.5
# Create the plot
supp_fig = df %>% 
    #filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status, fill = Status)
) + 
# FIX: Use the calculated numeric index for the factor axis
    geom_vline(xintercept = zero_crossing_index, color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 7), 
        legend.position = 'top', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    NULL

supp_fig
ggsave(filename = 'figs/supplementary_fig_line_traces_all_states.pdf', 
       plot = supp_fig, width = 8, height = 11, units = 'in')

## Supp fig - TNKILC

In [None]:
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)
cell_states_with_more_than_2_signif_bins = lineage_list[['TNKILC']] 
cell_states_with_more_than_2_signif_bins = cell_states_with_more_than_2_signif_bins[!cell_states_with_more_than_2_signif_bins %in% c('Tcd8-CXCL13', 'Tcd8-GZMK', 'Tcd4-CXCL13', 
                                                                                                                                     'Tcd4-Treg', 'Tplzf-gdlike', 'Tcd8-gdlike-PD1')]                                            
cell_states_with_more_than_2_signif_bins = cell_states_with_more_than_2_signif_bins %>% str_wrap(width = 15, whitespace_only = FALSE) %>% sort %>% rev
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = factor(cell_type, ordered = FALSE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

text_data_epi <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = factor(cell_type, ordered = FALSE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '  Epi') #'Epi')

text_data_stroma <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma') #    Stroma')
# ROBUST METHOD: Calculate the intercept based on the factor levels
levels_list <- levels(fct_reorder(df$dist_bin, df$midpoint))
zero_crossing_index <- which(levels_list == "(0,5]") - 0.5 
# If "(0,5]" isn't the exact string, adjust to match your first positive bin level.
# Alternatively, if you want it exactly between the 4th and 5th bin:
# zero_crossing_index <- 4.5
# Create the plot
supp_fig_TNKILC = df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    mutate(cell_type = factor(cell_type, ordered = FALSE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status, fill = Status)
) + 
# FIX: Use the calculated numeric index for the factor axis
    geom_vline(xintercept = zero_crossing_index, color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", nrow = 2) +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 7), 
        legend.position = 'top', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    ggtitle('TNKILC lineage') +
    NULL
fig.size(h = 3, w = 6.5)
supp_fig_TNKILC

ggsave(filename = 'figs/supplementary_fig_line_traces_TNKILC.pdf', 
       plot = supp_fig_TNKILC, width = 6.5, height = 3, units = 'in')

## Supp fig - Myeloid

In [None]:
cell_states_with_more_than_2_signif_bins

final_results$summary_stats %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

In [None]:
fig.size(h = 11, w = 8)
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)
cell_states_with_more_than_2_signif_bins = c(lineage_list[['Myeloid']]) %>% str_wrap(string = ., width = 20, whitespace_only = FALSE)
cell_states_with_more_than_2_signif_bins = cell_states_with_more_than_2_signif_bins[!cell_states_with_more_than_2_signif_bins %in% c('Myeloid-ISG', 'Myeloid-DCmreg')]                                            
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

text_data_epi <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '  \nEpi tiles') #'Epi')

text_data_stroma <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    \nStromal tiles') #    Stroma')
# ROBUST METHOD: Calculate the intercept based on the factor levels
levels_list <- levels(fct_reorder(df$dist_bin, df$midpoint))
zero_crossing_index <- which(levels_list == "(0,5]") - 0.5 
# If "(0,5]" isn't the exact string, adjust to match your first positive bin level.
# Alternatively, if you want it exactly between the 4th and 5th bin:
# zero_crossing_index <- 4.5
# Create the plot
supp_fig_myeloid = df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status, fill = Status)
) + 
# FIX: Use the calculated numeric index for the factor axis
    geom_vline(xintercept = zero_crossing_index, color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", ncol = 4) +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 7), 
        legend.position = 'top', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    ggtitle('Myeloid lineage subsets') +
    NULL
fig.size(h = 3.8, w = 6.5)
supp_fig_myeloid
ggsave(filename = 'figs/supplementary_fig_line_traces_Myeloid.pdf', 
       plot = supp_fig_myeloid, width = 6.5, height = 3.8, units = 'in')

## Supp fig - Strom

## endo

In [None]:
lineage_list[['Strom']]

In [None]:
endo = lineage_list[['Strom']] %>% grep(pattern = 'Endo|Pericyte', value = TRUE)
endo
fibro = lineage_list[['Strom']] %>% grep(pattern = 'Fibro|Smooth|Schwann', value = TRUE)
fibro

In [None]:
fig.size(h = 11, w = 8)
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)
cell_states_with_more_than_2_signif_bins = endo %>% str_wrap(string = ., width = 15, whitespace_only = FALSE)

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

text_data_epi <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '  Epi') #'Epi')

text_data_stroma <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma') #    Stroma')
# ROBUST METHOD: Calculate the intercept based on the factor levels
levels_list <- levels(fct_reorder(df$dist_bin, df$midpoint))
zero_crossing_index <- which(levels_list == "(0,5]") - 0.5 
# If "(0,5]" isn't the exact string, adjust to match your first positive bin level.
# Alternatively, if you want it exactly between the 4th and 5th bin:
# zero_crossing_index <- 4.5
# Create the plot
supp_fig_endo = df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status, fill = Status)
) + 
# FIX: Use the calculated numeric index for the factor axis
    geom_vline(xintercept = zero_crossing_index, color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", nrow = 2) +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 7), 
        legend.position = 'top', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    ggtitle('Endothelial cells') +
    NULL
fig.size(h = 6.5, w = 6.5)
supp_fig_endo
ggsave(filename = 'figs/supplementary_fig_line_traces_Endo.pdf', 
       plot = supp_fig_myeloid, width = 6.5, height = 6.5, units = 'in')

## fibro

In [None]:
fig.size(h = 11, w = 8)
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)
cell_states_with_more_than_2_signif_bins = fibro %>% str_wrap(string = ., width = 15, whitespace_only = FALSE)

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

text_data_epi <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '  Epi') #'Epi')

text_data_stroma <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma') #    Stroma')
# ROBUST METHOD: Calculate the intercept based on the factor levels
levels_list <- levels(fct_reorder(df$dist_bin, df$midpoint))
zero_crossing_index <- which(levels_list == "(0,5]") - 0.5 
# If "(0,5]" isn't the exact string, adjust to match your first positive bin level.
# Alternatively, if you want it exactly between the 4th and 5th bin:
# zero_crossing_index <- 4.5
# Create the plot
supp_fig_fibro = df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status, fill = Status)
) + 
# FIX: Use the calculated numeric index for the factor axis
    geom_vline(xintercept = zero_crossing_index, color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", nrow = 3) +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 7), 
        legend.position = 'top', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    ggtitle('Fibroblasts and other stromal cells') +
    NULL
fig.size(h = 6.5, w = 6.5)
supp_fig_fibro
ggsave(filename = 'figs/supplementary_fig_line_traces_Fibro.pdf', 
       plot = supp_fig_myeloid, width = 6.5, height = 6.5, units = 'in')

In [None]:
fig.size(h = 8, w = 6.5)
supp_fig_strom = wrap_plots((supp_fig_endo + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", ncol = 4))
           , (supp_fig_fibro + facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", ncol = 4)), design = 'A\nA\nB\nB\nB', guides = 'keep') + plot_annotation(tag_levels = 'A') & theme(legend.position = 'top', aspect.ratio = NULL) 
supp_fig_strom

pdf(file = 'figs/supp_fig_4_strom.pdf', height = 8, width = 6.5)
supp_fig_strom
dev.off()

## Supp fig - B/Plasma/Mast

In [None]:
lineage_list[['B']]
lineage_list[['Plasma']]


In [None]:
fig.size(h = 11, w = 8)
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)
cell_states_with_more_than_2_signif_bins = c(lineage_list[['B']], lineage_list[['Plasma']])
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) 

text_data_epi <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '  Epi') #'Epi')

text_data_stroma <- df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    na.omit() %>%
    #mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma') #    Stroma')
# ROBUST METHOD: Calculate the intercept based on the factor levels
levels_list <- levels(fct_reorder(df$dist_bin, df$midpoint))
zero_crossing_index <- which(levels_list == "(0,5]") - 0.5 
# If "(0,5]" isn't the exact string, adjust to match your first positive bin level.
# Alternatively, if you want it exactly between the 4th and 5th bin:
# zero_crossing_index <- 4.5
# Create the plot
supp_fig_B = df %>% 
    mutate(cell_type = str_wrap(string = cell_type, width = 15, whitespace_only = FALSE)) %>%
    filter(cell_type %in% cell_states_with_more_than_2_signif_bins) %>%
    #filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = cell_states_with_more_than_2_signif_bins)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status, fill = Status)
) + 
# FIX: Use the calculated numeric index for the factor axis
    geom_vline(xintercept = zero_crossing_index, color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", nrow = 1) +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 7), 
        legend.position = 'top', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    ggtitle('B lineage') +
    NULL

fig.size(h = 2, w = 6.5)
supp_fig_B
ggsave(filename = 'figs/supplementary_fig_line_traces_B.pdf', 
       plot = supp_fig_B, width = 6.5, height = 2, units = 'in')

In [None]:
fig.size(h = 7, w = 6.5)
options(repr.plot.res = 400)
require(ggh4x)
require(ggragged)
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    na.omit() %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) 

text_data_epi <- df %>% 
    na.omit() %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    na.omit() %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
supp_fig1 = df %>% 
    filter(type_lvl1 == 'TNKILC') %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(midpoint, 100 * estimate, color = Status, fill = Status)
) + 
    geom_vline(xintercept = c(0), color = 'red') + 
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    # geom_hline(yintercept = 0) + 
    # geom_line(data = . %>% 
    #           filter(cell_type != 'Epi') %>%
    #           dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk %>%
    filter(type_lvl1 == 'TNKILC') ,
        aes(midpoint, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    # geom_text(
    #     data = text_data_epi, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    # geom_text(
    #     data = text_data_stroma, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 5, color = 'black'), # face = 'bold', 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle('TNKILC') +
    #facet_ragged_rows(rows = vars(type_lvl1), cols = vars(cell_type)) +
    NULL

supp_fig1
# ggsave(filename = 'supplementary_fig_line_traces_all_states.pdf', 
#        plot = supp_fig, width = 6.5, height = 7, units = 'in')

In [None]:
fig.size(h = 7, w = 6.5)
# Create the plot
supp_fig2 = df %>% 
    filter(type_lvl1 == 'Strom') %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(midpoint, 100 * estimate, color = Status, fill = Status)
) + 
    geom_vline(xintercept = c(0), color = 'red') + 
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    # geom_hline(yintercept = 0) + 
    # geom_line(data = . %>% 
    #           filter(cell_type != 'Epi') %>%
    #           dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk %>%
    filter(type_lvl1 == 'Strom') ,
        aes(midpoint, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    # geom_text(
    #     data = text_data_epi, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    # geom_text(
    #     data = text_data_stroma, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 5, color = 'black'), # face = 'bold', 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle('Stromal cells') +
    #facet_ragged_rows(rows = vars(type_lvl1), cols = vars(cell_type)) +
    NULL

supp_fig2

In [None]:
fig.size(h = 7, w = 6.5)
# Create the plot
supp_fig3 = df %>% 
    filter(type_lvl1 == 'Myeloid') %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(midpoint, 100 * estimate, color = Status, fill = Status)
) + 
    geom_vline(xintercept = c(0), color = 'red') + 
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    # geom_hline(yintercept = 0) + 
    # geom_line(data = . %>% 
    #           filter(cell_type != 'Epi') %>%
    #           dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk %>%
    filter(type_lvl1 == 'Myeloid') ,
        aes(midpoint, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    # geom_text(
    #     data = text_data_epi, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    # geom_text(
    #     data = text_data_stroma, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 5, color = 'black'), # face = 'bold', 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle('Myeloid cells') +
    #facet_ragged_rows(rows = vars(type_lvl1), cols = vars(cell_type)) +
    NULL

supp_fig3

In [None]:
fig.size(h = 7, w = 6.5)
# Create the plot
supp_fig4 = df %>% 
    filter(type_lvl1 == 'B') %>%
    mutate(cell_type = str_wrap(string = cell_type, width = 10, whitespace_only = FALSE)) %>%
    filter(cell_type %in% order_of_cell_types) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_types)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(midpoint, 100 * estimate, color = Status, fill = Status)
) + 
    geom_vline(xintercept = c(0), color = 'red') + 
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    # geom_hline(yintercept = 0) + 
    # geom_line(data = . %>% 
    #           filter(cell_type != 'Epi') %>%
    #           dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +     
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk %>%
    filter(type_lvl1 == 'B') ,
        aes(midpoint, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    # geom_text(
    #     data = text_data_epi, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    # geom_text(
    #     data = text_data_stroma, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 5, color = 'black'), # face = 'bold', 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle('B cells') +
    #facet_ragged_rows(rows = vars(type_lvl1), cols = vars(cell_type)) +
    NULL

supp_fig4

In [None]:
#suppfig1 + suppfig2 + suppfig3 + suppfig4 

### Plot: Epi lineage

In [None]:
head(df)

In [None]:
cells %>% 
    filter(type_lvl1 == 'Epi') %>%
    pull(type_lvl3) %>%
    unique

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)
lineage = 'Epi' # ,
tnkilc_order = cells %>% 
    filter(type_lvl1 == 'Epi') %>%
    pull(type_lvl3) %>%
    unique
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
all_epi_states = df %>% 
           filter(cell_type %in% lineage_list[[lineage]]) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
          mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), 
                      ymax = 100 * (estimate + 1.96 * sqrt(variance))), 
                  width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', 
         subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free') +
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL
all_epi_states
ggsave(all_epi_states, filename = glue::glue('figs/cellstates/Epi/', 'all_states', '.pdf'), 
       height = 4, width = 4, create.dir = TRUE)


lapply(tnkilc_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
          mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), 
                      ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', 
         subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'right', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL

ggsave(p1, filename = glue::glue('figs/cellstates/Epi/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})

### Plot: T-cell, NK, and ILC Lineages

In [None]:
head(df)

In [None]:
rev(lineage_list[['TNKILC']] %>% sort)

In [None]:
fig.size(h = 2.5, w = 6.5)
options(repr.plot.res = 300)
lineage = 'TNKILC' # ,
tnkilc_order = lineage_list[[lineage]] #c('Tcd4-IL7R','NK-CD16','Tcd8-gdlike','Tcd8-HOBIT', 'NK-XCL1','ILC3','Tcd4-TFH') %>% sort %>% rev

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
df %>% 
           filter(cell_type %in% lineage_list[[lineage]]) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
          mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free') +
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

tnkilc_plots = lapply(tnkilc_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
          mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(7) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        #aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 7), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 7), 
        legend.position = 'right', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL
return(p1)
ggsave(p1, filename = glue::glue('figs/cellstates/TNKILC/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})


### Supplement - T cell states

In [None]:

require(ggh4x)
lineage = 'TNKILC'
tnkilc_order = c('Tcd4-IL7R','NK-CD16','Tcd8-gdlike','Tcd8-HOBIT', 'NK-XCL1','ILC3','Tcd4-TFH') %>% sort %>% rev
# "Tcd8-CXCL13", "Tcd8-HOBIT", "Tcd8-gdlike", "Tcd8-gdlike-PD1", "Tplzf-gdlike", "Tcd8-GZMK",
# "Tcd4-CXCL13","Tcd4-Treg", "Tcd4-IL7R", "Tcd4-TFH"
# --- Prepare data for the plot ---
plot_data <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status))

# --- MODIFIED: Prepare breaks for the x-axis ---
# 1. Get all possible x-axis labels from the data in the correct order
all_x_labels <- levels(plot_data$dist_bin)

# 2. Create a base vector containing every 5th label
x_axis_breaks_sparse <- all_x_labels[seq(1, length(all_x_labels), by = 5)]

# 3. Combine the sparse breaks with the last label and remove any duplicates
x_axis_breaks <- unique(c(x_axis_breaks_sparse, tail(all_x_labels, 1)))

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
panel_for_suppFig4_tnkilc = ggplot(
    data = plot_data, 
    aes(dist_bin, 100 * estimate, color = Status)
) +
    geom_vline(xintercept = c(20.5), size = 0.5, linetype = 'dashed', color = 'red') +
    geom_point(size = 0.25) +
    geom_errorbar(
        aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), 
        width = 0, show.legend = FALSE, linewidth = 0.25
    ) +
    geom_line(
        data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), 
        show.legend = FALSE, linewidth = 0.25
    ) +
    cowplot::theme_half_open(8) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(y = 'Percent of all cells', 
         x = expression(paste("Distance from interface (", mu, "m)")),
        # title = 'T cell states'
        ) +
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk),
        size = 3, vjust = .2, show.legend = FALSE, color = 'black'
    ) +
    geom_text(
        data = text_data_epi,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    geom_text(
        data = text_data_stroma,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    scale_color_manual(
        name = ' Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) +
    
    scale_x_discrete(breaks = x_axis_breaks) +
    
    guides(color = guide_legend(override.aes = list(size = 4, shape = 16))) +
    facet_wrap2(~cell_type, scales = 'free_y', nrow = 2, axes = 'all', remove_labels = "x") +
    theme(axis.title = element_text(size = 10),
          axis.text = element_text(size = 7),
          strip.background = element_rect(fill = NA),
          strip.text = element_text(size = 8, face = 'bold', color = 'black'),
          title = element_text(size = 10),
          legend.title = element_text(size = 9, face = 'bold'),
          legend.text = element_text(size = 10),
          legend.position = 'top'#,
         # aspect.ratio = 1
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(label = '', subtitle = '') + #IVW meta-analysis; mean & 95% CI, *padj<0.01 
    NULL

# Set the final figure dimensions
fig.size(h = 3, w = 6.5)
options(repr.plot.res = 300)

panel_for_suppFig4_tnkilc #+ ggtitle('T/NK/ILC states')
ggsave(filename = 'figs/figure5/tnkilc.pdf', plot = panel_for_suppFig4_tnkilc , width = 6.5, height = 4, units = 'in', create.dir = TRUE)

## Main t cell states

In [None]:

require(ggh4x)
lineage = 'TNKILC'
tnkilc_order = c("Tcd8-CXCL13", "Tcd8-gdlike-PD1", "Tplzf-gdlike", "Tcd8-GZMK", "Tcd4-CXCL13","Tcd4-Treg") %>% sort #%>% rev
# --- Prepare data for the plot ---
plot_data <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status))

# --- MODIFIED: Prepare breaks for the x-axis ---
# 1. Get all possible x-axis labels from the data in the correct order
all_x_labels <- levels(plot_data$dist_bin)

# 2. Create a base vector containing every 5th label
x_axis_breaks_sparse <- all_x_labels[seq(1, length(all_x_labels), by = 5)]

# 3. Combine the sparse breaks with the last label and remove any duplicates
x_axis_breaks <- unique(c(x_axis_breaks_sparse, tail(all_x_labels, 1)))

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
panel_for_fig_5_tnkilc = ggplot(
    data = plot_data, 
    aes(dist_bin, 100 * estimate, color = Status)
) +
    geom_vline(xintercept = c(20.5), size = 0.5, linetype = 'dashed', color = 'red') +
    geom_point(size = 0.25) +
    geom_errorbar(
        aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), 
        width = 0, show.legend = FALSE, linewidth = 0.25
    ) +
    geom_line(
        data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), 
        show.legend = FALSE, linewidth = 0.25
    ) +
    cowplot::theme_half_open(8) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(y = 'Percent of all cells', 
         x = expression(paste("Distance from interface (", mu, "m)")),
        # title = 'T cell states'
        ) +
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk),
        size = 3, vjust = .2, show.legend = FALSE, color = 'black'
    ) +
    geom_text(
        data = text_data_epi,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    geom_text(
        data = text_data_stroma,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    scale_color_manual(
        name = ' Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) +
    
    scale_x_discrete(breaks = x_axis_breaks) +
    
    guides(color = guide_legend(override.aes = list(size = 4, shape = 16))) +
    facet_wrap2(~cell_type, scales = 'free_y', nrow = 2, axes = 'all', remove_labels = "x") +
    theme(axis.title = element_text(size = 7),
          axis.text = element_text(size = 7),
          strip.background = element_rect(fill = NA),
          strip.text = element_text(size = 7, face = 'bold', color = 'black'),
          title = element_text(size = 7),
          legend.title = element_text(size = 7, face = 'bold'),
          legend.text = element_text(size = 7),
          legend.position = 'top'#,
         # aspect.ratio = 1
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(label = '', subtitle = '') + #IVW meta-analysis; mean & 95% CI, *padj<0.01 
    NULL

# Set the final figure dimensions
fig.size(h = 3, w = 4.5)
options(repr.plot.res = 300)

panel_for_fig_5_tnkilc #+ ggtitle('T/NK/ILC states')
ggsave(filename = 'figs/figure5/tnkilc.pdf', plot = panel_for_fig_5_tnkilc , width = 4.5, height = 3, units = 'in', create.dir = TRUE)

### Plot: Myeloid Lineage

In [None]:
lineage_list[['Myeloid']]

## Individual plots

In [None]:
fig.size(h = 20, w = 20)
options(repr.plot.res = 300)
lineage = 'Myeloid' # ,
tnkilc_order = lineage_list[[lineage]] #c('Tcd4-IL7R','NK-CD16','Tcd8-gdlike','Tcd8-HOBIT', 'NK-XCL1','ILC3','Tcd4-TFH') %>% sort %>% rev

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
df %>% 
           filter(cell_type %in% lineage_list[[lineage]]) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
          mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free') +
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

tnkilc_plots = lapply(tnkilc_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
          mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(7) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        #aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 7), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 7), 
        legend.position = 'right', 
        legend.text = element_text(size = 7)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL
return(p1)
ggsave(p1, filename = glue::glue('figs/cellstates/Myeloid/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})


## Main figure

In [None]:
require(ggh4x)
lineage = 'Myeloid'
myeloid_order <- str_wrap(c('Myeloid-ISG', 'DCmreg'), width = 20, whitespace_only = FALSE) # 'Macro-SEPP1-LYVE1', 'Macro-MMP9-APOE', 
#c('Myeloid-ISG', 'Myeloid-Macro-SEPP1-LYVE1', 
#                   'Myeloid-Macro-MMP9-APOE', 'Myeloid-Macro', 'Myeloid-Macro-C1Q',
#                   'Myeloid-ISGhigh', 'Myeloid-DCmreg', 'Myeloid-DC-pDC_ASDC') #unique(final_results$summary_stats$cell_type) %>% grep(pattern = 'Myeloid', value = TRUE)

# --- Prepare data for the plot ---
plot_data <- df %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-DC', replacement = 'DC')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Macro', replacement = 'Macro')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Mono', replacement = 'Mono')) %>%
    mutate(cell_type = str_wrap(cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status))

plot_data$cell_type %>% unique
# --- MODIFIED: Prepare breaks for the x-axis ---
# 1. Get all possible x-axis labels from the data in the correct order
all_x_labels <- levels(plot_data$dist_bin)

# 2. Create a base vector containing every 5th label
x_axis_breaks_sparse <- all_x_labels[seq(1, length(all_x_labels), by = 5)]

# 3. Combine the sparse breaks with the last label and remove any duplicates
x_axis_breaks <- unique(c(x_axis_breaks_sparse, tail(all_x_labels, 1)))

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>%
    mutate(cell_type = gsub(cell_type, pattern = 'groups|\\.Hub\\+', replacement = '')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-DC', replacement = 'DC')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Macro', replacement = 'Macro')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Mono', replacement = 'Mono')) %>%
    mutate(cell_type = str_wrap(cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order))

text_data_epi <- df %>%
    mutate(cell_type = gsub(cell_type, pattern = 'groups|\\.Hub\\+', replacement = '')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-DC', replacement = 'DC')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Macro', replacement = 'Macro')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Mono', replacement = 'Mono')) %>%
    mutate(cell_type = str_wrap(cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>%
    mutate(cell_type = gsub(cell_type, pattern = 'groups|\\.Hub\\+', replacement = '')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-DC', replacement = 'DC')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Macro', replacement = 'Macro')) %>%
    mutate(cell_type = gsub(cell_type, pattern = '^Myeloid-Mono', replacement = 'Mono')) %>%
    mutate(cell_type = str_wrap(cell_type, width = 20, whitespace_only = FALSE)) %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
panel_for_fig_5_myeloid = ggplot(
    data = plot_data, 
    aes(dist_bin, 100 * estimate, color = Status)
) +
    geom_vline(xintercept = c(20.5), size = 0.5, linetype = 'dashed', color = 'red') +
    geom_point(size = 0.25) +
    geom_errorbar(
        aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), 
        width = 0, show.legend = FALSE, linewidth = 0.25
    ) +
    geom_line(
        data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), 
        show.legend = FALSE, linewidth = 0.25
    ) +
    cowplot::theme_half_open(8) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(y = 'Percent of all cells', 
         x = expression(paste("Distance from interface (", mu, "m)"))#,
        # title = 'Myeloid cell states'
        ) +
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk),
        size = 3, vjust = .2, show.legend = FALSE, color = 'black'
    ) +
    geom_text(
        data = text_data_epi,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    geom_text(
        data = text_data_stroma,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    scale_color_manual(
        name = 'IVW meta-analysis; mean & 95% CI, *padj<0.01  Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub+', 'hubNeg' = 'Hub-', 'MSS' = 'MMRp')
    ) +
    
    scale_x_discrete(breaks = x_axis_breaks) +
    
    guides(color = guide_legend(override.aes = list(size = 4, shape = 16))) +
    facet_wrap2(~cell_type, scales = 'free_y', nrow = 2, axes = 'all', remove_labels = "x") +
    theme(axis.title = element_text(size = 10, face = 'plain'),
          axis.text = element_text(size = 7, face = 'plain'),
          strip.background = element_rect(fill = NA),
          strip.text = element_text(size = 8, face = 'bold', color = 'black'),
          title = element_text(size = 10, face = 'bold'),
          legend.title = element_text(size = 9, face = 'bold'),
          legend.text = element_text(size = 10),
          legend.position = 'none'
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

# Set the final figure dimensions
fig.size(h = 2, w = 3)
options(repr.plot.res = 300)

panel_for_fig_5_myeloid
ggsave(filename = 'figs/figure5/myeloid.pdf', plot = panel_for_fig_5_myeloid, width = 3, height = 2, units = 'in', create.dir = TRUE) # width = 7, height = 2

## Supplement

In [None]:

require(ggh4x)
lineage = 'Myeloid'
tnkilc_order = lineage_list[['Myeloid']]
tnkilc_order = tnkilc_order[!tnkilc_order %in% c('Myeloid-ISG', 'DCmreg')]
# "Tcd8-CXCL13", "Tcd8-HOBIT", "Tcd8-gdlike", "Tcd8-gdlike-PD1", "Tplzf-gdlike", "Tcd8-GZMK",
# "Tcd4-CXCL13","Tcd4-Treg", "Tcd4-IL7R", "Tcd4-TFH"
# --- Prepare data for the plot ---
plot_data <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status))

# --- MODIFIED: Prepare breaks for the x-axis ---
# 1. Get all possible x-axis labels from the data in the correct order
all_x_labels <- levels(plot_data$dist_bin)

# 2. Create a base vector containing every 5th label
x_axis_breaks_sparse <- all_x_labels[seq(1, length(all_x_labels), by = 5)]

# 3. Combine the sparse breaks with the last label and remove any duplicates
x_axis_breaks <- unique(c(x_axis_breaks_sparse, tail(all_x_labels, 1)))

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>%
    filter(cell_type %in% tnkilc_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
panel_for_suppFig4_myeloid = ggplot(
    data = plot_data, 
    aes(dist_bin, 100 * estimate, color = Status)
) +
    geom_vline(xintercept = c(20.5), size = 0.5, linetype = 'dashed', color = 'red') +
    geom_point(size = 0.25) +
    geom_errorbar(
        aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), 
        width = 0, show.legend = FALSE, linewidth = 0.25
    ) +
    geom_line(
        data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), 
        show.legend = FALSE, linewidth = 0.25
    ) +
    cowplot::theme_half_open(8) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(y = 'Percent of all cells', 
         x = expression(paste("Distance from interface (", mu, "m)")),
        # title = 'T cell states'
        ) +
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk),
        size = 3, vjust = .2, show.legend = FALSE, color = 'black'
    ) +
    geom_text(
        data = text_data_epi,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    geom_text(
        data = text_data_stroma,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    scale_color_manual(
        name = ' Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) +
    
    scale_x_discrete(breaks = x_axis_breaks) +
    
    guides(color = guide_legend(override.aes = list(size = 4, shape = 16))) +
    facet_wrap2(~cell_type, scales = 'free_y', axes = 'all', remove_labels = "x") +
    theme(axis.title = element_text(size = 10),
          axis.text = element_text(size = 7),
          strip.background = element_rect(fill = NA),
          strip.text = element_text(size = 8, face = 'bold', color = 'black'),
          title = element_text(size = 10),
          legend.title = element_text(size = 9, face = 'bold'),
          legend.text = element_text(size = 10),
          legend.position = 'top'#,
         # aspect.ratio = 1
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(label = '', subtitle = '') + #IVW meta-analysis; mean & 95% CI, *padj<0.01 
    NULL

# Set the final figure dimensions
fig.size(h = 3, w = 6.5)
options(repr.plot.res = 300)

panel_for_suppFig4_tnkilc #+ ggtitle('T/NK/ILC states')
ggsave(filename = 'figs/figure5/myeloid_supp.pdf', plot = panel_for_suppFig4_myeloid , width = 6.5, height = 5, units = 'in', create.dir = TRUE)

### Plot B, Plasma and Mast cells

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)
lineage = 'B|Plasma|Mast'
ggplot(df %>% filter(cell_type %in% unlist(lineage_list[grepl(names(lineage_list), pattern = 'B|Plasma|Mast')])), aes(dist_bin, 100 * estimate, color = Status)) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), show.legend = FALSE) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + #, title = paste(.types, collapse = '; ')) + 
    geom_text(
        data = final_results$summary_stats %>% filter(cell_type %in% lineage_list[grepl(names(lineage_list), pattern = 'B|Plasma|Mast')]),
        aes(y = 100 * height, label = asterisk), size = 6, vjust = .2, show.legend = FALSE,
        color = 'black'
    ) + 
    geom_text(data = df  %>% filter(cell_type %in% unlist(lineage_list[grepl(names(lineage_list), pattern = 'B|Plasma|Mast')])) %>% select(cell_type, ymax) %>% distinct() %>% mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi'), 
              aes(label = label, y = ymax, x = x), color = 'black', size = 3) +
    geom_text(data = df  %>% filter(cell_type %in% unlist(lineage_list[grepl(names(lineage_list), pattern = 'B|Plasma|Mast')])) %>% select(cell_type, ymax) %>% distinct() %>% mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma'), 
              aes(label = label, y = ymax, x = x), color = 'black', size = 3) +
    #annotate("text", x = 0.5, y = ymax + .05, label = 'Stromal Side', hjust = 0, size = 6) + 
    #annotate("text", x = 40.5, y = ymax + .05, label = 'Epithelial Side', hjust = 1, size = 6) + 
    scale_color_manual(values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73'), 
    name = 'Interface Type: ',
    labels = c('hubPos' = 'Hub+', 'hubNeg' = 'Hub-')) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free') +
    theme(aspect.ratio = 0.5, axis.text.x = element_text(size = 4), strip.background = element_rect(fill = NA), strip.text = element_text(size = 10, face = 'bold', color = 'black'), title = element_text(size = 10), legend.position = 'top', legend.text=element_text(size=10)) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

In [None]:
bplasma_order = df %>% filter(cell_type %in% unlist(lineage_list[grepl(names(lineage_list), pattern = 'B|Plasma|Mast')])) %>% pull(cell_type) %>% unique

lapply(bplasma_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'right', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL

ggsave(p1, filename = glue::glue('figs/cellstates/B_Plasma_Mast/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})

### Plot Stromal cells

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)
lineage = 'Strom'
strom_order = c("Fibro-BMP", "Fibro-CCL2", "Fibro-StemNiche", "Fibro-MMP3", "Fibro-CXCL14", "Fibro-GREM1", "Fibro-myo", "SmoothMuscle", "Pericyte", "Endo-art", "Endo-cap", "Endo", "Endo-tip", "Endo-ven", "Endo-lymph", "Schwann")
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = strom_order))

text_data_epi <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = strom_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = strom_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
ggplot(
    data = df %>% 
           filter(cell_type %in% lineage_list[[lineage]]) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = strom_order)), 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), show.legend = FALSE) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73'), 
        labels = c('hubPos' = 'Hub+', 'hubNeg' = 'Hub-')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free') +
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL


lapply(strom_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+ MMRd', 'hubNeg' = 'Hub- MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'right', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL

ggsave(p1, filename = glue::glue('figs/cellstates/Strom/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})

### Plot: Key States of Interest

In [None]:
fig.size(h = 6, w = 10)
options(repr.plot.res = 500)
states_of_interest = c('Myeloid-ISG', 'Tcd4-Treg', 'Tcd8-CXCL13', 'Tcd8-HOBIT', 'Fibro-BMP', 'Myeloid-DCmreg')
ggplot(df %>% filter(cell_type %in% states_of_interest), aes(dist_bin, 100 * estimate, color = Status)) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), show.legend = FALSE) + 
    cowplot::theme_half_open(7) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = '') + #, title = paste(.types, collapse = '; ')) + 
    geom_text(
        data = final_results$summary_stats %>% filter(cell_type %in% states_of_interest),
        aes(y = 100 * height, label = asterisk), size = 6, vjust = .2, show.legend = FALSE,
        color = 'black'
    ) + 
    geom_text(data = df  %>% filter(cell_type %in% states_of_interest) %>% select(cell_type, ymax) %>% distinct() %>% mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi'), 
              aes(label = label, y = ymax, x = x), color = 'black', size = 3) +
    geom_text(data = df  %>% filter(cell_type %in% states_of_interest) %>% select(cell_type, ymax) %>% distinct() %>% mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma'), 
              aes(label = label, y = ymax, x = x), color = 'black', size = 3) +
    scale_color_manual(values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73'), 
    name = 'Interface Type: ',
    labels = c('hubPos' = 'Hub+', 'hubNeg' = 'Hub-')) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free', nrow = 2) +
    theme(aspect.ratio = 0.5, axis.text.x = element_text(size = 4), strip.background = element_rect(fill = NA), strip.text = element_text(size = 10, face = 'bold', color = 'black'), title = element_text(size = 10), legend.position = 'top', legend.text=element_text(size=10)) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

## 8. Examine the spatial patterning of lineages

### `summarize_lineages_by_interface_proximity`

This is the core data processing function. For a given sample, it takes cell coordinates and interface geometries as input. It then performs the following steps:
- Calculates the distance for each cell to its nearest interface.
- Annotates each cell with the type of that nearest interface.
- Assigns a sign to the distance based on whether the cell is in a 'Stromal-enriched' region.
- Bins the cells into 5µm distance intervals.
- Returns a named list of matrices, where each matrix contains the counts of cell types within each distance bin for a specific interface type.

In [None]:
summarize_lineages_by_interface_proximity = function(cells, interfaces) {
    ## Get distances and closest interface type
    pts = st_as_sf(cells[, .(X, Y)], coords = c('X', 'Y'))
    geos_pts = geos::as_geos_geometry(pts$geometry)
    geos_lines = geos::as_geos_geometry(interfaces$x[1:nrow(interfaces)])
    
    nearest_interfaces_idx = geos::geos_nearest(geos_pts, geos_lines)
    
    cells$closest_interface_type = interfaces$Type_of_Interface[nearest_interfaces_idx]
    cells$dist_interface = geos::geos_distance(geos_pts, geos_lines[nearest_interfaces_idx])
    
    ## Assign sign to distances
    cells$dist_interface_signed = fifelse(
        cells$tessera_annotation == 'Stromal-enriched',
        -cells$dist_interface,
        cells$dist_interface
    )
    
    ## Assign cells to 5um bins
    dist_breaks = seq(-100, 100, by = 5)
    cells$dist_bin = cut(cells$dist_interface_signed, breaks = dist_breaks, include.lowest = TRUE)

    # --- ROBUST SUMMARIZATION --

    cells = cells %>% filter(
        (closest_interface_type == 'CXCLpos tumor & CXCLpos stroma' & cxcl_pos_tile == 'CXCL_pos') | (closest_interface_type == 'CXCLneg tumor & CXCLneg stroma' & cxcl_pos_tile == 'CXCL_neg')        
    )
    
    cells_in_range = cells[!is.na(dist_bin)]
    
    if (nrow(cells_in_range) == 0) {
        warning("No cells found within the -100 to 100µm distance range.")
        return(list())
    }

    all_interface_types = unique(cells$closest_interface_type)
    cells_in_range[, closest_interface_type := factor(closest_interface_type, levels = all_interface_types)]

    counts_long = cells_in_range[, .N, by = .(closest_interface_type, dist_bin, type_lvl1)]

    counts_wide = dcast(counts_long,
                        closest_interface_type + dist_bin ~ type_lvl1,
                        value.var = "N",
                        fill = 0,
                        drop = FALSE)

    result_list = split(counts_wide, by = "closest_interface_type")

    result_list = lapply(result_list, function(dt) {
        row_names = dt$dist_bin
        count_cols = setdiff(names(dt), c("closest_interface_type", "dist_bin"))
        mat = as.matrix(dt[, ..count_cols])
        rownames(mat) = row_names
        return(mat)
    })

    return(result_list)
}

### 9. Main Analysis: Calculate Distances and Bin Counts

This is the main computational step. We use `future_map` to run the `summarize_lineages_by_interface_proximity` function in parallel for each sample. This generates a list where each element corresponds to a sample and contains the binned cell counts for its different interface types.

In [None]:
options(future.globals.maxSize = 1e10)

system.time({
    counts_list = future_map(ids, function(.id) {
        summarize_lineages_by_interface_proximity(cells[SampleID == .id], interfaces[[.id]])    
    }, .options = furrr::furrr_options(seed=TRUE))
    names(counts_list) = ids
})

### 10. Post-processing: Stratify and Standardize Data

After calculating the counts, we separate them based on the interface type ('hub positive' vs. 'hub negative'). We then use the `standardize_matrix_columns` utility function to ensure that all count matrices have the exact same set of cell type columns, which is essential for the downstream meta-analysis.

In [None]:
# Separate lists for hub positive and hub negative interfaces
hubPos_counts_list = lapply(counts_list, function(x){return(x[['CXCLpos tumor & CXCLpos stroma']])})
names(hubPos_counts_list) = paste0(names(counts_list), '_hubPos')

hubNeg_counts_list = lapply(counts_list, function(x){return(x[['CXCLneg tumor & CXCLneg stroma']])})
names(hubNeg_counts_list) = paste0(names(counts_list), '_hubNeg')

# Combine them back into a single list and standardize columns
counts_list = c(hubPos_counts_list, hubNeg_counts_list)
counts_list = standardize_matrix_columns(counts_list)

### 11. Global Analysis Across All Cell Types

Now we run the main analysis function, `run_global_hub_analysis`. This function iterates through every cell type, performs the meta-analysis comparing hub-positive and hub-negative interfaces, calculates statistics, and returns a set of clean data frames ready for plotting.

In [None]:
# Create a list of cell types to iterate over
cellTypes = cells %>% 
    select(type_lvl1, type_lvl1) %>% 
    distinct

type_list <- lapply(split(cellTypes$type_lvl1, cellTypes$type_lvl1), unique)

# Run the full analysis
final_results <- run_global_hub_analysis(type_list, counts_list)

# Display the glimpse of the main summary table
glimpse(final_results$summary_stats)

### 12. Visualization

In this final section, we generate plots to visualize the results. We create faceted plots that group cell types by their major lineage (e.g., T-cells, Myeloid cells) to compare their distribution profiles between hub-positive and hub-negative interfaces.

In [None]:
# Prepare data for plotting by combining hubPos and hubNeg results
df = bind_rows(list(hubPos = final_results$hubPos_results,
                    MSS = fread('input_data/MSS_lineages.csv'), 
                    hubNeg = final_results$hubNeg_results), .id = 'Status') 

# Calculate y-axis limits for plotting
df <- df %>%
    group_by(cell_type) %>%
    mutate(ymax = 100 * max(estimate + 1.96 * sqrt(variance))) %>%
    ungroup

# Create a list for grouping cell types by lineage
lineage_list <- cells %>% 
    select(type_lvl1, type_lvl1) %>% 
    distinct %>%
    {split(.$type_lvl1, .$type_lvl1)}

### 13. Plot: All Lineages

In [None]:
sample_n(df, 20)

In [None]:
df %>% select(Status, cell_type) %>% distinct

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)

# Define the desired order for the facets
lineage_order <- c("TNKILC", "Epi", "Strom", "Myeloid", "B", "Plasma", "Mast")

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = lineage_order))

text_data_epi <- df %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = lineage_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = lineage_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
df %>% mutate(cell_type = factor(cell_type, ordered = TRUE, levels = lineage_order)) %>%
ggplot(
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(aes(group = Status), show.legend = FALSE) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = 'Lineages') + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub+', 'hubNeg' = 'Hub-', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap(~cell_type, scales = 'free') +
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

In [None]:
require(ggh4x)
# Define the desired order for the facets
myeloid_order <- c('Epi', 'Strom', 'Myeloid', 'TNKILC') #c("TNKILC", "Epi", "Strom", "Myeloid", "B", "Plasma", "Mast")

# --- Prepare data for the plot ---
plot_data <- df %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(midpoint = unlist(lapply(dist_bin, find_midpoint))) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order)) %>%
    mutate(Status = case_when(Status == 'MSS' ~ 'MMRp',.default =  Status))

# --- MODIFIED: Prepare breaks for the x-axis ---
# 1. Get all possible x-axis labels from the data in the correct order
all_x_labels <- levels(plot_data$dist_bin)

# 2. Create a base vector containing every 5th label
x_axis_breaks_sparse <- all_x_labels[seq(1, length(all_x_labels), by = 5)]

# 3. Combine the sparse breaks with the last label and remove any duplicates
x_axis_breaks <- unique(c(x_axis_breaks_sparse, tail(all_x_labels, 1)))

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order))

text_data_epi <- df %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>%
    filter(cell_type %in% myeloid_order) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = myeloid_order)) %>%
    select(cell_type, ymax) %>%
    distinct() %>%
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
panel_for_fig_5_lineages = ggplot(
    data = plot_data, 
    aes(dist_bin, 100 * estimate, color = Status)
) +
    geom_vline(xintercept = c(20.5), size = 0.5, linetype = 'dashed', color = 'red') +
    geom_point(size = 0.25) +
    geom_errorbar(
        aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), 
        width = 0, show.legend = FALSE, linewidth = 0.25
    ) +
    geom_line(
        data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin)), 
        show.legend = FALSE, linewidth = 0.25
    ) +
    cowplot::theme_half_open(8) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    labs(y = 'Percent of all cells', 
         x = expression(paste("Distance from interface (", mu, "m)"))#,
         #title = 'Cell Lineages'#,
         #subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01'
        ) +
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk),
        size = 3, vjust = .2, show.legend = FALSE, color = 'black'
    ) +
    geom_text(
        data = text_data_epi,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    geom_text(
        data = text_data_stroma,
        aes(label = label, y = ymax, x = x),
        color = 'black', size = 2.5
    ) +
    scale_color_manual(
        name = 'Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) +
    
    scale_x_discrete(breaks = x_axis_breaks) +
    
    guides(color = guide_legend(override.aes = list(size = 4, shape = 16))) +
    facet_wrap2(~cell_type, scales = 'free_y', nrow = 1, axes = 'all', remove_labels = "x") +
    theme(axis.title = element_text(size = 8, face = 'plain'),
          axis.text = element_text(size = 7, face = 'plain'),
          strip.background = element_rect(fill = NA),
          strip.text = element_text(size = 8, face = 'bold', color = 'black'),
          title = element_text(size = 8, face = 'bold'),
          legend.title = element_text(size = 10, face = 'bold'),
          legend.text = element_text(size = 10),
          legend.position = 'top'
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL

# Set the final figure dimensions
fig.size(h = 2, w = 7)
options(repr.plot.res = 300)
panel_for_fig_5_lineages
ggsave(filename = 'figs/figure5/selected_lineages.pdf', plot = panel_for_fig_5_lineages, width = 7, height = 2, units = 'in', create.dir = TRUE)

# Supplementary table 4b:

In [None]:
df %>% 
rename(Interface = Status) %>%
mutate(Interface = Interface %>% as.factor %>% fct_recode('Hub-inside' = 'hubPos', 'Hub-outside' = 'hubNeg', 'MMRp' = 'MSS')) %>%
mutate(estimate = 100*estimate) %>%
select(!c(asterisk, ymax)) %>%
mutate(`Lower Confidence Limit` = estimate - 1.96 * sqrt(variance)) %>%
mutate(`Upper Confidence Limit` = estimate + 1.96 * sqrt(variance)) %>%
rename(`Adjusted p-value` = padj) %>%
rename(`Spatial bin around the interface` = dist_bin) %>%
mutate(p = ifelse(test = Interface == 'MMRp', yes = NA, no = p)) %>% #, `Adjusted p-value` = NA
mutate(`Adjusted p-value` = ifelse(test = Interface == 'MMRp', yes = NA, no = `Adjusted p-value`)) %>% #, `Adjusted p-value` = NA

write.csv('figs/cellstates/table_of_cell_lineages_as_prop_of_all_cells.csv')
unique(df$Status)

# Align panel for Figure 5

In [None]:
require(patchwork)

figure5 = (panel_for_fig_5_lineages ) + # + ggtitle('Lineages')
    (panel_for_fig_5_myeloid + 
    facet_wrap2(~cell_type, scales = 'free_y', nrow = 2, axes = 'all', remove_labels = "x") +
    scale_color_manual(
        name = 'Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    )) +
    (panel_for_fig_5_tnkilc + 
    facet_wrap2(~cell_type, scales = 'free_y', nrow = 2, axes = 'all', remove_labels = "x") +
    scale_color_manual(
        name = 'Interface Type: ',
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MMRp' = 'lightgrey'),
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    )) +
    plot_layout(ncol = 1, 
                design = 'AAAA\nBCCC\nBCCC', 
                guides = 'collect') & theme(plot.margin = unit(c(0,0.1,0,0), "cm"), 
                                           legend.position = 'top', legend.justification = "center",
                                           axis.title = element_text(size = 8, face = 'plain'),
          axis.text = element_text(size = 7, face = 'plain'),
          strip.background = element_rect(fill = NA),
          strip.text = element_text(size = 7, face = 'bold', color = 'black'),
          title = element_text(size = 7, face = 'bold'),
          legend.title = element_text(size = 10, face = 'bold'),
          legend.text = element_text(size = 10)
    ) +
fig.size(w = 6.5, h = 4.15)
figure5
ggsave(figure5, filename = 'figs/figure5/figure5_line_traces.pdf', width = 6.5, height = 4.15, units = 'in')

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)

# Define the desired order for the facets
lineage_order <- c("TNKILC", "Epi", "Strom", "Myeloid", "B", "Plasma", "Mast")

lapply(lineage_order, function(lineage){
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type==lineage)

text_data_epi <- df %>% 
    filter(cell_type==lineage) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type==lineage) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')

# Create the plot
p1 = df %>% 
   filter(cell_type == lineage) %>%
ggplot(
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(aes(group = Status), show.legend = FALSE) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = 'Lineages') + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(lineage) +
    NULL
    print(p1)
ggsave(p1, filename = glue::glue('figs/cellstates/lineages/', lineage, '.pdf'), width =4, height = 4, create.dir = TRUE)})

# Supplement: all lineages in single row

In [None]:
final_results$summary_stats = final_results$summary_stats %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(midpoint = forcats::fct_recode(dist_bin, !!!midpoints_of_bins)) %>%
    mutate(midpoint = as.vector(midpoint)) %>%
    mutate(midpoint = as.numeric(midpoint))

df = df %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(midpoint = forcats::fct_recode(dist_bin, !!!midpoints_of_bins)) %>%
    mutate(midpoint = as.vector(midpoint)) %>%
    mutate(midpoint = as.numeric(midpoint))

final_results$summary_stats %>%
    select(dist_bin, midpoint) %>%
    distinct

In [None]:
fig.size(h = 2, w = 6.5)
options(repr.plot.res = 400)
require(ggh4x)

order_of_cell_lineages = c('B', 'Plasma', 'Mast')
# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% order_of_cell_lineages) %>%
    na.omit() %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_lineages)) 

text_data_epi <- df %>% 
    na.omit() %>%
    filter(cell_type %in% order_of_cell_lineages) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_lineages)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = '\nEpi')

text_data_stroma <- df %>% 
    na.omit() %>%
    filter(cell_type %in% order_of_cell_lineages) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_lineages)) %>%
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '\nStroma')

# Create the plot
supp_fig_lineages = df %>% 
    filter(cell_type %in% order_of_cell_lineages) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = order_of_cell_lineages)) %>%
ggplot(
    data = ., 
    aes(midpoint, 100 * estimate, color = Status, fill = Status)
) + 
    geom_vline(xintercept = c(0), color = 'red') + 
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.5) +
    #geom_ribbon(aes(group = Status, 
    #                ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, color = NA) + 
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.5) + 
    # geom_hline(yintercept = 0) + 
    # geom_line(data = . %>% 
    #           filter(cell_type != 'Epi') %>%
    #           dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    labs(y = 'Percent of\nall cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') + 
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(midpoint, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    # geom_text(
    #     data = text_data_epi, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    # geom_text(
    #     data = text_data_stroma, 
    #     aes(label = label, y = ymax, x = x), 
    #     color = 'black', size = 1
    # ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, nrow = 1, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    theme(
        panel.spacing = unit(0, "cm"), 
        axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
        #aspect.ratio = 0.5, 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 7, color = 'black'), # face = 'bold', 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    guides(color = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    NULL
supp_fig_lineages
ggsave(filename = 'figs/supplementary_fig_line_traces_lineages.pdf', plot = supp_fig_lineages, width = 6.5, height = 2, units = 'in')

In [None]:
require(patchwork)

## Immune lineage supplementary figure

In [None]:
fig.size(h = 10, w = 6.5)
supp_fig_immune = wrap_plots(supp_fig_lineages, (supp_fig_myeloid + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x", nrow = 4))
           , supp_fig_TNKILC, supp_fig_B, design = 'A\nB\nB\nB\nB\nB\nC\nC\nD', guides = 'keep') + plot_annotation(tag_levels = 'A') & theme(text = element_text(size = 7), legend.position = 'top', aspect.ratio = NULL) 
supp_fig_immune

pdf(file = 'figs/supp_fig_4_immune.pdf', height = 10, width = 6.5)
supp_fig_immune
dev.off()

In [None]:
# # fig.size(h = 9, w = 6.5)

# # complete_fig = (supp_fig_lineages + labs(y = 'Percent of all cells')) + #  title = 'Lineages'
# #     (supp_fig + theme(legend.position = 'none') + labs(subtitle = '')) + # + labs(title = 'Cell states') 
# #     plot_layout(guides = 'keep', tag_level = 'new', design = 'A\nB\nB\nB\nB\nB\nB\nB\nB\nB') +
# #     plot_annotation(tag_levels = 'A')
# # complete_fig
# pdf('complete_supp_fig_cell_states.pdf', height = 9, width = 6)
# complete_fig
# dev.off()

# 9. TNKILCs as a proportion of T cells - compare hubPos vs hubNeg interfaces

In [None]:
head(cells)
# Create a list for grouping cell types by lineage
lineage_list <- cells %>% 
    select(type_lvl1, type_lvl3) %>% 
    distinct %>%
    {split(.$type_lvl3, .$type_lvl1)}

In [None]:
options(future.globals.maxSize = 10e9)

system.time({
    counts_list = future_map(ids, function(.id) {
        summarize_cells_by_interface_proximity(cells[SampleID == .id & type_lvl1 == 'TNKILC'], interfaces[[.id]])    
    }, .options = furrr::furrr_options(seed=TRUE))
    names(counts_list) = ids
})

## 5. Post-processing: Stratify and Standardize Data

After calculating the counts, we separate them based on the interface type ('hub positive' vs. 'hub negative'). We then use the `standardize_matrix_columns` utility function to ensure that all count matrices have the exact same set of cell type columns, which is essential for the downstream meta-analysis.

In [None]:
# Separate lists for hub positive and hub negative interfaces
hubPos_counts_list = lapply(counts_list, function(x){return(x[['CXCLpos tumor & CXCLpos stroma']])})
names(hubPos_counts_list) = paste0(names(counts_list), '_hubPos')

hubNeg_counts_list = lapply(counts_list, function(x){return(x[['CXCLneg tumor & CXCLneg stroma']])})
names(hubNeg_counts_list) = paste0(names(counts_list), '_hubNeg')

# Combine them back into a single list and standardize columns
counts_list = c(hubPos_counts_list, hubNeg_counts_list)
counts_list = standardize_matrix_columns(counts_list)

In [None]:
names(counts_list)

In [None]:
interface_plot = function(counts, .types, est_model=c('binomial', 'poisson', 'mle')) {
    est_model <- match.arg(est_model)
    df = empirical_bayes_summary(
        rowSums(counts[, .types, drop = FALSE]),
        rowSums(counts),
        rownames(counts),
        est_model
    ) 

    ## get max y value for plotting 
    ymax = 100 * max(df$estimate + 1.96 * sqrt(df$variance))
    
    p1 = ggplot(df, aes(dist_bin, 100 * estimate)) + 
        geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
        geom_point(aes(size = size)) + 
        geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0) + 
        geom_hline(yintercept = 0) + 
        geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin))) + 
        theme_bw(base_size = 16) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        labs(y = '% of all TNKILC', x = 'Distance Window', size = '# Cells', subtitle = 'mean & 95% CI, *padj<0.01', title = paste(.types, collapse = '; ')) + 
        geom_text(aes(y = 100 * (estimate + 1.96 * sqrt(variance)), label = asterisk), size = 6, vjust = 0) + 
        annotate("text", x = 0.5, y = ymax + .05, label = 'Stromal Side', hjust = 0, size = 6) + 
        annotate("text", x = 40.5, y = ymax + .05, label = 'Epithelial Side', hjust = 1, size = 6) + 
        NULL
    return(p1)
}

# Per patient plots

In [None]:
lineage_list[['TNKILC']]

In [None]:
# for (state in lineage_list[['TNKILC']]){
# .types = grep(state, colnames(counts_list$C110_hubPos), value = TRUE)
# fig.size(18, 32)
# require(patchwork)
# # imap(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], function(counts, .id) {    
# p1 = imap(hubPos_counts_list, function(counts, .id) {    
#     interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub+)'))    
# }) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '))
#     p2 = imap(hubNeg_counts_list, function(counts, .id) {    
#     interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub-)'))    
# }) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '))
#     print(p1 + p2)
# }

In [None]:
require(patchwork)
pdf('figs/per_patient_plots_TNKILC_prop.pdf', height = 18, width = 32)
for (state in lineage_list[['TNKILC']]){
.types = grep(state, colnames(counts_list$C110_hubPos), value = TRUE) #grep('PD1', colnames(counts_list$C110_hubPos), value = TRUE)
p1 = imap(hubPos_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub+)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '), 
                                      theme = theme(plot.title = element_text(size = 20, face = "bold", color = "darkblue")))
p2 = imap(hubNeg_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub-)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '),  theme = theme(plot.title = element_text(size = 20, face = "bold", color = "darkblue")))
print(p1)
print(p2)
}
dev.off()

In [None]:
state = lineage_list[['TNKILC']][2]
state
.types = grep(state, colnames(counts_list$C110_hubPos), value = TRUE) #grep('PD1', colnames(counts_list$C110_hubPos), value = TRUE)
.types

In [None]:
fig.size(18, 32)
require(patchwork)
# imap(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], function(counts, .id) {    
imap(hubPos_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub+)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '), 
                                      theme = theme(plot.title = element_text(size = 20, face = "bold", color = "darkblue")))

In [None]:
fig.size(18, 32)
require(patchwork)
# imap(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], function(counts, .id) {    
imap(hubNeg_counts_list, function(counts, .id) {    
    interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (Hub-)'))    
}) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '),  theme = theme(plot.title = element_text(size = 20, face = "bold", color = "darkblue")))

## 6. Global Analysis Across All Cell Types

Now we run the main analysis function, `run_global_hub_analysis`. This function iterates through every cell type, performs the meta-analysis comparing hub-positive and hub-negative interfaces, calculates statistics, and returns a set of clean data frames ready for plotting.

In [None]:
# Create a list of cell types to iterate over
cellTypes = cells[type_lvl1 == 'TNKILC'] %>% 
    select(type_lvl2, type_lvl3) %>% 
    distinct
type_list <- lapply(split(cellTypes$type_lvl2, cellTypes$type_lvl3), unique)

# Run the full analysis
final_results <- run_global_hub_analysis(type_list, counts_list)

# Display the glimpse of the main summary table
glimpse(final_results$summary_stats)

## 7. Visualization

In this final section, we generate plots to visualize the results. We create faceted plots that group cell types by their major lineage (e.g., T-cells, Myeloid cells) to compare their distribution profiles between hub-positive and hub-negative interfaces.

In [None]:
MSS_results = fread('input_data/MSS_results_TNKILCs_as_prop_of_lineage.csv') 
head(MSS_results %>% filter(cell_type == 'Tcd4-Treg'))

In [None]:
# Prepare data for plotting by combining hubPos and hubNeg results
df = bind_rows(list(hubPos = final_results$hubPos_results, 
                    hubNeg = final_results$hubNeg_results,
                    MSS = MSS_results %>% filter(cell_type %in% unique(c(final_results$hubPos_results$cell_type, final_results$hubNeg_results$cell_type)))
                   ), .id = 'Status') 

# Calculate y-axis limits for plotting
df <- df %>%
    group_by(cell_type) %>%
    mutate(ymax = 100 * max(estimate + 1.96 * sqrt(variance))) %>%
    ungroup

# Create a list for grouping cell types by lineage
lineage_list <- cells %>% 
    select(type_lvl1, type_lvl3) %>% 
    distinct %>%
    {split(.$type_lvl3, .$type_lvl1)}

In [None]:
df %>% filter(cell_type == 'Tcd4-Treg') %>% arrange(desc(estimate))

### Plot: T-cell, NK, and ILC Lineages

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)
lineage = 'TNKILC'
tnkilc_order = lineage_list[['TNKILC']] #c("Tcd8-CXCL13", "Tcd8-HOBIT", "Tcd8-gdlike", "Tcd8-gdlike-PD1", "Tcd8-GZMK", "Tplzf-gdlike", "Tcd4-CXCL13", "Tcd4-TFH", "Tcd4-Treg", "Tcd4-IL7R", "NK-CD16", "NK-XCL1", "ILC3")

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')





lapply(tnkilc_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 1, linetype = 1, color = 'red') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'right', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL

ggsave(p1, filename = glue::glue('figs/cellstates/TNKILC/As_proportion_of_TNKILCs/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})

## Supplementary Figure 7

In [None]:
fig.size(h = 9, w = 6.5)
# Create the plot
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)

supp_fig_7 = df %>% 
    # filter(Status == 'MSS') %>%
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    # CHANGE: Set levels to sort(unique(cell_type)) for alphabetical sorting
    mutate(cell_type = factor(cell_type, levels = sort(unique(cell_type)))) %>%
    ggplot(
        data = ., 
        aes(dist_bin, 100 * estimate, color = Status, fill = Status)
    ) + 
    geom_vline(xintercept = c(20.5), color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +      
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    # facet_wrap(~cell_type, scales = 'free') + # REMOVED: This was overwriting facet_wrap2
    theme(
        #aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 7, angle = 90), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    NULL
fig.size(h = 8, w = 6.5)
supp_fig_7
supp_fig_7 %>% ggsave(filename = 'figs/supplementary_figure_7_TNKILC_as_lineage_proportion.pdf', height = 8, width = 6.5)

In [None]:
df %>% 
rename(Interface = Status) %>%
mutate(Interface = Interface %>% as.factor %>% fct_recode('Hub-inside' = 'hubPos', 'Hub-outside' = 'hubNeg')) %>%
mutate(estimate = 100*estimate) %>%
select(!c(asterisk, ymax)) %>%
mutate(`Lower Confidence Limit` = estimate - 1.96 * sqrt(variance)) %>%
mutate(`Upper Confidence Limit` = estimate + 1.96 * sqrt(variance)) %>%
rename(`Adjusted p-value` = padj) %>%
rename(`Spatial bin around the interface` = dist_bin) %>%
mutate(cell_type = gsub(cell_type, pattern = 'Tcd8-gdlike-PD1', replacement = 'Tcd8-CXCL13-LAG3')) %>%
write.csv('figs/cellstates/TNKILC/As_proportion_of_TNKILCs/table_of_TNKILCs_as_prop_of_TNKILC_compartment.csv')

# Compare analysis

In [None]:
propOfTNKILC = read.csv('figs/cellstates/TNKILC/As_proportion_of_TNKILCs/table_of_TNKILCs_as_prop_of_TNKILC_compartment.csv') %>%
    mutate(analysis = 'propOfTNKILC')
propOfAllCells = read.csv('figs/cellstates/table_of_cell_states_as_prop_of_all_cells.csv') %>%
    mutate(analysis = 'propOfAllCells')


In [None]:
glimpse(propOfTNKILC)
glimpse(propOfAllCells)

In [None]:
head(propOfTNKILC)

In [None]:
temp1 = propOfTNKILC %>%
    mutate(estimate_propOfTNKILC = estimate) %>%
    select(Interface, analysis, cell_type, Spatial.bin.around.the.interface, estimate_propOfTNKILC) %>%
    filter(cell_type %in% c('Tcd8-GZMK', 'Tcd8-CXCL13')) %>%
    pivot_wider(names_from = cell_type, values_from = estimate_propOfTNKILC) %>%
    mutate(ratio_propOfTNKILC = `Tcd8-CXCL13`/`Tcd8-GZMK`) 
temp1 %>%
    head

In [None]:
temp2 = propOfAllCells %>%
    mutate(estimate_propOfAllCells = estimate) %>%
    select(Interface, analysis, cell_type, Spatial.bin.around.the.interface, estimate_propOfAllCells) %>%
    filter(cell_type %in% c('Tcd8-GZMK', 'Tcd8-CXCL13')) %>%
    pivot_wider(names_from = cell_type, values_from = estimate_propOfAllCells) %>%
    mutate(ratio_propOfAllCells = `Tcd8-CXCL13`/`Tcd8-GZMK`) 
temp2 %>%
    head

In [None]:
temp2 %>%
    select(Interface, Spatial.bin.around.the.interface, ratio_propOfAllCells) %>%
    full_join(., temp1 %>% select(Interface, Spatial.bin.around.the.interface, ratio_propOfTNKILC)) %>%
    head

In [None]:
fig.size(h = 3, w = 5)
temp2 %>%
    select(Interface, Spatial.bin.around.the.interface, ratio_propOfAllCells) %>%
    full_join(., temp1 %>% select(Interface, Spatial.bin.around.the.interface, ratio_propOfTNKILC)) %>%
    na.omit() %>%
    filter(ratio_propOfAllCells < Inf & ratio_propOfTNKILC < Inf) %>%
    select(ratio_propOfAllCells, ratio_propOfTNKILC) %>%
    cor

In [None]:
fig.size(h = 3, w = 5)
temp2 %>%
    select(Interface, Spatial.bin.around.the.interface, ratio_propOfAllCells) %>%
    full_join(., temp1 %>% select(Interface, Spatial.bin.around.the.interface, ratio_propOfTNKILC)) %>%
    ggplot() +
         geom_point(aes(x = ratio_propOfTNKILC, y = ratio_propOfAllCells)) +
         facet_wrap(~Interface)

# 9. TNKILCs as a proportion of T cells - only show MMRd specimens

In [None]:
head(cells)
# Create a list for grouping cell types by lineage
lineage_list <- cells %>% 
    select(type_lvl1, type_lvl3) %>% 
    distinct %>%
    {split(.$type_lvl3, .$type_lvl1)}

In [None]:
#' @title Calculate Cell Counts in Distance Bins from an Interface
#' @description This function takes spatial coordinates of cells and interface lines,
#'   calculates the signed distance of each cell to the nearest interface, and
#'   groups cells into discrete distance bins. It returns a matrix of cell
#'   type counts per bin for a single sample.
#' @param cells A data.table containing cell information, including 'X'/'Y' coordinates,
#'   cell type ('type_lvl3'), and a region annotation ('tessera_annotation').
#' @param interfaces An sf object containing interface geometries (e.g., LINESTRINGs).
#' @return A matrix where rows are distance bins (e.g., "(-5,0]") and columns
#'   are cell types ('type_lvl3'), with values representing cell counts.
get_bins = function(cells, interfaces) {
    # Convert data.frame coordinates to a spatial 'sf' object
    pts = st_as_sf(cells[, .(X, Y)], coords = c('X', 'Y'))
    
    # Use 'geos' for high-performance spatial operations
    geos_pts = geos::as_geos_geometry(pts$geometry)
    geos_lines = geos::as_geos_geometry(interfaces$x[1:nrow(interfaces)])
    
    # Find the nearest interface for each cell
    nearest_interfaces = geos::geos_nearest(geos_pts, geos_lines)
    
    # Calculate the distance to that nearest interface
    cells$dist_interface = geos::geos_distance(geos_pts, geos_lines[nearest_interfaces])
    
    # Assign a sign to the distance based on tissue region (stroma vs. other)
    cells$dist_interface_signed = case_when(
        cells$tessera_annotation == 'Stromal-enriched' ~ -cells$dist_interface,
        TRUE ~ cells$dist_interface
    )
    
    # Bin cells into 5µm distance intervals
    cells$dist_bin = cut(cells$dist_interface_signed, seq(-100, 100, by = 5), include.lowest = TRUE)

    # Create the final count matrix
    counts = cells[
        !is.na(dist_bin)
    ] %>%
        with(table(dist_bin, type_lvl3)) %>%
        data.table() %>%
        dcast(dist_bin ~ type_lvl3, value.var = 'N') %>%
        dplyr::mutate(dist_bin = factor(dist_bin, levels(cells$dist_bin))) %>%
        arrange(dist_bin) %>%
        tibble::column_to_rownames('dist_bin') %>%
        as.matrix()
    
    return(counts)
}

In [None]:
options(future.globals.maxSize = 1e10)

# Load interface geometry files for each sample
ids = unique(cells$SampleID)
interfaces = map(ids, function(.id) {
    fname = normalizePath(list.files(path = '../Tessera tiles/Spatial objects for tumor-stromal interfaces in all MERFISH samples/', 
                                     pattern = '_tumor_stromal_interfaces.rds', full.names = TRUE)[grepl(list.files(path = '../Tessera tiles/Spatial objects for tumor-stromal interfaces in all MERFISH samples/', 
                                                                                                                    pattern = '_tumor_stromal_interfaces.rds', 
                                                                                                                    full.names = TRUE), pattern = .id)])
    readRDS(fname)
})
names(interfaces) = ids

glimpse(interfaces[[1]])

system.time({
    counts_list = future_map(ids, function(.id) {
        get_bins(cells[SampleID == .id & type_lvl1 == 'TNKILC'], interfaces[[.id]])    
    }, .options = furrr::furrr_options(seed=TRUE))
    names(counts_list) = ids
})

In [None]:
counts_list %>% names
counts_list[[1]] %>% head


## 5. Post-processing: Stratify and Standardize Data

After calculating the counts, we separate them based on the interface type ('hub positive' vs. 'hub negative'). We then use the `standardize_matrix_columns` utility function to ensure that all count matrices have the exact same set of cell type columns, which is essential for the downstream meta-analysis.

In [None]:
# Load mapping file for MMR status
mmr_map = readr::read_rds('../Tessera tiles/Tessera processed results/tile_metadata_2025-07-22.rds')  %>%
    select(c('PatientID', 'SampleID', 'MMRstatus')) %>%
    distinct()
mmr_map

In [None]:
names(counts_list)
counts_list[[1]]

In [None]:
mmr_map

In [None]:
names(counts_list)

In [None]:
# counts_list2 = vector(mode = 'list', length = length(unique(mmr_map$PatientID))) %>% 
#     lapply(., function(x)return(matrix(0, nrow = nrow(counts_list[[1]]), ncol = ncol(counts_list[[1]]))))
           
# names(counts_list2) = unique(mmr_map$PatientID)

# counts_list2 = lapply(names(counts_list2), 
       
#        function(x){
#            x %>% print
#             samples = mmr_map$SampleID[mmr_map$PatientID == x]
#             for (i in 1:length(samples)){
#                 temp = counts_list2[[x]] + counts_list[[samples[i]]]
#             }
#            return(temp)
#         })

In [None]:
#' @title Estimate Beta Prior Parameters from Data (Robustly)
#' @description Implements the method of moments to estimate the `alpha` and
#'   `beta` parameters of a Beta distribution that best fits the observed
#'   distribution of proportions. This version includes checks for edge cases
#'   like small sample sizes or zero-count bins to prevent errors.
#' @return A list containing `alpha` and `beta`. Returns `alpha=0`, `beta=0` if a
#'   prior cannot be estimated, which defaults the analysis to standard MLE.
estimate_beta_prior <- function(k, n) {
    # Handle cases with insufficient data to estimate a prior
    if (length(k) <= 1) return(list(alpha = 0, beta = 0))

    # Filter out bins with zero cells to avoid division-by-zero errors
    valid_bins <- n > 0
    if (sum(valid_bins) <= 1) return(list(alpha = 0, beta = 0))
    k_valid <- k[valid_bins]
    n_valid <- n[valid_bins]
    
    # Method of Moments calculation
    p_hat <- k_valid / n_valid
    mean_p <- mean(p_hat)
    var_p <- var(p_hat)
    mean_n <- mean(n_valid)
    var_true <- var_p - mean_p * (1 - mean_p) / mean_n
    
    # Handle numerical artifacts where estimated variance is not positive
    if (is.na(var_true) || var_true <= 0) return(list(alpha = 0, beta = 0))
    
    # Solve for the nu parameter
    nu <- mean_p * (1 - mean_p) / var_true - 1
    
    # ROBUSTNESS FIX: If nu is negative, the estimate is unstable and can lead
    # to negative variance. Fall back to the non-informative prior (MLE).
    if (nu <= 0) {
        return(list(alpha = 0, beta = 0))
    }
    
    # Solve for alpha and beta
    list(alpha = mean_p * nu, beta = (1 - mean_p) * nu)
}


#' @title Calculate Empirical Bayes Summaries for Count Data
#' @description Uses an empirical Bayes approach to "shrink" noisy estimates from
#'   bins with little data towards a more stable global average.
#' @return A data.table with detailed statistics for each bin.
empirical_bayes_summary <- function(k, n, bin_lvls, model = "binomial") {
    model <- match.arg(model, c("mle", "binomial", "poisson"))
    if (length(k) != length(n)) stop("Input vectors 'k' and 'n' must have the same length.")
    
    prior <- estimate_beta_prior(k, n)
    est <- (k + prior$alpha) / (n + prior$alpha + prior$beta)
    var <- ((k + prior$alpha) * (n - k + prior$beta)) /
           ((n + prior$alpha + prior$beta)^2 * (n + prior$alpha + prior$beta + 1))
    
    df = data.table(
        dist_bin = factor(bin_lvls, levels = bin_lvls),
        model = model, count = k, size = n, estimate = est, variance = var,
        alpha = prior$alpha, beta = prior$beta
    )

    df[, p := exp(pnorm(estimate / sqrt(variance), lower.tail = FALSE, log.p = TRUE))]
    df[, padj := p.adjust(p)]
    df[, asterisk := ifelse(padj < 0.01, "*", "")]
    
    return(df)
}

#' @title Perform Meta-Analysis for a Given Cell Type
#' @description Orchestrates the analysis across multiple samples for a single cell type.
get_stats = function(counts_list, .types) {
    df_list = imap(counts_list, function(counts, .id) {
        empirical_bayes_summary(
            rowSums(counts[, .types, drop = FALSE]),
            rowSums(counts),
            rownames(counts),
            'binomial'
        )
    })
    
    df = bind_rows(df_list, .id = 'SampleID')[
        , .(SampleID, dist_bin, estimate, variance)
    ][
        , meta_ashr(estimate, variance), dist_bin
    ]
    
    df[, p := exp(pnorm(estimate / sqrt(variance), lower.tail = FALSE, log.p = TRUE))]
    df[, padj := p.adjust(p)]
    df[, asterisk := case_when(is.na(padj) ~ '', padj < 0.01 ~ "*", TRUE ~ '')]
    
    return(df[])
}

#' @title Perform Meta-Analysis using Adaptive Shrinkage
#' @description Uses `ashr` to combine effect estimates, weighting by precision.
meta_ashr <- function(p_vec, var_vec) {
    ash_fit = ashr::ash(betahat = p_vec, sebetahat = sqrt(var_vec), method = "fdr", mixcompdist = 'normal')
    w = prop.table(1 / (ash_fit$result$PosteriorSD^2 + 1e-8))
    data.table(
        estimate = sum(w * ash_fit$result$PosteriorMean),
        variance = sum(w * ash_fit$result$PosteriorSD^2)
    )
}
interface_plot = function(counts, .types, est_model=c('binomial', 'poisson', 'mle')) {
    est_model <- match.arg(est_model)
    df = empirical_bayes_summary(
        rowSums(counts[, .types, drop = FALSE]),
        rowSums(counts),
        rownames(counts),
        est_model
    ) 

    ## get max y value for plotting 
    ymax = 100 * max(df$estimate + 1.96 * sqrt(df$variance))
    
    p1 = ggplot(df, aes(dist_bin, 100 * estimate)) + 
        geom_vline(xintercept = c(20.5), size = 2, linetype = 1, color = 'grey') + 
        geom_point(aes(size = size)) + 
        geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0) + 
        geom_hline(yintercept = 0) + 
        geom_line(data = . %>% dplyr::mutate(dist_bin = as.numeric(dist_bin))) + 
        theme_bw(base_size = 16) + 
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
        labs(y = '% of TNKILC cells', x = 'Distance Window', size = '# Cells', subtitle = 'mean & 95% CI, *padj<0.01', title = paste(.types, collapse = '; ')) + 
        geom_text(aes(y = 100 * (estimate + 1.96 * sqrt(variance)), label = asterisk), size = 6, vjust = 0) + 
        annotate("text", x = 0.5, y = ymax + .05, label = 'Stromal Side', hjust = 0, size = 6) + 
        annotate("text", x = 40.5, y = ymax + .05, label = 'Epithelial Side', hjust = 1, size = 6) + 
        NULL
    return(p1)
}

# Per patient plots

In [None]:
lineage_list[['TNKILC']]

In [None]:
names(counts_list)

In [None]:
mmr_map %>% arrange(desc(MMRstatus))

In [None]:
mmr_map[MMRstatus == 'MMRd']$PatientID
names(counts_list)

In [None]:
require(patchwork)
#pdf('figs/per_patient_plots_TNKILC_prop.pdf', height = 18, width = 32)
for (state in lineage_list[['TNKILC']]){
    .types = grep(state, colnames(counts_list$C110), value = TRUE)
    print(.types)
    p1 = imap(counts_list[mmr_map[MMRstatus == 'MMRd']$SampleID], function(counts, .id) {    
        interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (MMRd)'))    
    }) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '))
    
    p2 = imap(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], function(counts, .id) {    
        interface_plot(counts, .types, 'binomial') + labs(title = glue('{.id} (MMRp)'))    
    }) %>% wrap_plots() + plot_annotation(title = paste0(.types, collapse =  ', '))
    
    print(p1)
    print(p2)
}
#dev.off()

## 6. Global Analysis Across All Cell Types

Now we run the main analysis function, `run_global_hub_analysis`. This function iterates through every cell type, performs the meta-analysis comparing hub-positive and hub-negative interfaces, calculates statistics, and returns a set of clean data frames ready for plotting.

In [None]:
#' @title Run a Global Analysis Comparing MMR Status
#' @description This master function automates the entire statistical comparison.
run_global_mmr_analysis <- function(types_list, counts_list, mmr_map) {
  
  # Dynamically calculate sample sizes to make the analysis robust
  n_msi <- length(unique(mmr_map[MMRstatus == 'MMRd']$SampleID))
  n_mss <- length(unique(mmr_map[MMRstatus == 'MMRp']$SampleID))
  
  # Iterate over the simplified cell type names (`type_lvl3`)
  results_by_type <- purrr::imap(types_list, function(.x, .y) {
    .types <- .y # Use the name of the list element (the correct type) for subsetting
    
    # Run meta-analysis for each group
    df_MSI <- get_stats(counts_list[mmr_map[MMRstatus == 'MMRd']$SampleID], .types)
    df_MSS <- get_stats(counts_list[mmr_map[MMRstatus == 'MMRp']$SampleID], .types)
    
    # Combine results for direct comparison
    df <- bind_rows(list(MSI = df_MSI, MSS = df_MSS), .id = 'Status')
    
    # Reshape and run Welch's t-test on the meta-analyzed estimates
    df_stat <- dcast(df, dist_bin ~ Status, value.var = c('estimate', 'variance'))[
      , c('p', 'log2_fold_change') := t_test_and_lfc(estimate_MSI, variance_MSI, n_msi, estimate_MSS, variance_MSS, n_mss), dist_bin
    ]
    
    return(list(MSI_data = df_MSI, MSS_data = df_MSS, stats_data = df_stat))
  })
  
  # Restructure the list of lists into a more usable format
  transposed_results <- purrr::transpose(results_by_type)
  all_MSI_df <- dplyr::bind_rows(transposed_results$MSI_data, .id = "cell_type")
  all_MSS_df <- dplyr::bind_rows(transposed_results$MSS_data, .id = "cell_type")
  summary_stats <- dplyr::bind_rows(transposed_results$stats_data, .id = "cell_type")
  
  # Perform global FDR correction across all p-values from all tests
  summary_stats[, padj_global := p.adjust(p, method = 'fdr')]
  summary_stats[, asterisk := fifelse(padj_global < 0.01, "*", "")]
  summary_stats[, height := max(estimate_MSI + 1.96 * sqrt(variance_MSI), estimate_MSS + 1.96 * sqrt(variance_MSS)), by = .(cell_type, dist_bin)]
  
  # Return the final, tidy list of results
  return(list(summary_stats = summary_stats, MSI_results = all_MSI_df, MSS_results = all_MSS_df))
}

In [None]:
# Create a list of cell types to iterate over
cellTypes = cells[type_lvl1 == 'TNKILC'] %>% 
    select(type_lvl2, type_lvl3) %>% 
    distinct
type_list <- lapply(split(cellTypes$type_lvl2, cellTypes$type_lvl3), unique)

# Run the full analysis
final_results <- run_global_mmr_analysis(type_list, counts_list, mmr_map)

# Display the glimpse of the main summary table
glimpse(final_results$summary_stats)

## 7. Visualization

In this final section, we generate plots to visualize the results. We create faceted plots that group cell types by their major lineage (e.g., T-cells, Myeloid cells) to compare their distribution profiles between hub-positive and hub-negative interfaces.

In [None]:
# Prepare a combined data frame for plotting
df_plot <- bind_rows(list(MSI = final_results$MSI_results, MSS = final_results$MSS_results), .id = 'Status')

df_plot = df_plot %>%
    group_by(cell_type) %>%
    mutate(ymax = 100 * max(estimate + 1.96 * sqrt(variance))) %>%
    ungroup
head(df_plot)

# Create a list for grouping cell types by lineage for faceted plots
lineage_list <- cells %>% select(type_lvl1, type_lvl3) %>% distinct %>% {split(.$type_lvl3, .$type_lvl1)}

head(final_results$summary_stats)


In [None]:
df_plot %>% 
    group_by(Status, dist_bin) %>%
    summarize(n = sum(100*estimate)) %>%
    pivot_wider(names_from = Status, values_from = n)

### Plot: T-cell, NK, and ILC Lineages

In [None]:
fig.size(h = 9, w = 16)
options(repr.plot.res = 300)
lineage = 'TNKILC'
tnkilc_order = lineage_list[['TNKILC']] #c("Tcd8-CXCL13", "Tcd8-HOBIT", "Tcd8-gdlike", "Tcd8-gdlike-PD1", "Tcd8-GZMK", "Tplzf-gdlike", "Tcd4-CXCL13", "Tcd4-TFH", "Tcd4-Treg", "Tcd4-IL7R", "NK-CD16", "NK-XCL1", "ILC3")

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df_plot %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df_plot %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>% 
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')


fig.size(4,4)
lapply(tnkilc_order, function(mytype){

# Prepare the data for the geom_text layers beforehand for clarity
text_data_asterisk <- final_results$summary_stats %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order))

text_data_epi <- df %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 32, label = 'Epi')

text_data_stroma <- df_plot %>% 
    filter(cell_type == mytype) %>%
    mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>% 
    select(cell_type, ymax) %>% 
    distinct() %>% 
    mutate(ymax = ymax + ymax/3, x = 8, label = '    Stroma')
    
# Create the plot
p1 = df_plot %>% 
           filter(cell_type == mytype) %>%
           mutate(cell_type = factor(cell_type, ordered = TRUE, levels = tnkilc_order)) %>%
ggplot(
    data = ., 
    aes(dist_bin, 100 * estimate, color = Status)
) + 
    geom_vline(xintercept = c(20.5), size = 1, linetype = 1, color = 'red') + 
    geom_point() + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), width = 0, show.legend = FALSE) + 
    geom_hline(yintercept = 0) + 
    geom_line(data = . %>% dplyr::mutate(dist_bin = (dist_bin)), show.legend = FALSE, aes(group = Status)) + 
    cowplot::theme_half_open(10) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(y = 'Percent of all cells', x = 'Distance Window', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01', title = lineage) + 
    geom_text(
        data = text_data_asterisk,
        aes(y = 100 * height, label = asterisk), 
        size = 6, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    geom_text(
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 3
    ) +
    # scale_color_manual(
    #     name = 'Interface Type: ', 
    #     values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
    #     labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    # ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 4), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'right', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    ggtitle(mytype) +
    NULL
print(p1)
ggsave(p1, filename = glue::glue('figs/cellstates/TNKILC/As_proportion_of_TNKILCs_by_Status/', mytype, '.pdf'), height = 4, width = 4, create.dir = TRUE)})

## Supplementary Figure 7

In [None]:
cell_type_counts = rbindlist(lapply(counts_list, FUN = function(x){dist_bin = rownames(x); 
                                                                   x = cbind(dist_bin, x)
                                                                   as.data.frame(x)}), 
                             idcol = 'SampleID') %>%
    pivot_longer(cols = lineage_list[['TNKILC']]) %>%
    left_join(., mmr_map) %>%
    rename(MSIstatus = MMRstatus, cell_type = name, counts = value) %>%
    mutate(MSIstatus = as.factor(MSIstatus)) %>%
    mutate(MSIstatus = fct_recode(MSIstatus, 'MSS' = 'MMRp', 'MSI' = 'MMRd')) %>%
    filter(MSIstatus == 'MSI') %>%
    mutate(counts = as.integer(counts)) %>%
    group_by(dist_bin, cell_type) %>%
    summarize(counts = sum(counts)) %>%
    ungroup
head(cell_type_counts)

In [None]:
head(df_plot %>% left_join(., cell_type_counts))

In [None]:
fig.size(h = 9, w = 6.5)
# Create the plot
manual_breaks <- c(
    "[-100,-95]", "(-75,-70]", "(-50,-45]", "(-25,-20]",
    "(0,5]", "(25,30]", "(50,55]", "(75,80]", "(95,100]"
)

supp_fig_7 = df_plot %>% 
    # filter(Status == 'MSS') %>%
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    # CHANGE: Set levels to sort(unique(cell_type)) for alphabetical sorting
    mutate(cell_type = factor(cell_type, levels = sort(unique(cell_type)))) %>%
    ggplot(
        data = ., 
        aes(dist_bin, 100 * estimate, color = Status, fill = Status)
    ) + 
    geom_vline(xintercept = c(20.5), color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    geom_point(shape = '.') + 
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of all cells', x = 'Distance from the interface (\U03BCm)', size = '# Cells', subtitle = 'IVW meta-analysis; mean & 95% CI, *padj<0.01') +      
    geom_text(inherit.aes = FALSE,
        data = text_data_asterisk,
        aes(dist_bin, y = 100 * height, label = asterisk), 
        size = 1.5, vjust = .2, show.legend = FALSE, color = 'black'
    ) + 
    geom_text(inherit.aes = FALSE,
        data = text_data_epi, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_text(inherit.aes = FALSE,
        data = text_data_stroma, 
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'free_y', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    # scale_color_manual(
    #     name = 'Interface Type: ', 
    #     values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
    #     labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    # ) + 
    # scale_fill_manual(
    #     name = 'Interface Type: ', 
    #     values = c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
    #     labels = c('hubPos' = 'Hub-inside MMRd', 'hubNeg' = 'Hub-outside MMRd', 'MSS' = 'MMRp')
    # ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    # facet_wrap(~cell_type, scales = 'free') + # REMOVED: This was overwriting facet_wrap2
    theme(
        #aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 7, angle = 90), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    NULL
fig.size(h = 8, w = 6.5)
supp_fig_7

In [None]:
df_plot %>% 
    mutate(midpoint = unlist(lapply(.$dist_bin, find_midpoint))) %>%
    left_join(., cell_type_counts) %>%
    filter(Status == 'MSI') %>%
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    # CHANGE: Set levels to sort(unique(cell_type)) for alphabetical sorting
    mutate(cell_type = factor(cell_type, levels = sort(unique(cell_type)))) %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    pull(dist_bin) %>%
    levels

In [None]:
supp_fig_7 = df_plot %>% 
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = gsub(cell_type, pattern = 'Tcd8-gdlike-PD1', replacement = 'Tcd8-CXCL13-LAG3')) %>%
    mutate(midpoint = unlist(lapply(.$dist_bin, find_midpoint))) %>%
    left_join(., cell_type_counts) %>%
    filter(Status == 'MSI') %>%
    # CHANGE: Set levels to sort(unique(cell_type)) for alphabetical sorting
    mutate(cell_type = factor(cell_type, levels = rev(sort(unique(cell_type))))) %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    ggplot(
        data = ., 
        aes(dist_bin, 100 * estimate, color = Status, fill = Status)
    ) + 
    geom_vline(xintercept = c(20.5), color = 'red', linetype = "dashed", size = 0.5) +
    geom_line(aes(group = Status), alpha = 1, key_glyph = 'point', linewidth = 0.25) +
    #geom_point(aes(size = log10(counts))) + 
    geom_point(shape = '.') +
    geom_text(inherit.aes = FALSE,
        data = rbind(text_data_epi %>% mutate(label = 'Epi\ntiles'), text_data_stroma %>% mutate(label = '    Stromal\ntiles')) %>% 
              mutate(ymax = 35) %>%
    mutate(cell_type = gsub(cell_type, pattern = 'Tcd8-gdlike-PD1', replacement = 'Tcd8-CXCL13-LAG3')) %>%
    mutate(cell_type = factor(cell_type, levels = rev(sort(unique(cell_type)))))      
              ,
        aes(label = label, y = ymax, x = x), 
        color = 'black', size = 2
    ) +
    geom_errorbar(aes(ymin = 100 * (estimate - 1.96 * sqrt(variance)), ymax = 100 * (estimate + 1.96 * sqrt(variance))), show.legend = FALSE, alpha = 0.5, linewidth = 0.25) + 
    labs(y = 'Percent of TNKILC cells',
         x = 'Distance from the interface (\U03BCm)',
         title = 'TNKILC states as proportion of all TNKILCs', 
         subtitle = 'IVW meta-analysis; mean & 95% CI') +      
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    facet_wrap2(~cell_type, scales = 'fixed', axes = "all", remove_labels = "x") +
    cowplot::theme_half_open(7) + 
    scale_color_manual(
        name = 'Interface Type: ', 
        values = c('MSI' = 'darkred'), #c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('MSI' = 'MMRd')
    ) + 
    scale_fill_manual(
        name = 'Interface Type: ', 
        values = c('MSI' = 'darkred'), #c('hubPos' = '#D55E00', 'hubNeg' = '#009E73', 'MSS' = 'grey'), 
        labels = c('MSI' = 'MMRd')
    ) + 
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        #aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 7, angle = 90), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 8, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 10)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    NULL
fig.size(h = 8, w = 6.5)
supp_fig_7
supp_fig_7 %>% ggsave(filename = 'figs/supplementary_figure_7_TNKILC_as_lineage_proportion.pdf', height = 8, width = 6.5)

In [None]:
supp_fig_7 = df_plot %>% 
    mutate(midpoint = unlist(lapply(.$dist_bin, find_midpoint))) %>%
    left_join(., cell_type_counts) %>%
    filter(Status == 'MSI') %>%
    filter(cell_type %in% lineage_list[[lineage]]) %>%
    mutate(cell_type = factor(cell_type, levels = sort(unique(cell_type)))) %>%
    mutate(dist_bin = factor(dist_bin)) %>%
    mutate(dist_bin = fct_reorder(dist_bin, midpoint)) %>%
    ggplot(
        data = ., 
        aes(dist_bin, 100 * estimate, fill = cell_type), color = 'black'
    ) + 
    geom_vline(xintercept = c(20.5), color = 'red', linetype = "dashed", size = 0.5) +
    geom_col(position = 'fill', color = 'black') + 
    labs(y = 'Percent of TNKILC cells', x = 'Distance from the interface (\U03BCm)') +      
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    cowplot::theme_half_open(7) +
    guides(color = guide_legend(override.aes = list(size = 2, shape = 16))) + 
    theme(
        #aspect.ratio = 0.5, 
        axis.text.x = element_text(size = 7, angle = 90), 
        strip.background = element_rect(fill = NA), 
        strip.text = element_text(size = 10, face = 'bold', color = 'black'), 
        title = element_text(size = 10), 
        legend.position = 'top', 
        legend.text = element_text(size = 8)
    ) +
    guides(fill = guide_legend(override.aes = list(nrow = 1, shape = 16))) +
    scale_x_discrete(breaks = manual_breaks) +
    ggthemes::scale_fill_tableau('Tableau 20', name = '') + # TNKILC\nstate 
    NULL
fig.size(h = 5, w = 5)
supp_fig_7

In [None]:
# df %>% 
# rename(Interface = Status) %>%
# mutate(Interface = Interface %>% as.factor %>% fct_recode('Hub-inside' = 'hubPos', 'Hub-outside' = 'hubNeg')) %>%
# mutate(estimate = 100*estimate) %>%
# select(!c(asterisk, ymax)) %>%
# mutate(`Lower Confidence Limit` = estimate - 1.96 * sqrt(variance)) %>%
# mutate(`Upper Confidence Limit` = estimate + 1.96 * sqrt(variance)) %>%
# rename(`Adjusted p-value` = padj) %>%
# rename(`Spatial bin around the interface` = dist_bin) %>%
# mutate(cell_type = gsub(cell_type, pattern = 'Tcd8-gdlike-PD1', replacement = 'Tcd8-CXCL13-LAG3')) %>%
# write.csv('figs/cellstates/TNKILC/As_proportion_of_TNKILCs/table_of_TNKILCs_as_prop_of_TNKILC_compartment.csv')

In [None]:
sessionInfo()