# COSMX FOV REPORT FOR SLIDE {{ COSMX_SLIDE_NAME }}
* **Notebook version:** v0.0.1
* **Created by:** NIHR Imperial BRC Genomics Facility
* **Maintained by:** NIHR Imperial BRC Genomics Facility
* **Docker image path:** [Dockerfile](https://github.com/imperial-genomics-facility/igf-dockerfiles/tree/main/cosmx/Dockerfile_v1)
* **Notebook code path:** [Templates](https://github.com/imperial-genomics-facility/igf-dockerfiles/tree/main/cosmx/)
* **Created on:** {{ DATE_TAG }}
* **Contact us:** [NIHR Imperial BRC Genomics Facility - Contact us](https://www.imperial.ac.uk/medicine/research-and-impact/facilities/genomics-facility/contact-us/)
* **License:** Apache [License 2.0](https://github.com/imperial-genomics-facility/igf-dockerfiles/blob/main/LICENSE)


* **Project name:** {{ COSMX_PROJECT_NAME }}
* **Panel type:** {{ PANEL_NAME }}
* **COSMX slide name:** {{ COSMX_SLIDE_NAME }}

## Intro
Field of View (FOV) quality control is crucial in CosMx spatial transcriptomics experiments because technical effects can significantly compromise data integrity and lead to misleading biological interpretations. While most FOVs typically perform comparably, some may experience reduced overall gene expression or selective signal loss from specific genes, potentially causing cells within an affected FOV to artificially cluster as the same cell type due to technical artifacts rather than true biological similarity. These quality issues can stem from various factors including poor tissue or section quality, high autofluorescence, and inadequate fiducials, making early detection and exclusion of problematic FOVs essential for reliable downstream analyses.

The FOV QC process involves a systematic two-step approach implemented through specialized R functions. 
* First, a permissive signal strength assessment identifies FOVs with greater than 60% signal loss across most of their spatial span by comparing total counts in grid squares to similar regions in other FOVs.
* Second, a more sophisticated bias detection method examines gene expression profiles at the reporter level rather than the gene level, since FOV quality issues are often linked to specific fluorescent reporters.


## Source
Bruker blog post: [FOV QC from single-cell gene expression in spatial dataset](https://nanostring-biostats.github.io/CosMx-Analysis-Scratch-Space/posts/fov-qc/)

## Technical notes from the blog post
```
Technical details:

We place a 7x7 grid across each FOV. For each grid square, we find the 10 most similar squares in other FOVs, with “similar” being based on the square’s expression profile (we also only accept one neighbor per other FOV).

Then we score FOVs for signal loss. For each square, we compare its total counts to its comparator squares. For each reporter bit, this gives us 49 contrasts. If most (75%) of an FOV’s squares have low total counts compared to comparators, we flag the FOV.

To score FOVs for bias, we use a similar approach. For each reporter, we take the genes using the bit, and we contrast their expression in the square vs. in the average of the 10 most similar squares elsewhere. When an FOV’s grid squares consistently underexpress the relevant gene set, we flag the FOV.
```

In [None]:
INPUT_PATH <- "{{ COSMX_FLATFILE_PATH }}"

slidenames <- c("{{ COSMX_SLIDE_NAME }}")

In [None]:
## source the necessary functions:
source("https://raw.githubusercontent.com/Nanostring-Biostats/CosMx-Analysis-Scratch-Space/Main/_code/FOV%20QC/FOV%20QC%20utils.R")
## load barcodes:
allbarcodes <- readRDS(url("https://github.com/Nanostring-Biostats/CosMx-Analysis-Scratch-Space/raw/Main/_code/FOV%20QC/barcodes_by_panel.RDS"))

In [None]:
barcode_hash <- list(
    "(1.0) (1.0) Human RNA Immuno-oncology" = "Hs_IO",
    "(1.0) (1.0) Human RNA Universal Cell Characterization" = "Hs_UCC",
    "(1.1) (1.1) Human RNA 6k Discovery" = "Hs_6k",
    "(1.0) (1.0) Mouse RNA Neuroscience" = "Mm_Neuro",
    "(1.0) (1.0) Mouse RNA Universal Cell Characterization" = "Mm_UCC",
    "(1.0) (1.0) Human RNA Whole Transcriptome" = "Hs_WTX"
)

In [None]:
panel_name <- "{{ PANEL_NAME }}"

In [None]:
panel_short_code <- barcode_hash[[panel_name]]
if (is.null(panel_short_code)) {
    stop(paste("Unknown slide panel found", ": ",panel_name))
}

In [None]:
barcodemap <- allbarcodes[[panel_short_code]]

In [None]:
# necessary libraries
library(data.table) # for more memory efficient data frames
#> Warning: package 'data.table' was built under R version 4.4.0
library(Matrix) # for sparse matrices like our counts matrix
library(viridis)
library(uwot)
#> Loading required package: Matrix
library(irlba)  
library(viridis)

### lists to collect the counts matrices and metadata, one per slide
countlist <- vector(mode='list', length=length(slidenames)) 
metadatalist <- vector(mode='list', length=length(slidenames)) 

for(i in 1:length(slidenames)){
   
  slidename <- slidenames[i] 

  msg <- paste0("Loading slide ", slidename, ", ", i, "/", length(slidenames), ".")
  #message(msg)    
  # slide-specific files:
  thisslidesfiles <- dir(paste0(INPUT_PATH, "/", slidename))
  
  # load in metadata:
  thisslidesmetadata <- thisslidesfiles[grepl("metadata\\_file", thisslidesfiles)]
  tempdatatable <- data.table::fread(paste0(INPUT_PATH, "/", slidename, "/", thisslidesmetadata))
  
  # numeric slide ID 
  slide_ID_numeric <- tempdatatable[1,]$slide_ID 
    
  # load in counts as a data table:
  thisslidescounts <- thisslidesfiles[grepl("exprMat\\_file", thisslidesfiles)]
  
  countsfile <- paste0(INPUT_PATH, "/", slidename, "/", thisslidescounts)
  nonzero_elements_perchunk <- 5*10**7
  ### Safely read in the dense (0-filled ) counts matrices in chunks.
  ### Note: the file is gzip compressed, so we don't know a priori the number of chunks needed.
  lastchunk <- FALSE 
  skiprows <- 0
  chunkid <- 1
  
  required_cols <- fread(countsfile, select=c("fov", "cell_ID"))
  stopifnot("columns 'fov' and 'cell_ID' are required, but not found in the counts file" = 
              all(c("cell_ID", "fov") %in% colnames(required_cols)))
  number_of_cells <- nrow(required_cols)
  
  number_of_cols <-  ncol(fread(countsfile, nrows = 2))
  number_of_chunks <- ceiling(as.numeric(number_of_cols) * number_of_cells / (nonzero_elements_perchunk))
  chunk_size <- floor(number_of_cells / number_of_chunks)
  sub_counts_matrix <- vector(mode='list', length=number_of_chunks)
   
  #pb <- txtProgressBar(min = 0, max = number_of_chunks, initial = 0, char = "=",
  #                     width = NA, title, label, style = 3, file = "")
  cellcount <- 0
  while(lastchunk==FALSE){
    read_header <- FALSE
    if(chunkid==1){
      read_header <- TRUE
    }
    
    countsdatatable <- data.table::fread(countsfile,
                                         nrows=chunk_size,
                                         skip=skiprows + (chunkid > 1),
                                         header=read_header)
    if(chunkid == 1){
      header <- colnames(countsdatatable)
    } else {
      colnames(countsdatatable) <- header
    }
     
    cellcount <- nrow(countsdatatable) + cellcount     
    if(cellcount == number_of_cells) lastchunk <- TRUE
    skiprows <- skiprows + chunk_size
    slide_fov_cell_counts <- paste0("c_",slide_ID_numeric, "_", countsdatatable$fov, "_", countsdatatable$cell_ID)
    sub_counts_matrix[[chunkid]] <- as(countsdatatable[,-c("fov", "cell_ID"),with=FALSE], "sparseMatrix") 
    rownames(sub_counts_matrix[[chunkid]]) <- slide_fov_cell_counts 
    #setTxtProgressBar(pb, chunkid)
    chunkid <- chunkid + 1
  }
  
  #close(pb)   
  
  countlist[[i]] <- do.call(rbind, sub_counts_matrix) 
  
  # ensure that cell-order in counts matches cell-order in metadata   
  slide_fov_cell_metadata <- paste0("c_",slide_ID_numeric, "_", tempdatatable$fov, "_", tempdatatable$cell_ID)
  countlist[[i]] <- countlist[[i]][match(slide_fov_cell_metadata, rownames(countlist[[i]])),] 
  metadatalist[[i]] <- tempdatatable 
  
  # track common genes and common metadata columns across slides
  if(i==1){
    sharedgenes <- colnames(countlist[[i]]) 
    sharedcolumns <- colnames(tempdatatable)
  }  else {
    sharedgenes <- intersect(sharedgenes, colnames(countlist[[i]]))
    sharedcolumns <- intersect(sharedcolumns, colnames(tempdatatable))
  }
    
}
# reduce to shared metadata columns and shared genes
for(i in 1:length(slidenames)){
  metadatalist[[i]] <- metadatalist[[i]][, ..sharedcolumns]
  countlist[[i]] <- countlist[[i]][, sharedgenes]
}
counts <- do.call(rbind, countlist)
metadata <- rbindlist(metadatalist)

# add to metadata: add a global non-slide-specific FOV ID:
## uncomment this line for merging multi slide dataset
#metadata$FOV <- paste0("s", metadata$slide_ID, "f", metadata$fov)
## comment this line for merging multi slide dataset
metadata$FOV <- paste0("FOV_", metadata$fov)

# remove cell_ID metadata column, which only identifies cell within slides, not across slides:
metadata$cell_ID <- NULL

# isolate negative control matrices:
negcounts <- counts[, grepl("Negative", colnames(counts))]
falsecounts <- counts[, grepl("SystemControl", colnames(counts))]

# reduce counts matrix to only genes:
counts <- counts[, !grepl("Negative", colnames(counts)) & !grepl("SystemControl", colnames(counts))]
atomxdata <- list(counts = counts,
                  metadata = metadata,
                  negcounts = negcounts,
                  falsecounts = falsecounts)
# load cell metadata, output by the first vignette script, 0.-loading-flat-files.Rmd
metadata <- atomxdata$metadata
xy <- as.matrix(metadata[, c("CenterX_global_px", "CenterY_global_px")])
rownames(xy) <- metadata$cell_id
# rescale to mm:
thisinstrument_nanometers_per_pixel = 120.280945   
# The above value holds for most instruments. Your RunSummary file specifies the value for your instrument. 
xy <- xy * thisinstrument_nanometers_per_pixel / 1000000
colnames(xy) <- paste0(c("x", "y"), "_mm")

counts <- atomxdata$counts
negcounts <- atomxdata$negcounts
falsecounts <- atomxdata$falsecounts
# require 20 counts per cell 
count_threshold <- 20
flag <- metadata$nCount_RNA < count_threshold

This table shows the count of each unique value in the flag table.

In [None]:
table(flag)

### Cell area distribution
Following histogram plot shows distribution of cell areas in this dataset.

* Peak of the distribution occurs shows the most common cell area.
* The spread shows the minimum and maximum cell sizes detected.
* Normal distribution suggests consistent cell segmentation.
* Right-skewed distribution (long tail toward larger areas) is common in biological data.
* Multiple peaks might indicate different cell types or segmentation artifacts.
* Very large or very small areas that might represent:
  * Segmentation errors (cells that are too small or artificially merged)
  * Debris or artifacts
  * Genuinely different cell types

This histogram is often used to identify thresholds for filtering out poorly segmented cells. For example, you might exclude cells that are extremely small (likely debris) or extremely large (likely merged cells or artifacts). The distribution helps you choose reasonable cutoff values for downstream analysis.

In [None]:
## set plot size
options(repr.plot.width=15, repr.plot.height=10)
# what's the distribution of areas?
hist(metadata$Area, breaks = 100, xlab = "Cell Area", main = "")
# based on the above, set a threshold:
area_threshold <- 30000
abline(v = area_threshold, col = "red")

### Number of flagged cells based on area

In [None]:
# flag cells based on area:
flag <- flag | (metadata$Area > area_threshold)
table(flag)

### List of QC failed FOVs

In [None]:
## run the method:
fovqcresult <- runFOVQC(counts = counts, xy = xy, fov = metadata$FOV, barcodemap = barcodemap, max_prop_loss = 0.6)


### Flagged FOVs

This visualization creates a spatial plot showing which FOVs (Fields of View) have been flagged as problematic during the quality control process. It helps identify whether problematic FOVs are clustered together (suggesting localized technical issues like poor tissue preparation) or randomly distributed. Colors passing FOVs in <span style="color:steelblue">blue</span> and failing FOVs are in <span style="color:red">red</span>.

In [None]:
## set plot size
options(repr.plot.width=18, repr.plot.height=12)
# map of flagged FOVs:
mapFlaggedFOVs(fovqcresult, shownames = TRUE)

#### List FOVs flagged for any reason, for loss of signal, for bias

In [None]:
# list FOVs flagged for any reason, for loss of signal, for bias:
fovqcresult$flaggedfovs

#### List FOVs flagged for loss of signal

In [None]:
fovqcresult$flaggedfovs_fortotalcounts

#### List FOVs flagged for bias
This identifies FOVs where certain barcode bits (fluorescent reporters) are systematically under-expressed compared to biologically similar regions in other FOVs.

* The algorithm examines each of the 4 colors (B, G, Y, R) within each reporter cycle.
* For each FOV's 7×7 grid squares, it compares expression of genes using specific barcode bits to the 10 most similar grid squares from other FOVs
* An FOV gets flagged for bias if:
  * At least 2 out of 4 colors from the same reporter cycle show significant bias (≥50% of that reporter cycle's bits are flagged)
  * The bias exceeds the max_prop_loss threshold (default 60% signal loss)
  * The effect is statistically significant (p < 0.01)
  * At least 50% of the FOV's grid squares agree on the direction of the bias

In [None]:
fovqcresult$flaggedfovs_forbias

### FOV signal loss pattern

Following spatial visualization shows signal loss patterns across FOVs by displaying how total gene expression counts in each region compare to similar regions elsewhere. It helps researchers identify FOVs where cells are systematically expressing fewer total transcripts than expected based on comparable spatial regions in other FOVs. The yellow borders make it immediately clear which FOVs have been flagged for having >60% signal loss. <span style="color:#FFD700">Yellow</span> rectangles with <b>thick borders</b> highlight FOVs that failed the total counts QC check. 

Each cell is plotted as a small dot colored according to its log2 fold-change in total counts compared to similar spatial regions:

* Dark blue: Severe signal loss (< -2 fold-change)
* Blue: Moderate signal loss (-1 fold-change)
* Grey: Normal signal (0 fold-change)
* Red: Higher signal (+1 fold-change)
* Dark red: Much higher signal (> +2 fold-change)

In [None]:
FOVSignalLossSpatialPlot(fovqcresult, shownames = TRUE) 

### Spatial plots of flagged reporter positions
This plot shows how specific barcode bits (fluorescent reporters) are performing across different spatial regions within FOVs, helping identify technical quality issues. It helps to validate FOV QC results by showing whether apparent expression differences are due to technical FOV effects or genuine biological spatial patterns.

Color Coding:

* Blue colors indicate underexpression (negative fold-change)
* Red colors indicate overexpression (positive fold-change)
* Gray indicates no change
* The scale typically ranges from -1 to +1 log2 fold-change

In [None]:
# spatial plots of flagged reporter positions:
par(mfrow = c(2,2))
FOVEffectsSpatialPlots(res = fovqcresult, outdir = NULL, bits = "flagged_reportercycles") 

## FOV bias
This heatmap that shows the bias in barcode bit expression across all FOVs and reporter bits.

* Rows: Each FOV in the experiment
* Columns: Each barcode bit (reporter cycle + color combination, e.g., "reportercycle1B", "reportercycle1G", etc.)
* Cell Values: Log2 fold-change showing how much each FOV under- or over-expresses genes using specific barcode bits compared to similar regions in other FOVs

Color Scheme:

* Dark blue: Severe underexpression (< -2 fold-change)
* Blue: Moderate underexpression
* White: Normal expression (0 fold-change)
* Red: Moderate overexpression
* Dark red: Severe overexpression (> +2 fold-change)

The heatmap makes it easy to spot whether failures are:

* Clustered within specific reporter cycles (technical issue)
* Scattered randomly (potentially biological variation)
* Affecting multiple FOVs in similar patterns

In [None]:
## set plot size
options(repr.plot.width=25, repr.plot.height=18)
FOVEffectsHeatmap(fovqcresult)

### List of affected genes
A list of all the unique gene names whose expression is compromised in one or more flagged FOVs.

In [None]:
# genes from all barcode bits that were flagged in any FOV:
print(unique(fovqcresult$flagged_fov_x_gene[,"gene"]))