# Overview

## Setup

### Installation
See `https://github.com/sagar87/spatial-data/tree/main`. It is highly recommended to perform all of this in an isolated environment, e. g. using conda or mamba.

### Basic Structure

Spatialdata has different modules. Each module contains functions for a specific purpose, such as preprocessing, plotting, etc. The modules are:

- `pp` (preprocessing): takes care of preprocessing such as normalizing channel intensities or transforming the imaging data into an expression matrix.
- `pl` (plotting): takes care of all things plotting, such as looking at the distribution of marker intensities or plotting cell types in the spatial context.
- `ext` (external): contains external tools which can perform different tasks, such as cell segmentation or cell type prediction.

The example below illustrates what a standard analysis workflow could look like.

## Tutorial

### Step 1: Loading the data

In [5]:
import pandas as pd
import numpy as np
from skimage.io import imread
import spatial_data
import matplotlib.pyplot as plt

In [9]:
# change this part according to the sample you want to look at
# TODO: this path needs to be changed
image_path = "/home/meyerben/codex/BNHL_TMA/cropped_automated_3k/166_3_H3_LK.tif"
image = imread(image_path)
image.shape

(56, 3000, 3000)

From the shape, we can see that there are 56 channels, and the image is 3000px by 3000px large. In order to assign the channels correctly, we also need to read in the file containing all of the markers in the correct order.

In [10]:
# TODO: change this path
marker_path = "/home/meyerben/codex/BNHL_TMA/MarkerList.txt"
markers = list(pd.read_csv(marker_path, header=None)[0])
markers[:5]

['DAPI', 'Helios', 'CD10', 'TCF7/TCF1', 'PD-L1']

# Cell Segmentation

The first step in analysing highly multiplexed fluorescence images typically consists of segmenting the cells. In this example, we will use StarDist to segment the cell nuclei based on the nuclear channel (DAPI).

In [11]:
# reading the image and the marker list into a spatialproteomics object
sdata = spatial_data.load_image_data(image, channel_coords=markers)
sdata

In [12]:
# performing segmentation
sdata = sdata.ext.stardist()
sdata

2024-04-10 13:36:39.929762: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Found model '2D_versatile_fluo' for 'StarDist2D'.


2024-04-10 13:37:00.785590: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Loading network weights from 'weights_best.h5'.
Loading thresholds from 'thresholds.json'.
Using default values: prob_thresh=0.479071, nms_thresh=0.3.


100%|██████████| 144/144 [06:28<00:00,  2.70s/it]
utils.py (494): The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
Found centroid-0 in _obs. Skipping.
Found centroid-1 in _obs. Skipping.


As you can see, the image is now segmented into cells and the cell masks are stored in the sdata object. 
To refine the segmentation masks, we perform filtering to remove cells that are too big or too small.
Furthermore, we expand the masks by two pixels in each direction, with the goal of capturing the cytosol and membrane of each cell better.
All of these actions are contained in the preprocessing (pp) module.

In [13]:
# TODO continue here once the growing is merged
# sdata = sdata.pp.add_observations('area').pp.filter_by_obs(col='area', func=lambda x: (x>75) & (x<300)).pp.grow_segmentation_masks()
sdata

Found _obs in image container. Concatenating.


AttributeError: 'PreprocessingAccessor' object has no attribute 'grow_segmentation_masks'

Next, we want to quantify the expression of the markers in the different cells. 
You could use different methods of aggregating values within the segmentation mask of each cell, 
but we will simply take the mean intensity for this example.


sdata.pp.