# Get intensity data frames from images and masks

In this script, raw (stitched) images and their respective segmentation masks are loaded. The script reads the images into the notebook and their dimensions are checked. 
If the dimensions are correct, the images are measured and their data is saved to data frames. The intensity values for each cell (min, mean, max) are saved, as well as cell area (in pixels) and their centroids. 

If the dimensions of the images and the mask are not identical, the cells cannot be measured. 
For the data in this project, this was the case a few times. However, there was only padding missing in the bottoms of some images. This means horizontally at the bottom of the images, rows of black pixels had to be added.
This script allows for that to happen, if necessary.
If the dimensions are not the same for any other reason, the padding code in this script is not appropriate for fixing the problem.

After creating the data frames, they are saved to the data folder.

### Usage
Please run this script cell-by-cell. No modifications should be required. However, please make sure that the directory structure is the same.

### Input
This script is initialised with the variable roi, which decides the name of the sample image to be measured.
In this script, the correct name of the image available as example data is already assigned to the roi variable.

This script requires a labelled segmentation mask as input, which should be saved in "00_Data/NucleiCytoMasks". It also takes stitched preprocessed multiplexed stained images as input from the folder "00_Data/PP_Images".
All images and masks should be the exact same shape.

The input files should be .tif files.

### Output
This script saves one data frame for each biomarker-stained input image to "00_Data/IntensityDataFrames/" as csv files.

### Load packages and define input

In [10]:
import numpy as np # v. 1.26.4
import skimage # v. 0.22.0
from skimage import io, morphology, measure  # v. 0.22.0
import os
import pandas as pd # v. 2.2.0
from skimage.measure import label, regionprops, regionprops_table # v. 0.22.0

In [11]:
"""
Here, the roi variable is loaded. This decides which sample is chosen and which input data is loaded.
For testing this notebook, please don't change this variable. 
The current name is the name of the example data available in this repository.
"""

# Get current directory. This command does not get the directory this file is in (02_CreateIntensityDfs)
# but the one before.
cwd = os.path.dirname(os.getcwd())
roi = "R1B1ROI1" # CHANGE THIS if you want to compute this for other images. (But don't now :) )  

### Load data

In [13]:
# Load mask as image
maskpath = os.path.join(cwd, "00_Data/NucleiCytoMasks")
masklist = os.listdir(maskpath)
# In mask directory, find the correct one
for m in masklist:
    if m.startswith(roi):
        maskpath = os.path.join(maskpath, m)

# Read in mask with scikit-image
mask = io.imread(maskpath)

In [14]:
# Load images into dictionary
path = os.path.join(cwd, "00_Data/PP_Images", roi) # path for images
filelist = os.listdir(path) # image file names

# Create dictionary
images = {}
for f in filelist:
    if not f.endswith(".tif"): # different file
        continue
        
    if "DAPI" in f: # dapi
        dapi = io.imread(os.path.join(path, f))
        continue
        
    markername = f.split("_")[-2].split("-")[1].lower() # getting marker name from filename
    images[markername] = io.imread(os.path.join(path, f)) # save image into dict
    
print("Images of the markers {} loaded for the image {}".format(images.keys(), roi))


Images of the markers dict_keys(['collageni', 'cd45ro', 'lyve1', 'cd138', 'cytokeratin', 'fibulin2', 'cd45', 'podoplanin', 'cd3', 'cd4', 'cd38', 'actin']) loaded for the image R1B1ROI1


### Check image and mask shape compatibility
Correct if necessary by padding the image.

In [15]:
# Check shapes & correct if necessary
# If the images are not all the exact same shape as the mask, it doesn't work.
correctit = True # Only have this option on if you are confident that the mask is just padded, not that the images are shifted
maskshape = mask.shape

for img in images:
    if images[img].shape != maskshape:
        print("The shape for image {} is wrong. Image shape: {}. Mask shape: {}".format(img, images[img].shape, maskshape))
        
        if images[img].shape[0] < maskshape[0] and images[img].shape[1] == maskshape[1]:
            # This means there is a wrong number of rows (heigh of image wrong), mask has padding that image does not
            if correctit == True:
                diff = maskshape[0] - images[img].shape[0]
                blackrow = np.zeros(maskshape[1]) # Empty row of 0s, aka black row of pixels, with same length as image width
                paddedimg = images[img]
                
                # Add as many black rows to bottom of image as there is padding missing
                for i in range(diff):
                    paddedimg = np.vstack([paddedimg, blackrow])
                    
                # Save new padded image in images dictionary
                images[img] = paddedimg
                if images[img].shape == maskshape:
                    print("Shape has been corrected.")
                    
    else:
        print("{} is in the correct shape.".format(img))
                    
                
    

collageni is in the correct shape.
cd45ro is in the correct shape.
lyve1 is in the correct shape.
cd138 is in the correct shape.
cytokeratin is in the correct shape.
fibulin2 is in the correct shape.
cd45 is in the correct shape.
podoplanin is in the correct shape.
cd3 is in the correct shape.
cd4 is in the correct shape.
cd38 is in the correct shape.
actin is in the correct shape.


### Create data frames and save to folder

In [20]:
# Create intensities df and save to folder
# Important: the folder has to exist before you save the files.
for marker in images:
    print("Trying", marker)
    try:
        # Measure images with regionprops_table
        intensities = pd.DataFrame(regionprops_table(mask, images[marker], properties=["intensity_mean", "intensity_min", "intensity_max", "area", "centroid"]))
        dfname = cwd +  "/00_Data/IntensityDataFrames/" + roi + "/" + marker + "df.csv"
        intensities.to_csv(dfname) # Save to folder
    except:
        print("Didn't work. Compare the shape of the image {} with the shape of the mask again.".format(marker))
        continue
    
print("Done. All data frames saved to {}/00_Data/IntensityDataFrames/{}".format(cwd, roi))



Trying collageni
Trying cd45ro
Trying lyve1
Trying cd138
Trying cytokeratin
Trying fibulin2
Trying cd45
Trying podoplanin
Trying cd3
Trying cd4
Trying cd38
Trying actin
Done. All data frames saved to /Users/fschulz/Documents/Uni/LundUniversity/Year2/Thesis/00_Data/IntensityDataFrames/R1B1ROI1
