
# Correlation Strategy Part II - Local

  _This Jupyter Notebook belongs to the article : **CLEMSite, a software for automated phenotypic screens using light microscopy and FIB-SEM.** Please, cite the original paper if you make use of anything present in this notebook._
  
  In the experimental setup presented in the article, MatTek dishes were used to automatically correlate cells from light microscopy (LM) with electron microscopy (EM). In the Part I we developed a global transformation used to drive the microscope to the specific cell. In this step we will try to increase the accuracy going to the local region.
  
  
 ## Content:
* [Why Local?](#why-local)
* [Folders and data structure](#folders-and-data-structure)
* [Sampling description](#sampling-description)
* [Registration workflow](#registration-workflow)
* [Results table](#results-table)
* [Conclusions](#conclusions)
* [How to improve accuracy](#accuracy)

## Why Local? <a class="anchor" id="why-local"></a>

<img src="diagram.png">

 After determining the global transformation and the position of each cell in SEM stage coordinates, the SEM stage is moved to the precise position of the first target cell. This position will be the center of the region to be acquired.  Since the coincidence point method will build a mark in the surface, first we have to move 50 µm in x direction from the target, so we avoid the burning of the surface on top of the target cell by the mark. After restoring the original conditions (0 beam shift, focus and stigmation) the coincidence point calculation is performed.
 
  If a map was created, the application will perform then a recalculation of the original position. First, it will extract a list of the closest landmarks to the target position. With this list in hand, it will apply the routine of landmark acquisition and determination again, for at least 4 landmarks. The default for the experiments was initially set to 6 landmarks and later updated to 8 landmarks (depends on the quality of the sample surface). In principle we could extend it to any N closest positions.  After the selection and re-detection of landmarks, a new affine transformation will be computed and the original position will be overwritten. 

There are several reasons of why this operation was performed:
- If the sample was cracked in two or more parts or it had a deformed shape that prevent the surface to fit into a plane, a global transformation will not provide enough precision close to the target, increasing the error.
- Samples, during operation and over time, move inside the microscope because of thermal drift and other factors (e.g. FIB/SEM lens losing calibration by usage). 
- If the original coincidence point is far away from the coincidence point of the current sample (which happens where a high tilt is present), the z value is altered in a big measure, which has an effect on the final position. 


## Folders and data structure <a class="anchor" id="folders-and-data-structure"></a>

The data gathered from the experiment contains many information and it is structured as it is shown here:
```bash
        ├───AUTOMATION_COMPLETE
            ├───COPB1_Liquid
                ├───14_November_2018_COPB1
                │   ├───14112018_automation_no_spots_LM
                │   │   ├───renamed
                │   │   │    ├───prescan
                │   │   │    │      ├───field--X01--Y16_0066
                │   │   │    │      ...
                │   │   │    ├───hr 
                │   │   │    ├───center_golgi.json
                │   │   ...
                │   ├───part1_SEM      
                │   │   ├───SEM_project
                │   │   │   ├───2L_field--X01--Y16_0023___0010
                │   │   │   ....
                │   │   ├───SEM_scans
                │   │   ├───generated_coordinates_and_files
        
```  
Each project has a name (14_November_2018_COPB1) and inside there is a folder for the LM, ended with \_LM (all the light microscopy acquisition and processing) and one or several SEM folders ended with \_SEM. There is one folder for each time there was a run in the FIB/SEM microscope, for example, part1_SEM is the first part, and part2_SEM corresponds to a second run.

Inside the LM folder: 
   - **Original data**: we can find the original acquired data in the _4927_ folder, the processed images and features with cell profiler in the _4927--cp_, all the scripts used during the acquisition and the jupyter notebook for selection, where all the analysis and selection is stored (e.g. _AUTOMATION_notebook-14Nov.ipynb_ and _AUTOMATION\_notebook-14Nov.html_).
   - **Renamed data**: There is a folder called renamed, where all the preprocessing for landmark acquisition and validation of selection is done. The most important file is _selected\_cells.csv_, where the selection of cells and its coordinates is stored. The files center_golgi.json and center_nuclei.json contain the image coordinates for the centroids of the respective organelles and cell nuclei. In addition to that, we find 2 subfolders:
        - _prescan_ - contains the prescan images used by cell profiler, but now renamed in an easy way
        - _hr_ - contains all the files acquired in the second round of the confocal: for each cell, the low resolution of the grid, dapi and golgi, and the high resolution scan with the stack.
        
      This two subfolders repeat the information of 4927 but they organize all the images in a way that CLEMSite can understand it. In CLEMSite_LM there is plugin that allows to do the renaming. For doing that, follow the instructions provided by the CLEMSite manual.    
       The renamed data is structured according to a code that indicates the relation between the cell acquired in SEM and the cell acquired in LM. E.g. _X00--Y01_, corresponds to one specific position in the LM, and the acquisition in the SEM was named accordingly, in this case could be something like _3M_X00--Y01_01_. 
       
Inside the SEM folder:
   - **generated_coordinates_and_files** - Contains files with all the coordinates and project mappings.
   - **SEM_scans** - Contains all the images and files generated during the scan of the grid surface
   - **SEM_project**- Each folder contains the acquisition (the 3D volume) where:
       - cross_ref folders are folders used to compute the crossing information locally
       - intermediate files previous to the acquisition, e.g. autofocus on cross section and trench detection
       - The folder ending with **\_\_acq** contains the acquisition data and all files generated during it (logs and tracking from RunChecker)
       

## Sampling description <a class="anchor" id="sampling-description"></a>
   
   From a total of 4 studies (**13JulySPOTS**, **19NovemberSPOTS**, **14NovemberCOPB1**, **21NovemberCOPB1**), we will select randomly 10 cells from each. The strategy would be to compare a manual registration with the computed one. Given the scales, it is likely that the manual registration error can hold around 2-3 micrometers errors (see later, "Enhanced registration using local features"), but currently we don't have a way to estimate the human error with experimental samples. Thus, we will assume the human registration as ground truth, that is, assuming we have enough local features identified visually to generate a precise overlay of light microscopy and SEM images. 
   
   Since the surface of each SEM sample looks very different and the preservation of the grid for identification varies from one experiment to another, we will separate the results instead of providing a global RMSE error.

## Registration workflow

_Note: do not confuse the registration between landmarks of LM and EM during the microscope running, with this registration workflow. In the first, we try to obtain the position, based on the N-closest nearest landmarks and at the end, that provides a position where the microscope starts to do its acquisition. The image taken in SEM is taken with this position at the center._

We need to evaluate the RMSE of that position. For this reason, now we will register the LM image with the SEM image, and then, compute the distance from the organelle centroid to the center of the SEM image. The distance between them in microns will be used as RMSE. We do not use the transformation matrix used by the microscope, we evaluate independently, and be based only in local features of the image.


In [1]:
import glob 
import os,re
import sys

# Auxiliary Function to get files by regular expression
def filterPick(folder, myString):
    myList = glob.glob(folder+'\\*')
    pattern = re.compile(myString);
    indices = [i for i, x in enumerate(myList) if pattern.search(x)]
    return [ myList[ind] for ind in indices]

In [2]:
# Step 1: define folders
project_folder = ".\\data\\19_November_SPOTS"
project_sample = "3K_field--X02--Y08_0025"
sample_path = os.path.join(project_folder,project_sample)
sample_path

'.\\data\\19_November_SPOTS\\3K_field--X02--Y08_0025'

In [3]:
# Find relevant folders
LM_images_path = os.path.join(sample_path,"field--X02--Y08_0025")
SEM_image_path = filterPick(sample_path,"sFOV.*tif")[0]  # Find the filename with sFOV

#### 1. Generate LM position masks for the whole experiment:
First, we are going to create masks that point the position of the cell in light microscopy:

In [4]:
# Generate Masks 
import json

with open(os.path.join(project_folder,"masks//center_golgi.json")) as json_file:
    data = json.load(json_file)
    
masks_folder = os.path.join(project_folder,"masks")

In [5]:
import cv2
import numpy as np

prescan_image_size = (680,680,3)
# We are going to generate masks with the centroids for 680x680 images
center_x = data['Location_Center_X'] # or   data['Mean_Golgi_AreaShape_Center_X']
center_y = data['Location_Center_Y'] # or   data['Mean_Golgi_AreaShape_Center_Y']

for key, cx in center_x.items():
    cy = center_y[key]
    im = np.zeros(prescan_image_size,dtype=np.uint8) # We know the prescan resolution was 680
    coords = (int(np.round(cx)),int(np.round(cy)))
    im = cv2.circle(im, coords , 2, (0, 255, 255), 2)
    im[coords[1],coords[0],1] = 0
    cv2.imwrite( masks_folder+'//'+key+'_.tif',im)

#### 2. Extract SEM information

The file used for registration is the one with the tag: _sFOV_. We can use the image _bFOV_ in case there is not enough visual landmarks to do our local registration. 

The information about pixel size can be read from the _.tif_ metadata using the following function: 

In [6]:
import xml.etree.ElementTree as ET
import numpy as np
from tifftest import TiffFile,TiffWriter


def getInfoHeaderAtlas(tifname):
    xml_info = ""
    data = {}
    with TiffFile(tifname) as tif:
        for page in tif:
            for tag in page.tags.values():
                if (tag.name == '51023' or tag.name =='fibics_xml'):
                    xml_info = tag.value
    if (not xml_info):
        raise ValueError("NO INFO HEADER for ATLAS picture")
    root = ET.fromstring(xml_info)
    first_tag = [];
    second_tag = [];
    third_tag = [];
    for child in root:
        m = re.match('Scan', child.tag)
        m2 = re.match('.*Stage.*', child.tag)
        m3 = re.match('.*Image$', child.tag)
        if m:
            first_tag = m.group(0)
        elif m2:
            second_tag = m2.group(0)
        elif m3:
            third_tag = m3.group(0)
    #### Scan
    if (first_tag):
        child = root.findall(first_tag)
        for el in child[0]:
            if (el.tag == 'Ux'):
                data['PixelSize'] = float(el.text)
            elif (el.tag == 'Dwell'):
                data['DwellTime'] = float(el.text)
            elif (el.tag == 'FOV_X'):
                data['FOV_X'] = float(el.text)
            elif (el.tag == 'FOV_Y'):
                data['FOV_Y'] = float(el.text)
            elif (el.tag == 'Focus'):
                data['WD'] = float(el.text)
    ######## Stage
    if (second_tag):
        child = root.findall(second_tag)
        for el in child[0]:
            if (el.tag == 'X'):
                data['PositionX'] = float(el.text)
            elif (el.tag == 'Y'):
                data['PositionY'] = float(el.text)
            elif (el.tag == 'Z'):
                data['PositionZ'] = float(el.text)
    ######## Image
    if (third_tag):
        child = root.findall(third_tag)
        for el in child[0]:
            if (el.tag == 'Detector'):
                data['Detector'] = el.text
            elif (el.tag == 'Aperture'):
                data['Aperture'] = el.text
            elif (el.tag == 'Width'):
                data['Width'] = int(el.text)
            elif (el.tag == 'Height'):
                data['Height'] = int(el.text)
            elif (el.tag == 'Brightness'):
                data['Brightness'] = float(el.text)
            elif (el.tag == 'Contrast'):
                data['Contrast'] = float(el.text)
            elif (el.tag == 'Beam'):
                data['Beam'] = el.text
    return data

In [7]:
dataAtlas = getInfoHeaderAtlas(SEM_image_path) # Get the header information
dataAtlas

{'DwellTime': 10000.0,
 'FOV_X': 307.154449462891,
 'FOV_Y': 307.154449462891,
 'PixelSize': 0.299955517053604,
 'WD': 0.00501363817602396,
 'PositionX': -40007.2460528197,
 'PositionY': -68152.6367931032,
 'PositionZ': 41901.7124354794,
 'Width': 1024,
 'Height': 1024,
 'Beam': 'SEM',
 'Aperture': '1,5 kV | 700 pA [An]',
 'Detector': 'SESI',
 'Contrast': 19.9325046539307,
 'Brightness': 51.5462379455566}

#### 3. Extract LM information
Do the same for LM, extract pixel size information.

In [8]:
def getInfoTiffOME(tifname):
    """ Get info from tiff header """
    pixel_size = 0
    res_unit = 0
    xml_info = []
    with TiffFile(tifname) as tif:
        for page in tif:
            for tag in page.tags.values():
                if (tag.name == 'image_description'):
                    xml_info = tag.value
                if (tag.name == 'resolution_unit'):
                    res_unit = tag.value
                if (tag.name == 'x_resolution'):  # we assume the same x and y resolution
                    res_size = tag.value
                if (tag.name == 'image_length'):
                    im_length = tag.value
    if (int(res_unit) == 3):  # dots per cm
        tpx = float(res_size[0]) / float(res_size[1])  # pixels per 1 cm
        pixelsize = 1.0 / tpx  # length of the image e.g. 1024 pixels/ tpx
        pixel_size = pixelsize * 10  # change to meters

    return (xml_info, pixel_size)

def getInfoHeader(fname):
    xml_info, pixel_size = getInfoTiffOME(fname)
    if(not xml_info):
        return
    try:
        root = ET.fromstring(xml_info)
    except ET.ParseError as err:
        return

    first_tag = []
    second_tag = []
    for child in root:
        m = re.match('.*Image.*', child.tag)
        if m:
            first_tag = m.group(0)
    if (first_tag):
        data = {}
        for child in root.findall(first_tag):
            for gch in child:
                m = re.match('.*Pixels.*', gch.tag)
                if m:
                    second_tag = m.group(0)
    if (second_tag):
        child = root.findall(first_tag + '//' + second_tag)
        for gch in child[0]:
            planetag = re.match('.*Plane.*', gch.tag)
        child2 = root.findall(first_tag + '//' + second_tag + '//' + planetag.group(0))
        for gch in child2[0]:
            stagepositiontag = re.match('.*StagePosition.*', gch.tag)
        child2 = root.findall(
            first_tag + '//' + second_tag + '//' + planetag.group(0) + '//' + stagepositiontag.group(0))
        mydict = child2[0].attrib;
        data['PositionX'] = float(mydict['PositionX']) * 1e6;
        data['PositionY'] = float(mydict['PositionY']) * 1e6;
        data['PositionZ'] = float(mydict['PositionZ']) * 1e6;
        mydict = child[0].attrib;

        data['PixelSize'] = pixel_size*1e3; # in micrometers!!  # float(mydict['PhysicalSizeX'])*1e-3;
        #                data['PhysicalSizeY'] = mydict['PhysicalSizeY']
        #                data['PhysicalSizeZ'] = mydict['PhysicalSizeZ']
        data['PyxelType'] = mydict['PixelType']

    return data

In [9]:
prescans = filterPick(LM_images_path,"prescan.*tif")
dataLM = getInfoHeader(prescans[0])
dataLM

{'PositionX': 57633.19803085,
 'PositionY': 40164.59857634,
 'PositionZ': -2.6609923,
 'PixelSize': 0.37990264549510777,
 'PyxelType': 'uint8'}

#### 4. Auto adjust Brightness and Contrast of images
Now, we prepare the images. First set up B&C from SEM image to observe better any salient features (like cells):

In [10]:
# Data preparation SEM
# Auto brightness and contrast
%matplotlib inline
from matplotlib import pyplot as plt
from skimage import exposure

im =  cv2.imread(SEM_image_path,0)
im_adapted = exposure.equalize_adapthist(im, clip_limit=0.005)
im_adapted = im_adapted**2+0.012
cv2.imwrite(SEM_image_path[:-20]+"_adapted.tif",im_adapted)

True

Now we can calculate the scale ratio between SEM and LM:

In [16]:
from skimage import exposure
ratio = dataLM['PixelSize']/dataAtlas['PixelSize']
# 680x680 is the shape of the prescan
new_shape = int(np.round(680*ratio)), int(np.round(680*ratio))
print(new_shape)

(861, 861)


We will make use of the open source software GIMP to do the manual registration.  We decided to go for a simple **rigid** transform which only allows one scaling, one horizontal flip and shift and rotations. By restraining the number of transformations and not allowing complete freedom, we expect to be less biased by not adapting the image to the data. 

Why GIMP? There are alternative softwares to do registration, like ecCLEM (http://icy.bioimageanalysis.org/plugin/ec-clem/) or several plugins for registration in FIJI. Using that software registration usually does an affine transform by clicking on landmarks over two images. However, the following procedure was really fast (one image can be aligned in 10-15 minutes).

#### Instructions in GIMP to do a manual registration

- Get the SEM image sFOV and drag it into GIMP 2.10
- Add the mask image (if there are several masks corresponding to the code Xnn\_\_Ynn, look the mask that matches with reg\_t.jpg)
- Add the images corresponding to the prefix prescan (golgi, nuclei, transmitted light)
- Create a layer group and add the mask and prescan images inside
- Sort it: mask first, nuclei second, golgi third, transmitted light fourth and adjust transparencies with the bar on top.
- Chain all images inside the layer group
- Scale the layer group from 680x680 to 861x861 (Right Mouse click on image ->Scale Layer) 
- Adjust brightness and contrast of all the images:
    - for the prescan images use Colors->Brightness & Contrast->Edit these settings as levels-> Auto Input levels
- Flip Horizontally (Layer->Transform->Flip Horizontally)
- Shift and rotate until:
    - Imperfections and shapes in the transmitted light match with the SEM image 
    - Sometimes fluorescence staining (nuclei and golgi) matches to the contour visible cells




 <img src="example.png"  width="512">


## Results table for manual registration <a class="anchor" id="results_table"></a>

<table>
  <tr>
    <th>Sample</th>
    <th>Value for Global Transformation</th>
    <th>Value for Local Transformation (estimated)</th>
    <th>Value for Local Transformation (experimental, n=10)</th>
  </tr>
  <tr>
    <td style="text-align:left">13 Jul</td>
    <td style="width:200px">6.44 +/-4.3, n=53</td>
    <td style="width:200px"> 4.53 +/- 3.4</td>
    <td style="width:200px">4 +/- 1.85</td>
    
  </tr>
  <tr>
    <td style="text-align:left">19 Nov</td>
    <td style="width:200px">9.622 +/-5.1, n=46</td>
    <td style="width:200px">4.26 +/-3.09</td>
    <td style="width:200px">4.94 +/-3.5</td>
    
  </tr>
  <tr>
    <td style="text-align:left">14 Nov</td>
    <td style="width:200px">18.76 +/-11.5, n=33</td>
    <td style="width:200px">12.73 +/-10</td> 
    <td style="width:200px">12 +/- 4.32 </td>
    
  </tr>
  <tr>
    <td style="text-align:left">21 Nov</td>
    <td style="width:200px">20.56 +/-13.54, n=47</td>
    <td style="width:200px">14.98 +/-13.45</td>
    <td style="width:200px">9.86 +/- 6.47</td>
  </tr>
  <tr>
    <td style="text-align:left">Average and SD</td>
    <td style="width:200px">13.21 +/-6.16, n=179</td>
    <td style="width:200px">8.71 +/- 5.17 , n=179</td>
    <td style="width:200px">7.7+/-4.36     , n=40</td>
  </tr>
    </table>
    
 


## Conclusions <a class="anchor" id="results_table"></a>

 We registered a total of 40 cells belonging to a 4 different datasets, sampling 10 cells per dataset. We used rigid registration using features in the SEM image of the surface (image taken just before acquisition, with the target in the center) and the LM using fluorescence. If available, we made use also of transmitted light. Rigid registration lacks of shearing, needed in many cases (the SEM surface was tilted), however, in order to save time, to simply overlay with shifts and rotations, basing the overlay in fitting the closest local features.
 
  The datasets _14Nov_ and _21Nov_ displayed a big error, mostly caused by the errors in detection. By examining the SEM surface of both samples, it could be observed that the task of identifying accurately the landmarks was not a simple one. In case of the _14Nov_ sample the marks where too faint and covered with paint. In the case of _21Nov_ , the sample was cracked into several pieces:
 <img src="errors.png">

 It is important to notice that we were able to predict accurately the targeting error before the acquisition. That can help us to decide if the error is too big so the sample must be discarded, or if we continue the acquisitions but we enlarge the milling area and FOV.


## How to improve accuracy <a class="anchor" id="ccuracy"></a>

#### Reduce distance between landmarks
 There is a constraint imposed by MatTek dishes: a square in the grid has a side of 600 µm, which means that for acquiring just 20 landmarks, the area to be covered is around 2,4 mm in diameter (√20 ≈ 4 landmarks in average). Scanning for landmarks in such a big extension is against the benefits explained in previous points (thermal and mechanical drift, avoid local areas with deformations). With other grids (like sapphire grids, ibidi or even with self made glass coverslips)  where the distance between landmarks is reduced, we could obtain more landmarks in a closer range, thus increasing the accuracy of the local transform. That would require, of course, an adaptation of the detection software.

<img src="other_grids.png">


#### Enhanced registration by using local area features
 Using predefined landmarks has the downside that landmarks can be very far from the target. Experienced CLEM users tend to use local features present in both SEM and LM images to do an overlay (what we did in the manual registration procedure). Can we automate this procedure? 
 A suggested way would be to find the contours of cells at high kV (5 or 10 kV) in SEM, and then use the contours detected using transmitted light or fluorescence. Overlaying the contours can provide then a high accuracy targeting precision.

<img src="advanced_registration.png">

We tested this with 5 images using ITK for registration (by minimizing mutual information and restricted to an affine transformation). 3 of the 5 images were automatically registered. In the others, the optimization of the registration was not able to converge, providing a wrong alignment of both images.

**Advantages**:
   - In theory is possible to achieve the maximum accuracy, 2-3 micrometers precision proven or even less using other features as landmarks, like beads.
 
**Disadvantages**:
   - Depending on local features, does not guarantee the presence or successful detection of that features in the region where a cell is located.
   - The appeareance of cells in the sample surface for SEM is very dependent on sample preparation.
   - Might not work in very confluent configurations
   - The current state of the art in registration requires to tinker around with parameters for each specific registration. That means, it is not easy to automate for light microscopy images acquired with different conditions. It requires to set up and handle a register procedure adapted to each experiment, e.g., which channels available, if the cell contours are well defined or not, etc...