# Visualizing Overlays of Clusters on Widefield Images

During an analysis it is very often useful to overlay clustered localizations on top of widefield images to ensure that the clustering is performed correctly. One may also wish to navigate through the clusters and manually annotate them one-by-one.

In this notebook, we will demonstrate how to do this with the OverlayClusters and AlignToWidefield multiprocessors.

In [1]:
# Import the essential bstore libraries
%pylab
from bstore import processors as proc
from bstore import multiprocessors as mp
import pandas as pd

# This is part of Python 3.4 and greater and not part of B-Store
from pathlib import Path

Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib


## Before starting: Get the test data
You can get the test data for this tutorial from the B-Store test repository at https://github.com/kmdouglass/bstore_test_files. Clone or download the files and change the filename below to point to the folder *multiprocessor_test_files/align_to_widefield* within this repository.

In [2]:
dataDirectory = Path('../../bstore_test_files/multiprocessor_test_files/align_to_widefield/') # ../ means go up one directory level

# Step one: load the data

This example demonstrates how to use the [OverlayClusters](http://b-store.readthedocs.io/en/latest/bstore.html#bstore.multiprocessors.OverlayClusters) multiprocessor in B-Store's analysis tools. This processor takes as input 

1. a Pandas DataFrame containing clustered localization information;
2. (optional) a Pandas DataFrame containing the statistics belonging to each cluster;
3. (optional) a widefield image to overlay the clusters onto.

If no `stats` DataFrame is supplied, a basic one will be calculated. If no widefield image is supplied, then the clusters will be displayed on a blank 2D space.

The DataFrame containing the localizations **MUST** have a column that specifies cluster IDs as integers. If the localizations have not been clustered, you could use the [Cluster processor](http://b-store.readthedocs.io/en/latest/bstore.html#bstore.processors.Cluster) or any other clustering algorithm to do so.

The example data contains all three of the above datasets, so we'll load all three now.

In [3]:
locsFile  = dataDirectory / Path('locResults_A647_Pos0.csv')
statsFile = dataDirectory / Path('locResults_A647_Pos0_processed.csv')
wfFile    = dataDirectory / Path('HeLaS_Control_53BP1_IF_FISH_A647_WF1/HeLaS_Control_53BP1_IF_FISH_A647_WF1_MMStack_Pos0.ome.tif')

with open(str(locsFile), 'r') as f:
    locs = pd.read_csv(f)
    
with open(str(statsFile), 'r') as f:
    # Note that we set the cluster_id to the index column!
    stats = pd.read_csv(f, index_col = 'cluster_id')
    
with open(str(wfFile), 'br') as f:
    img = plt.imread(f)

In [4]:
locs.head()

Unnamed: 0,x,y,z,frame,photons,loglikelihood,background,sigma,length,cluster_id
0,2731.223376,88151.099508,0.0,500,173518.9,97.537365,6690.266,126.724696,115,0
1,2793.6,65219.0,0.0,500,1330.0,92.823,65.583,138.91,1,-1
2,9385.288184,97538.576682,0.0,500,4763.67,61.1316,290.942,133.074,5,1
3,10362.332473,72860.705094,0.0,500,5023.6,80.984667,319.252,149.726667,3,2
4,12256.993051,70377.657241,0.0,500,18711.7,90.9649,906.081,143.543,10,3


In [5]:
stats.head()

Unnamed: 0_level_0,x_center,y_center,number_of_localizations,eccentricity,convex_hull,radius_of_gyration
cluster_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
-1,53280.718198,50983.564479,12187,1.301732,9.430623,39925.833664
0,2729.491474,88159.770103,935,1.452702,4601.6319,13.136815
1,9371.397574,97529.475573,61,2.12371,3862.3021,22.195811
2,10326.880968,72878.546954,230,4.845439,18257.826,59.531482
3,12268.029688,70368.264307,51,2.679195,5170.2054,26.197739


In [6]:
plt.imshow(img, cmap = 'gray_r')
plt.show()

If all goes well you should see the first five lines of the `locs` and `stats` DataFrames. The widefield image of telomeres in HeLa cell nuclei should appear in a separate window after running the above cell.

# Step two: set up the stats DataFrame for annotation

The `OverlayClusters` multiprocessor allows you to annotate clusters with a label, such as `True`, `False`, or an integer between 0 and 9. This allows you to, for example, manually filter clusters for further analyses. To do this, you need to add a column that will be annotated for cluster in the `stats` DataFrame.

This step is optional, so you may skip it if you like.

In [7]:
# Use AddColumn processor from B-Store to add the column
adder = proc.AddColumn('annotation', defaultValue = True)
stats = adder(stats)
stats.head()

Unnamed: 0_level_0,x_center,y_center,number_of_localizations,eccentricity,convex_hull,radius_of_gyration,annotation
cluster_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
-1,53280.718198,50983.564479,12187,1.301732,9.430623,39925.833664,True
0,2729.491474,88159.770103,935,1.452702,4601.6319,13.136815,True
1,9371.397574,97529.475573,61,2.12371,3862.3021,22.195811,True
2,10326.880968,72878.546954,230,4.845439,18257.826,59.531482,True
3,12268.029688,70368.264307,51,2.679195,5170.2054,26.197739,True


You can see that the stats DataFrame now has an annotation column with each value set to `True`.

Let's do some initial filtering on this DataFrame. Many of the clusters are noise and don't actually correspond to the telomeric signal. They typically have fewer than 50 localizations per cluster. We can remove already during our filtering step using Pandas DataFrame slicing and assignments.

In [8]:
# Set rows representing clusters with fewer than 50 localizations to false
stats.loc[stats['number_of_localizations'] < 50, 'annotation'] = False

# Step 3: Overlay the clusters on top of the widefield image

Running the cell below will open up a window showing two views. On the left, you will see the full widefield image displayed with white dots on top. These dots are the centers of the clusters in the stats DataFrame. A yellow circle will indicate the current cluster.

On the right, you will see a zoom of the current cluster. The localizations in this cluster are teal circles. Green circles denote the centers of other clusters now currently being analyzed and magenta dots denote noise localizations (their `cluster_id` is -1).

You can press `g` and `b` to navigate forward and backward through each cluster.

In [9]:
overlay = mp.OverlayClusters(annotateCol = 'annotation', filterCol='annotation', pixelSize = 108)
overlay(locs, stats, img)



Setting the `filterCol` parameter to the name of the annotation column removed all the clusters that we filtered out above from the visualization. If you set this None, you will see every cluster in the DataFrame.

# Step 4: Correcting the shift between clusters and the widefield image

As you navigate, you should notice a constant offset between the widefield image and the clusters. This can be corrected with the [AlignToWidefield](http://b-store.readthedocs.io/en/latest/bstore.html#bstore.multiprocessors.AlignToWidefield) multiprocessor. This processor creates a histogram from the localizations and computes the cross-correlation with an upsampled version of the widefield image to determine the global offset between the two.

To use this multiprocessor, we will input the widefield image and localizations belonging to the filtered clusters as inputs.

In [10]:
# This removes all localizations whose cluster_id is not set to False in stats
# Filtering out the noisy localizations is not necessary but sometimes helps the alignment
filteredLocs = locs.loc[locs['cluster_id'].isin(stats[stats['annotation'] == True].index)]

# Now compute the offset with the filtered localizations
aligner = mp.AlignToWidefield()
dx, dy = aligner(filteredLocs, img)

print('x-offset: {0}, y-offset: {1}'.format(dx, dy))

x-offset: -172.8, y-offset: -194.4


We can now use the `xShift` and `yShift` parameters of the call to overlay to apply these corrections. The localizations are not physically changed by this operation; only their locations in the visualization are moved.

In [11]:
overlay = mp.OverlayClusters(annotateCol = 'annotation', filterCol='annotation', pixelSize = 108,
                             xShift = dx, yShift = dy)
overlay(locs, stats, img)



Now when you navigate through the clusters you should see that they overlap quite well.

# Step 5: Annotating the clusters

If you do specify an annotation column in the call to `overlay`, you can use the keyboard to annotate each cluster and move to the next. The following keys are used to add annotations:

- **Space bar** : set the value in the stats column for this cluster to `True`
- **r** : set the value in the stats column for this cluster to `False` ('r' is for 'reject')
- **0-9** : set the value in the stats column to an integer between 0 and 9

# Step 6: Saving the results

Once you are finished, you may save the results of the annotation by saving the `stats` DataFrame using any of the Pandas save functions, such as `to_csv()`.

In [12]:
filename = 'annotated_data'
with open(filename, 'w') as f:
    stats.to_csv(f)

# Summary

1. The **OverlayClusters** multiprocessor may be used to overlay clustered localizations on widefield images
2. The same multiprocessor may be used to manually annotate clusters
3. If the localizations are shifted relative to the widefield image, use the `AlignToWidefield` multiprocessor to correct this global shift.