Created on Tue Feb 05 11:34:21 2019 <br>
@author: zoldi.miklos
# <h1><center> Alignment of STORM images taken with two different filter cubes II </center></h1>
<h1><center> Finding the "best alignment" </center></h1>



Alignment issue: The result of align.py script (offset needed to shift LPs to reference LPs, obtained by different filter cubes) is highly dependent on the parameters of clustering criteria. One could set them in an empirical manner, testing some criteria, and choosing the one which was OK. This would be different for every experiment (where the labeling density, number of imaged frames and the quality of STORM image is different). <br><br>
__Is there an objective function which could choose the optimal clustering parameters for alignment?__ 

In [42]:
from IPython.display import HTML
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')

In [9]:
#import modules
import align_miki_for_test_v1 as align #align.py modified in a way that it can be imported for this script
import os                              #should be in the same directory where .ipynb file is located!
import itertools
import pandas as pd
import numpy as np
import scipy
from bokeh.io import show, output_notebook, reset_output
from bokeh.plotting import figure, output_file, save, gridplot
from bokeh.models import Arrow, NormalHead
from bokeh.palettes import Category10
from bokeh.models.annotations import Label

In [5]:
#set plotting style:
reset_output()    
output_notebook() #disable these not to show notebook inline but in a separate window 

## 1) Alignment results are highly dependent on clustering parameters 

Example alignments on 1 file using different parameters for DBSCAN (like number of neighbors = [5,10,15,20], epsilon = [100,150,200]) <br> <br>
The image shows that there is some deviation in the calculated offset values for alignment. The mean of all offset values are labeled by the red circle, and the closest ("best") alignment to it is highlited with its clustering parameters (here 5 neighbors in 150 nm, and the number of clusters found in each channel) <br><br>

__NOTE:__ Of course, without a grund truth reference, one can not state which alignment is the optimal. Choosing the one which is closest to the mean may not be good at all - it can happen that all alignmnets produce bad results except the one which gives a totally different result ... But if alignments are more or less OK, I think we can reason the the one closest to the mean never deviates too much, and hence is exceptable ...

In [38]:
#get files (very same format as needed for align.py)
csv_file = "L:/Miki/Scripts/Original_scripts_from_others/quad-cube-storm-exploration/out.csv"  
dt = pd.read_csv(csv_file, sep=';', dtype=str)

In [26]:
#set variables to test for
crit_neighbors = [5,10,15,20]
crit_distances = [100,150,200]

#constant values
border_size = float(200)
center_zoom_factor = float(0.25)
min_cluster_area_1 = 0
max_cluster_area_1 = float(np.inf)
min_cluster_area_2 = 0
max_cluster_area_2 = float(np.inf)
use_mean_vector_offset = True
no_plots = True

In [40]:
#function to test best alignment
def test_criteria_per_file(crit_neighbors=crit_neighbors, crit_distances=crit_distances):
    ''' finds alignments for a given file for mulitple criteria and plots them''' 
    
    #lists for results from all files
    closest_crit_all = []
    closest_offset_all = []
    distances_all = []
    
    for index, row in dt.iterrows():
        #validates file pairs
        try:
            os.path.isfile(row['first'])
            os.path.isfile(row['second'])
        except Exception as exc:
#             print "could not find pairs for: ", row['first'], '\n'
#             print exc
            continue
                 
        postfix = '-' + row['channel'].replace('/', '_').replace('\\', '_')
        out_name = row['first'] + '-' + postfix
        
        #lists for results of current file
        criteria = []
        nr_files = 0
        offsets = []
        clNr1_all = []
        clNr2_all = []       
        
        #reading dataframes here and NOT via align.py prepare (one has to read
        #file just once for the different criteria being tested)
        try:
            dt_1, dt_2, _ = align.prepare(row['first'], row['second'], row['channel'],
                            center_zoom_factor)
        except:
#             print ("could not perform alignment on: ", row['first'], 
#                 row['second'], row['channel'], "align.py possibly not found")
            continue
        #iterates over criteria
        for i in crit_neighbors: 
            for j in crit_distances:                            
                dbscan_min_samples_1 = int(i)
                dbscan_eps_1 = float(j)
                dbscan_min_samples_2 = int(i)          
                dbscan_eps_2 = float(j)
                
                parameters = {
                    'file_1': row['first'],
                    'prep_1' : dt_1,
                    'file_2': row['second'],
                    'prep_2' : dt_2,
                    'common_channel_name': row['channel'],
                    'dbscan_eps_1': dbscan_eps_1,
                    'dbscan_min_samples_1': dbscan_min_samples_1,
                    'dbscan_eps_2': dbscan_eps_2,
                    'dbscan_min_samples_2': dbscan_min_samples_2,
                    'border_size': border_size,
                    'center_zoom_factor': center_zoom_factor,
                    'min_cluster_area_1': min_cluster_area_1,
                    'max_cluster_area_1': max_cluster_area_1,
                    'min_cluster_area_2': min_cluster_area_2,
                    'max_cluster_area_2': max_cluster_area_2,
                    'use_mean_vector_offset': use_mean_vector_offset,
                    'out_base_name': out_name,
                    'no_plots': no_plots}
                
                try:
                   offset, cl_1, cl_2 = align.perform_alignment(parameters)
                except Exception:
#                     print 'correct align.py module was not found'
                    continue
                    
                nr_files += 1
                criteria.append((i,j))
                offsets.append(offset)
                
                #adds also noise (key = -1)
                clNr1_all.append(len(cl_1.keys()))
                clNr2_all.append(len(cl_2.keys()))
        #from all the offsets generated by different criteria, calculates the mean (3d)
        mean_offset = np.mean(np.array(offsets), axis=0)
        #calculates criteria generated the closest offset to mean and updates list
        tree = scipy.spatial.cKDTree(offsets)
        dist, closest = tree.query(mean_offset)
        closest_crit = criteria[closest] 
        closest_crit_all.append(closest_crit)
        closest_offset = offsets[closest]
        closest_offset_all.append(closest_offset)
        
        #calculates distances of all offsets from the mean offset
        if distances_all == []:
            for i in range(len(criteria)):
                distances_all.append(0)           
        for i,j in enumerate(offsets):
            distance = np.sqrt((j[0]-mean_offset[0])**2 + 
                               (j[1]-mean_offset[1])**2 + (j[2]-mean_offset[2])**2)
            distances_all[i]+= distance
         
        #plot all offsets of all criteria per file
        #does not really make sense for all files, rather do it only for one to see how different alignments 
        #result in different shifts per file 
#         p1 = figure(title=parameters['file_1'], match_aspect=True, toolbar_location='right')
#         #show mean offset
#         p1.circle(x=mean_offset[0], y=mean_offset[1], size=20,
#                   fill_color='red', line_color='black', line_width=3)
#         for offset, crit, cl_1, cl_2, color in zip (offsets, criteria,
#             clNr1_all, clNr2_all, itertools.cycle(Category10[10])):
#             p1.circle(x=[0, offset[0]],
#                       y=[0, offset[1]],
#                       color=color)
#             p1.add_layout(Arrow(end=NormalHead(fill_color=color), x_start=0,
#                         y_start=0, x_end=offset[0], y_end=offset[1]))
#             #label only the closest offset 
#             if crit == closest_crit:
#                 p1.add_layout(Label(text="crit: "+str(crit), x=offset[0],
#                                     y=offset[1], x_offset=5, y_offset=5,
#                                     border_line_color='red', border_line_width=3.0))
#                 p1.add_layout(Label(text="clNr: "+str(cl_1)+","+str(cl_2), x=offset[0],
#                                     y=offset[1], x_offset=5, y_offset=-15))
            
# #         output_file(str(parameters['file_1'])+"---arrows.html", mode='inline')
# #         save(gridplot([[p1]], sizing_mode='stretch_both', merge_tools=False))
#         show(gridplot([[p1]], merge_tools=False))
    
    #calculates best criteria1 (criteria's offsets were most often the closest offsets to the mean offset)
    count_dict = {}
    for i in set(closest_crit_all):
        counts = closest_crit_all.count(i)
        count_dict[i]=counts
    best1 = max(count_dict, key=count_dict.get)
    #calculates best criteria2 (on average, criteria's offsets had the smallest distances from mean offset)
    distances_dict = {}
    distances_all = np.array(distances_all)
    distances_all = distances_all / float(nr_files)
    for i,j in enumerate(criteria):
        distances_dict[j] = distances_all[i]
    best2 = min(distances_dict, key=distances_dict.get)
    
    #plot best offsets for all files 
    p2 = figure(title="range of best offsets for all files", match_aspect=True)
    #mean of all offsets (best from all tests criteria per file)
    mean_offsets = np.mean(np.array(closest_offset_all), axis=0)
    p2.circle(x=mean_offsets[0], y=mean_offsets[1], size=20,
              fill_color='red', line_color='black', line_width=3)
    for offset, crit, color in zip (closest_offset_all, closest_crit_all,
                                    itertools.cycle(Category10[10])):
        p2.circle(x=[0, offset[0]], y=[0, offset[1]], color=color)
        p2.add_layout(Arrow(end=NormalHead(fill_color=color), x_start=0,
                            y_start=0, x_end=offset[0], y_end=offset[1]))
        p2.add_layout(Label(text="crit: "+str(crit), x=offset[0],
                            y=offset[1], x_offset=5, y_offset=5))
            
#     output_file(filename="_all_best_offsets.html", mode='inline')
#     save(gridplot([[p2]], sizing_mode='stretch_both', merge_tools=False))
    show(gridplot([[p2]], merge_tools=False))
 
    
    print "best criteria 1 (closest to mean offsets most of the time): ", best1, '\n'
    print "best criteria 2 (on average, closest to mean offsets): ", best2, '\n'
    return  best1, best2, count_dict, distances_dict

In [24]:
best1, best2, closest, distances = test_criteria_per_file(crit_neighbors=crit_neighbors, crit_distances=crit_distances)

For different files, the closest/"best" alignment varies. 

In [41]:
best1, best2, closest, distances = test_criteria_per_file(crit_neighbors=crit_neighbors, crit_distances=crit_distances)

   no suitable cluster roi found, skipping "L:/Miki/Other_projects/Glutamaterg/THC_Cagliari/180606_gluerg_CA3_STORM/stormtxts_to_merge\BB236_VGluT_sl01_934_CB1_647_1116_Bsn_568_s006F_list-2018-06-20-19-03-41_S73.bin_Zmod.txt"


best criteria 1 (closest to mean offsets most of the time):  (10, 200) 

best criteria 2 (on average, closest to mean offsets):  (10, 150) 

