# Preproc2P - final experiment

This notebook sets up the final experiment, which is combining all covariance functions with all likelihoods and fitting them to all datasets

## Datasets

The datasets consist of the publicly available [Neurofinder datasets](http://neurofinder.codeneuro.org/)


## The covariance functions

These functions describe our prior assumptions about the spatial covariance of the illumination incident on the sample. The set of assumptions - going from less to more - are as follows:

- No prior
- Spatially smooth function (RBF covariance)
- Smooth + Symmetric (XY or circular)
- Smooth + Symmetric (XY or circular) + Anti-correlated at distance

One can enforce the influence of the prior to a variable degree

## Likelihood functions

These functions describe the relationship between an observation and a corresponding sample of the Gaussian process output, and are as follows:

- LinearGainLikelihood
- PoissonInputPhotomultiplierLikelihood
- PoissonInputUnderamplifiedPhotomultiplierLikelihood

In [1]:
from preprocExperimentSetup import *
# File system management
import os
import errno
import zipfile

# Get current git hash to ensure reproducible results
import git
git_cur_repo = git.Repo(search_parent_directories=True)
git_cur_sha = git_cur_repo.head.object.hexsha


In [2]:
# Define which datasets do we want to work with
data_dir='/nfs/data/gergo/Neurofinder_update/'
all_dataset_names = [
#                      'neurofinder.00.00', 
#                      'neurofinder.00.01', 
#                      'neurofinder.00.00.test',
#                      'neurofinder.00.01.test',
#                      'neurofinder.01.00', 
#                      'neurofinder.01.00.test',
#                      'neurofinder.01.01.test',                     
                     'neurofinder.02.00', 
                     'neurofinder.02.01',
                     'neurofinder.02.00.test',
                     'neurofinder.02.01.test',
#                      'neurofinder.03.00', 
#                      'neurofinder.03.00.test',
#                      'neurofinder.04.00',
#                      'neurofinder.04.01',
#                      'neurofinder.04.00.test',
#                      'neurofinder.04.01.test'
                    ]

# Imputing datasets

Determine for each dataset the location of missing pixels (that are often the result of computational buffering inconsistencies during fluorescence microscopy, and are detectable as lines of "0" values), and create two datasets, one where the missing data is marked with NaN values, and another where the missing data is imputed via local convolution. of nearby non-missing pixels.

In [None]:
for dataset_name in all_dataset_names:
    
    print(dataset_name)
    
    # Check if data has already been preprocessed
    if not os.path.exists(data_dir+dataset_name+'/'):
        with zipfile.ZipFile(data_dir+dataset_name + ".zip","r") as zip_ref:
            zip_ref.extractall(path = data_dir)
            print("Successfully extracted raw zip")
    
    
    imputeDataset(dataset_name, max_T=2000, data_dir=data_dir, 
                        stamp='', force_redo=True, device='cpu',
                        returnNans = False)
    torch.cuda.empty_cache()
    
    print("Successfully found and imputed missing pixels")

# Extracting background pixels training data

In order to avoid signal-induced time variation, we use heuristics to attempt we get evenly distributed background pixel samples from throughout the field of view for each dataset

In [3]:
# dataset_name = 'neurofinder.02.00'
# imgsImputed, imgsNans = imputeDataset(dataset_name=dataset_name, max_T=500, data_dir=data_dir, 
#                             stamp='', force_redo=False, device='cpu', returnNans=True)
# float(torch.isnan(imgsNans.contiguous().view(-1,1)).sum())/float(imgsNans.numel())*100

32.16450653076171

In [None]:
stamp_training_data = '_gitsha_' + git_cur_sha + '_rPC_1_origPMgain_useNans'

for dataset_name in all_dataset_names:    
    print(dataset_name)

    # Create training data with given parameters
    trainingData = extractTrainingData(dataset_name, 
                                       max_T=500, 
                                       data_dir=data_dir, 
                                       remove_PCs = 1, 
                                       normalize_Fano = False,
                                       use_imputed_data = False,
                                       stamp=stamp_training_data, 
                                       force_redo='trainingData_only',
                                       device='cpu')

In [None]:
stamp_training_data = '_gitsha_' + git_cur_sha + '_rPC_0_origPMgain_useNans'

for dataset_name in all_dataset_names:    
    print(dataset_name)

    # Create training data with given parameters
    trainingData = extractTrainingData(dataset_name, 
                                       max_T=500, 
                                       data_dir=data_dir, 
                                       remove_PCs = None, 
                                       normalize_Fano = False,
                                       use_imputed_data = False,
                                       stamp=stamp_training_data, 
                                       force_redo='trainingData_only',
                                       device='cpu')

In [None]:
stamp_training_data = '_gitsha_' + git_cur_sha + '_rPC_1_normPMgain_useNans'

for dataset_name in all_dataset_names:    
    print(dataset_name)

    # Create training data with given parameters
    trainingData = extractTrainingData(dataset_name, 
                                       max_T=500, 
                                       data_dir=data_dir, 
                                       remove_PCs = 1, 
                                       normalize_Fano = True,
                                       use_imputed_data = False,
                                       stamp=stamp_training_data, 
                                       force_redo='trainingData_only',
                                       device='cpu')

In [None]:
stamp_training_data = '_gitsha_' + git_cur_sha + '_rPC_0_normPMgain_useNans'

for dataset_name in all_dataset_names:    
    print(dataset_name)

    # Create training data with given parameters
    trainingData = extractTrainingData(dataset_name, 
                                       max_T=500, 
                                       data_dir=data_dir, 
                                       remove_PCs = None, 
                                       normalize_Fano = True,
                                       use_imputed_data = False,
                                       stamp=stamp_training_data, 
                                       force_redo='trainingData_only',
                                       device='cpu')

# Set up and run model fitting

In [None]:
import nbimporter
from examineTrainingData import *

In [None]:
# Loading the appropriate training data sample
stamp_git = '_gitsha_' + '2bd0d720de0995be6b0f1795304839f9877cb6c3'
stamp_training_type = '_rPC_1_origPMgain_useNans'


dataset_name = 'neurofinder.00.00'
trainingData = loadTrainingData(
        dataset_name = dataset_name,
        data_dir = data_dir,
        stamp = stamp_git + stamp_training_type
    )

In [None]:
# Downsample the training data before use
trainingDataUniform = downsampleTrainingData(trainingData, filter_width= 35, targetCoverage=0.01)
stamp_trainingCoverage = '_targetCoverage_01' 

In [None]:
# import importlib
# import preprocExperimentSetup
# importlib.reload(preprocExperimentSetup)
# from preprocExperimentSetup import *

In [None]:
# Train the model

import torch

# Set up all combinations
import itertools


# Priors
all_priors = [
{
    'mean' : gpytorch.means.ZeroMean(),
    'kernel' : preprocKernels.WhiteNoiseKernelBugfix(variances=torch.tensor([10.]))
             },
# {
#     'mean' : gpytorch.means.ConstantMean(),
#     'kernel' : k1
# }
]

# Likelihoods
all_likelihood_classes = [
   #preprocLikelihoods.LinearGainLikelihood,
   #preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')



for likelihood_class, prior_model_base in itertools.product(all_likelihood_classes, all_priors):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    # Clean cuda cache if being used
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    stamp_modelGridType = '_grid_25_5'
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cpu',
               data_dir='/nfs/data/gergo/Neurofinder_update/',
               stamp = stamp_git + stamp_training_type + stamp_trainingCoverage + stamp_modelGridType,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 2,
               model_grid_size = 25,
               model_interp_point_number = 5
                  )

In [None]:
import torch

# Set up all combinations
import itertools


# Priors
all_priors = [
{
    'mean' : gpytorch.means.ZeroMean(),
    'kernel' : preprocKernels.WhiteNoiseKernelBugfix(variances=torch.tensor([10.]))
             },
# {
#     'mean' : gpytorch.means.ConstantMean(),
#     'kernel' : k1
# }
]

# Likelihoods
all_likelihood_classes = [
   preprocLikelihoods.LinearGainLikelihood,
   preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')

cur_stamp = git_cur_sha + '_02_normPMGain_test_05'

for likelihood_class, prior_model_base, dataset_name in itertools.product(all_likelihood_classes, all_priors, all_dataset_names):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    # Create training data
    trainingData = extractTrainingData(dataset_name, max_T=500, 
                                       data_dir='/nfs/data3/gergo/Neurofinder_update/', 
                                       remove_PCs = None, normalize_Fano = True,
                                       stamp=cur_stamp, force_redo='trainingData_only')
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cuda:2',
               data_dir='/nfs/data3/gergo/Neurofinder_update/',
               stamp = cur_stamp,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 2)
    
    
    


In [None]:
import torch

# Set up all combinations
import itertools

# Datasets
all_dataset_names = ['neurofinder.00.00', 'neurofinder.01.00', 'neurofinder.02.00', 'neurofinder.03.00', 'neurofinder.04.00']

# Priors
all_priors = [
{
    'mean' : gpytorch.means.ZeroMean(),
    'kernel' : preprocKernels.WhiteNoiseKernelBugfix(variances=torch.tensor([10.]))
             },
# {
#     'mean' : gpytorch.means.ConstantMean(),
#     'kernel' : k1
# }
]

# Likelihoods
all_likelihood_classes = [
   #preprocLikelihoods.LinearGainLikelihood,
   preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   #preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')

cur_stamp = '_02_origPMGain_test_05'

for likelihood_class, prior_model_base, dataset_name in itertools.product(all_likelihood_classes, all_priors, all_dataset_names):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    # Create training data
    trainingData = extractTrainingData(dataset_name, max_T=500, 
                                       data_dir='/nfs/data3/gergo/Neurofinder_update/', 
                                       remove_PCs = None, normalize_Fano = False,
                                       stamp=cur_stamp, force_redo='trainingData_only')
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cuda:2',
               data_dir='/nfs/data3/gergo/Neurofinder_update/',
               stamp = cur_stamp,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 2)
    
    
    


# Design complex covariance function as prior

In [None]:
from preprocExperimentSetup import *
# File system management
import os
import errno
import zipfile


import torch
import math
import gpytorch

import preprocUtils
import preprocRandomVariables
import preprocLikelihoods
import preprocModels
import preprocKernels

In [None]:
#Additive covariance

# ---------------------------------------------------------
# Linearly symmetrised kernel

a1=gpytorch.kernels.RBFKernel()
a2=gpytorch.kernels.RBFKernel()
# a1=preprocKernels.MexicanHatKernel()
# a2=preprocKernels.MexicanHatKernel()
a1.register_parameter('log_lengthscale', preprocUtils.toTorchParam(7.0),
                                prior = gpytorch.priors.SmoothedBoxPrior(4.5, 8., sigma=0.1))
del a1.active_dims
a1.register_buffer('active_dims', torch.tensor([0], dtype=torch.long))
a2.register_parameter('log_lengthscale', preprocUtils.toTorchParam(7.0),
                                prior = gpytorch.priors.SmoothedBoxPrior(4.5, 8., sigma=0.1))
del a2.active_dims
a2.register_buffer('active_dims', torch.tensor([1], dtype=torch.long))

linsymm_kernel_scanmirrors = preprocKernels.SymmetriseKernelLinearly(
            a1+a2, 
            center=torch.tensor([0.5, 0.5]) 
        )
linsymm_kernel_scanmirrors.center.requires_grad = False


# ---------------------------------------------------------
# Radially symmetrised kernel

b1 = preprocKernels.MexicanHatKernel()
b1.register_parameter('log_lengthscale', preprocUtils.toTorchParam(6.0),
                                prior = gpytorch.priors.SmoothedBoxPrior(4.5, 8., sigma=0.1))
radsymm_kernel_objective = preprocKernels.SymmetriseKernelRadially(
            b1, 
            center=torch.tensor([0.5, 0.5]) 
        )
radsymm_kernel_objective.center.requires_grad = False


# ---------------------------------------------------------
# Short lengthscale aberrations kernel

base_kernel_aberrations = gpytorch.kernels.RBFKernel()
base_kernel_aberrations.register_parameter('log_lengthscale', preprocUtils.toTorchParam(2.0),
                                prior = gpytorch.priors.SmoothedBoxPrior(1., 3.5, sigma=0.1))



# ---------------------------------------------------------
# ---------------------------------------------------------
# Creating the additive kernel function

expert_covariance_function = preprocKernels.ScaleKernel(
    (
        preprocKernels.ScaleKernel(
            (
                preprocKernels.ScaleKernel(
                    linsymm_kernel_scanmirrors,
                    1., fix_scale=False     
                ) + (

                preprocKernels.ScaleKernel(
                    radsymm_kernel_objective,
                    1., fix_scale=False      
                )) 
            ),
            1./math.exp(1.), fix_scale = True
            
        ) + (
            
            
        preprocKernels.ScaleKernel(
                base_kernel_aberrations,
                1., fix_scale=True      
        )
        )
    ),
    1./math.exp(1.), fix_scale=False
)


str(expert_covariance_function)


In [None]:
import torch

# Set up all combinations
import itertools

# Datasets
all_dataset_names = ['neurofinder.00.00', 'neurofinder.01.00', 'neurofinder.02.00', 'neurofinder.03.00', 'neurofinder.04.00']

# Priors
all_priors = [
{
    'mean' : gpytorch.means.ConstantMean(),
    'kernel' : expert_covariance_function
}
]

# Likelihoods
all_likelihood_classes = [
   preprocLikelihoods.LinearGainLikelihood,
   preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   #preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')

cur_stamp = '_05_origPMGain_test_05_finegrid_50_3interp'

for likelihood_class, prior_model_base, dataset_name in itertools.product(all_likelihood_classes, all_priors, all_dataset_names):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    # Create training data
    trainingData = extractTrainingData(dataset_name, max_T=500, 
                                       data_dir='/nfs/data3/gergo/Neurofinder_update/', 
                                       remove_PCs = None, normalize_Fano = False,
                                       stamp=cur_stamp, force_redo='trainingData_only')
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cuda:2',
               data_dir='/nfs/data3/gergo/Neurofinder_update/',
               stamp = cur_stamp,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 1)
    
    
    


In [None]:
import torch

# Set up all combinations
import itertools

# Datasets
all_dataset_names = ['neurofinder.00.00', 'neurofinder.01.00', 'neurofinder.02.00', 'neurofinder.03.00', 'neurofinder.04.00']

# Priors
all_priors = [
{
    'mean' : gpytorch.means.ConstantMean(),
    'kernel' : expert_covariance_function
}
]

# Likelihoods
all_likelihood_classes = [
   #preprocLikelihoods.LinearGainLikelihood,
   preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   #preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')

cur_stamp = '_02_normPMGain_test_05'

for likelihood_class, prior_model_base, dataset_name in itertools.product(all_likelihood_classes, all_priors, all_dataset_names):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    # Create training data
    trainingData = extractTrainingData(dataset_name, max_T=500, 
                                       data_dir='/nfs/data3/gergo/Neurofinder_update/', 
                                       remove_PCs = None, normalize_Fano = True,
                                       stamp=cur_stamp, force_redo='trainingData_only')
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cuda:1',
               data_dir='/nfs/data3/gergo/Neurofinder_update/',
               stamp = cur_stamp,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 1)
    
    
    


In [None]:
import torch

# Set up all combinations
import itertools

# Datasets
all_dataset_names = ['neurofinder.00.00', 'neurofinder.01.00', 'neurofinder.02.00', 'neurofinder.03.00', 'neurofinder.04.00']

# Priors
all_priors = [
{
    'mean' : gpytorch.means.ConstantMean(),
    'kernel' : expert_covariance_function
}
]

# Likelihoods
all_likelihood_classes = [
   #preprocLikelihoods.LinearGainLikelihood,
   #preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')

cur_stamp = '_02_origPMGain_test_05'

for likelihood_class, prior_model_base, dataset_name in itertools.product(all_likelihood_classes, all_priors, all_dataset_names):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    # Create training data
    trainingData = extractTrainingData(dataset_name, max_T=500, 
                                       data_dir='/nfs/data3/gergo/Neurofinder_update/', 
                                       remove_PCs = None, normalize_Fano = False,
                                       stamp=cur_stamp, force_redo='trainingData_only')
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cuda:1',
               data_dir='/nfs/data3/gergo/Neurofinder_update/',
               stamp = cur_stamp,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 1)
    
    
    


In [None]:
import torch

# Set up all combinations
import itertools

# Datasets
all_dataset_names = ['neurofinder.00.00', 'neurofinder.01.00', 'neurofinder.02.00', 'neurofinder.03.00', 'neurofinder.04.00']

# Priors
all_priors = [
{
    'mean' : gpytorch.means.ConstantMean(),
    'kernel' : expert_covariance_function
}
]

# Likelihoods
all_likelihood_classes = [
   #preprocLikelihoods.LinearGainLikelihood,
   #preprocLikelihoods.PoissonInputPhotomultiplierLikelihood,
   preprocLikelihoods.PoissonInputUnderamplifiedPhotomultiplierLikelihood
]

#cur_stamp = '_'+datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%dT%H%M%S')

cur_stamp = '_02_normPMGain_test_05'

for likelihood_class, prior_model_base, dataset_name in itertools.product(all_likelihood_classes, all_priors, all_dataset_names):
    
    print(dataset_name, likelihood_class, prior_model_base)
    
    # Create training data
    trainingData = extractTrainingData(dataset_name, max_T=500, 
                                       data_dir='/nfs/data3/gergo/Neurofinder_update/', 
                                       remove_PCs = None, normalize_Fano = True,
                                       stamp=cur_stamp, force_redo='trainingData_only')
    
    # Set up current prior and likelihood
    prior_model = {
        'mean' : copy.deepcopy(prior_model_base['mean']),
        'kernel' : copy.deepcopy(prior_model_base['kernel'])
    }
    
    likelihood_model = likelihood_class()
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    # Run the training
    mll=trainModel(dataset_name, trainingData, prior_model, likelihood_model, device='cuda:1',
               data_dir='/nfs/data3/gergo/Neurofinder_update/',
               stamp = cur_stamp,
               n_iter = 30, x_batchsize=2**13, y_batchsize = 200, manual_seed=2713,
               verbose = 1)
    
    
    
