Status: my new code (which eliminates some of the FFTs) takes 16 seconds instead of 63, on my laptop - for a 2x2 tiled-up input [I did this to give more reproducible timings with a larger work size].

Without exploiting symmetries it takes 23 seconds. I had to write C code to speed up the generation of the reflected/transposed FFT matrices - without that it was barely any faster than just computing the rfftn for all aa,bb.

Now, 2/3rds of the time is spent in fft2, so that is the bottleneck. If I insist on a square array then I could halve that, but that would be a limitation. I could, I suppose, make it a *recommendation* that allows the code to run faster. I could certainly take advantage of that for my own work.

For the original dataset size, on my mac pro, my code takes 6.5s (single-threaded) to back-project plane 4, compared to Matlab's 2.2s (multithreaded). 

On my mac pro, full backprojection takes 96s single-threaded or 13s parallel(!), of which 1s is on the merging at the end (which I should be able to speed up too). Matlab with multithreading took 28s, so I have achieved a >2x speedup. More would have been nice, but it's still fairly respectable.


NOTE: my c code can't cope with an array that has been transposed (Probably because it assumes adjacent strides in x?). I should probably fix that, though I doubt it's a performance issue to just .copy() the transposed array, which is what I do at the moment. I should really be swapping the transpose to be the final operation (in the case of square inputs) anyway. However, it looks as if a decent chunk of the fft time is actually being spent in the other ffts (for the reduced arrays) anyway!

### Performance investigation

Actual thread execution time seems to grow considerably with the number of threads, i.e. efficiency falls. I am not sure how to try and work out what the cause of that is. I could go back to working on dummy data (no transfers between processes) and see if that makes a difference to *that* in particular. (I think I may have looked only at the dead time overheads - which are also an issue).
I looked at user and system cpu time, and with Instruments. Looks like 20% of time is spent in madvise (macbook, 2 threads). I am not sure exactly why or where that is happening. It seems to be related to python memory management in some way. I should check if that grows with number of threads on mac pro, and if it is the same when I use dummy work blocks rather than passing to subprocesses

-> revisit this now I am using mmap rather than pickle - hopefully much of this is now fixed.

### Performance improvements to make

Move transpose to final operation (since it's probably faster than reversing an array - although it may impact subsequent fft performance?), in the case of square arrays

fft2 returns a double array (on macbook, at least) for float input. I would much prefer it to return complex64. I could well believe it might be a performance hit to do it this way (larger memory footprint). Can I improve on this? I suppose I could call through to c code that calls fftw, for example

Code now supports a third dimension for the camera images (and object z plane), so that we can implement PIV. At the moment it just iterates - performance should be improved by only calculating FT(H) once.


In [None]:
import numpy as np
import numexpr as ne
import scipy.ndimage, scipy.optimize, scipy.io
from scipy.ndimage.filters import convolve
from scipy.signal import convolve2d, fftconvolve
from scipy.optimize import Bounds
import os, sys, time, warnings
import matplotlib.pyplot as plt
%matplotlib inline
import tifffile
import h5py
import multiprocessing
from functools import partial
from joblib import Parallel, delayed
import cProfile, pstats
import glob, csv
from tqdm import tqdm_notebook as tqdm
from numba import jit
sys.path.insert(0, 'py_symmetry')
import py_symmetry as jps
from skimage.transform import PiecewiseAffineTransform, warp

from __future__ import print_function

# I don't know if these are necessary, but it has been suggested that low-level threading
# does not interact well with the joblib Parallel feature.
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_DYNAMIC'] = 'FALSE'

try:
    os.mkdir('perf_diags')
except:
    pass  # Probably the directory already exists

In [None]:
matPath = 'PSFmatrix/PSFmatrix_M22.2NA0.5MLPitch125fml3125from-110to110zspacing4Nnum19lambda520n1.33.mat'

if False:
    warnings.warn('WARNING: Switched to faster matrix for testing')
    matPath = 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-26to0zspacing2Nnum15lambda520n1.0.mat'
elif True:
    warnings.warn('WARNING: Switched to faster and closer-spaced matrix for testing')
    matPath = 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-13to0zspacing0.5Nnum15lambda520n1.0.mat'   

In [None]:
mmapPath = os.path.splitext(matPath)[0]
try:
    os.mkdir(mmapPath)
except:
    pass  # Probably the directory already exists

_HPathFormat = mmapPath+'/H{z:02d}.array'
_HtPathFormat = mmapPath+'/Ht{z:02d}.array'
_HReducedShape = []
_HtReducedShape = []
if True:
    # Load the matrices from the .mat file.
    # This is slow since they must be decompressed and are rather large! (9.5GB each, in single-precision FP)
    with h5py.File(matPath, 'r') as f:
        print('Load CAindex')
        sys.stdout.flush()
        _CAindex = f['CAindex'].value.astype('int')
        
        print('Load H')
        sys.stdout.flush()
        _H = f['H'].value.astype('float32')
        Nnum = _H.shape[2]
        aabbRange = int((Nnum+1)/2)
        for cc in tqdm(range(_H.shape[0]), desc='memmap H'):
            HCC =  _H[cc, :aabbRange, :aabbRange, _CAindex[0,cc]-1:_CAindex[1,cc], _CAindex[0,cc]-1:_CAindex[1,cc]]
            _HReducedShape.append(HCC.shape)
            a = np.memmap(_HPathFormat.format(z=cc), dtype='float32', mode='w+', shape=HCC.shape)
            a[:,:,:,:] = HCC[:,:,:,:]
            del a
        #del _H        # H is needed for old code
        
        print('Load Ht')
        sys.stdout.flush()
        _Ht = f['Ht'].value.astype('float32')
        for cc in tqdm(range(_Ht.shape[0]), desc='memmap Ht'):
            HtCC =  _Ht[cc, :aabbRange, :aabbRange, _CAindex[0,cc]-1:_CAindex[1,cc], _CAindex[0,cc]-1:_CAindex[1,cc]]
            _HtReducedShape.append(HtCC.shape)
            a = np.memmap(_HtPathFormat.format(z=cc), dtype='float32', mode='w+', shape=HtCC.shape)
            a[:,:,:,:] = HtCC[:,:,:,:]
            del a
        #del _Ht        # Ht is needed for old code

In [None]:
class HMatrix:
    def __init__(self, HPathFormat, HtPathFormat, HReducedShape, numZ=None, zStart=0):
        self.HPathFormat = HPathFormat
        self.HtPathFormat = HtPathFormat
        self.HReducedShape = HReducedShape   # Same for Ht
        if numZ is not None:
            self.numZ = numZ
        else:
            self.numZ = len(HReducedShape)
        self.zStart = zStart
        
    def Hcc(self, cc, transpose):
        if transpose:
            pathFormat = self.HtPathFormat
        else:
            pathFormat = self.HPathFormat
        result = np.memmap(pathFormat.format(z=cc+self.zStart), dtype='float32', mode='r', shape=self.HReducedShape[cc+self.zStart])
        return result
    
    def IterableBRange(self, cc):
        return range(self.HReducedShape[cc+self.zStart][0])
    
    def PSFShape(self, cc):
        return (self.HReducedShape[cc+self.zStart][2], self.HReducedShape[cc+self.zStart][3])
        
    def Nnum(self, cc):
        return self.HReducedShape[cc+self.zStart][0]*2-1

In [None]:
# Load the input image
LFmovie = tifffile.imread('Data/02_Rectified/exampleData/20131219WORM2_small_full_neg_X1_N15_cropped_uncompressed.tif')
LFmovie = LFmovie.transpose()[np.newaxis,:,:]

LFIMG = LFmovie[0].astype('float32')
if True:
    # Actual (cropped) image loaded from disk
    inputImage = LFIMG
else:
    inputImage = np.tile(LFIMG,(2,2))

## Objects stored in the .mat file

### Optical parameters from GUI: [? means I am not sure if or where it is stored]

M<br>
NA<br>
d    "fml" in GUI (stored here in units of m)<br>
pixelPitch is "ML pitch" / "Nnum" (stored here in units of m)<br>
? n<br>
? wavelength<br>

### User parameters from GUI:

OSR<br>
zspacing<br>
? z-min<br>
? z-max<br>
Nnum<br>


### Misc parameter:

fobj (can presumably be deduced from mag, NA etc?)<br>

### The actual arrays:

H:             shape (56, 19, 19, 343, 343), type "f4"<br>
Ht:            shape (56, 19, 19, 343, 343), type "f4"<br>

### Information about object space:

x1objspace:    x pixel positions in object space (19 elements across one lenslet)<br>
x2objspace:    y pixel positions in object space (19 elements across one lenslet)<br>
x3objspace:    z pixel positions in object space (56 z planes)<br>
x1space:       x pixel positions in lenslet space (19 elements across one lenslet)<br>
x2space:       y pixel positions in lenslet space (19 elements across one lenslet)<br>

### Not sure what these are exactly:

CAindex:       shape (2, 56) - something about the start and end index of the PSF array, for each z plane.<br>
CP:            shape (343, 1)<br>
MLARRAY:       shape (1141, 1141), type "|V16"<br>
objspace:      shape (56, 1, 1)<br>
settingPSF:    You would think this contains the GUI parameters, but e.g. print(f['settingPSF']['M'].value) gives a strange 3x1 array [50, 50, 46, 50] etc...?<br>


In [None]:
# Note: I am a little unsure how to interpret the arrays I have loaded from the .mat.
# From looking at how H and CAindex are accessed, it looks as if the shapes I have loaded
# are the reversal of the shape ordering as expected in Matlab.
# I suppose that makes sense given that matlab is column-major in its array accesses.
# The data has been loaded from disk in the order it is *stored*,
# and I therefore need to flip around all the matlab array index ordering 
# (e.g. matlabArray(1,2,3) becomes pythonArray[3,2,1])

In [None]:
import resource

def noProgressBar(work, **kwargs):
    # Dummy function to be used in place of tqdm when we don't want to show a progress bar
    return work    

def cpuTime(kind):
    rus = resource.getrusage(resource.RUSAGE_SELF)    
    ruc = resource.getrusage(resource.RUSAGE_CHILDREN)
    if (kind == 'self'):
        return np.array([rus.ru_utime, rus.ru_stime])
    elif (kind == 'children'):
        return np.array([ruc.ru_utime, ruc.ru_stime])
    else:
        return np.array([rus.ru_utime+ruc.ru_utime, rus.ru_stime+ruc.ru_stime])

In [None]:
from scipy._lib._version import NumpyVersion
from numpy.fft import fft, fftn, rfft, rfftn, irfftn
_rfft_mt_safe = (NumpyVersion(np.__version__) >= '1.9.0.dev-e24486e')

def _next_regular(target):
    """
    Find the next regular number greater than or equal to target.
    Regular numbers are composites of the prime factors 2, 3, and 5.
    Also known as 5-smooth numbers or Hamming numbers, these are the optimal
    size for inputs to FFTPACK.

    Target must be a positive integer.
    """
    if target <= 6:
        return target

    # Quickly check if it's already a power of 2
    if not (target & (target-1)):
        return target

    match = float('inf')  # Anything found will be smaller
    p5 = 1
    while p5 < target:
        p35 = p5
        while p35 < target:
            # Ceiling integer division, avoiding conversion to float
            # (quotient = ceil(target / p35))
            quotient = -(-target // p35)

            # Quickly find next power of 2 >= quotient
            try:
                p2 = 2**((quotient - 1).bit_length())
            except AttributeError:
                # Fallback for Python <2.7
                p2 = 2**(len(bin(quotient - 1)) - 2)

            N = p2 * p35
            if N == target:
                return N
            elif N < match:
                match = N
            p35 *= 3
            if p35 == target:
                return p35
        if p35 < match:
            match = p35
        p5 *= 5
        if p5 == target:
            return p5
    if p5 < match:
        match = p5
    return match

def _centered(arr, newsize):
    # Return the center newsize portion of the array.
    currsize = np.array(arr.shape)
    newsize = np.asarray(newsize)
    if (len(currsize) > len(newsize)):
        newsize = np.append([currsize[0]], newsize)
    startind = (currsize - newsize) // 2
    endind = startind + newsize
    myslice = [slice(startind[k], endind[k]) for k in range(len(endind))]
    return arr[tuple(myslice)]

def tempMul(bb,fshape,result):
    result *= np.exp(-1j * bb * 2*np.pi / fshape[0] * np.arange(result.shape[0],dtype='complex64'))[:,np.newaxis]
    return result

def expand2(result, bb, aa, Nnum, fshape):
    return np.tile(result, (Nnum,1))

def expand(reducedF, bb, aa, Nnum, fshape):
    result = np.tile(reducedF, (1,int(Nnum/2+1)))
    result = result[:,:int(fshape[1]/2+1)]
    result *= np.exp(-1j * aa * 2*np.pi / fshape[1] * np.arange(result.shape[1],dtype='complex64'))
    result = expand2(result, bb, aa, Nnum, fshape)
    return tempMul(bb,fshape,result)


def special_rfftn(in1, bb, aa, Nnum, fshape):
    # Compute the fft of elements in1[bb::Nnum,aa::Nnum], after in1 has been zero-padded out to fshape
    # We exploit the fact that fft(masked-in1) is fft(arr[::Nnum,::Nnum]) replicated Nnum times.
    reducedShape = ()
    for d in fshape:
        assert((d % Nnum) == 0)
        reducedShape = reducedShape + (int(d/Nnum),)
        
    assert(in1.ndim == 2)
    reduced = in1[bb::Nnum,aa::Nnum]

    # Compute an array giving rfft(mask(in1))
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        reducedF = scipy.fftpack.fft2(reduced, reducedShape).astype('complex64')
    return expand(reducedF, bb, aa, Nnum, fshape)

def convolutionShape(in1, in2, Nnum):
    # Logic copied from fftconvolve source code
    s1 = np.array(in1.shape)
    s2 = np.array(in2.shape)
    if (len(s1) == 3):   # Cope with case where we are processing multiple reconstructions in parallel
        s1 = s1[1:]
    shape = s1 + s2 - 1
    if False:
        # TODO: I haven't worked out if/how I can do this yet.
        # This is the original code in fftconvolve, which says:
        # Speed up FFT by padding to optimal size for FFTPACK
        fshape = [_next_regular(int(d)) for d in shape]
    else:
        fshape = [int(np.ceil(d/float(Nnum)))*Nnum for d in shape]
    fslice = tuple([slice(0, int(sz)) for sz in shape])
    return (fshape, fslice, s1)
    
def special_fftconvolve_part1(in1, bb, aa, Nnum, in2):
    assert(len(in1.shape) == 2)
    assert(len(in2.shape) == 2)
    (fshape, fslice, s1) = convolutionShape(in1, in2, Nnum)
    # Pre-1.9 NumPy FFT routines are not threadsafe - this code requires numpy 1.9 or greater
    assert(_rfft_mt_safe)
    fa = special_rfftn(in1, bb, aa, Nnum, fshape)
    return (fa, fshape, fslice, s1)

def special_fftconvolve_part3b(fab, fshape, fslice, s1):
    assert(len(fab.shape) == 2)
    ret = irfftn(fab, fshape)[fslice].copy()
    return _centered(ret, s1)

def special_fftconvolve_part3(fab, fshape, fslice, s1):
    if (len(fab.shape) == 2):
        return special_fftconvolve_part3b(fab, fshape, fslice, s1)
    else:
        results = []
        for n in range(fab.shape[0]):
            results.append(special_fftconvolve_part3(fab[n], fshape, fslice, s1))
        return np.array(results)

def special_fftconvolve(in1, bb, aa, Nnum, in2, accum, fb=None):
    '''
    in1 consists of subapertures of size Nnum x Nnum pixels.
    We are being asked to convolve only pixel (bb,aa) within each subaperture, i.e.
        tempSlice = np.zeros(in1.shape, dtype=in1.dtype)
        tempSlice[bb::Nnum, aa::Nnum] = in1[bb::Nnum, aa::Nnum]
    This allows us to take a significant shortcut in computing the FFT for in1.
    '''
    (fa, fshape, fslice, s1) = special_fftconvolve_part1(in1, bb, aa, Nnum, in2)
    if fb is None:
        fb = rfftn(in2, fshape)
    if accum is None:
        accum = fa*fb
    else:
        accum += fa*fb
    return (accum, fshape, fslice, s1)

In [None]:
def forwardProjectForZ_old(HCC, realspaceCC):
    singleJob = (len(realspaceCC.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspaceCC = realspaceCC[np.newaxis,:,:]
    # Iterate over each lenslet pixel
    Nnum = HCC.shape[1]
    TOTALprojection = np.zeros(realspaceCC.shape, dtype='float32')
    for bb in tqdm(range(Nnum), leave=False, desc='Forward-project - y'):
        for aa in tqdm(range(Nnum), leave=False, desc='Forward-project - x'):
            # Extract the part of H that represents this lenslet pixel
            Hs = HCC[bb, aa]
            for n in range(realspaceCC.shape[0]):
                # Create a workspace representing just the voxels cc,bb,aa behind each lenslet (the rest is 0)
                tempspace = np.zeros((realspaceCC[n].shape[0], realspaceCC[n].shape[1]), dtype='float32');
                tempspace[bb::Nnum, aa::Nnum] = realspaceCC[n, bb::Nnum, aa::Nnum]  # ???? what to do about index ordering?
                # Compute how those voxels project onto the sensor, and accumulate
                TOTALprojection[n] += fftconvolve(tempspace, Hs, 'same')
    if singleJob:
        return TOTALprojection[0]
    else:
        return TOTALprojection
    
def backwardProjectForZ_old(HtCC, projection):
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    # Iterate over each lenslet pixel
    Nnum = HtCC.shape[1]
    tempSliceBack = np.zeros(projection.shape, dtype='float32')        
    for aa in tqdm(range(Nnum), leave=False, desc='y'):
        for bb in range(Nnum):
            # Extract the part of Ht that represents this lenslet pixel
            Hts = HtCC[bb, aa]
            for n in range(projection.shape[0]):
                # Create a workspace representing just the voxels cc,bb,aa behind each lenslet (the rest is 0)
                tempSlice = np.zeros(projection[n].shape, dtype='float32')
                tempSlice[bb::Nnum, aa::Nnum] = projection[n, bb::Nnum, aa::Nnum]
                # Compute how those voxels back-project from the sensor
                tempSliceBack[n] += fftconvolve(tempSlice, Hts, 'same')
    if singleJob:
        return tempSliceBack[0]
    else:
        return tempSliceBack

def backwardProjectACC_original(Ht, projection, CAindex, planes=None):
    Backprojection = np.zeros((Ht.shape[0], projection.shape[0], projection.shape[1]), dtype='float32')
    # Iterate over each z plane
    if planes is None:
        planes = range(Ht.shape[0])
    for cc in tqdm(planes, desc='Back-project - z'):
        HtCC =  Ht[cc, :, :, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
        Backprojection[cc] = backwardProjectForZ_old(HtCC, projection)
    return Backprojection

In [None]:
def deconvRL(hMatrix, Htf, maxIter, Xguess, logPrint=True):
    # Note:
    #  Htf is the *initial* backprojection of the camera image
    #  Xguess is the initial guess for the object
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        HXguess = forwardProjectACC(hMatrix, Xguess, logPrint=logPrint)
        HXguessBack = backwardProjectACC(hMatrix, HXguess, logPrint=logPrint)
        errorBack = Htf / HXguessBack
        Xguess = Xguess * errorBack
        Xguess[np.where(np.isnan(Xguess))] = 0
        ttime = time.time() - t0
        print('iter %d | %d, took %.1f secs. Max val %f' % (i+1, maxIter, ttime, np.max(Xguess)))
    return Xguess

In [None]:
# Note: H.shape in python is (<num z planes>, Nnum, Nnum, <psf size>, <psf size>),
#                       e.g. (56, 19, 19, 343, 343)

class Projector(object):
    # Note: the variable names in this class mostly imply we are doing the back-projection
    # (e.g. Ht, 'projection', etc. However, the same code also does forward-projection!)
    def __init__(self, projection, HtCCBB, Nnum):
        # Note: H and Hts are not stored as class variables.
        # I had a lot of trouble with them and multithreading,
        # and eventually settled on having them in shared memory.
        # As I encapsulate more stuff in this class, I could bring them back as class variables...

        self.cpuTime = np.zeros(2)
        
        # Nnum: number of pixels across a lenslet array (after rectification)
        self.Nnum = Nnum
        
        # This next chunk of logic copied from fftconvolve source code.
        # s1, s2: shapes of the input arrays
        # fshape: shape of the (full, possibly padded) result array in Fourier space
        # fslice: slicing tuple specifying the actual result size that should be returned
        self.s1 = np.array(projection.shape)
        self.s2 = np.array(HtCCBB[0].shape)
        shape = self.s1 + self.s2 - 1
        if False:
            # TODO: I haven't worked out if/how I can do this yet.
            # This is the original code in fftconvolve, which says:
            # Speed up FFT by padding to optimal size for FFTPACK
            self.fshape = [_next_regular(int(d)) for d in shape]
        else:
            self.fshape = [int(np.ceil(d/float(Nnum)))*Nnum for d in shape]
        self.fslice = tuple([slice(0, int(sz)) for sz in shape])
        
        # rfslice: slicing tuple to crop down full fft array to the shape that would be output from rfftn
        self.rfslice = (slice(0,self.fshape[0]), slice(0,int(self.fshape[1]/2)+1))
        return
    
    def MirrorXArray(self, Hts, fHtsFull):
        padLength = self.fshape[0] - Hts.shape[0]
        if False:
            fHtsFull = fHtsFull.conj() * np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[0]) * np.arange(self.fshape[0],dtype='complex64')[:,np.newaxis])
            fHtsFull[:,1::] = fHtsFull[:,1::][:,::-1]
            return fHtsFull
        else:
            temp = np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[0]) * np.arange(self.fshape[0])).astype('complex64')
            if True:
                result = jps.mirrorX(fHtsFull, temp)
            else:
                result = np.empty(fHtsFull.shape, dtype=fHtsFull.dtype)
                result[:,0] = fHtsFull[:,0].conj()*temp
                for i in range(1,fHtsFull.shape[1]):
                    result[:,i] = (fHtsFull[:,fHtsFull.shape[1]-i].conj()*temp)
            return result

    def MirrorYArray(self, Hts, fHtsFull):
        padLength = self.fshape[1] - Hts.shape[1]
        if False:
            fHtsFull = fHtsFull.conj() * np.exp(1j * (1+padLength) * 2*np.pi / self.fshape[1] * np.arange(self.fshape[1],dtype='complex64'))
            fHtsFull[1::] = fHtsFull[1::][::-1]
            return fHtsFull
        else:
            temp = np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[1]) * np.arange(self.fshape[1])).astype('complex64')
            if True:
                result = jps.mirrorY(fHtsFull, temp)
            else:
                result = np.empty(fHtsFull.shape, dtype=fHtsFull.dtype)
                result[0] = fHtsFull[0].conj()*temp
                for i in range(1,fHtsFull.shape[0]):
                    result[i] = (fHtsFull[fHtsFull.shape[0]-i].conj()*temp)
            return result
        
    def convolvePart3(self, projection, bb, aa, Hts, fHtsFull, mirrorX, accum):
        # TODO: to make this work, I need the full matrix for fHts and then I need to slice it 
        # to the correct shape when I call through to special_fftconvolve here. Is fshape what I need?
        cpu0 = cpuTime('both')
        (accum,_,_,_) = special_fftconvolve(projection,bb,aa,self.Nnum,Hts,accum,fb=fHtsFull[self.rfslice])
        self.cpuTime += cpuTime('both')-cpu0
        if mirrorX:
            fHtsFull = self.MirrorXArray(Hts, fHtsFull)
            cpu0 = cpuTime('both')
            (accum,_,_,_) = special_fftconvolve(projection,self.Nnum-bb-1,aa,self.Nnum,Hts[::-1,:],accum,fb=fHtsFull[self.rfslice]) 
            self.cpuTime += cpuTime('both')-cpu0
        return accum

    def convolvePart2(self, projection, bb, aa, Hts, fHtsFull, mirrorY, mirrorX, accum):
        accum = self.convolvePart3(projection,bb,aa,Hts,fHtsFull,mirrorX,accum)
        if mirrorY:
            fHtsFull = self.MirrorYArray(Hts, fHtsFull)
            accum = self.convolvePart3(projection,bb,self.Nnum-aa-1,Hts[:,::-1],fHtsFull,mirrorX,accum)
        return accum

    def fft2(self, mat, shape):
        # Perform a 'float' FFT on the matrix we are passed.
        # It would probably be faster if there was a way to perform the FFT natively on the 'float' type,
        # but scipy does not seem to support that option
        #
        # With my Mac Pro install, we hit a FutureWarning within scipy.
        # This wrapper just suppresses that warning.
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            return scipy.fftpack.fft2(mat, shape).astype('complex64')       
        
    def convolve(self, projection, bb, aa, Hts, accum):
        cent = int(self.Nnum/2)

        mirrorX = (bb != cent)
        mirrorY = (aa != cent)
        transpose = ((aa != bb) and (aa != (self.Nnum-bb-1)))
            
        # TODO: it would speed things up if I could avoid computing the full fft for Hts.
        # However, it's not immediately clear to me how to fill out the full fftn array from rfftn
        # in the case of a 2D transform.
        # For 1D it's the reversed conjugate, but for 2D it's more complicated than that.
        # It's possible that it's actually nontrivial, in spite of the fact that
        # you can get away without it when only computing fft/ifft for real arrays)
        fHtsFull = self.fft2(Hts, self.fshape)
        accum = self.convolvePart2(projection,bb,aa,Hts,fHtsFull,mirrorY,mirrorX, accum)
        if transpose:
            if (self.fshape[0] == self.fshape[1]):
                # For a square array, the FFT of the transpose is just the transpose of the FFT.
                # The copy() is because my C code currently can't cope with
                # a transposed array (non-contiguous strides in x)
                fHtsFull = fHtsFull.transpose().copy()    
            else:
                # For a non-square array, we have to compute the FFT for the transpose.
                fHtsFull = self.fft2(Hts.transpose(), self.fshape)

            # Note that mx,my need to be swapped following the transpose
            accum = self.convolvePart2(projection,aa,bb,Hts.transpose(),fHtsFull,mirrorX,mirrorY, accum) 
        return accum
    
def _projectForZY(cc, bb, source, hMatrix, backwards, Hccbb=None):
    f = open('perf_diags/%d_%d.txt'%(cc,bb), "w")
    t1 = time.time()
    singleJob = (len(source.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        source = source[np.newaxis,:,:]
    result = [None] * source.shape[0]
    if Hccbb is None:
        Hccbb = hMatrix.Hcc(cc, transpose=backwards)[bb]
        Nnum = hMatrix.Nnum(cc)
    else:
        Nnum = Hccbb.shape[0]
    projector = Projector(source[0], Hccbb, Nnum)
    projector.cpuTime = np.zeros(2)
    for aa in range(bb,int((Nnum+1)/2)):
        for n in range(source.shape[0]):
            result[n] = projector.convolve(source[n], bb, aa, Hccbb[aa], result[n])
    t2 = time.time()
    f.write('%d\t%f\t%f\t%f\t%f\t%f\n' % (os.getpid(), t1, t2, t2-t1, projector.cpuTime[0], projector.cpuTime[1]))
    f.close()
    if singleJob:
        return (result[0], cc, bb, t2-t1)
    else:
        return (np.array(result), cc, bb, t2-t1)
    
def projectForZ(Hcc, cc, source):
    result = None
    Nnum = Hcc.shape[1]
    r = range(int((Nnum+1)/2))
    for bb in tqdm(r, leave=False, desc='Project - y'):
        (thisResult, _, _, _) = _projectForZY(cc, bb, source, None, False, Hcc[bb])
        if (result is None):
            result = thisResult
        else:
            result += thisResult
    # Actually, for forward projection we don't need to do this separately for every z,
    # but it's easier to do it for symmetry (and this function is not used in performance-critical code anyway)
    (fshape, fslice, s1) = convolutionShape(source, Hcc[0,0], Nnum)
    return special_fftconvolve_part3(result, fshape, fslice, s1)

def projectForZ2(hMatrix, backwards, cc, source):
    result = None
    for bb in tqdm(hMatrix.IterableBRange(cc), leave=False, desc='Project - y'):
        (thisResult, _, _, _) = _projectForZY(cc, bb, source, hMatrix, backwards)
        if (result is None):
            result = thisResult
        else:
            result += thisResult
    # Actually, for forward projection we don't need to do this separately for every z,
    # but it's easier to do it for symmetry (and this function is not used in performance-critical code anyway)
    (fshape, fslice, s1) = convolutionShape(source, np.empty(hMatrix.PSFShape(cc)), hMatrix.Nnum(cc))
    return special_fftconvolve_part3(result, fshape, fslice, s1)
    
# Test the backprojection code against a slower definitive version
# (this code is here for now because this is where I have been working on stuff, but it could move)
# TODO: would be a better test if I use the hMatrix form of projectForZ
testHtCC = np.random.random((5,5,30,30)).astype(np.float32)
testHtCC = _Ht[13,int(_Ht.shape[1]/2)-2:int(_Ht.shape[1]/2)+3,int(_Ht.shape[2]/2)-2:int(_Ht.shape[2]/2)+3,_CAindex[0,13]-1:_CAindex[1,13], _CAindex[0,13]-1:_CAindex[1,13]]
for fd in [False, True]:
    for shape in [(200,200), (200,300), (300,200)]:
        # Test both square and non-square, since they use different code
        testProjection = np.random.random(shape).astype(np.float32)
        if fd:
            testResultOld = forwardProjectForZ_old(testHtCC, testProjection)
            testResultNew = projectForZ(testHtCC, 0, testProjection)
        else:
            testResultOld = backwardProjectForZ_old(testHtCC, testProjection)
            testResultNew = projectForZ(testHtCC, 0, testProjection)
        comparison = np.max(np.abs(testResultOld - testResultNew))
        print('test result (should be <<1): %e' % comparison)
        if (comparison > 1e-4):
            print(" -> WARNING: disagreement detected")
        else:
            print(" -> OK")
        
print('Done')

In [None]:
# Slower test that exercises the projection code with the new HMatrix object
if False:
    testHCC = _H[13]
    testHtCC = _Ht[13]
    testHMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape, numZ=1, zStart=13)
    for fd in [False, True]:
        for shape in [(200,200), (200,300), (300,200)]:
            # Test both square and non-square, since they use different code
            testProjection = np.random.random(shape).astype(np.float32)
            if fd:
                testResultOld = forwardProjectForZ_old(testHCC, testProjection)
                testResultNew = projectForZ2(testHMatrix, False, 0, testProjection)
            else:
                testResultOld = backwardProjectForZ_old(testHtCC, testProjection)
                testResultNew = projectForZ2(testHMatrix, True, 0, testProjection)
            comparison = np.max(np.abs(testResultOld - testResultNew))
            print('test result (fd=%d) (should be <<1): %e' % (fd, comparison))
            if (comparison > 1e-4):
                print(" -> WARNING: disagreement detected")
            else:
                print(" -> OK")

    print('Done')

In [None]:
def backwardProjectACC(hMatrix, projection, planes=None, numjobs=multiprocessing.cpu_count(), progress=tqdm, logPrint=True):
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    if planes is None:
        planes = range(hMatrix.numZ)
    if progress is None:
        progress = noProgressBar        

    ru1 = cpuTime('both')

    Backprojection = np.zeros((hMatrix.numZ, projection.shape[0], projection.shape[1], projection.shape[2]), dtype='float32')
        
    # Set up the work to iterate over each z plane
    work = []
    for cc in planes:
        for bb in hMatrix.IterableBRange(cc):
            work.append((cc, bb, projection, hMatrix, True))

    # Run the multithreaded work
    t0 = time.time()
    results = Parallel(n_jobs=numjobs)\
            (delayed(_projectForZY)(*args) for args in progress(work, desc='Back-project - z', leave=False))
    ru2 = cpuTime('both')

    # Gather together and sum the results for each z plane
    t1 = time.time()
    fourierZPlanes = [None]*hMatrix.numZ
    elapsedTime = 0
    for (result, cc, bb, t) in results:
        elapsedTime += t
        if fourierZPlanes[cc] is None:
            fourierZPlanes[cc] = result
        else:
            fourierZPlanes[cc] += result
    
    # Compute the FFT for each z plane
    for cc in planes:
        # A bit complicated here to set up the correct inputs for convolutionShape...
        (fshape, fslice, s1) = convolutionShape(projection, np.empty(hMatrix.PSFShape(cc)), hMatrix.Nnum(cc))
        Backprojection[cc] = special_fftconvolve_part3(fourierZPlanes[cc], fshape, fslice, s1)        
    t2 = time.time()

    # Save some diagnostics
    if logPrint:
        print('work elapsed wallclock time %f'%(t1-t0))
        print('work elapsed thread time %f'%elapsedTime)
        print('work delta rusage:', ru2-ru1)
        print('FFTs took %f'%(t2-t1))
    
    f = open('overall.txt', 'w')
    f.write('%f\t%f\t%f\t%f\t%f\t%f\n' % (t0, t1, t1-t0, t2-t1, (ru2-ru1)[0], (ru2-ru1)[1]))
    f.close()

    if singleJob:
        return Backprojection[:,0]
    else:
        return Backprojection

def forwardProjectACC(hMatrix, realspace, planes=None, numjobs=multiprocessing.cpu_count(), progress=tqdm, logPrint=True):
    singleJob = (len(realspace.shape) == 3)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspace = realspace[:,np.newaxis,:,:]
    if planes is None:
        planes = range(hMatrix.numZ)
    if progress is None:
        progress = noProgressBar        

    # Set up the work to iterate over each z plane
    work = []
    for cc in planes:
        for bb in hMatrix.IterableBRange(cc):
            work.append((cc, bb, realspace[cc], hMatrix, False))

    # Run the multithreaded work
    t0 = time.time()
    results = Parallel(n_jobs=numjobs)\
                (delayed(_projectForZY)(*args) for args in progress(work, desc='Forward-project - z', leave=False))

    # Gather together and sum all the results
    t1 = time.time()
    fourierProjection = [None]*hMatrix.numZ
    elapsedTime = 0
    for (result, cc, bb, t) in results:
        elapsedTime += t
        if fourierProjection[cc] is None:
            fourierProjection[cc] = result
        else:
            fourierProjection[cc] += result

    # Compute and accumulate the FFT for each z plane
    TOTALprojection = None
    for cc in planes:
        # A bit complicated here to set up the correct inputs for convolutionShape...
        (fshape, fslice, s1) = convolutionShape(realspace[cc], np.empty(hMatrix.PSFShape(cc)), hMatrix.Nnum(cc))
        thisProjection = special_fftconvolve_part3(fourierProjection[cc], fshape, fslice, s1)        
        if TOTALprojection is None:
            TOTALprojection = thisProjection
        else:
            TOTALprojection += thisProjection
    t2 = time.time()
            
    # Print out some diagnostics
    if (logPrint):
        print('work elapsed wallclock time %f'%(t1-t0))
        print('work elapsed thread time %f'%elapsedTime)
        print('FFTs took %f'%(t2-t1))
        
    if singleJob:
        return TOTALprojection[0]
    else:
        return TOTALprojection

if False:
    # Temporary call to test parallelization
    temp = backwardProjectACC(hMatrix, inputImage, planes=[0], numjobs=3)
    
if False:
    # Temporary code to test running with an image pair
    # This is maybe not a comprehensive test, but it run with two different (albeit proportional)
    # images and checks that the result matches the result for two totally independent calls on a single array.
    hMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape)
    candidate = np.tile(inputImage[np.newaxis,0,0], (2,1,1))
    candidate[1] *= 1.4
    temp = backwardProjectACC(hMatrix, candidate, planes=None, numjobs=1)
    dualRoundtrip = forwardProjectACC(hMatrix, temp, planes=None)

    temp = backwardProjectACC(hMatrix, candidate[0], planes=None, numjobs=1)
    firstRoundtrip = forwardProjectACC(hMatrix, temp, planes=None, numjobs=1)    
    comparison = np.max(np.abs(firstRoundtrip - dualRoundtrip[0]))
    print('test result (should be <<1): %e' % comparison)
    if (comparison > 1e-6):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")
    
    temp = backwardProjectACC(hMatrix, candidate[1], planes=None, numjobs=1)
    secondRoundtrip = forwardProjectACC(hMatrix, temp, planes=None, numjobs=1)    
    comparison = np.max(np.abs(secondRoundtrip - dualRoundtrip[1]))
    print('test result (should be <<1): %e' % comparison)
    if (comparison > 1e-6):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")

In [None]:
def AnalyzeTestResults():
    with open('overall.txt') as f:
        csv_reader = csv.reader(f, delimiter='\t')
        for row in csv_reader:
            pass
    startTime = float(row[0])
    endTime = float(row[1])
    userTime = float(row[4])
    sysTime = float(row[5])

    rows = []
    for fn in glob.glob('perf_diags/*_*.txt'):
        with open(fn) as f:
            csv_reader = csv.reader(f, delimiter='\t')
            for row in csv_reader:
                pass
            rows.append(row)
    rows = np.array(rows).astype('float').transpose()
    firstPid = np.min(rows[0])
    rows[0] -= firstPid
    rows[1:3] -= startTime
    rows = rows[:,np.argsort(rows[1],kind='mergesort')]
    rows = rows[:,rows[0].argsort(kind='mergesort')]

    deadTimeStart = 0
    deadTimeMid = 0
    deadTimeEnd = 0
    threadWorkTime = 0
    thisThreadStartTime = 0
    longestThreadRunTime = 0
    longestThreadRunPid = -1
    latestStartTime = 0
    userTimeBreakdown = 0
    sysTimeBreakdown = 0
    for i in range(rows.shape[1]):
        pid = rows[0,i]
        t0 = rows[1,i]
        t1 = rows[2,i]
        userTimeBreakdown += rows[4,i]
        sysTimeBreakdown += rows[5,i]
        if (i == 0):
            deadTimeStart += t0
            thisThreadStartTime = t0
            latestStartTime = t0
        else:
            if (pid == rows[0,i-1]):
                deadTimeMid += t0 - rows[2,i-1]
            else:
                latestStartTime = max(latestStartTime, t0)
                thisThreadRunTime = rows[2,i-1]-thisThreadStartTime  # For previous pid
                if (thisThreadRunTime > longestThreadRunTime):
                    longestThreadRunPid = rows[0,i-1]
                    longestThreadRunTime = thisThreadRunTime
                thisThreadStartTime = t0
                deadTimeStart += t0
                deadTimeEnd += (endTime-startTime) - rows[2,i-1]
        threadWorkTime += t1-t0
        plt.plot([t0, t1], [pid, pid])
        plt.plot(t0, pid, 'x')
    thisThreadRunTime = t1-thisThreadStartTime
    if (thisThreadRunTime > longestThreadRunTime):
        longestThreadRunPid = pid
        longestThreadRunTime = thisThreadRunTime
    deadTimeEnd += (endTime-startTime) - rows[2,-1]
    print('Elapsed time', endTime-startTime)
    print('Longest thread run time', longestThreadRunTime, 'pid', int(longestThreadRunPid))
    print('Latest start time', latestStartTime)
    print('Thread work time', threadWorkTime)
    print('Dead time', deadTimeStart, deadTimeMid, deadTimeEnd)
    print(' Total', deadTimeStart + deadTimeMid + deadTimeEnd)
    print('User cpu time', userTime)
    print('System cpu time', sysTime)
    print('User cpu time for subset', userTimeBreakdown)
    print('System cpu time for subset', sysTimeBreakdown)

    with open('stats.txt', 'a') as f:
        f.write('%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\n' % (numJobsForTesting, endTime-startTime, threadWorkTime, \
                        longestThreadRunTime, latestStartTime, deadTimeStart, deadTimeMid, deadTimeEnd, userTime, sysTime))

    plt.xlim(0, endTime-startTime)
    plt.ylim(-0.5,np.max(rows[0])+0.5)
    plt.show()
    
if False:
    for numJobsForTesting in range(1,13):
        ru1 = cpuTime('both')
        temp = backwardProjectACC(Ht, HtPathFormat, HtReducedShape, inputImage, numjobs=numJobsForTesting, planes=None)
        ru2 = cpuTime('both')
        print('overall delta rusage:', ru2-ru1)
        AnalyzeTestResults()

In [None]:
def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield raw

def AnalyzeTestResults2(fn):
    rows = []
    with open(fn) as f:
        csv_reader = csv.reader(decomment(f), delimiter='\t')
        for row in csv_reader:
            rows.append(row)
    rows = np.array(rows).astype(np.float).transpose()

    plt.plot(rows[0], rows[2]/rows[2,0], label='work time')
    plt.plot(rows[0], np.sum(rows[5:8], axis=0)/(rows[0]*rows[1]), label='dead time')
    plt.plot(rows[0], rows[5]/(rows[0]*rows[1]), label='dead start')
    plt.plot(rows[0], rows[1]/(rows[1,0]/rows[0]), label='runtime excess')
    plt.ylim(0,2.5)
    plt.legend(loc=2)
    plt.show()

plt.title('Dummy work on empty arrays')
AnalyzeTestResults2('stats-dummy.txt')
plt.title('Real work')
AnalyzeTestResults2('stats-realwork.txt')
plt.title('Smaller memory footprint - no improvement')
AnalyzeTestResults2('stats-no-H.txt')
plt.title('New code')
AnalyzeTestResults2('stats-new-code.txt')

# Test a single backprojection and compare against definitive version

In [None]:
planesToProcess = None
if False:
    t0 = time.time()
    Htf = backwardProjectACC_original(Ht, inputImage, CAindex, planes=planesToProcess)
    print('Original code took %f'%(time.time()-t0))
elif True:
    # Profile my code (single-threaded) on a cropped version of Prevedel's data
    myStats = cProfile.run('Htf = backwardProjectACC(HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape), inputImage, planes=planesToProcess, numjobs=1)', 'mystats')
    p = pstats.Stats('mystats')
    p.strip_dirs().sort_stats('cumulative').print_stats(40)
else:
    # Profile my code (single-threaded) in the sort of scenario I would expect to run it in for my PIV experiments
    tempInputImage = np.zeros((2,Nnum*20,Nnum*20))
    myStats = cProfile.run('temp = backwardProjectACC(HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape), tempInputImage, planes=planesToProcess, numjobs=1)', 'mystats')
    p = pstats.Stats('mystats')
    p.strip_dirs().sort_stats('cumulative').print_stats(40)

    

In [None]:
# Compare against definitive version generated from Matlab
if planesToProcess is not None:
    print('WARNING: the following test is not valid because not all planes were processed')
definitive = tifffile.imread('Data/03_Reconstructed/exampleData/definitive_worm_crop_X15_backproject.tif')
definitive = np.transpose(definitive, axes=(0,2,1))
comparison = np.max(np.abs(definitive[4] - Htf[4]*10))
print('Compare against matlab result (should be <1.0): %f' % comparison)
if (comparison > 1.0):
    print(" -> WARNING: disagreement detected")
else:
    print(" -> OK")

#tifffile.imsave('Htf_backproject4.tif', np.transpose(Htf*1e2, axes=(0,2,1)))

# Test a full deconvolution and compare against definitive version

In [None]:
Xguess = Htf.copy();
maxIter = 8
deconvolvedResult = deconvRL(HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape), Htf, maxIter, Xguess)

In [None]:
# Compare against definitive version generated from Matlab
definitive = tifffile.imread('Data/03_Reconstructed/exampleData/definitive_worm_crop_X15_iter8.tif')
definitive = np.transpose(definitive, axes=(0,2,1))
comparison = np.max(np.abs(definitive - deconvolvedResult*1e3))
print('Compare against matlab result (should be <1.0): %f' % comparison)
if (comparison > 1.0):
    print(" -> WARNING: disagreement detected")
else:
    print(" -> OK")

#tifffile.imsave('iter8.tif', np.transpose(Xguess*1e3, axes=(0,2,1)))

# Solve for flow field (single-plane toy example)

In [None]:
# Generate two identical images of the same synthetic object,
# which for now consists of a cloud of random gaussian spots
from scipy.ndimage.filters import gaussian_filter
if False:
    numSpots = 100
    imageSize = 240
    sigma = 8
    controlPointSpacing = 30    
elif False:
    numSpots = 400
    imageSize = 120
    sigma = 2
    controlPointSpacing = 30
else:
    numSpots = 1000
    imageSize = 180
    sigma = 2
    controlPointSpacing = 30
syntheticImageExtendSize = 30

syntheticObjectExt = np.zeros((1, imageSize+syntheticImageExtendSize, imageSize))
syntheticObjectExt[0, (np.random.random(numSpots)*syntheticObjectExt.shape[1]).astype('int'), \
                      (np.random.random(numSpots)*syntheticObjectExt.shape[2]).astype('int')] = 1
syntheticObjectExt = gaussian_filter(syntheticObjectExt, sigma=(0,sigma,sigma))
plt.imshow(syntheticObjectExt[0])

In [None]:
# Set up the PSF that we will use

# First check we're using the expected PSF - the plane choices used here are intended to work with this PSF.
assert(matPath == 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-13to0zspacing0.5Nnum15lambda520n1.0.mat')

zPlaneToModel = _H.shape[0]-1   # Modelling native focal plane
zPlaneToModel = 7   # Modelling some way from the native focal plane, which should perform fairly well
zPlaneToModel = _H.shape[0]-3   # Modelling close to native focal plane. This has artefacts - prev one is fairly artefact-free
zPlaneToModel = _H.shape[0]-2


pivHMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape, numZ=1, zStart=zPlaneToModel)

In [None]:
if False:
    shiftType = 'piv'
    source = 'synthetic'
    actualImageExtendSize = syntheticImageExtendSize
    # Allowing an x search range is fairer, but it makes little difference for vertical flow
    xMotionPermitted = False
    xSearchRange = 0
    ySearchRange = 10
else:
    shiftType = 'piv'
    source = 'piv'
    actualImageExtendSize = 0
    xMotionPermitted = True
    xSearchRange = 8
    ySearchRange = 8


def forwardProjectACC_PIV(hMatrix, obj, shiftDescription):
    # Compute the AB images obtained from the single object we are provided with
    # (with the B image being of the object shifted by shiftYX).
    # We give each image half the intensity in order to conserve energy.
    dualObject = np.tile(obj[:,np.newaxis,:,:] / 2.0, (1,2,1,1))
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
    return forwardProjectACC(hMatrix, dualObject, logPrint=False, progress=None)

def dualBackwardProjectACC_PIV(hMatrix, dualProjection, shiftDescription):
    # Compute the reverse transform given the AB images (B image shifted by shiftYX).
    # First we do the reverse transformation on both images
    dualObject = backwardProjectACC(hMatrix, dualProjection, logPrint=False, progress=None)
    # Now we reverse the shift on the B object
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], -shiftDescription)
    # Now, ideally the objects would match, but of course in practice there will be discrepancies,
    # especially if we are not using the correct shiftDescription.
    # To make the operation match the transpose of the forward operation,
    # we add the two objects and divide by 2 here
    return dualObject

def fusedBackwardProjectACC_PIV(hMatrix, dualProjection, shiftDescription):
    dualObject = dualBackwardProjectACC_PIV(hMatrix, dualProjection, shiftDescription)
    result = np.sum(dualObject, axis=1) / 2.0     # Merge the two backprojection
    return result

def deconvRL_PIV_OLD(hMatrix, imageAB, maxIter, Xguess, shiftDescription):
    # I believed this to be the RL algorithm in the way I have written it in the past.
    # However, this gives different results to Prevedel's implementation
    # (mine seems to converge more slowly).
    # TODO: I should look into this and see if I've just made a mistake or if they are actually different.
    
    # Xguess is our single combined guess of the object
    Xguess = Xguess.copy()    # Because we will be updating it, and caller may not always be expecting that
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        relativeBlurDual = imageAB / forwardProjectACC_PIV(hMatrix, Xguess, shiftDescription)
        Xguess *= fusedBackwardProjectACC_PIV(hMatrix, relativeBlurDual, shiftDescription)
        Xguess[np.where(np.isnan(Xguess))] = 0
        t1 = time.time() - t0
    return Xguess

def deconvRL_PIV(hMatrix, imageAB, maxIter, shiftDescription):
    # Note:
    #  Htf is the *initial* backprojection of the camera image
    #  Xguess is the initial guess for the object
    Htf = fusedBackwardProjectACC_PIV(hMatrix, imageAB, shiftDescription)
    Xguess = Htf.copy()
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        HXguess = forwardProjectACC_PIV(hMatrix, Xguess, shiftDescription)
        HXguessBack = fusedBackwardProjectACC_PIV(hMatrix, HXguess, shiftDescription)
        errorBack = Htf / HXguessBack
        Xguess = Xguess * errorBack
        Xguess[np.where(np.isnan(Xguess))] = 0
        t1 = time.time() - t0
    return Xguess

def RollNoninteger(obj, amount, axis=0):
    intAmount = int(amount)
    frac = amount - intAmount
    result1 = np.roll(obj, intAmount, axis=axis)
    result2 = np.roll(obj, intAmount+1, axis=axis)
    return result1 * (1-frac) + result2 * frac

In [None]:
if (shiftType == 'uniform') or (shiftType == 'uniformSK'):
    if shiftType == 'uniform':
        def ShiftObject(obj, shiftYX):
            # Transform a 3D object according to the flow information provided in shiftDescription
            # For now I just consider a uniform translation in xy
            # 
            # TODO: We need to worry about conserving energy during the shift. 
            # For now I will do a circular shift in order to avoid having to worry about this!
            result = RollNoninteger(obj, shiftYX[0,0], axis=len(obj.shape)-2)
            return RollNoninteger(result, shiftYX[0,1], axis=len(obj.shape)-1)
    else:
        # A lot of code duplication here, but it's just an experiment for now
        def ShiftObject(obj, shiftYX):
            # Generate control points in the corners of the image
            src_cols = np.arange(0, obj.shape[-1]+1, obj.shape[-1])
            src_rows = np.arange(0, obj.shape[-2]+1, obj.shape[-2])
            src_rows, src_cols = np.meshgrid(src_rows, src_cols)
            src = np.dstack([src_cols.flat, src_rows.flat])[0]
            dst = src + shiftYX[0]
            tform = PiecewiseAffineTransform()
            tform.estimate(src, dst)
            # Annoyingly, skimage insists that a float input is scaled between 0 and 1, so I must rescale here
            maxVal = np.max(np.abs(obj))
            if len(obj.shape) == 3:
                result = np.zeros(obj.shape)
                for cc in range(obj.shape[0]):
                    result[cc] = warp(obj[cc]/maxVal, tform) * maxVal
                return result
            else:
                return warp(obj/maxVal, tform) * maxVal
    
    def ExampleShiftDescriptionForObject(obj):
        return np.array([[-10, 20]])
    
    def VelocityShapeForObject(obj):
        return (2,)

    def IWCentresForObject(obj):
        return np.array([[int(obj.shape[-2]/2), int(obj.shape[-1]/2)]])

else:
    # Arbitrary motion described in terms of an array of control points at IWCentresForObject
    assert(shiftType == 'piv')
    def IWCentresForObject(obj):
        startPos = 0
        # Reusing the code from the skimage example, since that actualy does what we need:
        src_cols = np.arange(startPos, obj.shape[-1]+1, controlPointSpacing)
        src_rows = np.arange(startPos, obj.shape[-2]+1-actualImageExtendSize, controlPointSpacing)
        src_rows, src_cols = np.meshgrid(src_rows, src_cols)
        return np.dstack([src_cols.flat, src_rows.flat])[0]

    def VelocityShapeForObject(obj):
        return IWCentresForObject(obj).shape
    
    def ExampleShiftDescriptionForObject(obj):
        peakVelocity = 7
        iwPos = IWCentresForObject(obj)
        shiftDescription = np.zeros(VelocityShapeForObject(obj))
        width = obj.shape[-1]
        for n in range(iwPos.shape[0]):
            quadraticProfile = ((width/2)**2 - (iwPos[n,0]-width/2)**2)
            quadraticProfile = quadraticProfile / ((width/2)**2) * peakVelocity
            shiftDescription[n,1] = quadraticProfile
        if xMotionPermitted:
            return shiftDescription
        else:
            return shiftDescription[:,1:2]

    def ExtraDuplicateRow(shifts, add=None):
        assert(len(shifts.shape) == 2)
        rowLength = int(np.sqrt(shifts.shape[0]))
        shifts = np.reshape(shifts, (rowLength, rowLength, shifts.shape[1]))
        toAppend = shifts[:,-1:,:].copy()
        if add is not None:
            toAppend += add
        result = np.append(shifts, toAppend, axis=1)
        return result.reshape(result.shape[0]*result.shape[1], result.shape[2])

    def ShiftObject(obj, shiftYX):
        # Transform a 3D object according to the flow information provided in shiftDescription
        # I use a piecewise affine transformation that should approximately correspond to
        # what I use for PIV analysis
        src = IWCentresForObject(obj)
        if (src.shape[0] != shiftYX.shape[0]):
            print(src.shape, shiftYX.shape, obj.shape)
            assert(src.shape[0] == shiftYX.shape[0])
        
        if (actualImageExtendSize > 0):
            src = ExtraDuplicateRow(src, add=np.array([0, actualImageExtendSize]))
            if xMotionPermitted:
                dst = src + ExtraDuplicateRow(shiftYX)
            else:
                dst = src.copy().astype(shiftYX.dtype)
                dst[:,1] = dst[:,1] + ExtraDuplicateRow(shiftYX)[:,0]
        else:
            dst = src.copy().astype(shiftYX.dtype) + shiftYX
            
        tform = PiecewiseAffineTransform()
        tform.estimate(src, dst)
        # Annoyingly, skimage insists that a float input is scaled between 0 and 1, so I must rescale here
        maxVal = np.max(np.abs(obj))
        if len(obj.shape) == 3:
            result = np.zeros(obj.shape)
            for cc in range(obj.shape[0]):
                result[cc] = warp(obj[cc]/maxVal, tform) * maxVal
            return result
        else:
            assert(len(obj.shape) == 2)
            return warp(obj/maxVal, tform) * maxVal

In [None]:
if source == 'synthetic':
    # Generate a synthetic shift in the B image
    dualObject = np.tile(syntheticObjectExt[:,np.newaxis,:,:], (1,2,1,1)) *1e3#* 1e7
    if False:
        warnings.warn('Loading previously-saved dualObject')
        dualObject = np.load('dualObject5.npy')
    
    shiftDescription = ExampleShiftDescriptionForObject(dualObject)
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)

    # Since I am only using a local minimizer, we need to start with a decent guess as to the flow.
    # I think that's ok though: we should have that from a PIV estimate on the with-artefacts AB images
    #initialShiftGuess = np.zeros(VelocityShapeForObject(dualObject))
    initialShiftGuess = shiftDescription + np.random.random(shiftDescription.shape) * 4.0
else:
    assert(source == 'piv')
    pivImagePair = tifffile.imread('piv-raw-data/038298.tif')[24:26,:15*20,:15*16].astype('float64')
    # Note: frames 57-58 (wrong pair) would be an option to investigate bigger motion (~16px) with imperfect AB matches
    #              64-65 (correct pair) are another example of small movement (0-3px)
    dualObject = pivImagePair[np.newaxis]
    # For now, I just guess an initial shift of zero
    shiftDescription = np.zeros(VelocityShapeForObject(dualObject)).astype('float64')
    initialShiftGuess = shiftDescription.copy()
    
    
lb = []
ub = []
if xMotionPermitted:
    for n in range(shiftDescription.shape[0]):
        lb.extend([shiftDescription[n,0]-xSearchRange, shiftDescription[n,1]-ySearchRange])
        ub.extend([shiftDescription[n,0]+xSearchRange, shiftDescription[n,1]+ySearchRange])
else:
    for n in range(shiftDescription.shape[0]):
        lb.extend([shiftDescription[n,0]-ySearchRange])
        ub.extend([shiftDescription[n,0]+ySearchRange])
shiftSearchBounds = scipy.optimize.Bounds(lb, ub, True)

plt.subplot(1, 2, 1)
plt.imshow(dualObject[0,0])
plt.subplot(1, 2, 2)
plt.imshow(dualObject[0,1])
plt.show()

In [None]:
# Code used for investigations in which I directly warp the input object/images,
# without any use of light field PSFs and deconvolution

def ScoreShift2(candidateShiftYX, method, imageAB, hMatrix=None, shiftHistory=None, scaling=1.0, log=True, comparator=None):
    return ScoreShift3(candidateShiftYX, method, imageAB, hMatrix, shiftHistory, scaling, log, comparator)[0]

def ScoreShift3(candidateShiftYX, method, imageAB, hMatrix=None, shiftHistory=None, scaling=1.0, log=True, comparator=None):
    # Our input parameters get flattened, so we need to reshape them to Nx2 like my code is expecting
    # 'scaling' is useful for optimizers that insist on initial very small step sizes
    if xMotionPermitted:
        candidateShiftYX = candidateShiftYX.reshape(int(candidateShiftYX.shape[0]/2),2) * scaling
    else:
        candidateShiftYX = candidateShiftYX.reshape(candidateShiftYX.shape[0],1) * scaling
    # Sanity check and reminder that we have a 2xMxN AB image pair
    assert(len(imageAB.shape) == 3)  
    assert(imageAB.shape[0] == 2)
        
    if log:
        print('======== Score shift ========', candidateShiftYX.T)

    if method == 'joint':
        # Perform the joint deconvolution to recover a single object
        res = deconvRL_PIV(hMatrix, imageAB, maxIter=8, shiftDescription=candidateShiftYX)
        # Evaluate how well the forward-projected result matches the actual camera images, using SSD
        candidateImageAB = forwardProjectACC_PIV(hMatrix, res, candidateShiftYX)
        assert(len(candidateImageAB.shape) == 3)  # Temp test: surely this must be the case, but I am doubting myself now (after merging naive and joint code together...)
    else:
        # Just warp the raw B image manually and look at how the two images compare
        assert(method == 'naive')
        candidateImageAB = imageAB.copy()
        # A bit of dimensional gymnastics here, because ShiftObject expects an *object*,
        # i.e. a 3D volume, whereas in this case we just have a 2D image
        candidateImageAB[1,:,:] = ShiftObject(candidateImageAB[np.newaxis,0,:,:], candidateShiftYX)[0]  
    # Sanity check and reminder that we have a 2xMxN AB image pair
    assert(len(candidateImageAB.shape) == 3)  
    assert(candidateImageAB.shape[0] == 2)

    imageToScore = candidateImageAB[1, 1:-1-actualImageExtendSize, 1:-1-actualImageExtendSize]
    referenceImage = imageAB[1, 1:-1-actualImageExtendSize, 1:-1-actualImageExtendSize]
    # TODO: I should think about whether this is the correct thing to do.
    # The A images will be identical in the case of the 'naive' method (direct warping),
    # but for the 'joint' method my intention is that this should be a useful normalization.
    # I need to think more about that though (and comment an explanation, if nothing else!)
    renormHack = np.average(candidateImageAB[0]) / np.average(imageAB[0])
    ssdScore = np.sum((imageToScore/renormHack - referenceImage)**2)

    if comparator is not None:
        maxLoc = np.argmax(np.abs(imageToScore - comparator)[1:-1,1:-1])
        maxVal =    np.max(np.abs(imageToScore - comparator)[1:-1,1:-1])
        plt.imshow((imageToScore - comparator)[170:,150:])
        plt.colorbar()
        plt.title('BRel (max %e)'%maxVal)
        print('Max val %f at %d (image scale %d)' % (maxVal, maxLoc, np.max(comparator)))
        plt.show()

    if shiftHistory is not None:
        shiftHistory.Update(candidateShiftYX, ssdScore)
        if log:
            if shiftHistory.PlotHistory(onlyPlotEvery=20):
                if method == 'joint':
                    print(res.shape)
                    dualObject = np.tile(res[:,np.newaxis,:,:] / 2.0, (1,2,1,1))
                    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
                    print(dualObject.shape)
                    ShowDualObjectAndFlow(dualObject, candidateShiftYX)
                else:
                    ShowDualObjectAndFlow(candidateImageAB, candidateShiftYX)
    if log:
        print('return', ssdScore)
    return (ssdScore, renormHack, np.average(candidateImageAB[0]), np.average(imageAB), candidateImageAB[1])

def ShowDualObjectAndFlow(dualObject, shiftDescription, otherObject=None, otherObject2=None):
    plt.subplot(1, 2, 1)
    if (len(dualObject.shape) == 4):
        assert(dualObject.shape[1] == 2)
        plt.imshow(dualObject[0,0])
        plt.subplot(1, 2, 2)
        plt.imshow(dualObject[0,1])
    else:
        assert(len(dualObject.shape) == 3)  # It's actually a dual image not an object
        assert(dualObject.shape[0] == 2)
        plt.imshow(dualObject[1])
    iwPos = IWCentresForObject(dualObject)
    if xMotionPermitted == False:
        for n in range(iwPos.shape[0]):
            plt.plot([iwPos[n,0], iwPos[n,0]], \
                     [iwPos[n,1], iwPos[n,1] - shiftDescription[n,0]/2], color='red')
    else:
        for n in range(iwPos.shape[0]):
            plt.plot([iwPos[n,0], iwPos[n,0] - shiftDescription[n,0]/2], \
                     [iwPos[n,1], iwPos[n,1] - shiftDescription[n,1]/2], color='red')
    plt.xlim(0, dualObject.shape[-1])
    plt.ylim(dualObject.shape[-2], 0)
    plt.show()
    if otherObject is not None:
        plt.imshow(otherObject[0])
        plt.show()        
    if otherObject2 is not None:
        plt.imshow(otherObject2[0])
        plt.show()   
        
def CheckConvergence(funcToCall, convergedShift, args):
    initialScore = funcToCall(convergedShift.flatten(), *args)
    print('initial score', initialScore)
    for du in [0.5, -0.5, 1.5, -1.5]:
        for n in [7, 8, 12, 13]:
            temp = convergedShift.copy()
            temp[n] += du
            score = funcToCall(temp, *args)
            print('offset score', score)
            if (score < initialScore):
                print(n, du, 'BETTER!')

def ReportOnOptimizerConvergence(shiftHistory, obj):
    bestShift = shiftHistory.BestShift()
    print(shiftHistory.BestScore())
    print('np.array([', end='')
    for n in bestShift.flatten():
        print('%f, '%n, end='')
    print('])')
    CheckConvergence(ScoreShift2, bestShift.flatten(), ('naive', obj, None, None, 1.0, False))                
                
class ShiftHistory:
    def __init__(self):
        self.Reset()

    def __copy__(self):
        result = ShiftHistory()
        result.shiftHistory = self.shiftHistory
        result.scoreHistory = self.scoreHistory
        result.counter = self.counter
        return result

    def Reset(self):
        self.scoreHistory = []
        self.shiftHistory = []
        self.counter = 0
    
    def Update(self, shift, score):
        self.shiftHistory.append(shift)
        self.scoreHistory.append(score)
        self.counter = self.counter + 1
        
    def BestScore(self):
        return np.min(self.scoreHistory)

    def BestShift(self):
        return self.shiftHistory[np.argmin(self.scoreHistory)]

    def PlotHistory(self, onlyPlotEvery=1):
        if ((self.counter%onlyPlotEvery) == 0) and (len(self.shiftHistory) > 0):
            print('best score so far:', np.min(self.scoreHistory))
            # Plot one of the shifts
            shiftShape = self.shiftHistory[0].shape
            selectedItem = np.minimum(int(np.sqrt(shiftShape[0])/2), shiftShape[0]-1)
            selectedShift = np.array(self.shiftHistory)[:, selectedItem, -1]
            plt.plot(selectedShift)
            plt.show()
            # Plot scores, with a suitable y axis scaling to see the interesting parts.
            # We limit the y axis to avoid stupid guesses distorting the plot.
            improvement = self.scoreHistory[0] - np.min(self.scoreHistory)
            plt.ylim(np.min(self.scoreHistory), self.scoreHistory[0]+2*improvement)
            plt.plot(self.scoreHistory)
            plt.show()

            with open('scores.txt', 'a') as f:
                f.write('%f\t' % self.scoreHistory[-1])
                for n in self.shiftHistory[-1]:
                    if xMotionPermitted:
                        f.write('%f\t%f\t' % (n[0], n[1]))
                    else:
                        f.write('%f\t' % (n[0]))
                f.write('\n')
            return True
        else:
            return False        

In [None]:
# Generate synthetic light-field-recovered AB images (doing it the naive way, not using my new joint deconvolution)
# Run the imaging cycle on each of the AB images individually (i.e. introduce artefacts into them)
dualObjectRecovered = dualObject.copy()
for n in [0, 1]:
    cameraImage = forwardProjectACC(pivHMatrix, dualObject[:,n,:,:], logPrint=False)
    backProjected = backwardProjectACC(pivHMatrix, cameraImage, logPrint=False)
    
    # With the shifted images, we have problems with true zeroes in regions that have no features remaining.
    # To avoid this, I apply a very small nonzero background so that the deconvolution doesn't fail.
    backProjected = np.maximum(backProjected, 1e-5*np.max(backProjected))
    
    dualObjectRecovered[:,n,:,:] = deconvRL(pivHMatrix, backProjected, maxIter=8, Xguess=backProjected, logPrint=False)

In [None]:
print('Original object')
iwPos = IWCentresForObject(dualObject)
ShowDualObjectAndFlow(dualObject, shiftDescription)
print('Recovered from light field images (plane %d)' % zPlaneToModel)
ShowDualObjectAndFlow(dualObjectRecovered, shiftDescription)

In [None]:
if True:
    # If I want to give the algorithm the best possible starting point,
    # I can give it the actual true shift values as its starting point
    # (but it still may iterate away from that...)
    warnings.warn("WARNING: starting guess is actually the correct flow description")
    startShiftForOptimizer = shiftDescription.copy()
else:
    startShiftForOptimizer = initialShiftGuess.copy()

    
def OptimizeToRecoverFlowField(method, imageAB, hMatrix, shiftDescription, initialShiftGuess):
    imageAB = imageAB.copy()    # This is just paranoia - I don't think it should get manipulated
    print('True shift:', shiftDescription.T)

    if False:
        plt.imshow(imageAB[0,:,:])
        plt.show()
        plt.imshow(imageAB[1,:,:])
        plt.show()

    if False:
        print('Score for correct shift:', ScoreShift2(shiftDescription.flatten(), method, imageAB, hMatrix))
        print('Score for initial guess:', ScoreShift2(initialShiftGuess.flatten(), method, imageAB, hMatrix))

    if True:
        optimizationAlgorithm = 'Powell'
        options = {'xtol': 1e-2}
    elif True:
        optimizationAlgorithm = 'L-BFGS-B'
        options = {'eps': 5e-03, 'gtol': 1e-6}
    else:
        optimizationAlgorithm = 'Nelder-Mead'
        options = {'eps': 5e-03, 'xatol': 1e-2, 'adaptive': True}

    # Optimize to obtain the best-matching shift
    shiftHistory = ShiftHistory()
    try:
        shift = scipy.optimize.minimize(ScoreShift2, initialShiftGuess, bounds=shiftSearchBounds, args=(method, imageAB, hMatrix, shiftHistory), method=optimizationAlgorithm, options=options)
    except KeyboardInterrupt:
        # Catch keyboard interrupts so that we still return whatever shiftHistory we have built up so far.
        print('KEYBOARD INTERRUPT DURING OPTIMIZATION')
        return shiftHistory
    print('Optimizer finished:', str(shift.message), 'Final shift:', shift.x.T)
    return shiftHistory

In [None]:
# Perform the reconstruction using direct shift-matching of the raw input images (for real experimental SPIM-PIV images)
if True:
    shiftHistoryRaw = OptimizeToRecoverFlowField('naive', dualObject[0], None, shiftDescription, startShiftForOptimizer)

In [None]:
ReportOnOptimizerConvergence(shiftHistoryRaw, dualObject[0])

In [None]:
# Perform the reconstruction using direct shift-matching of the light-field-deconvolved images
if True:
    shiftHistoryNaive = OptimizeToRecoverFlowField('naive', dualObjectRecovered[0], None, shiftDescription, startShiftForOptimizer)

In [None]:
ReportOnOptimizerConvergence(shiftHistoryNaive, dualObjectRecovered[0])

In [None]:
# Perform the reconstruction using my new joint algorithm
if True:
    # Generate a camera image pair from the object.
    # The B image is determined with the help of the chosen shift transform.
    imageAB = forwardProjectACC_PIV(pivHMatrix, dualObject[:,0,:,:], shiftDescription)
    # Run the joint optimizer optimizer to find the shift value for an input frame pair
    shiftHistoryJoint = OptimizeToRecoverFlowField('joint', imageAB, pivHMatrix, shiftDescription, startShiftForOptimizer)    

In [None]:
if False:
    np.save('shiftHistory7c.npy', shiftHistoryJoint.shiftHistory)
    np.save('scoreHistory7c.npy', shiftHistoryJoint.scoreHistory)
    np.save('dualObject7c.npy', dualObject)

In [None]:
ReportOnOptimizerConvergence(shiftHistoryJoint, imageAB)

In [None]:
# Look at how the scores are evolving during the powell iterations
vals = np.array(shiftHistoryNaive.shiftHistory)
scores = np.array(shiftHistoryNaive.scoreHistory)
iwOfInterest = 5*7+3
iwOfInterest = 5*7+6# Looking at border control point for shiftHistoryNaive
x = []
y = []
y2 = []

if False:
    (_,_,_,_,comp) = ScoreShift3(vals[3020].flatten(), 'naive', objectToUse[0], log=False)    
    for n in [3054]:#range(3030, 3055):
#    for n in range(222, 224):
        recalculatedVals = ScoreShift2(vals[n].flatten(), 'naive', objectToUse[0], log=False, comparator=comp)
        recalculatedVals2 = ScoreShiftByDirectWarping(vals[n].flatten(), objectToUse[0], log=False)

        print(vals[n].flatten()[iwOfInterest], scores[n])
#        print(vals[n].flatten()[iwOfInterest], scores[n])
     
    plt.plot(np.array(vals[3043:3047])[:,iwOfInterest,0], scores[3043:3047], 'x')
    plt.show()
    plt.plot(np.array(vals[3040:3050])[:,iwOfInterest,0], 'x')
    plt.show()

if False:
    for d in np.arange(5.011, 5.015, 0.0005):
        sh = vals[3020].flatten()
        sh[iwOfInterest] = d
        sc = ScoreShift2(sh, 'naive', objectToUse[0], log=False, comparator=None)#comp)
        plt.plot(d, sc, '.', color='red')

    plt.show()

if False:
    # Compare the results from different control points
    sh = vals[3020].flatten()
    sh[iwOfInterest] = 5.011
    (_,_,_,_,comp) = ScoreShift3(sh, 'naive', objectToUse, log=False)    
    for d in [5.012, 5.0135, 5.0145]:
        sh[iwOfInterest] = d
        sc = ScoreShift2(sh, 'naive', objectToUse[0], log=False, comparator=comp)
        print('score', sc)


if True:
    #for iw in [iwOfInterest]:
    for iw in range(49):
    #for iw in [0]:
        for n in range(0,vals.shape[0]-1):
            if (vals[n,iw,0] != vals[n+1,iw,0]):
                x.append(vals[n,iw,0])
                y.append(scores[n])
            else:
                if (len(x) > 0):
                    if (len(x) > 2):
                        plt.plot(x, y, 'x')
                        plt.title('%d,%d %d(%d)'%(iw/7,iw%7, n, len(x)))
                        plt.show()
                    x = []
                    y = []
                    y2 = []
                nStart = n

                


In [None]:
# To understand how the optimizer is behaving, scan the search space rather than optimizing

# This gives a nice clear quadratic minimum, although biased to about 1.5 (true shift 1.0).
# I should remember that I don't expect a perfect result in the native focal plane.
# Very clear quadratic minimum at 1.0 when I use z plane index 7. This is good news!

actualShift = (1,0)
# Generate a camera image pair based on a chosen shift transform
imageAB = forwardProjectACC_PIV(thisH, obj, actualShift)
scores = []
for dx in range(-2,4,1):
    scores.append([dx, ScoreShift(np.array([dx,0]), imageAB)])
scores = np.array(scores)

In [None]:
plt.plot(scores[:,0], scores[:,1], 'x')