Status: my new code (which eliminates some of the FFTs) takes 16 seconds instead of 63, on my laptop - for a 2x2 tiled-up input [I did this to give more reproducible timings with a larger work size].

Without exploiting symmetries it takes 23 seconds. I had to write C code to speed up the generation of the reflected/transposed FFT matrices - without that it was barely any faster than just computing the rfftn for all aa,bb.

Now, 2/3rds of the time is spent in fft2, so that is the bottleneck. If I insist on a square array then I could halve that, but that would be a limitation. I could, I suppose, make it a *recommendation* that allows the code to run faster. I could certainly take advantage of that for my own work.

For the original dataset size, on my mac pro, my code takes 6.5s (single-threaded) to back-project plane 4, compared to Matlab's 2.2s (multithreaded). 

On my mac pro, full backprojection takes 96s single-threaded or 13s parallel(!), of which 1s is on the merging at the end (which I should be able to speed up too). Matlab with multithreading took 28s, so I have achieved a >2x speedup. More would have been nice, but it's still fairly respectable.


NOTE: my c code can't cope with an array that has been transposed (Probably because it assumes adjacent strides in x?). I should probably fix that, though I doubt it's a performance issue to just .copy() the transposed array, which is what I do at the moment. I should really be swapping the transpose to be the final operation (in the case of square inputs) anyway. However, it looks as if a decent chunk of the fft time is actually being spent in the other ffts (for the reduced arrays) anyway!

### Performance investigation

Actual thread execution time seems to grow considerably with the number of threads, i.e. efficiency falls. I am not sure how to try and work out what the cause of that is. I could go back to working on dummy data (no transfers between processes) and see if that makes a difference to *that* in particular. (I think I may have looked only at the dead time overheads - which are also an issue).
I looked at user and system cpu time, and with Instruments. Looks like 20% of time is spent in madvise (macbook, 2 threads). I am not sure exactly why or where that is happening. It seems to be related to python memory management in some way. I should check if that grows with number of threads on mac pro, and if it is the same when I use dummy work blocks rather than passing to subprocesses

### Performance improvements to make

Move transpose to final operation (since it's probably faster than reversing an array - although it may impact subsequent fft performance?), in the case of square arrays

fft2 returns a double array (on macbook, at least) for float input. I would much prefer it to return complex64. I could well believe it might be a performance hit to do it this way (larger memory footprint). Can I improve on this? I suppose I could call through to c code that calls fftw, for example

Code now supports a third dimension for the camera images (and object z plane), so that we can implement PIV. At the moment it just iterates - performance should be improved by only calculating FT(H) once.


In [None]:
import numpy as np
import numexpr as ne
import scipy.ndimage, scipy.optimize, scipy.io
from scipy.ndimage.filters import convolve
from scipy.signal import convolve2d, fftconvolve
import os, sys, time, warnings
import matplotlib.pyplot as plt
%matplotlib inline
import tifffile
import h5py
import multiprocessing
from functools import partial
from joblib import Parallel, delayed
import cProfile, pstats
import glob, csv
from tqdm import tqdm_notebook as tqdm
from numba import jit
sys.path.insert(0, 'py_symmetry')
import py_symmetry as jps

# I don't know if these are necessary, but it has been suggested that low-level threading
# does not interact well with the joblib Parallel feature.
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_DYNAMIC'] = 'FALSE'

In [None]:
matPath = 'PSFmatrix/PSFmatrix_M22.2NA0.5MLPitch125fml3125from-110to110zspacing4Nnum19lambda520n1.33.mat'

warnings.warn('WARNING: Switched to faster matrix for testing')
matPath = 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-26to0zspacing2Nnum15lambda520n1.0.mat'

In [None]:
HPathFormat = os.path.splitext(matPath)[0]+'/H%02d.array'
HtPathFormat = os.path.splitext(matPath)[0]+'/Ht%02d.array'
HReducedShape = []
HtReducedShape = []
if True:
    # Load the matrices from the .mat file.
    # This is slow since they must be decompressed and are rather large! (9.5GB each, in single-precision FP)
    with h5py.File(matPath, 'r') as f:
        print('Load CAindex')
        sys.stdout.flush()
        CAindex = f['CAindex'].value.astype('int')
        
        print('Load H')
        sys.stdout.flush()
        H = f['H'].value.astype('float32')
        Nnum = H.shape[2]
        aabbRange = int((Nnum+1)/2)
        for cc in tqdm(range(H.shape[0]), desc='memmap H'):
            HCC =  H[cc, :aabbRange, :aabbRange, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
            HReducedShape.append(HCC.shape)
            a = np.memmap(HPathFormat%cc, dtype='float32', mode='w+', shape=HCC.shape)
            a[:,:,:,:] = HCC[:,:,:,:]
            del a
        #del H        # Needed for old code
        
        print('Load Ht')
        sys.stdout.flush()
        Ht = f['Ht'].value.astype('float32')
        for cc in tqdm(range(Ht.shape[0]), desc='memmap Ht'):
            HtCC =  Ht[cc, :aabbRange, :aabbRange, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
            HtReducedShape.append(HtCC.shape)
            a = np.memmap(HtPathFormat%cc, dtype='float32', mode='w+', shape=HtCC.shape)
            a[:,:,:,:] = HtCC[:,:,:,:]
            del a
        #del Ht        # Needed for old code

In [None]:
# Load the input image
LFmovie = tifffile.imread('Data/02_Rectified/exampleData/20131219WORM2_small_full_neg_X1_N15_cropped_uncompressed.tif')
LFmovie = LFmovie.transpose()[np.newaxis,:,:]

LFIMG = LFmovie[0].astype('float32')
if True:
    # Actual (cropped) image loaded from disk
    inputImage = LFIMG
else:
    inputImage = np.tile(LFIMG,(2,2))

## Objects stored in the .mat file

### Optical parameters from GUI: [? means I am not sure if or where it is stored]

M<br>
NA<br>
d    "fml" in GUI (stored here in units of m)<br>
pixelPitch is "ML pitch" / "Nnum" (stored here in units of m)<br>
? n<br>
? wavelength<br>

### User parameters from GUI:

OSR<br>
zspacing<br>
? z-min<br>
? z-max<br>
Nnum<br>


### Misc parameter:

fobj (can presumably be deduced from mag, NA etc?)<br>

### The actual arrays:

H:             shape (56, 19, 19, 343, 343), type "f4"<br>
Ht:            shape (56, 19, 19, 343, 343), type "f4"<br>

### Information about object space:

x1objspace:    x pixel positions in object space (19 elements across one lenslet)<br>
x2objspace:    y pixel positions in object space (19 elements across one lenslet)<br>
x3objspace:    z pixel positions in object space (56 z planes)<br>
x1space:       x pixel positions in lenslet space (19 elements across one lenslet)<br>
x2space:       y pixel positions in lenslet space (19 elements across one lenslet)<br>

### Not sure what these are exactly:

CAindex:       shape (2, 56) - something about the start and end index of the PSF array, for each z plane.<br>
CP:            shape (343, 1)<br>
MLARRAY:       shape (1141, 1141), type "|V16"<br>
objspace:      shape (56, 1, 1)<br>
settingPSF:    You would think this contains the GUI parameters, but e.g. print(f['settingPSF']['M'].value) gives a strange 3x1 array [50, 50, 46, 50] etc...?<br>


In [None]:
# Note: I am a little unsure how to interpret the arrays I have loaded from the .mat.
# From looking at how H and CAindex are accessed, it looks as if the shapes I have loaded
# are the reversal of the shape ordering as expected in Matlab.
# I suppose that makes sense given that matlab is column-major in its array accesses.
# The data has been loaded from disk in the order it is *stored*,
# and I therefore need to flip around all the matlab array index ordering 
# (e.g. matlabArray(1,2,3) becomes pythonArray[3,2,1])

In [None]:
import resource

def noProgressBar(work, **kwargs):
    # Dummy function to be used in place of tqdm when we don't want to show a progress bar
    return work    

def cpuTime(kind):
    rus = resource.getrusage(resource.RUSAGE_SELF)    
    ruc = resource.getrusage(resource.RUSAGE_CHILDREN)
    if (kind == 'self'):
        return np.array([rus.ru_utime, rus.ru_stime])
    elif (kind == 'children'):
        return np.array([ruc.ru_utime, ruc.ru_stime])
    else:
        return np.array([rus.ru_utime+ruc.ru_utime, rus.ru_stime+ruc.ru_stime])

In [None]:
from scipy._lib._version import NumpyVersion
from numpy.fft import fft, fftn, rfft, rfftn, irfftn
_rfft_mt_safe = (NumpyVersion(np.__version__) >= '1.9.0.dev-e24486e')

def _next_regular(target):
    """
    Find the next regular number greater than or equal to target.
    Regular numbers are composites of the prime factors 2, 3, and 5.
    Also known as 5-smooth numbers or Hamming numbers, these are the optimal
    size for inputs to FFTPACK.

    Target must be a positive integer.
    """
    if target <= 6:
        return target

    # Quickly check if it's already a power of 2
    if not (target & (target-1)):
        return target

    match = float('inf')  # Anything found will be smaller
    p5 = 1
    while p5 < target:
        p35 = p5
        while p35 < target:
            # Ceiling integer division, avoiding conversion to float
            # (quotient = ceil(target / p35))
            quotient = -(-target // p35)

            # Quickly find next power of 2 >= quotient
            try:
                p2 = 2**((quotient - 1).bit_length())
            except AttributeError:
                # Fallback for Python <2.7
                p2 = 2**(len(bin(quotient - 1)) - 2)

            N = p2 * p35
            if N == target:
                return N
            elif N < match:
                match = N
            p35 *= 3
            if p35 == target:
                return p35
        if p35 < match:
            match = p35
        p5 *= 5
        if p5 == target:
            return p5
    if p5 < match:
        match = p5
    return match

def _centered(arr, newsize):
    # Return the center newsize portion of the array.
    currsize = np.array(arr.shape)
    newsize = np.asarray(newsize)
    if (len(currsize) > len(newsize)):
        newsize = np.append([currsize[0]], newsize)
    startind = (currsize - newsize) // 2
    endind = startind + newsize
    myslice = [slice(startind[k], endind[k]) for k in range(len(endind))]
    return arr[tuple(myslice)]

def tempMul(bb,fshape,result):
    result *= np.exp(-1j * bb * 2*np.pi / fshape[0] * np.arange(result.shape[0],dtype='complex64'))[:,np.newaxis]
    return result

def expand2(result, bb, aa, Nnum, fshape):
    return np.tile(result, (Nnum,1))

def expand(reducedF, bb, aa, Nnum, fshape):
    result = np.tile(reducedF, (1,int(Nnum/2+1)))
    result = result[:,:int(fshape[1]/2+1)]
    result *= np.exp(-1j * aa * 2*np.pi / fshape[1] * np.arange(result.shape[1],dtype='complex64'))
    result = expand2(result, bb, aa, Nnum, fshape)
    return tempMul(bb,fshape,result)


def special_rfftn(in1, bb, aa, Nnum, fshape):
    # Compute the fft of elements in1[bb::Nnum,aa::Nnum], after in1 has been zero-padded out to fshape
    # We exploit the fact that fft(masked-in1) is fft(arr[::Nnum,::Nnum]) replicated Nnum times.
    reducedShape = ()
    for d in fshape:
        assert((d % Nnum) == 0)
        reducedShape = reducedShape + (int(d/Nnum),)
        
    assert(in1.ndim == 2)
    reduced = in1[bb::Nnum,aa::Nnum]

    # Compute an array giving rfft(mask(in1))
    reducedF = scipy.fftpack.fft2(reduced, reducedShape).astype('complex64')
    return expand(reducedF, bb, aa, Nnum, fshape)

def convolutionShape(in1, in2, Nnum):
    # Logic copied from fftconvolve source code
    s1 = np.array(in1.shape)
    s2 = np.array(in2.shape)
    if (len(s1) == 3):
        s1 = s1[1:]
    shape = s1 + s2 - 1
    if False:
        # TODO: I haven't worked out if/how I can do this yet.
        # This is the original code in fftconvolve, which says:
        # Speed up FFT by padding to optimal size for FFTPACK
        fshape = [_next_regular(int(d)) for d in shape]
    else:
        fshape = [int(np.ceil(d/float(Nnum)))*Nnum for d in shape]
    fslice = tuple([slice(0, int(sz)) for sz in shape])
    return (fshape, fslice, s1)
    
def special_fftconvolve_part1(in1, bb, aa, Nnum, in2):
    (fshape, fslice, s1) = convolutionShape(in1, in2, Nnum)
    # Pre-1.9 NumPy FFT routines are not threadsafe - this code requires numpy 1.9 or greater
    assert(_rfft_mt_safe)
    fa = special_rfftn(in1, bb, aa, Nnum, fshape)
    return (fa, fshape, fslice, s1)

def special_fftconvolve_part3(fab, fshape, fslice, s1):
    ret = irfftn(fab, fshape)[fslice].copy()
    return _centered(ret, s1)

def special_fftconvolveNew(in1, bb, aa, Nnum, in2, accum, fb=None):
    '''
    in1 consists of subapertures of size Nnum x Nnum pixels.
    We are being asked to convolve only pixel (bb,aa) within each subaperture, i.e.
        tempSlice = np.zeros(in1.shape, dtype=in1.dtype)
        tempSlice[bb::Nnum, aa::Nnum] = in1[bb::Nnum, aa::Nnum]
    This allows us to take a significant shortcut in computing the FFT for in1.
    '''
    (fa, fshape, fslice, s1) = special_fftconvolve_part1(in1, bb, aa, Nnum, in2)
    if fb is None:
        fb = rfftn(in2, fshape)
    if accum is None:
        accum = fa*fb
    else:
        accum += fa*fb
    return (accum, fshape, fslice, s1)

In [None]:
def forwardProjectForZ_old(HCC, realspaceCC):
    singleJob = (len(realspaceCC.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspaceCC = realspaceCC[np.newaxis,:,:]
    # Iterate over each lenslet pixel
    Nnum = HCC.shape[1]
    TOTALprojection = np.zeros(realspaceCC.shape, dtype='float32')
    for bb in tqdm(range(Nnum), leave=False, desc='Forward-project - y'):
        for aa in tqdm(range(Nnum), leave=False, desc='Forward-project - x'):
            # Extract the part of H that represents this lenslet pixel
            Hs = HCC[bb, aa]
            for n in range(realspaceCC.shape[0]):
                # Create a workspace representing just the voxels cc,bb,aa behind each lenslet (the rest is 0)
                tempspace = np.zeros((realspaceCC[n].shape[0], realspaceCC[n].shape[1]), dtype='float32');
                tempspace[bb::Nnum, aa::Nnum] = realspaceCC[n, bb::Nnum, aa::Nnum]  # ???? what to do about index ordering?
                # Compute how those voxels project onto the sensor, and accumulate
                TOTALprojection[n] += fftconvolve(tempspace, Hs, 'same')
    if singleJob:
        return TOTALprojection[0]
    else:
        return TOTALprojection
    
def backwardProjectForZ_old(HtCC, projection):
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    # Iterate over each lenslet pixel
    Nnum = HtCC.shape[1]
    tempSliceBack = np.zeros(projection.shape, dtype='float32')        
    for aa in tqdm(range(Nnum), leave=False, desc='y'):
        for bb in range(Nnum):
            # Extract the part of Ht that represents this lenslet pixel
            Hts = HtCC[bb, aa]
            for n in range(projection.shape[0]):
                # Create a workspace representing just the voxels cc,bb,aa behind each lenslet (the rest is 0)
                tempSlice = np.zeros(projection[n].shape, dtype='float32')
                tempSlice[bb::Nnum, aa::Nnum] = projection[n, bb::Nnum, aa::Nnum]
                # Compute how those voxels back-project from the sensor
                tempSliceBack[n] += fftconvolve(tempSlice, Hts, 'same')
    if singleJob:
        return tempSliceBack[0]
    else:
        return tempSliceBack
    
def backwardProjectACC_original(Ht, projection, CAindex, planes=None):
    Backprojection = np.zeros((x3length, projection.shape[0], projection.shape[1]), dtype='float32')
    # Iterate over each z plane
    if planes is None:
        planes = range(Ht.shape[0])
    for cc in tqdm(planes, desc='Back-project - z'):
        HtCC =  Ht[cc, :, :, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
        Backprojection[cc] = backwardProjectForZ_old(HtCC, projection)
    return Backprojection

In [None]:
def deconvRL(Htf, maxIter, Xguess, thisCAindex=CAindex):
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        HXguess = forwardProjectACC(H, Xguess, thisCAindex)
        HXguessBack = backwardProjectACC(Ht, HXguess, thisCAindex)
        errorBack = Htf / HXguessBack
        Xguess = Xguess * errorBack
        Xguess[np.where(np.isnan(Xguess))] = 0
        ttime = time.time() - t0
        print('iter %d | %d, took %.1f secs. Max val %f' % (i+1, maxIter, ttime, np.max(Xguess)))
    return Xguess

In [None]:
# Note: H.shape in python is (<num z planes>, Nnum, Nnum, <psf size>, <psf size>),
#                       e.g. (56, 19, 19, 343, 343)
useSymmetries = True

class Projector(object):
    def __init__(self, projection, HtCCBB, Nnum):
        # Note: H and Hts are not stored as class variables.
        # I had a lot of trouble with them and multithreading,
        # and eventually settled on having them in shared memory.
        # As I encapsulate more stuff in this class, I could bring them back as class variables...

        self.cpuTime = np.zeros(2)
        
        # Nnum: number of pixels across a lenslet array (after rectification)
        self.Nnum = Nnum
        
        # This next chunk of logic copied from fftconvolve source code.
        # s1, s2: shapes of the input arrays
        # fshape: shape of the (full, possibly padded) result array in Fourier space
        # fslice: slicing tuple specifying the actual result size that should be returned
        self.s1 = np.array(projection.shape)
        self.s2 = np.array(HtCCBB[0].shape)
        shape = self.s1 + self.s2 - 1
        if False:
            # TODO: I haven't worked out if/how I can do this yet.
            # This is the original code in fftconvolve, which says:
            # Speed up FFT by padding to optimal size for FFTPACK
            self.fshape = [_next_regular(int(d)) for d in shape]
        else:
            self.fshape = [int(np.ceil(d/float(Nnum)))*Nnum for d in shape]
        self.fslice = tuple([slice(0, int(sz)) for sz in shape])
        
        # rfslice: slicing tuple to crop down full fft array to the shape that would be output from rfftn
        self.rfslice = (slice(0,self.fshape[0]), slice(0,int(self.fshape[1]/2)+1))
        return
    
    def MirrorXArray(self, Hts, fHtsFull):
        padLength = self.fshape[0] - Hts.shape[0]
        if False:
            fHtsFull = fHtsFull.conj() * np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[0]) * np.arange(self.fshape[0],dtype='complex64')[:,np.newaxis])
            fHtsFull[:,1::] = fHtsFull[:,1::][:,::-1]
            return fHtsFull
        else:
            temp = np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[0]) * np.arange(self.fshape[0])).astype('complex64')
            if True:
                result = jps.mirrorX(fHtsFull, temp)
            else:
                result = np.empty(fHtsFull.shape, dtype=fHtsFull.dtype)
                result[:,0] = fHtsFull[:,0].conj()*temp
                for i in range(1,fHtsFull.shape[1]):
                    result[:,i] = (fHtsFull[:,fHtsFull.shape[1]-i].conj()*temp)
            return result

    def MirrorYArray(self, Hts, fHtsFull):
        padLength = self.fshape[1] - Hts.shape[1]
        if False:
            fHtsFull = fHtsFull.conj() * np.exp(1j * (1+padLength) * 2*np.pi / self.fshape[1] * np.arange(self.fshape[1],dtype='complex64'))
            fHtsFull[1::] = fHtsFull[1::][::-1]
            return fHtsFull
        else:
            temp = np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[1]) * np.arange(self.fshape[1])).astype('complex64')
            if True:
                result = jps.mirrorY(fHtsFull, temp)
            else:
                result = np.empty(fHtsFull.shape, dtype=fHtsFull.dtype)
                result[0] = fHtsFull[0].conj()*temp
                for i in range(1,fHtsFull.shape[0]):
                    result[i] = (fHtsFull[fHtsFull.shape[0]-i].conj()*temp)
            return result
        
    def convolvePart3(self, projection, bb, aa, Hts, fHtsFull, mirrorX, accum):
        # TODO: to make this work, I need the full matrix for fHts and then I need to slice it 
        # to the correct shape when I call through to special_fftconvolve here. Is fshape what I need?
        cpu0 = cpuTime('both')
        (accum,_,_,_) = special_fftconvolveNew(projection,bb,aa,self.Nnum,Hts,accum,fb=fHtsFull[self.rfslice])
        self.cpuTime += cpuTime('both')-cpu0
        if mirrorX:
            fHtsFull = self.MirrorXArray(Hts, fHtsFull)
            cpu0 = cpuTime('both')
            (accum,_,_,_) = special_fftconvolveNew(projection,self.Nnum-bb-1,aa,self.Nnum,Hts[::-1,:],accum,fb=fHtsFull[self.rfslice]) 
            self.cpuTime += cpuTime('both')-cpu0
        return accum

    def convolvePart2(self, projection, bb, aa, Hts, fHtsFull, mirrorY, mirrorX, accum):
        accum = self.convolvePart3(projection,bb,aa,Hts,fHtsFull,mirrorX,accum)
        if mirrorY:
            fHtsFull = self.MirrorYArray(Hts, fHtsFull)
            accum = self.convolvePart3(projection,bb,self.Nnum-aa-1,Hts[:,::-1],fHtsFull,mirrorX,accum)
        return accum

    def convolve(self, projection, bb, aa, Hts, accum):
        cent = int(self.Nnum/2)

        if useSymmetries:
            # Full symmetry
            mirrorX = (bb != cent)
            mirrorY = (aa != cent)
            transpose = ((aa != bb) and (aa != (self.Nnum-bb-1)))
        else:
            mirrorX = False
            mirrorY = False
            transpose = False
            
        # TODO: it would speed things up if I could avoid computing the full fft for Hts.
        # However, it's not immediately clear to me how to fill out the full fftn array from rfftn
        # in the case of a 2D transform.
        # For 1D it's the reversed conjugate, but for 2D it's more complicated than that.
        # It's possible that it's actually nontrivial, in spite of the fact that
        # you can get away without it when only computing fft/ifft for real arrays)
        if useSymmetries:
            fHtsFull = scipy.fftpack.fft2(Hts, self.fshape).astype('complex64')
        else:
            fHtsFull = scipy.fftpack.rfft2(Hts, self.fshape).astype('complex64')
        accum = self.convolvePart2(projection,bb,aa,Hts,fHtsFull,mirrorY,mirrorX, accum)
        if transpose:
            if (self.fshape[0] == self.fshape[1]):
                # For a square array, the FFT of the transpose is just the transpose of the FFT.
                # The copy() is because my C code currently can't cope with
                # a transposed array (non-contiguous strides in x)
                fHtsFull = fHtsFull.transpose().copy()    
            else:
                # For a non-square array, we have to compute the FFT for the transpose.
                fHtsFull = scipy.fftpack.fft2(Hts.transpose(), self.fshape).astype('complex64')

            # Note that mx,my need to be swapped following the transpose
            accum = self.convolvePart2(projection,aa,bb,Hts.transpose(),fHtsFull,mirrorX,mirrorY, accum) 
        return accum
    
def backwardProjectForZY(cc, bb, projection, HtPathFormat, HtCCshape, HtCCBB=None):
    f = open('%d_%d.txt'%(cc,bb), "w")
    t1 = time.time()
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    tempSliceBack = [None] * projection.shape[0]
    if HtCCBB is None:
        HtCCBB = np.memmap(HtPathFormat%cc, dtype='float32', mode='r', shape=HtCCshape)[bb]
        Nnum = HtCCBB.shape[0]*2-1
    else:
        Nnum = HtCCBB.shape[0]
    projector = Projector(projection[0], HtCCBB, Nnum)
    fshape = projector.fshape    # TODO: these three are just for back-compatibility - should tidy up once all is settled
    fslice = projector.fslice
    s1 = projector.s1
    projector.cpuTime = np.zeros(2)
    for aa in range(bb,int((Nnum+1)/2)):
        for n in range(projection.shape[0]):
            tempSliceBack[n] = projector.convolve(projection[n], bb, aa, HtCCBB[aa], tempSliceBack[n])
    t2 = time.time()
    f.write('%d\t%f\t%f\t%f\t%f\t%f\n' % (os.getpid(), t1, t2, t2-t1, projector.cpuTime[0], projector.cpuTime[1]))
    f.close()
    if singleJob:
        return (tempSliceBack[0], fshape, fslice, s1, cc, bb, t2-t1)
    else:
        return (np.array(tempSliceBack), fshape, fslice, s1, cc, bb, t2-t1)
    
def backwardProjectForZ(HtCC, cc, projection):
    tempSliceBack = None
    Nnum = HtCC.shape[1]
    if useSymmetries:
        r = range(int((Nnum+1)/2))
    else:
        r = range(Nnum)
    for bb in tqdm(r, leave=False, desc='Backward-project - y'):
        (result, fshape, fslice, s1, _, _, _) = backwardProjectForZY(cc, bb, projection, None, None, HtCC[bb])
        if (tempSliceBack is None):
            tempSliceBack = result
        else:
            tempSliceBack += result
    return special_fftconvolve_part3(tempSliceBack, fshape, fslice, s1)

def forwardProjectForZY(cc, bb, realspaceCC, HPathFormat, HCCshape, HCCBB=None):
    t1 = time.time()
    singleJob = (len(realspaceCC.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspaceCC = realspaceCC[np.newaxis,:,:]
    result = [None] * realspaceCC.shape[0]
    if HCCBB is None:
        HCCBB = np.memmap(HPathFormat%cc, dtype='float32', mode='r', shape=HCCshape)[bb]
        Nnum = HCCBB.shape[0]*2-1
    else:
        Nnum = HCCBB.shape[0]
    projector = Projector(realspaceCC[0], HCCBB, Nnum)
    fshape = projector.fshape    # TODO: these three are just for back-compatibility - should tidy up once all is settled
    fslice = projector.fslice
    s1 = projector.s1
    for aa in range(bb,int((Nnum+1)/2)):
        for n in range(realspaceCC.shape[0]):
            result[n] = projector.convolve(realspaceCC[n], bb, aa, HCCBB[aa], result[n])
    t2 = time.time()
    if singleJob:
        return (result[0], fshape, fslice, s1, cc, bb, t2-t1)
    else:
        return (np.array(result), fshape, fslice, s1, cc, bb, t2-t1)
    
def forwardProjectForZ(HCC, cc, realspaceCC):
    TOTALprojection = None
    Nnum = HCC.shape[1]
    if useSymmetries:
        r = range(int((Nnum+1)/2))
    else:
        r = range(Nnum)
    for bb in tqdm(r, leave=False, desc='Forward-project - y'):
        (result, fshape, fslice, s1, _, _, _) = forwardProjectForZY(cc, bb, realspaceCC, None, None, HCC[bb])
        if (TOTALprojection is None):
            TOTALprojection = result
        else:
            TOTALprojection += result
    # Actually we only need to do this once, not separately for every z,
    # but for now I'll leave this here for symmetry with backwardProjectForZ
    # since it's not a bottleneck
    return special_fftconvolve_part3(TOTALprojection, fshape, fslice, s1)
    

# Temp tests for dual input
if False:
    testProjection = np.random.random(shape).astype(np.float32)
    testHtCC = Ht[13,int(Ht.shape[1]/2)-2:int(Ht.shape[1]/2)+3,int(Ht.shape[2]/2)-2:int(Ht.shape[2]/2)+3,CAindex[0,13]-1:CAindex[1,13], CAindex[0,13]-1:CAindex[1,13]]
    testResultOld = backwardProjectForZ_old(testHtCC, np.tile(testProjection[np.newaxis,:,:], (2,1,1)))

    
# Test the backprojection code against a slower definitive version
# (this code is here for now because this is where I have been working on stuff, but it could move)
# Note that on the first run, you need to run several cells after this one, before this here will run
testHtCC = np.random.random((5,5,30,30)).astype(np.float32)
testHtCC = Ht[13,int(Ht.shape[1]/2)-2:int(Ht.shape[1]/2)+3,int(Ht.shape[2]/2)-2:int(Ht.shape[2]/2)+3,CAindex[0,13]-1:CAindex[1,13], CAindex[0,13]-1:CAindex[1,13]]
for fd in [False, True]:
    for shape in [(200,200), (200,300)]:
        # Test both square and non-square, since they use different code
        testProjection = np.random.random(shape).astype(np.float32)
        if fd:
            testResultOld = forwardProjectForZ_old(testHtCC, testProjection)
            testResultNew = forwardProjectForZ(testHtCC, 0, testProjection)
        else:
            testResultOld = backwardProjectForZ_old(testHtCC, testProjection)
            testResultNew = backwardProjectForZ(testHtCC, 0, testProjection)
        comparison = np.max(np.abs(testResultOld - testResultNew))
        print('test result (should be <<1): %e' % comparison)
        if (comparison > 1e-4):
            print(" -> WARNING: disagreement detected")
        else:
            print(" -> OK")
        
print('Done')

In [None]:
def backwardProjectACC(Ht, projection, CAindex, planes=None, numjobs=multiprocessing.cpu_count(), progress=tqdm, logPrint=True):
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    if planes is None:
        planes = range(Ht.shape[0])
    if progress is None:
        progress = noProgressBar        

    ru1 = cpuTime('both')

    Backprojection = np.zeros((Ht.shape[0], projection.shape[0], projection.shape[1], projection.shape[2]), dtype='float32')
        
    # Set up the work to iterate over each z plane
    work = []
    for cc in planes:
        Nnum = Ht[cc].shape[1]
        if useSymmetries:
            bbRange = range(int((Nnum+1)/2))
        else:
            bbRange = range(Nnum)
        for bb in bbRange:
            work.append((cc, bb, projection, HtPathFormat, HtReducedShape[cc]))

    # Run the multithreaded work
    t0 = time.time()
    results = Parallel(n_jobs=numjobs)\
            (delayed(backwardProjectForZY)(*args) for args in progress(work, desc='Back-project - z', leave=False))
    ru2 = cpuTime('both')

    # Gather together and sum the results for each z plane
    t1 = time.time()
    fourierZPlanes = [None]*Ht.shape[0]
    elapsedTime = 0
    for (result, fshape, fslice, s1, cc, bb, t) in results:
        elapsedTime += t
        if fourierZPlanes[cc] is None:
            fourierZPlanes[cc] = result
        else:
            fourierZPlanes[cc] += result
    
    # Compute the FFT for each z plane
    for cc in planes:
        HtCC =  Ht[cc, :, :, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
        (fshape, fslice, s1) = convolutionShape(projection, HtCC[0,0], Ht.shape[2])
        Backprojection[cc] = special_fftconvolve_part3(fourierZPlanes[cc], fshape, fslice, s1)        
    t2 = time.time()
    

    # Save some diagnostics
    if logPrint:
        print('work elapsed wallclock time %f'%(t1-t0))
        print('work elapsed thread time %f'%elapsedTime)
        print('work delta rusage:', ru2-ru1)
        print('FFTs took %f'%(t2-t1))
    
    f = open('overall.txt', 'w')
    f.write('%f\t%f\t%f\t%f\t%f\t%f\n' % (t0, t1, t1-t0, t2-t1, (ru2-ru1)[0], (ru2-ru1)[1]))
    f.close()

    if singleJob:
        return Backprojection[:,0]
    else:
        return Backprojection

def forwardProjectACC(H, realspace, CAindex, planes=None, numjobs=multiprocessing.cpu_count(), progress=tqdm, logPrint=True):
    singleJob = (len(realspace.shape) == 3)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspace = realspace[:,np.newaxis,:,:]
    if planes is None:
        planes = range(realspace.shape[0])
    if progress is None:
        progress = noProgressBar        

    # Set up the work to iterate over each z plane
    work = []
    for cc in planes:
        Nnum = H[cc].shape[1]
        bbRange = range(int((Nnum+1)/2))
        for bb in bbRange:
            work.append((cc, bb, realspace[cc], HPathFormat, HReducedShape[cc]))

    # Run the multithreaded work
    t0 = time.time()
    results = Parallel(n_jobs=numjobs)\
                (delayed(forwardProjectForZY)(*args) for args in progress(work, desc='Forward-project - z', leave=False))

    # Gather together and sum all the results
    t1 = time.time()
    fourierProjection = [None]*H.shape[0]
    elapsedTime = 0
    for (result, fshape, fslice, s1, cc, bb, t) in results:
        elapsedTime += t
        if fourierProjection[cc] is None:
            fourierProjection[cc] = result
        else:
            fourierProjection[cc] += result

    # Compute and accumulate the FFT for each z plane
    TOTALprojection = None
    for cc in planes:
        HCC =  H[cc, :, :, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
        (fshape, fslice, s1) = convolutionShape(realspace[:,0,:,:], HCC[0,0], H.shape[2])
        thisProjection = special_fftconvolve_part3(fourierProjection[cc], fshape, fslice, s1)        
        if TOTALprojection is None:
            TOTALprojection = thisProjection
        else:
            TOTALprojection += thisProjection
    t2 = time.time()
            
    # Print out some diagnostics
    if (logPrint):
        print('work elapsed wallclock time %f'%(t1-t0))
        print('work elapsed thread time %f'%elapsedTime)
        print('FFTs took %f'%(t2-t1))
        
    if singleJob:
        return TOTALprojection[0]
    else:
        return TOTALprojection

if False:
    # Temporary call to test parallelization
    temp = backwardProjectACC(Ht, inputImage, CAindex, planes=[0], numjobs=3)
    
if False:
    # Temporary code to test running with an image pair
    # This is maybe not a comprehensive test, but it run with two different (albeit proportional)
    # images and checks that the result matches the result for two totally independent calls on a single array.
    candidate = np.tile(inputImage[np.newaxis,0,0], (2,1,1))
    candidate[1] *= 1.4
    temp = backwardProjectACC(Ht, candidate, CAindex, planes=None)
    dualRoundtrip = forwardProjectACC(H, temp, CAindex, planes=None)

    temp = backwardProjectACC(Ht, candidate[0], CAindex, planes=None, numjobs=1)
    firstRoundtrip = forwardProjectACC(H, temp, CAindex, planes=None, numjobs=1)    
    comparison = np.max(np.abs(firstRoundtrip - dualRoundtrip[0]))
    print('test result (should be <<1): %e' % comparison)
    if (comparison > 1e-6):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")
    
    temp = backwardProjectACC(Ht, candidate[1], CAindex, planes=None, numjobs=1)
    secondRoundtrip = forwardProjectACC(H, temp, CAindex, planes=None, numjobs=1)    
    comparison = np.max(np.abs(secondRoundtrip - dualRoundtrip[1]))
    print('test result (should be <<1): %e' % comparison)
    if (comparison > 1e-6):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")

In [None]:
def AnalyzeTestResults():
    with open('overall.txt') as f:
        csv_reader = csv.reader(f, delimiter='\t')
        for row in csv_reader:
            pass
    startTime = float(row[0])
    endTime = float(row[1])
    userTime = float(row[4])
    sysTime = float(row[5])

    rows = []
    for fn in glob.glob('*_*.txt'):
        with open(fn) as f:
            csv_reader = csv.reader(f, delimiter='\t')
            for row in csv_reader:
                pass
            rows.append(row)
    rows = np.array(rows).astype('float').transpose()
    firstPid = np.min(rows[0])
    rows[0] -= firstPid
    rows[1:3] -= startTime
    rows = rows[:,np.argsort(rows[1],kind='mergesort')]
    rows = rows[:,rows[0].argsort(kind='mergesort')]

    deadTimeStart = 0
    deadTimeMid = 0
    deadTimeEnd = 0
    threadWorkTime = 0
    thisThreadStartTime = 0
    longestThreadRunTime = 0
    longestThreadRunPid = -1
    latestStartTime = 0
    userTimeBreakdown = 0
    sysTimeBreakdown = 0
    for i in range(rows.shape[1]):
        pid = rows[0,i]
        t0 = rows[1,i]
        t1 = rows[2,i]
        userTimeBreakdown += rows[4,i]
        sysTimeBreakdown += rows[5,i]

    #    print(pid, t0, t1)
        if (i == 0):
            deadTimeStart += t0
            thisThreadStartTime = t0
            latestStartTime = t0
        else:
            if (pid == rows[0,i-1]):
                deadTimeMid += t0 - rows[2,i-1]
            else:
                latestStartTime = max(latestStartTime, t0)
                thisThreadRunTime = rows[2,i-1]-thisThreadStartTime  # For previous pid
                if (thisThreadRunTime > longestThreadRunTime):
                    longestThreadRunPid = rows[0,i-1]
                    longestThreadRunTime = thisThreadRunTime
                thisThreadStartTime = t0
                deadTimeStart += t0
                deadTimeEnd += (endTime-startTime) - rows[2,i-1]
        threadWorkTime += t1-t0
        plt.plot([t0, t1], [pid, pid])
        plt.plot(t0, pid, 'x')
    thisThreadRunTime = t1-thisThreadStartTime
    if (thisThreadRunTime > longestThreadRunTime):
        print('Final thread', thisThreadRunTime, t1, thisThreadStartTime)
        longestThreadRunPid = pid
        longestThreadRunTime = thisThreadRunTime
    deadTimeEnd += (endTime-startTime) - rows[2,-1]
    print('Elapsed time', endTime-startTime)
    print('Longest thread run time', longestThreadRunTime, 'pid', int(longestThreadRunPid))
    print('Latest start time', latestStartTime)
    print('Thread work time', threadWorkTime)
    print('Dead time', deadTimeStart, deadTimeMid, deadTimeEnd)
    print(' Total', deadTimeStart + deadTimeMid + deadTimeEnd)
    print('User cpu time', userTime)
    print('System cpu time', sysTime)
    print('User cpu time for subset', userTimeBreakdown)
    print('System cpu time for subset', sysTimeBreakdown)

    with open('stats.txt', 'a') as f:
        f.write('%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\n' % (numJobsForTesting, endTime-startTime, threadWorkTime, \
                        longestThreadRunTime, latestStartTime, deadTimeStart, deadTimeMid, deadTimeEnd, userTime, sysTime))

    plt.xlim(0, endTime-startTime)
    plt.ylim(-0.5,np.max(rows[0])+0.5)
    plt.show()
    
if False:
    for numJobsForTesting in [3]:#range(1,4):#13):
        ru1 = cpuTime('both')
        temp = backwardProjectACC(Ht, inputImage, CAindex, numjobs=numJobsForTesting, planes=None)
        ru2 = cpuTime('both')
        print('overall delta rusage:', ru2-ru1)
    AnalyzeTestResults()

In [None]:
def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield raw

def AnalyzeTestResults2(fn):
    rows = []
    with open(fn) as f:
        csv_reader = csv.reader(decomment(f), delimiter='\t')
        for row in csv_reader:
            rows.append(row)
    rows = np.array(rows).astype(np.float).transpose()

    plt.plot(rows[0], rows[2]/rows[2,0], label='work time')
    plt.plot(rows[0], np.sum(rows[5:8], axis=0)/(rows[0]*rows[1]), label='dead time')
    plt.plot(rows[0], rows[5]/(rows[0]*rows[1]), label='dead start')
    plt.plot(rows[0], rows[1]/(rows[1,0]/rows[0]), label='runtime excess')
    plt.ylim(0,2.5)
    plt.legend(loc=2)
    plt.show()

plt.title('Dummy work on empty arrays')
AnalyzeTestResults2('stats-dummy.txt')
plt.title('Real work')
AnalyzeTestResults2('stats-realwork.txt')
plt.title('Smaller memory footprint - no improvement')
AnalyzeTestResults2('stats-no-H.txt')

# Test a single backprojection and compare against definitive version

In [None]:
planesToProcess = None#[4]
t0 = time.time()
if False:
    Htf = backwardProjectACC_original(Ht, inputImage, CAindex, planes=planesToProcess)
else:
    myStats = cProfile.run('Htf = backwardProjectACC(Ht, inputImage, CAindex, planes=planesToProcess, numjobs=1)', 'mystats')
    p = pstats.Stats('mystats')
    p.strip_dirs().sort_stats('cumulative').print_stats(40)

print('iter 0 took %.1f secs' % (time.time()-t0))

In [None]:
# Compare against definitive version generated from Matlab
if planesToProcess is not None:
    print('WARNING: the following test is not valid because not all planes were processed')
definitive = tifffile.imread('Data/03_Reconstructed/exampleData/definitive_worm_crop_X15_backproject.tif')
definitive = np.transpose(definitive, axes=(0,2,1))
comparison = np.max(np.abs(definitive[4] - Htf[4]*10))
print('Compare against matlab result (should be <1.0): %f' % comparison)
if (comparison > 1.0):
    print(" -> WARNING: disagreement detected")
else:
    print(" -> OK")

#tifffile.imsave('Htf_backproject4.tif', np.transpose(Htf*1e2, axes=(0,2,1)))

# Test a full deconvolution and compare against definitive version

In [None]:
Xguess = Htf.copy();
maxIter = 8
deconvolvedResult = deconvRL(Htf, maxIter, Xguess)

In [None]:
# Compare against definitive version generated from Matlab
definitive = tifffile.imread('Data/03_Reconstructed/exampleData/definitive_worm_crop_X15_iter8.tif')
definitive = np.transpose(definitive, axes=(0,2,1))
comparison = np.max(np.abs(definitive - deconvolvedResult*1e3))
print('Compare against matlab result (should be <1.0): %f' % comparison)
if (comparison > 1.0):
    print(" -> WARNING: disagreement detected")
else:
    print(" -> OK")

#tifffile.imsave('iter8.tif', np.transpose(Xguess*1e3, axes=(0,2,1)))

# Solve for flow field (single-plane toy example)

In [None]:
# Generate two identical images of the same synthetic object,
# which for now consists of 10 random gaussian spots
from scipy.ndimage.filters import gaussian_filter
numSpots = 10
obj = np.zeros((1, 200,200))
obj[0, (np.random.random(numSpots)*obj.shape[1]).astype('int'), (np.random.random(numSpots)*obj.shape[2]).astype('int')] = 1
obj = np.pad(obj, ((0,0),(20,20),(20,20)), 'constant')
obj = gaussian_filter(obj, sigma=(0,8,8))
plt.imshow(obj[0])

In [None]:
def forwardProjectACC_PIV(H, obj, CAindex, shiftDescription):
    # Compute the AB images obtained from the single object we are provided with
    # (with the B image being of the object shifted by shiftYX).
    # We give each image half the intensity in order to conserve energy.
    dualObject = np.tile(obj[:,np.newaxis,:,:] / 2.0, (1,2,1,1))
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
    return forwardProjectACC(H, dualObject, CAindex, logPrint=False, progress=None)

def dualBackwardProjectACC_PIV(Ht, dualProjection, CAindex, shiftDescription):
    # Compute the reverse transform given the AB images (B image shifted by shiftYX).
    # First we do the reverse transformation on both images
    dualObject = backwardProjectACC(Ht, dualProjection, CAindex, logPrint=False, progress=None)
    # Now we reverse the shift on the B object
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], -shiftDescription)
    # Now, ideally the objects would match, but of course in practice there will be discrepancies,
    # especially if we are not using the correct shiftDescription.
    # To make the operation match the transpose of the forward operation,
    # we add the two objects and divide by 2 here
    return dualObject

def fusedBackwardProjectACC_PIV(Ht, dualProjection, CAindex, shiftDescription):
    dualObject = dualBackwardProjectACC_PIV(Ht, dualProjection, CAindex, shiftDescription)
    result = np.sum(dualObject, axis=1) / 2.0     # Merge the two backprojection
    return result

def deconvRL_PIV(H, Ht, imageAB, CAindex, maxIter, Xguess, shiftDescription):
    # Xguess is our single combined guess of the object
    Xguess = Xguess.copy()    # Because we will be updating it, and caller may not always be expecting that
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        relativeBlurDual = imageAB / forwardProjectACC_PIV(H, Xguess, CAindex, shiftDescription)
        Xguess *= fusedBackwardProjectACC_PIV(Ht, relativeBlurDual, CAindex, shiftDescription)
        Xguess[np.where(np.isnan(Xguess))] = 0
        t1 = time.time() - t0
    return Xguess

def RollNoninteger(obj, amount, axis=0):
    intAmount = int(amount)
    frac = amount - intAmount
    result1 = np.roll(obj, intAmount, axis=axis)
    result2 = np.roll(obj, intAmount+1, axis=axis)
    return result1 * (1-frac) + result2 * frac
    
def ShiftObject(obj, shiftYX):
    # Transform a 3D object according to the flow information provided in shiftDescription
    # For now I just consider a uniform translation in xy
    # 
    # TODO: We need to worry about conserving energy during the shift. 
    # For now I will do a circular shift in order to avoid having to worry about this!
    result = RollNoninteger(obj, shiftYX[0], axis=len(obj.shape)-2)
    return RollNoninteger(result, shiftYX[1], axis=len(obj.shape)-1)
  
# Generate a synthetic shift in the B image
shiftDescription = np.array([-10,20])
dualObject = np.tile(obj[:,np.newaxis,:,:], (1,2,1,1))
dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
plt.imshow(dualObject[0,0])
plt.show()
plt.imshow(dualObject[0,1])
plt.show()

In [None]:
zPlaneToModel = H.shape[0]-1   # Modelling native focal plane
zPlaneToModel = 7   # Modelling some way from the native focal plane
thisH = H[zPlaneToModel:zPlaneToModel+1]
thisHt = Ht[zPlaneToModel:zPlaneToModel+1]
pivCAindex = CAindex[:,zPlaneToModel:zPlaneToModel+1]

In [None]:
def ScoreShift(candidateShiftYX, imageAB):
    print('======== Score shift ========', candidateShiftYX)
    thisH = H[zPlaneToModel:zPlaneToModel+1]
    thisHt = Ht[zPlaneToModel:zPlaneToModel+1]
    initialEstimate = fusedBackwardProjectACC_PIV(thisHt, imageAB, pivCAindex, candidateShiftYX)
    res = deconvRL_PIV(thisH, thisHt, imageAB, pivCAindex, maxIter=6, Xguess=initialEstimate, shiftDescription=candidateShiftYX)
    candidateImageAB = forwardProjectACC_PIV(thisH, res, pivCAindex, candidateShiftYX)
    # Evaluate the match using SSD
    renormHack = np.average(candidateImageAB) / np.average(imageAB)
    ssdScore = np.sum((candidateImageAB/renormHack - imageAB)**2)
    print('return', ssdScore, 'for', candidateShiftYX)
    with open('scores.txt', 'a') as f:
        f.write('%f\t%f\t%f\n' % (ssdScore, candidateShiftYX[0], candidateShiftYX[1]))

    return ssdScore

if False:
    # Run the actual optimizer to find the shift value for an input frame pair
    actualShifts = [(1,0)]#[(10,4), (0,0), (-4,17), (2,-7)]
    for actualShift in actualShifts:
        # Generate a camera image pair based on a chosen shift transform
        imageAB = forwardProjectACC_PIV(thisH, obj, pivCAindex, actualShift)
        # Optimize to obtain the best-matching shift
        shift = scipy.optimize.minimize(ScoreShift, (0,0), args=(imageAB), options={'eps': 1e-03})
        print('shift', shift['x'], 'for actual shift', actualShift)

In [None]:
# To understand how the optimizer is behaving, scan the search space rather than optimizing
actualShift = (1,0)
# Generate a camera image pair based on a chosen shift transform
imageAB = forwardProjectACC_PIV(thisH, obj, pivCAindex, actualShift)
scores = []
for dx in range(-2,4,1):
    scores.append([dx, ScoreShift(np.array([dx,0]), imageAB)])
scores = np.array(scores)

In [None]:
# This gives a nice clear quadratic minimum, although biased to about 1.5 (true shift 1.0).
# I should remember that I don't expect a perfect result in the native focal plane!
# That said, the very small numbers being trialled by the optimizer on my mac pro do not bode well!
# Needs more investigation (and I should log each trial that is made)
# Very clear quadratic minimum at 1.0 when I use z plane index 7. This is good news!

# Next steps:
# 1. Investigate why the mac pro optimizer did not converge to the correct minimum.
# 2. See if there's a way to allow optimizer to stop when converged to 0.1 accuracy(!), and/or investigate which is the best optimizer
# 3. Write code to compare against PIV performed on independent reconstructions
# 4. Expand the PIV code to more than one z plane

In [None]:
plt.plot(scores[:,0], scores[:,1], 'x')