NOTE: my c code can't cope with an array that has been transposed (Probably because it assumes adjacent strides in x?). I should probably fix that, though I doubt it's a performance issue to just .copy() the transposed array, which is what I do at the moment. I should really be swapping the transpose to be the final operation (in the case of square inputs) anyway. However, it looks as if a decent chunk of the fft time is actually being spent in the other ffts (for the reduced arrays) anyway!

### Performance investigation

Actual thread execution time seems to grow considerably with the number of threads, i.e. efficiency falls. It's less than 50% efficient by the time I hit 12 threads (on an 8-core machine). I am not sure how to try and work out what the cause of that is. I could go back to working on dummy data (no transfers between processes) and see if that makes a difference to *that* in particular. (I think I may have looked only at the dead time overheads - which are also an issue).
I looked at user and system cpu time, and with Instruments. Looks like 20% of time is spent in madvise (macbook, 2 threads). I am not sure exactly why or where that is happening. It seems to be related to python memory management in some way. I should check if that grows with number of threads on mac pro, and if it is the same when I use dummy work blocks rather than passing to subprocesses

-> revisit this now I am using mmap rather than pickle - hopefully some of this is now fixed.

### Performance improvements to make

Consider moving transpose to final operation (since it's probably faster than reversing an array - although it may impact subsequent fft performance?), in the case of square arrays

In [None]:
import numpy as np
import numexpr as ne
import scipy.ndimage, scipy.optimize, scipy.io
from scipy.ndimage.filters import convolve
from scipy.signal import convolve2d, fftconvolve
from scipy.optimize import Bounds
import os, sys, time, warnings
import matplotlib.pyplot as plt
%matplotlib inline
import tifffile
import h5py
import multiprocessing
from functools import partial
from joblib import Parallel, delayed
import cProfile, pstats
import glob, csv
from tqdm import tqdm_notebook as tqdm
from numba import jit
sys.path.insert(0, 'py_symmetry')
import py_symmetry as jps
from skimage.transform import PiecewiseAffineTransform, warp
import j_py_sad_correlation as jpsad

from __future__ import print_function

# I don't know if these are necessary, but it has been suggested that low-level threading
# does not interact well with the joblib Parallel feature.
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['MKL_DYNAMIC'] = 'FALSE'

try:
    os.mkdir('perf_diags')
except:
    pass  # Probably the directory already exists

# My code expects these to start as None, and I want to do this right at the top
# to minimize the rish of accidentally resetting them to None after I've spent hours
# accumulating valuable data in them!
shiftHistoryJoint = None
shiftHistoryRaw = None
shiftHistoryNaive = None

In [None]:
matPath = 'PSFmatrix/PSFmatrix_M22.2NA0.5MLPitch125fml3125from-110to110zspacing4Nnum19lambda520n1.33.mat'

if False:
    warnings.warn('WARNING: Switched to faster matrix for testing')
    matPath = 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-26to0zspacing2Nnum15lambda520n1.0.mat'
elif True:
    warnings.warn('WARNING: Switched to faster and closer-spaced matrix for testing')
    matPath = 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-13to0zspacing0.5Nnum15lambda520n1.0.mat'   

In [None]:
mmapPath = os.path.splitext(matPath)[0]
try:
    os.mkdir(mmapPath)
except:
    pass  # Probably the directory already exists

_HPathFormat = mmapPath+'/H{z:02d}.array'
_HtPathFormat = mmapPath+'/Ht{z:02d}.array'
_HReducedShape = []
_HtReducedShape = []
if True:
    # Load the matrices from the .mat file.
    # This is slow since they must be decompressed and are rather large! (9.5GB each, in single-precision FP)
    with h5py.File(matPath, 'r') as f:
        print('Load CAindex')
        sys.stdout.flush()
        _CAindex = f['CAindex'].value.astype('int')
        
        print('Load H')
        sys.stdout.flush()
        _H = f['H'].value.astype('float32')
        Nnum = _H.shape[2]
        aabbRange = int((Nnum+1)/2)
        for cc in tqdm(range(_H.shape[0]), desc='memmap H'):
            HCC =  _H[cc, :aabbRange, :aabbRange, _CAindex[0,cc]-1:_CAindex[1,cc], _CAindex[0,cc]-1:_CAindex[1,cc]]
            _HReducedShape.append(HCC.shape)
            a = np.memmap(_HPathFormat.format(z=cc), dtype='float32', mode='w+', shape=HCC.shape)
            a[:,:,:,:] = HCC[:,:,:,:]
            del a
        #del _H        # H is needed for old code
        
        print('Load Ht')
        sys.stdout.flush()
        _Ht = f['Ht'].value.astype('float32')
        for cc in tqdm(range(_Ht.shape[0]), desc='memmap Ht'):
            HtCC =  _Ht[cc, :aabbRange, :aabbRange, _CAindex[0,cc]-1:_CAindex[1,cc], _CAindex[0,cc]-1:_CAindex[1,cc]]
            _HtReducedShape.append(HtCC.shape)
            a = np.memmap(_HtPathFormat.format(z=cc), dtype='float32', mode='w+', shape=HtCC.shape)
            a[:,:,:,:] = HtCC[:,:,:,:]
            del a
        #del _Ht        # Ht is needed for old code
    np.save(mmapPath+'/HReducedShape.npy', _HReducedShape)
    np.save(mmapPath+'/HtReducedShape.npy', _HtReducedShape)
else:
    _HReducedShape = np.load(mmapPath+'/HReducedShape.npy')
    _HtReducedShape = np.load(mmapPath+'/HtReducedShape.npy')

In [None]:
import pyfftw
pyfftw.interfaces.cache.enable()
pyfftw.interfaces.cache.set_keepalive_time(10.0)

if False:
    # Old FFT code
    def myFFT2(mat, shape):
        # Perform a 'float' FFT on the matrix we are passed.
        # It would probably be faster if there was a way to perform the FFT natively on the 'float' type,
        # but scipy does not seem to support that option
        #
        # With my Mac Pro install, we hit a FutureWarning within scipy.
        # This wrapper just suppresses that warning.
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            return scipy.fftpack.fft2(mat, shape).astype('complex64')
else:
    # New FFT code based on FFTW.
    # It may be possible to speed this up slightly be using their "proper" interface
    # rather than the scipy-like interface (see pyfftw documentation)
    def myFFT2(mat, shape):
        return pyfftw.interfaces.scipy_fftpack.fft2(mat, shape)

    def myIFFT2(mat, shape):
        return pyfftw.interfaces.numpy_fft.irfft2(mat, shape)

In [None]:
class HMatrix:
    def __init__(self, HPathFormat, HtPathFormat, HReducedShape, numZ=None, zStart=0):
        self.HPathFormat = HPathFormat
        self.HtPathFormat = HtPathFormat
        self.HReducedShape = HReducedShape   # Same for Ht
        if numZ is not None:
            self.numZ = numZ
        else:
            self.numZ = len(HReducedShape)
        self.zStart = zStart
        self.Hcache = dict()
        self.cacheHits = 0
        self.cacheMisses = 0
        self.cacheSize = 0
  
    def Hcc(self, cc, useHt):
        if useHt:
            pathFormat = self.HtPathFormat
        else:
            pathFormat = self.HPathFormat
        result = np.memmap(pathFormat.format(z=cc+self.zStart), dtype='float32', mode='r', shape=tuple(self.HReducedShape[cc+self.zStart]))
        return result

    def fH_uncached(self, cc, bb, aa, useHt, transposePSF, fshape):
        if transposePSF:
            return myFFT2(self.Hcc(cc, useHt)[bb, aa].transpose(), fshape)
        else:
            return myFFT2(self.Hcc(cc, useHt)[bb, aa], fshape)
    
    def fH(self, cc, bb, aa, useHt, transposePSF, fshape):
        key = '%d,%d,%d,%d,%d'%(cc, bb, aa, int(useHt), int(transposePSF))
        if not key in self.Hcache:
            result = self.fH_uncached(cc, bb, aa, useHt, transposePSF, fshape)
            self.Hcache[key] = result
            self.cacheSize += result.nbytes
            self.cacheMisses += 1
        else:
            self.cacheHits += 1
        return self.Hcache[key]
    
    def IterableBRange(self, cc):
        return range(self.HReducedShape[cc+self.zStart][0])
    
    def PSFShape(self, cc):
        return (self.HReducedShape[cc+self.zStart][2], self.HReducedShape[cc+self.zStart][3])
        
    def Nnum(self, cc):
        return self.HReducedShape[cc+self.zStart][0]*2-1

In [None]:
# Load the input image
LFmovie = tifffile.imread('Data/02_Rectified/exampleData/20131219WORM2_small_full_neg_X1_N15_cropped_uncompressed.tif')
LFmovie = LFmovie.transpose()[np.newaxis,:,:]

LFIMG = LFmovie[0].astype('float32')
if True:
    # Actual (cropped) image loaded from disk
    inputImage = LFIMG
else:
    inputImage = np.tile(LFIMG,(2,2))

## Objects stored in the .mat file

### Optical parameters from GUI: [? means I am not sure if or where it is stored]

M<br>
NA<br>
d    "fml" in GUI (stored here in units of m)<br>
pixelPitch is "ML pitch" / "Nnum" (stored here in units of m)<br>
? n<br>
? wavelength<br>

### User parameters from GUI:

OSR<br>
zspacing<br>
? z-min<br>
? z-max<br>
Nnum<br>


### Misc parameter:

fobj (can presumably be deduced from mag, NA etc?)<br>

### The actual arrays:

H:             shape (56, 19, 19, 343, 343), type "f4"<br>
Ht:            shape (56, 19, 19, 343, 343), type "f4"<br>

### Information about object space:

x1objspace:    x pixel positions in object space (19 elements across one lenslet)<br>
x2objspace:    y pixel positions in object space (19 elements across one lenslet)<br>
x3objspace:    z pixel positions in object space (56 z planes)<br>
x1space:       x pixel positions in lenslet space (19 elements across one lenslet)<br>
x2space:       y pixel positions in lenslet space (19 elements across one lenslet)<br>

### Not sure what these are exactly:

CAindex:       shape (2, 56) - something about the start and end index of the PSF array, for each z plane.<br>
CP:            shape (343, 1)<br>
MLARRAY:       shape (1141, 1141), type "|V16"<br>
objspace:      shape (56, 1, 1)<br>
settingPSF:    You would think this contains the GUI parameters, but e.g. print(f['settingPSF']['M'].value) gives a strange 3x1 array [50, 50, 46, 50] etc...?<br>


In [None]:
# Note: I am a little unsure how to interpret the arrays I have loaded from the .mat.
# From looking at how H and CAindex are accessed, it looks as if the shapes I have loaded
# are the reversal of the shape ordering as expected in Matlab.
# I suppose that makes sense given that matlab is column-major in its array accesses.
# The data has been loaded from disk in the order it is *stored*,
# and I therefore need to flip around all the matlab array index ordering 
# (e.g. matlabArray(1,2,3) becomes pythonArray[3,2,1])

In [None]:
import resource

def noProgressBar(work, **kwargs):
    # Dummy function to be used in place of tqdm when we don't want to show a progress bar
    return work    

def cpuTime(kind):
    rus = resource.getrusage(resource.RUSAGE_SELF)    
    ruc = resource.getrusage(resource.RUSAGE_CHILDREN)
    if (kind == 'self'):
        return np.array([rus.ru_utime, rus.ru_stime])
    elif (kind == 'children'):
        return np.array([ruc.ru_utime, ruc.ru_stime])
    else:
        return np.array([rus.ru_utime+ruc.ru_utime, rus.ru_stime+ruc.ru_stime])

In [None]:
from scipy._lib._version import NumpyVersion
from numpy.fft import fft, fftn, rfft, rfftn, irfftn
_rfft_mt_safe = (NumpyVersion(np.__version__) >= '1.9.0.dev-e24486e')

def _next_regular(target):
    """
    Find the next regular number greater than or equal to target.
    Regular numbers are composites of the prime factors 2, 3, and 5.
    Also known as 5-smooth numbers or Hamming numbers, these are the optimal
    size for inputs to FFTPACK.

    Target must be a positive integer.
    """
    if target <= 6:
        return target

    # Quickly check if it's already a power of 2
    if not (target & (target-1)):
        return target

    match = float('inf')  # Anything found will be smaller
    p5 = 1
    while p5 < target:
        p35 = p5
        while p35 < target:
            # Ceiling integer division, avoiding conversion to float
            # (quotient = ceil(target / p35))
            quotient = -(-target // p35)

            # Quickly find next power of 2 >= quotient
            try:
                p2 = 2**((quotient - 1).bit_length())
            except AttributeError:
                # Fallback for Python <2.7
                p2 = 2**(len(bin(quotient - 1)) - 2)

            N = p2 * p35
            if N == target:
                return N
            elif N < match:
                match = N
            p35 *= 3
            if p35 == target:
                return p35
        if p35 < match:
            match = p35
        p5 *= 5
        if p5 == target:
            return p5
    if p5 < match:
        match = p5
    return match

def _centered(arr, newsize):
    # Return the center newsize portion of the array.
    currsize = np.array(arr.shape)
    newsize = np.asarray(newsize)
    if (len(currsize) > len(newsize)):
        newsize = np.append([currsize[0]], newsize)
    startind = (currsize - newsize) // 2
    endind = startind + newsize
    myslice = [slice(startind[k], endind[k]) for k in range(len(endind))]
    return arr[tuple(myslice)]

def tempMul(bb,fshape,result):
    result *= np.exp(-1j * bb * 2*np.pi / fshape[-2] * np.arange(result.shape[-2],dtype='complex64'))[...,np.newaxis]
    return result

def expand2(result, bb, aa, Nnum, fshape):
    tileFactor = (1,) * (len(result.shape)-2) + (Nnum, 1)
    return np.tile(result, tileFactor)

def expand(reducedF, bb, aa, Nnum, fshape):
    tileFactor = (1,) * (len(reducedF.shape)-1) + (int(Nnum/2+1),)
    result = np.tile(reducedF, tileFactor)
    result = result[...,:int(fshape[-1]/2+1)]
    result *= np.exp(-1j * aa * 2*np.pi / fshape[-1] * np.arange(result.shape[-1],dtype='complex64'))
    result = expand2(result, bb, aa, Nnum, fshape)
    return tempMul(bb,fshape,result)

def special_rfftn(in1, bb, aa, Nnum, fshape):
    # Compute the fft of elements in1[bb::Nnum,aa::Nnum], after in1 has been zero-padded out to fshape
    # We exploit the fact that fft(masked-in1) is fft(arr[::Nnum,::Nnum]) replicated Nnum times.
    reducedShape = ()
    for d in fshape:
        assert((d % Nnum) == 0)
        reducedShape = reducedShape + (int(d/Nnum),)
    reduced = in1[...,bb::Nnum,aa::Nnum]

    # Compute an array giving rfft(mask(in1))
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        reducedF = myFFT2(reduced, reducedShape)
    return expand(reducedF, bb, aa, Nnum, fshape)

def convolutionShape(in1, in2Shape, Nnum):
    # Logic copied from fftconvolve source code
    s1 = np.array(in1.shape)
    s2 = np.array(in2Shape)
    if (len(s1) == 3):   # Cope with case where we are processing multiple reconstructions in parallel
        s1 = s1[1:]
    shape = s1 + s2 - 1
    if False:
        # TODO: I haven't worked out if/how I can do this yet.
        # This is the original code in fftconvolve, which says:
        # Speed up FFT by padding to optimal size for FFTPACK
        fshape = [_next_regular(int(d)) for d in shape]
    else:
        fshape = [int(np.ceil(d/float(Nnum)))*Nnum for d in shape]
    fslice = tuple([slice(0, int(sz)) for sz in shape])
    return (fshape, fslice, s1)
    
def _special_fftconvolve_part1(in1, bb, aa, Nnum, in2Shape):
    assert((len(in1.shape) == 2) or (len(in1.shape) == 3))
    assert(len(in2Shape) == 2)
    (fshape, fslice, s1) = convolutionShape(in1, in2Shape, Nnum)
    # Pre-1.9 NumPy FFT routines are not threadsafe - this code requires numpy 1.9 or greater
    assert(_rfft_mt_safe)
    fa = special_rfftn(in1, bb, aa, Nnum, fshape)
    return (fa, fshape, fslice, s1)

def special_fftconvolve_part3b(fab, fshape, fslice, s1):
    assert(len(fab.shape) == 2)
    ret = myIFFT2(fab, fshape)[fslice].copy()
    return _centered(ret, s1)

def special_fftconvolve_part3(fab, fshape, fslice, s1):
    # TODO: This gymnastics is probably unnecessary now I call ifft2 rather than fftn
    if (len(fab.shape) == 2):
        return special_fftconvolve_part3b(fab, fshape, fslice, s1)
    else:
        results = []
        for n in range(fab.shape[0]):
            results.append(special_fftconvolve_part3(fab[n], fshape, fslice, s1))
        return np.array(results)

def _special_fftconvolve(in1, bb, aa, Nnum, in2Shape, accum, fb=None):
    '''
    in1 consists of subapertures of size Nnum x Nnum pixels.
    We are being asked to convolve only pixel (bb,aa) within each subaperture, i.e.
        tempSlice = np.zeros(in1.shape, dtype=in1.dtype)
        tempSlice[bb::Nnum, aa::Nnum] = in1[bb::Nnum, aa::Nnum]
    This allows us to take a significant shortcut in computing the FFT for in1.
    '''
    (fa, fshape, fslice, s1) = _special_fftconvolve_part1(in1, bb, aa, Nnum, in2Shape)
    assert(fa.dtype == np.complex64)   # Keep an eye out for any reversion to double-precision
    assert(fb.dtype == np.complex64)   # Keep an eye out for any reversion to double-precision

    if accum is None:
        accum = fa*fb
    else:
        accum += fa*fb
    assert(accum.dtype == np.complex64)   # Keep an eye out for any reversion to double-precision
    return (accum, fshape, fslice, s1)

In [None]:
def forwardProjectForZ_old(HCC, realspaceCC):
    singleJob = (len(realspaceCC.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspaceCC = realspaceCC[np.newaxis,:,:]
    # Iterate over each lenslet pixel
    Nnum = HCC.shape[1]
    TOTALprojection = np.zeros(realspaceCC.shape, dtype='float32')
    for bb in tqdm(range(Nnum), leave=False, desc='Forward-project - y'):
        for aa in tqdm(range(Nnum), leave=False, desc='Forward-project - x'):
            # Extract the part of H that represents this lenslet pixel
            Hs = HCC[bb, aa]
            for n in range(realspaceCC.shape[0]):
                # Create a workspace representing just the voxels cc,bb,aa behind each lenslet (the rest is 0)
                tempspace = np.zeros((realspaceCC[n].shape[0], realspaceCC[n].shape[1]), dtype='float32');
                tempspace[bb::Nnum, aa::Nnum] = realspaceCC[n, bb::Nnum, aa::Nnum]  # ???? what to do about index ordering?
                # Compute how those voxels project onto the sensor, and accumulate
                TOTALprojection[n] += fftconvolve(tempspace, Hs, 'same')
    if singleJob:
        return TOTALprojection[0]
    else:
        return TOTALprojection
    
def backwardProjectForZ_old(HtCC, projection, progress=tqdm):
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    # Iterate over each lenslet pixel
    Nnum = HtCC.shape[1]
    tempSliceBack = np.zeros(projection.shape, dtype='float32')        
    for aa in progress(range(Nnum), leave=False, desc='y'):
        for bb in range(Nnum):
            # Extract the part of Ht that represents this lenslet pixel
            Hts = HtCC[bb, aa]
            for n in range(projection.shape[0]):
                # Create a workspace representing just the voxels cc,bb,aa behind each lenslet (the rest is 0)
                tempSlice = np.zeros(projection[n].shape, dtype='float32')
                tempSlice[bb::Nnum, aa::Nnum] = projection[n, bb::Nnum, aa::Nnum]
                # Compute how those voxels back-project from the sensor
                tempSliceBack[n] += fftconvolve(tempSlice, Hts, 'same')
    if singleJob:
        return tempSliceBack[0]
    else:
        return tempSliceBack

def backwardProjectACC_original(Ht, projection, CAindex, progress=tqdm, planes=None):
    if progress is None:
        progress = noProgressBar        
    Backprojection = np.zeros((Ht.shape[0], projection.shape[0], projection.shape[1]), dtype='float32')
    # Iterate over each z plane
    if planes is None:
        planes = range(Ht.shape[0])
    for cc in progress(planes, desc='Back-project - z'):
        HtCC =  Ht[cc, :, :, CAindex[0,cc]-1:CAindex[1,cc], CAindex[0,cc]-1:CAindex[1,cc]]
        Backprojection[cc] = backwardProjectForZ_old(HtCC, projection, progress=progress)
    return Backprojection

In [None]:
def deconvRL(hMatrix, Htf, maxIter, Xguess, logPrint=True):
    # Note:
    #  Htf is the *initial* backprojection of the camera image
    #  Xguess is the initial guess for the object
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        HXguess = forwardProjectACC(hMatrix, Xguess, logPrint=logPrint)
        HXguessBack = backwardProjectACC(hMatrix, HXguess, logPrint=logPrint)
        errorBack = Htf / HXguessBack
        Xguess = Xguess * errorBack
        Xguess[np.where(np.isnan(Xguess))] = 0
        ttime = time.time() - t0
        print('iter %d | %d, took %.1f secs. Max val %f' % (i+1, maxIter, ttime, np.max(Xguess)))
    return Xguess

In [None]:
# Note: H.shape in python is (<num z planes>, Nnum, Nnum, <psf size>, <psf size>),
#                       e.g. (56, 19, 19, 343, 343)

class _Projector(object):
    # Note: the variable names in this class mostly imply we are doing the back-projection
    # (e.g. Ht, 'projection', etc. However, the same code also does forward-projection!)
    def __init__(self, projection, hMatrix, cc):
        # Note: H and Hts are not stored as class variables.
        # I had a lot of trouble with them and multithreading,
        # and eventually settled on having them in shared memory.
        # As I encapsulate more stuff in this class, I could bring them back as class variables...

        self.cpuTime = np.zeros(2)
        
        # Nnum: number of pixels across a lenslet array (after rectification)
        self.Nnum = hMatrix.Nnum(cc)
        
        # This next chunk of logic copied from fftconvolve source code.
        # s1, s2: shapes of the input arrays
        # fshape: shape of the (full, possibly padded) result array in Fourier space
        # fslice: slicing tuple specifying the actual result size that should be returned
        self.s1 = np.array(projection.shape)
        self.s2 = np.array(hMatrix.PSFShape(cc))
        shape = self.s1 + self.s2 - 1
        if False:
            # TODO: I haven't worked out if/how I can do this yet.
            # This is the original code in fftconvolve, which says:
            # Speed up FFT by padding to optimal size for FFTPACK
            self.fshape = [_next_regular(int(d)) for d in shape]
        else:
            self.fshape = [int(np.ceil(d/float(Nnum)))*Nnum for d in shape]
        self.fslice = tuple([slice(0, int(sz)) for sz in shape])
        
        # rfslice: slicing tuple to crop down full fft array to the shape that would be output from rfftn
        self.rfslice = (slice(0,self.fshape[0]), slice(0,int(self.fshape[1]/2)+1))
        return
    
    def _MirrorXArray(self, fHtsFull):
        padLength = self.fshape[0] - self.s2[0]
        if False:
            fHtsFull = fHtsFull.conj() * np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[0]) * np.arange(self.fshape[0],dtype='complex64')[:,np.newaxis])
            fHtsFull[:,1::] = fHtsFull[:,1::][:,::-1]
            return fHtsFull
        else:
            temp = np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[0]) * np.arange(self.fshape[0])).astype('complex64')
            if True:
                result = jps.mirrorX(fHtsFull, temp)
            else:
                result = np.empty(fHtsFull.shape, dtype=fHtsFull.dtype)
                result[:,0] = fHtsFull[:,0].conj()*temp
                for i in range(1,fHtsFull.shape[1]):
                    result[:,i] = (fHtsFull[:,fHtsFull.shape[1]-i].conj()*temp)
            return result

    def _MirrorYArray(self, fHtsFull):
        padLength = self.fshape[1] - self.s2[1]
        if False:
            fHtsFull = fHtsFull.conj() * np.exp(1j * (1+padLength) * 2*np.pi / self.fshape[1] * np.arange(self.fshape[1],dtype='complex64'))
            fHtsFull[1::] = fHtsFull[1::][::-1]
            return fHtsFull
        else:
            temp = np.exp((1j * (1+padLength) * 2*np.pi / self.fshape[1]) * np.arange(self.fshape[1])).astype('complex64')
            if True:
                result = jps.mirrorY(fHtsFull, temp)
            else:
                result = np.empty(fHtsFull.shape, dtype=fHtsFull.dtype)
                result[0] = fHtsFull[0].conj()*temp
                for i in range(1,fHtsFull.shape[0]):
                    result[i] = (fHtsFull[fHtsFull.shape[0]-i].conj()*temp)
            return result
        
    def _convolvePart3(self, projection, bb, aa, fHtsFull, mirrorX, accum):
        # TODO: to make this work, I need the full matrix for fHts and then I need to slice it 
        # to the correct shape when I call through to special_fftconvolve here. Is fshape what I need?
        cpu0 = cpuTime('both')
        (accum,_,_,_) = _special_fftconvolve(projection,bb,aa,self.Nnum,self.s2,accum,fb=fHtsFull[self.rfslice])
        self.cpuTime += cpuTime('both')-cpu0
        if mirrorX:
            fHtsFull = self._MirrorXArray(fHtsFull)
            cpu0 = cpuTime('both')
            (accum,_,_,_) = _special_fftconvolve(projection,self.Nnum-bb-1,aa,self.Nnum,self.s2,accum,fb=fHtsFull[self.rfslice]) 
            self.cpuTime += cpuTime('both')-cpu0
        return accum

    def _convolvePart2(self, projection, bb, aa, fHtsFull, mirrorY, mirrorX, accum):
        accum = self._convolvePart3(projection,bb,aa,fHtsFull,mirrorX,accum)
        if mirrorY:
            fHtsFull = self._MirrorYArray(fHtsFull)
            accum = self._convolvePart3(projection,bb,self.Nnum-aa-1,fHtsFull,mirrorX,accum)
        return accum

    def _convolve(self, projection, hMatrix, cc, bb, aa, backwards, accum):
        cent = int(self.Nnum/2)

        mirrorX = (bb != cent)
        mirrorY = (aa != cent)
        transpose = ((aa != bb) and (aa != (self.Nnum-bb-1)))
            
        # TODO: it would speed things up if I could avoid computing the full fft for Hts.
        # However, it's not immediately clear to me how to fill out the full fftn array from rfftn
        # in the case of a 2D transform.
        # For 1D it's the reversed conjugate, but for 2D it's more complicated than that.
        # It's possible that it's actually nontrivial, in spite of the fact that
        # you can get away without it when only computing fft/ifft for real arrays)
        fHtsFull = hMatrix.fH(cc, bb, aa, backwards, False, self.fshape)
        accum = self._convolvePart2(projection,bb,aa,fHtsFull,mirrorY,mirrorX, accum)
        if transpose:
            if (self.fshape[0] == self.fshape[1]):
                # For a square array, the FFT of the transpose is just the transpose of the FFT.
                # The copy() is because my C code currently can't cope with
                # a transposed array (non-contiguous strides in x)
                fHtsFull = fHtsFull.transpose().copy()    
            else:
                # For a non-square array, we have to compute the FFT for the transpose.
                fHtsFull = hMatrix.fH(cc, bb, aa, backwards, True, self.fshape)

            # Note that mx,my need to be swapped following the transpose
            accum = self._convolvePart2(projection,aa,bb,fHtsFull,mirrorX,mirrorY, accum) 
        assert(accum.dtype == np.complex64)   # Keep an eye out for any reversion to double-precision
        return accum
    
def _projectForZY(cc, bb, source, hMatrix, backwards):
    f = open('perf_diags/%d_%d.txt'%(cc,bb), "w")
    t1 = time.time()
    singleJob = (len(source.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        source = source[np.newaxis,:,:]
    result = None
    projector = _Projector(source[0], hMatrix, cc)
    projector.cpuTime = np.zeros(2)
    for aa in range(bb,int((Nnum+1)/2)):
        result = projector._convolve(source, hMatrix, cc, bb, aa, backwards, result)
    t2 = time.time()
    assert(result.dtype == np.complex64)   # Keep an eye out for any reversion to double-precision
    f.write('%d\t%f\t%f\t%f\t%f\t%f\n' % (os.getpid(), t1, t2, t2-t1, projector.cpuTime[0], projector.cpuTime[1]))
    f.close()
    if singleJob:
        return (result[0], cc, bb, t2-t1)
    else:
        return (np.array(result), cc, bb, t2-t1)
    
def projectForZ2(hMatrix, backwards, cc, source):
    result = None
    for bb in tqdm(hMatrix.IterableBRange(cc), leave=False, desc='Project - y'):
        (thisResult, _, _, _) = _projectForZY(cc, bb, source, hMatrix, backwards)
        if (result is None):
            result = thisResult
        else:
            result += thisResult
    # Actually, for forward projection we don't need to do this separately for every z,
    # but it's easier to do it for symmetry (and this function is not used in performance-critical code anyway)
    (fshape, fslice, s1) = convolutionShape(source, hMatrix.PSFShape(cc), hMatrix.Nnum(cc))
    return special_fftconvolve_part3(result, fshape, fslice, s1)
    
# Test the backprojection code against a slower definitive version
# (this code is here for now because this is where I have been working on stuff, but it could move)
# TODO: would be a better test if I use the hMatrix form of projectForZ
#testHtCC = np.random.random((5,5,30,30)).astype(np.float32)
#testHtCC = _Ht[13,int(_Ht.shape[1]/2)-2:int(_Ht.shape[1]/2)+3,int(_Ht.shape[2]/2)-2:int(_Ht.shape[2]/2)+3,_CAindex[0,13]-1:_CAindex[1,13], _CAindex[0,13]-1:_CAindex[1,13]]
testHCC = _H[13]
testHtCC = _Ht[13]

for fd in [False, True]:
    for shape in [(200,200), (200,300), (300,200)]:
        # Test both square and non-square, since they use different code
        testHMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape, numZ=1, zStart=13)  # Needs to be in here because caching is confused by changing the image shape
        testProjection = np.random.random(shape).astype(np.float32)
        if fd:
            testResultOld = forwardProjectForZ_old(testHCC, testProjection)
            testResultNew = projectForZ2(testHMatrix, False, 0, testProjection)
        else:
            testResultOld = backwardProjectForZ_old(testHtCC, testProjection)
            testResultNew = projectForZ2(testHMatrix, True, 0, testProjection)
        comparison = np.max(np.abs(testResultOld - testResultNew))
        print('test result (should be <<1): %e' % comparison)
        if (comparison > 1e-4):
            print(" -> WARNING: disagreement detected")
        else:
            print(" -> OK")
        
print('Done')

In [None]:
def backwardProjectACC(hMatrix, projection, planes=None, numjobs=multiprocessing.cpu_count(), progress=tqdm, logPrint=True):
    singleJob = (len(projection.shape) == 2)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        projection = projection[np.newaxis,:,:]
    if planes is None:
        planes = range(hMatrix.numZ)
    if progress is None:
        progress = noProgressBar        

    ru1 = cpuTime('both')

    Backprojection = np.zeros((hMatrix.numZ, projection.shape[0], projection.shape[1], projection.shape[2]), dtype='float32')
        
    # Set up the work to iterate over each z plane
    work = []
    for cc in planes:
        for bb in hMatrix.IterableBRange(cc):
            work.append((cc, bb, projection, hMatrix, True))

    # Run the multithreaded work
    t0 = time.time()
    results = Parallel(n_jobs=numjobs)\
            (delayed(_projectForZY)(*args) for args in progress(work, desc='Back-project - z', leave=False))
    ru2 = cpuTime('both')

    # Gather together and sum the results for each z plane
    t1 = time.time()
    fourierZPlanes = [None]*hMatrix.numZ
    elapsedTime = 0
    for (result, cc, bb, t) in results:
        elapsedTime += t
        if fourierZPlanes[cc] is None:
            fourierZPlanes[cc] = result
        else:
            fourierZPlanes[cc] += result
    
    # Compute the FFT for each z plane
    for cc in planes:
        # A bit complicated here to set up the correct inputs for convolutionShape...
        (fshape, fslice, s1) = convolutionShape(projection, hMatrix.PSFShape(cc), hMatrix.Nnum(cc))
        Backprojection[cc] = special_fftconvolve_part3(fourierZPlanes[cc], fshape, fslice, s1)        
    t2 = time.time()
    assert(Backprojection.dtype == np.float32)   # Keep an eye out for any reversion to double-precision
   
    # Save some diagnostics
    if logPrint:
        print('work elapsed wallclock time %f'%(t1-t0))
        print('work elapsed thread time %f'%elapsedTime)
        print('work delta rusage:', ru2-ru1)
        print('FFTs took %f'%(t2-t1))
    
    f = open('overall.txt', 'w')
    f.write('%f\t%f\t%f\t%f\t%f\t%f\n' % (t0, t1, t1-t0, t2-t1, (ru2-ru1)[0], (ru2-ru1)[1]))
    f.close()

    if singleJob:
        return Backprojection[:,0]
    else:
        return Backprojection

def forwardProjectACC(hMatrix, realspace, planes=None, numjobs=multiprocessing.cpu_count(), progress=tqdm, logPrint=True):
    singleJob = (len(realspace.shape) == 3)
    if singleJob:   # Cope with both a single 2D plane and an array of multiple 2D planes to process independently
        realspace = realspace[:,np.newaxis,:,:]
    if planes is None:
        planes = range(hMatrix.numZ)
    if progress is None:
        progress = noProgressBar        

    # Set up the work to iterate over each z plane
    work = []
    for cc in planes:
        for bb in hMatrix.IterableBRange(cc):
            work.append((cc, bb, realspace[cc], hMatrix, False))

    # Run the multithreaded work
    t0 = time.time()
    results = Parallel(n_jobs=numjobs)\
                (delayed(_projectForZY)(*args) for args in progress(work, desc='Forward-project - z', leave=False))

    # Gather together and sum all the results
    t1 = time.time()
    fourierProjection = [None]*hMatrix.numZ
    elapsedTime = 0
    for (result, cc, bb, t) in results:
        elapsedTime += t
        if fourierProjection[cc] is None:
            fourierProjection[cc] = result
        else:
            fourierProjection[cc] += result

    # Compute and accumulate the FFT for each z plane
    TOTALprojection = None
    for cc in planes:
        # A bit complicated here to set up the correct inputs for convolutionShape...
        (fshape, fslice, s1) = convolutionShape(realspace[cc], hMatrix.PSFShape(cc), hMatrix.Nnum(cc))
        thisProjection = special_fftconvolve_part3(fourierProjection[cc], fshape, fslice, s1)        
        if TOTALprojection is None:
            TOTALprojection = thisProjection
        else:
            TOTALprojection += thisProjection
    t2 = time.time()
    assert(TOTALprojection.dtype == np.float32)   # Keep an eye out for any reversion to double-precision
            
    # Print out some diagnostics
    if (logPrint):
        print('work elapsed wallclock time %f'%(t1-t0))
        print('work elapsed thread time %f'%elapsedTime)
        print('FFTs took %f'%(t2-t1))
        
    if singleJob:
        return TOTALprojection[0]
    else:
        return TOTALprojection

if False:
    # Temporary call to test parallelization
    temp = backwardProjectACC(hMatrix, inputImage, planes=[0], numjobs=3)
    
if True:
    # Temporary code to test running with an image pair
    # This is maybe not a comprehensive test, but it run with two different (albeit proportional)
    # images and checks that the result matches the result for two totally independent calls on a single array.
    hMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape)
    candidate = np.tile(inputImage[np.newaxis,:,:], (2,1,1))
    candidate[1] *= 1.4
    planesToUse = None
    planesToUse = range(3,4)
    if planesToUse is None:
        numPlanesToUse = hMatrix.numZ
    else:
        numPlanesToUse = len(planesToUse)
    print('Running (%d planes x2)'%numPlanesToUse)
    
    t1 = time.time()
    temp = backwardProjectACC(hMatrix, candidate, planes=planesToUse, numjobs=1, progress=None)
    print('New method took', time.time()-t1)

    print('Running (%d planes x10)'%numPlanesToUse)
    t1 = time.time()
    temp = backwardProjectACC(hMatrix, np.tile(candidate, (5,1,1)), planes=planesToUse, numjobs=1, progress=None)
    print('New method took', time.time()-t1)

    # This is just backprojecting the first image, rather than the image image pair,
    # so is doing half the amount of work
    t1 = time.time()
    temp2 = backwardProjectACC_original(_H, candidate[0], _CAindex, planes=planesToUse, progress=None)
    print('Old method took', time.time()-t1)


    dualRoundtrip = forwardProjectACC(hMatrix, temp, planes=planesToUse)

    temp = backwardProjectACC(hMatrix, candidate[0], planes=planesToUse, numjobs=1)
    firstRoundtrip = forwardProjectACC(hMatrix, temp, planes=planesToUse, numjobs=1)    
    comparison = np.max(np.abs(firstRoundtrip - dualRoundtrip[0]))
    print('test result (should be <<1): %e' % comparison)
    if (comparison > 1e-6):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")
    
    temp = backwardProjectACC(hMatrix, candidate[1], planes=planesToUse, numjobs=1)
    secondRoundtrip = forwardProjectACC(hMatrix, temp, planes=planesToUse, numjobs=1)    
    comparison = np.max(np.abs(secondRoundtrip - dualRoundtrip[1]))
    print('test result (should be <<1): %e' % comparison)
    if (comparison > 1e-6):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")

In [None]:
def AnalyzeTestResults():
    with open('overall.txt') as f:
        csv_reader = csv.reader(f, delimiter='\t')
        for row in csv_reader:
            pass
    startTime = float(row[0])
    endTime = float(row[1])
    userTime = float(row[4])
    sysTime = float(row[5])

    rows = []
    for fn in glob.glob('perf_diags/*_*.txt'):
        with open(fn) as f:
            csv_reader = csv.reader(f, delimiter='\t')
            for row in csv_reader:
                pass
            rows.append(row)
    rows = np.array(rows).astype('float').transpose()
    firstPid = np.min(rows[0])
    rows[0] -= firstPid
    rows[1:3] -= startTime
    rows = rows[:,np.argsort(rows[1],kind='mergesort')]
    rows = rows[:,rows[0].argsort(kind='mergesort')]

    deadTimeStart = 0
    deadTimeMid = 0
    deadTimeEnd = 0
    threadWorkTime = 0
    thisThreadStartTime = 0
    longestThreadRunTime = 0
    longestThreadRunPid = -1
    latestStartTime = 0
    userTimeBreakdown = 0
    sysTimeBreakdown = 0
    for i in range(rows.shape[1]):
        pid = rows[0,i]
        t0 = rows[1,i]
        t1 = rows[2,i]
        userTimeBreakdown += rows[4,i]
        sysTimeBreakdown += rows[5,i]
        if (i == 0):
            deadTimeStart += t0
            thisThreadStartTime = t0
            latestStartTime = t0
        else:
            if (pid == rows[0,i-1]):
                deadTimeMid += t0 - rows[2,i-1]
            else:
                latestStartTime = max(latestStartTime, t0)
                thisThreadRunTime = rows[2,i-1]-thisThreadStartTime  # For previous pid
                if (thisThreadRunTime > longestThreadRunTime):
                    longestThreadRunPid = rows[0,i-1]
                    longestThreadRunTime = thisThreadRunTime
                thisThreadStartTime = t0
                deadTimeStart += t0
                deadTimeEnd += (endTime-startTime) - rows[2,i-1]
        threadWorkTime += t1-t0
        plt.plot([t0, t1], [pid, pid])
        plt.plot(t0, pid, 'x')
    thisThreadRunTime = t1-thisThreadStartTime
    if (thisThreadRunTime > longestThreadRunTime):
        longestThreadRunPid = pid
        longestThreadRunTime = thisThreadRunTime
    deadTimeEnd += (endTime-startTime) - rows[2,-1]
    print('Elapsed time', endTime-startTime)
    print('Longest thread run time', longestThreadRunTime, 'pid', int(longestThreadRunPid))
    print('Latest start time', latestStartTime)
    print('Thread work time', threadWorkTime)
    print('Dead time', deadTimeStart, deadTimeMid, deadTimeEnd)
    print(' Total', deadTimeStart + deadTimeMid + deadTimeEnd)
    print('User cpu time', userTime)
    print('System cpu time', sysTime)
    print('User cpu time for subset', userTimeBreakdown)
    print('System cpu time for subset', sysTimeBreakdown)

    with open('stats.txt', 'a') as f:
        f.write('%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\t%f\n' % (numJobsForTesting, endTime-startTime, threadWorkTime, \
                        longestThreadRunTime, latestStartTime, deadTimeStart, deadTimeMid, deadTimeEnd, userTime, sysTime))

    plt.xlim(0, endTime-startTime)
    plt.ylim(-0.5,np.max(rows[0])+0.5)
    plt.show()
    
if False:
    for numJobsForTesting in range(1,13):
        ru1 = cpuTime('both')
        temp = backwardProjectACC(HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape), inputImage, numjobs=numJobsForTesting, planes=None)
        ru2 = cpuTime('both')
        print('overall delta rusage:', ru2-ru1)
        AnalyzeTestResults()

In [None]:
def decomment(csvfile):
    for row in csvfile:
        raw = row.split('#')[0].strip()
        if raw: yield raw

def AnalyzeTestResults2(fn):
    rows = []
    with open(fn) as f:
        csv_reader = csv.reader(decomment(f), delimiter='\t')
        for row in csv_reader:
            rows.append(row)
    rows = np.array(rows).astype(np.float).transpose()

    plt.plot(rows[0], rows[2]/rows[2,0], label='work time')
    plt.plot(rows[0], np.sum(rows[5:8], axis=0)/(rows[0]*rows[1]), label='dead time')
    plt.plot(rows[0], rows[5]/(rows[0]*rows[1]), label='dead start')
    plt.plot(rows[0], rows[1]/(rows[1,0]/rows[0]), label='runtime excess')
    plt.ylim(0,2.5)
    plt.legend(loc=2)
    plt.show()

plt.title('Dummy work on empty arrays')
AnalyzeTestResults2('stats-dummy.txt')
plt.title('Real work')
AnalyzeTestResults2('stats-realwork.txt')
plt.title('Smaller memory footprint - no improvement')
AnalyzeTestResults2('stats-no-H.txt')
plt.title('New code')
AnalyzeTestResults2('stats-new-code.txt')

# Test a single backprojection and compare against definitive version

In [None]:
testHMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape)

In [None]:
print(testHMatrix.cacheMisses, testHMatrix.cacheHits, testHMatrix.cacheSize)
planesToProcess = None#range(6)
if False:
    t0 = time.time()
    Htf = backwardProjectACC_original(_Ht, inputImage, _CAindex, planes=planesToProcess, progress=None)
    print('Original code took %f'%(time.time()-t0))
elif True:
    # Profile my code (single-threaded) on a cropped version of Prevedel's data
    myStats = cProfile.run('Htf = backwardProjectACC(testHMatrix, inputImage, planes=planesToProcess, numjobs=1, progress=None)', 'mystats')
    p = pstats.Stats('mystats')
    p.strip_dirs().sort_stats('cumulative').print_stats(40)
elif False:
    # Profile my code (single-threaded) in the sort of scenario I would expect to run it in for my PIV experiments
    tempInputImage = np.zeros((2,Nnum*20,Nnum*20))
    myStats = cProfile.run('temp = backwardProjectACC(testHMatrix, tempInputImage, planes=planesToProcess, numjobs=1)', 'mystats')
    p = pstats.Stats('mystats')
    p.strip_dirs().sort_stats('cumulative').print_stats(40)

In [None]:
# Compare against definitive version generated from Matlab
try:
    if planesToProcess is not None:
        print('WARNING: the following test is not valid because not all planes were processed')
    if False:
        definitive = tifffile.imread('Data/03_Reconstructed/exampleData/definitive_worm_crop_X15_backproject.tif')
        definitive = np.transpose(definitive, axes=(0,2,1))
        comparison = np.max(np.abs(definitive[4] - Htf[4]*10))
    else:
        definitive = np.load('semi-definitive.npy')
        comparison = np.max(np.abs(definitive - Htf))
    print('Compare against matlab result (should be <1.0): %f' % comparison)
    if (comparison > 1.0):
        print(" -> WARNING: disagreement detected")
    else:
        print(" -> OK")
except NameError:
    warnings.warn('Cannot compare - previous cell was probably not run')

# Test a full deconvolution and compare against definitive version

In [None]:
if False:
    Xguess = Htf.copy();
    maxIter = 8
    deconvolvedResult = deconvRL(HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape), Htf, maxIter, Xguess)
else:
    deconvolvedResult = 1   # Just set the variable to something so the next cell doesn't fail to run

In [None]:
# Compare against definitive version generated from Matlab
definitive = tifffile.imread('Data/03_Reconstructed/exampleData/definitive_worm_crop_X15_iter8.tif')
definitive = np.transpose(definitive, axes=(0,2,1))
comparison = np.max(np.abs(definitive - deconvolvedResult*1e3))
print('Compare against matlab result (should be <1.0): %f' % comparison)
if (comparison > 1.0):
    print(" -> WARNING: disagreement detected")
else:
    print(" -> OK")

#tifffile.imsave('iter8.tif', np.transpose(Xguess*1e3, axes=(0,2,1)))

# Solve for flow field (single-plane toy example)

In [None]:
# Generate two identical images of the same synthetic object,
# which for now consists of a cloud of random gaussian spots
from scipy.ndimage.filters import gaussian_filter
if False:
    numSpots = 100
    imageSize = 240
    sigma = 8
    controlPointSpacing = 30    
elif False:
    numSpots = 400
    imageSize = 120
    sigma = 2
    controlPointSpacing = 30
elif True:
    numSpots = 1000
    imageSize = 180
    sigma = 2
    controlPointSpacing = 30
    previouslySavedSynthetic = '2019-06-24 14.02.53 syntheticInput.npy'
elif True:
    numSpots = 250
    imageSize = 90
    sigma = 1
    controlPointSpacing = 15
    previouslySavedSynthetic = '2019-06-24 14.32.19 syntheticInput.npy'

syntheticImageExtendSize = 30

syntheticObjectExt = np.zeros((1, imageSize+syntheticImageExtendSize, imageSize))
syntheticObjectExt[0, (np.random.random(numSpots)*syntheticObjectExt.shape[1]).astype('int'), \
                      (np.random.random(numSpots)*syntheticObjectExt.shape[2]).astype('int')] = 1
syntheticObjectExt = gaussian_filter(syntheticObjectExt, sigma=(0,sigma,sigma))
plt.imshow(syntheticObjectExt[0])
if False:
    import datetime
    fn = datetime.datetime.now().strftime("%Y-%m-%d %H.%M.%S syntheticInput.npy")
    np.save(fn, syntheticObjectExt[0])
else:
    syntheticObjectExt[0] = np.load(previouslySavedSynthetic)

In [None]:
# Set up the PSF that we will use

# First check we're using the expected PSF - the plane choices used here are intended to work with this PSF.
assert(matPath == 'PSFmatrix/PSFmatrix_M40NA0.95MLPitch150fml3000from-13to0zspacing0.5Nnum15lambda520n1.0.mat')

zPlaneToModel = _H.shape[0]-1   # Modelling native focal plane
zPlaneToModel = 22   # Modelling some way from the native focal plane, which should perform fairly well
zPlaneToModel = _H.shape[0]-3   # Modelling close to native focal plane. This has artefacts - prev one is fairly artefact-free
zPlaneToModel = _H.shape[0]-2

pivHMatrix = HMatrix(_HPathFormat, _HtPathFormat, _HReducedShape, numZ=1, zStart=zPlaneToModel)

In [None]:
if True:
    shiftType = 'piv'
    source = 'synthetic'
    actualImageExtendSize = syntheticImageExtendSize
    # Allowing an x search range is fairer, but it makes little difference for vertical flow
    xMotionPermitted = False
    xSearchRange = 0
    ySearchRange = 10
else:
    shiftType = 'piv-zeroedge'
    source = 'piv'
    actualImageExtendSize = 0
    xMotionPermitted = True
    xSearchRange = 8
    ySearchRange = 8

In [None]:
def forwardProjectACC_PIV(hMatrix, obj, shiftDescription):
    # Compute the AB images obtained from the single object we are provided with
    # (with the B image being of the object shifted by shiftYX).
    # We give each image half the intensity in order to conserve energy.
    dualObject = np.tile(obj[:,np.newaxis,:,:] / 2.0, (1,2,1,1))
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
    return forwardProjectACC(hMatrix, dualObject, logPrint=False, progress=None)

def dualBackwardProjectACC_PIV(hMatrix, dualProjection, shiftDescription):
    # Compute the reverse transform given the AB images (B image shifted by shiftYX).
    # First we do the reverse transformation on both images
    dualObject = backwardProjectACC(hMatrix, dualProjection, logPrint=False, progress=None)
    # Now we reverse the shift on the B object
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], -shiftDescription)
    # Now, ideally the objects would match, but of course in practice there will be discrepancies,
    # especially if we are not using the correct shiftDescription.
    # To make the operation match the transpose of the forward operation,
    # we add the two objects and divide by 2 here
    return dualObject

def fusedBackwardProjectACC_PIV(hMatrix, dualProjection, shiftDescription):
    dualObject = dualBackwardProjectACC_PIV(hMatrix, dualProjection, shiftDescription)
    result = np.sum(dualObject, axis=1) / 2.0     # Merge the two backprojection
    return result

def deconvRL_PIV_OLD(hMatrix, imageAB, maxIter, Xguess, shiftDescription):
    # I believed this to be the RL algorithm in the way I have written it in the past.
    # However, this gives different results to Prevedel's implementation
    # (mine seems to converge more slowly).
    # TODO: I should look into this and see if I've just made a mistake or if they are actually different.
    
    # Xguess is our single combined guess of the object
    Xguess = Xguess.copy()    # Because we will be updating it, and caller may not always be expecting that
    for i in tqdm(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        relativeBlurDual = imageAB / forwardProjectACC_PIV(hMatrix, Xguess, shiftDescription)
        Xguess *= fusedBackwardProjectACC_PIV(hMatrix, relativeBlurDual, shiftDescription)
        Xguess[np.where(np.isnan(Xguess))] = 0
        t1 = time.time() - t0
    return Xguess

def deconvRL_PIV(hMatrix, imageAB, maxIter, shiftDescription):
    # Note:
    #  Htf is the *initial* backprojection of the camera image
    #  Xguess is the initial guess for the object
    Htf = fusedBackwardProjectACC_PIV(hMatrix, imageAB, shiftDescription)
    Xguess = Htf.copy()
    print('Deconv')
    for i in noProgressBar(range(maxIter), desc='RL deconv'):
        t0 = time.time()
        HXguess = forwardProjectACC_PIV(hMatrix, Xguess, shiftDescription)
        HXguessBack = fusedBackwardProjectACC_PIV(hMatrix, HXguess, shiftDescription)
        errorBack = Htf / HXguessBack
        Xguess = Xguess * errorBack
        Xguess[np.where(np.isnan(Xguess))] = 0
        t1 = time.time() - t0
    return Xguess

def RollNoninteger(obj, amount, axis=0):
    intAmount = int(amount)
    frac = amount - intAmount
    result1 = np.roll(obj, intAmount, axis=axis)
    result2 = np.roll(obj, intAmount+1, axis=axis)
    return result1 * (1-frac) + result2 * frac


# Some replacement functions to use for testing (effective PSF is a delta function, 1:1 mapping from image to object)
def forwardProjectTrivial(hMatrix, obj, shiftDescription):
    # Compute the AB images obtained from the single object we are provided with
    # (with the B image being of the object shifted by shiftYX).
    # We give each image half the intensity in order to conserve energy.
    dualObject = np.tile(obj[:,np.newaxis,:,:] / 2.0, (1,2,1,1))
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
    return dualObject[0]

def dualBackwardProjectTrivial(hMatrix, dualProjection, shiftDescription):
    dualObject = dualProjection[np.newaxis].copy()
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], -shiftDescription)
    return dualObject

def fusedBackwardProjectTrivial(hMatrix, dualProjection, shiftDescription):
    dualObject = dualBackwardProjectTrivial(hMatrix, dualProjection, shiftDescription)
    result = np.sum(dualObject, axis=1) / 2.0     # Merge the two backprojection
    return result

def deconvRLTrivial(hMatrix, imageAB, maxIter, shiftDescription):
    # Note:
    #  Htf is the *initial* backprojection of the camera image
    #  Xguess is the initial guess for the object
    return fusedBackwardProjectTrivial(hMatrix, imageAB, shiftDescription)

In [None]:
if (shiftType == 'uniform') or (shiftType == 'uniformSK'):
    if shiftType == 'uniform':
        def ShiftObject(obj, shiftYX):
            # Transform a 3D object according to the flow information provided in shiftDescription
            # For now I just consider a uniform translation in xy
            # 
            # TODO: We need to worry about conserving energy during the shift. 
            # For now I will do a circular shift in order to avoid having to worry about this!
            result = RollNoninteger(obj, shiftYX[0,0], axis=len(obj.shape)-2)
            return RollNoninteger(result, shiftYX[0,1], axis=len(obj.shape)-1)
    else:
        # A lot of code duplication here, but it's just an experiment for now
        def ShiftObject(obj, shiftYX):
            # Generate control points in the corners of the image
            src_cols = np.arange(0, obj.shape[-1]+1, obj.shape[-1])
            src_rows = np.arange(0, obj.shape[-2]+1, obj.shape[-2])
            src_rows, src_cols = np.meshgrid(src_rows, src_cols)
            src = np.dstack([src_cols.flat, src_rows.flat])[0]
            dst = src + shiftYX[0]
            tform = PiecewiseAffineTransform()
            tform.estimate(src, dst)
            # Annoyingly, skimage insists that a float input is scaled between 0 and 1, so I must rescale here
            maxVal = np.max(np.abs(obj))
            if len(obj.shape) == 3:
                result = np.zeros(obj.shape)
                for cc in range(obj.shape[0]):
                    result[cc] = warp(obj[cc]/maxVal, tform, mode='edge') * maxVal
                return result
            else:
                return warp(obj/maxVal, tform, mode='edge') * maxVal
    
    def ExampleShiftDescriptionForObject(obj):
        return np.array([[-10, 20]])
    
    def VelocityShapeForObject(obj):
        return (2,)

    def IWCentresForObject(obj):
        return np.array([[int(obj.shape[-2]/2), int(obj.shape[-1]/2)]])

else:
    # Arbitrary motion described in terms of an array of control points at IWCentresForObject
    assert((shiftType == 'piv') or (shiftType == 'piv-zeroedge'))
    def IWCentresForObject(obj, st=shiftType):
        startPos = 0
        # Reusing the code from the skimage example, since that actualy does what we need:
        if st == 'piv-zeroedge':
            src_cols = np.arange(controlPointSpacing, obj.shape[-1], controlPointSpacing)
            src_rows = np.arange(controlPointSpacing, obj.shape[-2]-actualImageExtendSize, controlPointSpacing)
        else:
            src_cols = np.arange(startPos, obj.shape[-1]+1, controlPointSpacing)
            src_rows = np.arange(startPos, obj.shape[-2]+1-actualImageExtendSize, controlPointSpacing)
        src_rows, src_cols = np.meshgrid(src_rows, src_cols)
        return np.dstack([src_cols.flat, src_rows.flat])[0]

    def VelocityShapeForObject(obj):
        return IWCentresForObject(obj).shape
    
    def ExampleShiftDescriptionForObject(obj):
        peakVelocity = 7
        iwPos = IWCentresForObject(obj)
        shiftDescription = np.zeros(VelocityShapeForObject(obj))
        width = obj.shape[-1]
        for n in range(iwPos.shape[0]):
            quadraticProfile = ((width/2.)**2 - (iwPos[n,0]-width/2.)**2)
            quadraticProfile = quadraticProfile / ((width/2.)**2) * peakVelocity
            shiftDescription[n,1] = quadraticProfile
        if xMotionPermitted:
            return shiftDescription
        else:
            return shiftDescription[:,1:2]

    def ExtraDuplicateRow(shifts, add=None):
        assert(len(shifts.shape) == 2)
        rowLength = int(np.sqrt(shifts.shape[0]))
        shifts = np.reshape(shifts, (rowLength, rowLength, shifts.shape[1]))
        toAppend = shifts[:,-1:,:].copy()
        if add is not None:
            toAppend += add
        result = np.append(shifts, toAppend, axis=1)
        return result.reshape(result.shape[0]*result.shape[1], result.shape[2])

    def AddZeroEdgePadding(obj, src, shiftYX):
        paddedSrc = IWCentresForObject(obj, st='piv')
        paddedShifts = np.zeros(paddedSrc.shape)
        for i in range(src.shape[0]):
            match = False
            for j in range(paddedSrc.shape[0]):
                if (src[i] == paddedSrc[j]).all():
                    match = True
                    paddedShifts[j] = shiftYX[i]
            assert(match)
        return paddedSrc, paddedShifts
        
    def ShiftObject(obj, shiftYX):
        # Transform a 3D object according to the flow information provided in shiftDescription
        # I use a piecewise affine transformation that should approximately correspond to
        # what I use for PIV analysis
        src = IWCentresForObject(obj)
        if (src.shape[0] != shiftYX.shape[0]):
            print(src.shape, shiftYX.shape, obj.shape)
            assert(src.shape[0] == shiftYX.shape[0])
            
        if (shiftType == 'piv-zeroedge'):
            (src, shiftYX) = AddZeroEdgePadding(obj, src, shiftYX)
        
        if (actualImageExtendSize > 0):
            src = ExtraDuplicateRow(src, add=np.array([0, actualImageExtendSize]))
            if xMotionPermitted:
                dst = src + ExtraDuplicateRow(shiftYX)
            else:
                dst = src.copy().astype(shiftYX.dtype)
                dst[:,1] = dst[:,1] + ExtraDuplicateRow(shiftYX)[:,0]
        else:
            dst = src.copy().astype(shiftYX.dtype) + shiftYX
            
        tform = PiecewiseAffineTransform()
        tform.estimate(src, dst)
        # Annoyingly, skimage insists that a float input is scaled between 0 and 1, so I must rescale here
        maxVal = np.max(np.abs(obj))
        if len(obj.shape) == 3:
            result = np.zeros(obj.shape)
            for cc in range(obj.shape[0]):
                result[cc] = warp(obj[cc]/maxVal, tform, mode='edge') * maxVal
            return result
        else:
            assert(len(obj.shape) == 2)
            return warp(obj/maxVal, tform, mode='edge') * maxVal

In [None]:
if source == 'synthetic':
    # Generate a synthetic shift in the B image
    dualObject = np.tile(syntheticObjectExt[:,np.newaxis,:,:], (1,2,1,1)) *1e3#* 1e7
    if False:
        warnings.warn('Loading previously-saved dualObject')
        dualObject = np.load('dualObject5.npy')
    
    shiftDescription = ExampleShiftDescriptionForObject(dualObject)
    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)

    # Since I am only using a local minimizer, we need to start with a decent guess as to the flow.
    # I think that's ok though: we should have that from a PIV estimate on the with-artefacts AB images
    #initialShiftGuess = np.zeros(VelocityShapeForObject(dualObject))
    initialShiftGuess = shiftDescription + np.random.random(shiftDescription.shape) * 4.0
else:
    assert(source == 'piv')
    pivImagePair = tifffile.imread('piv-raw-data/038298.tif')[24:26,:15*20,:15*16].astype('float64')
    # Note: frames 57-58 (wrong pair) would be an option to investigate bigger motion (~16px) with imperfect AB matches
    #              64-65 (correct pair) are another example of small movement (0-3px)
    dualObject = pivImagePair[np.newaxis]
    # For now, I just guess an initial shift of zero
    shiftDescription = np.zeros(VelocityShapeForObject(dualObject)).astype('float64')
    initialShiftGuess = shiftDescription.copy()
    
    
lb = []
ub = []
if xMotionPermitted:
    for n in range(shiftDescription.shape[0]):
        lb.extend([shiftDescription[n,0]-xSearchRange, shiftDescription[n,1]-ySearchRange])
        ub.extend([shiftDescription[n,0]+xSearchRange, shiftDescription[n,1]+ySearchRange])
else:
    for n in range(shiftDescription.shape[0]):
        lb.extend([shiftDescription[n,0]-ySearchRange])
        ub.extend([shiftDescription[n,0]+ySearchRange])
shiftSearchBounds = scipy.optimize.Bounds(lb, ub, True)

plt.subplot(1, 2, 1)
plt.imshow(dualObject[0,0])
plt.subplot(1, 2, 2)
plt.imshow(dualObject[0,1])
plt.show()

In [None]:
# Code used for investigations in which I directly warp the input object/images,
# without any use of light field PSFs and deconvolution

def ScoreShift2(candidateShiftYX, method, imageAB, hMatrix=None, shiftHistory=None, scaling=1.0, log=True, comparator=None, maxIter=8):
    return ScoreShift3(candidateShiftYX, method, imageAB, hMatrix, shiftHistory, scaling, log, comparator, maxIter=maxIter)[0]

def ScoreShift3(candidateShiftYX, method, imageAB, hMatrix=None, shiftHistory=None, scaling=1.0, log=True, comparator=None, maxIter=8):
    # Our input parameters get flattened, so we need to reshape them to Nx2 like my code is expecting
    # 'scaling' is useful for optimizers that insist on initial very small step sizes
    if xMotionPermitted:
        candidateShiftYX = candidateShiftYX.reshape(int(candidateShiftYX.shape[0]/2),2) * scaling
    else:
        candidateShiftYX = candidateShiftYX.reshape(candidateShiftYX.shape[0],1) * scaling
    # Sanity check and reminder that we have a 2xMxN AB image pair
    assert(len(imageAB.shape) == 3)  
    assert(imageAB.shape[0] == 2)
        
    if log:
#        print('======== Score shift ========', candidateShiftYX.T)
#        print('======== Score shift ========')
        pass

    if method == 'joint':
        # Perform the joint deconvolution to recover a single object
        res = deconvRL_PIV(hMatrix, imageAB, maxIter=maxIter, shiftDescription=candidateShiftYX)
        # Evaluate how well the forward-projected result matches the actual camera images, using SSD
        candidateImageAB = forwardProjectACC_PIV(hMatrix, res, candidateShiftYX)
    elif method == 'joint-test-trivial':
        # Debugging method in which I use trivial projectors that behave like a delta function PSF
        res = deconvRLTrivial(hMatrix, imageAB, maxIter=maxIter, shiftDescription=candidateShiftYX)
        candidateImageAB = forwardProjectTrivial(hMatrix, res, candidateShiftYX)
    else:
        # Just warp the raw B image manually and look at how the two images compare
        assert(method == 'naive')
        candidateImageAB = imageAB.copy()
        # A bit of dimensional gymnastics here, because ShiftObject expects an *object*,
        # i.e. a 3D volume, whereas in this case we just have a 2D image
        candidateImageAB[1,:,:] = ShiftObject(candidateImageAB[np.newaxis,0,:,:], candidateShiftYX)[0]  
        res = None  # So that we have something to return
    # Sanity check and reminder that we have a 2xMxN AB image pair
    assert(len(candidateImageAB.shape) == 3)  
    assert(candidateImageAB.shape[0] == 2)

    imageToScore = candidateImageAB[:, 1:-1-actualImageExtendSize, 1:-1-actualImageExtendSize]
    referenceImage = imageAB[:, 1:-1-actualImageExtendSize, 1:-1-actualImageExtendSize]
    # Score by comparing the A and B images to the ones we are optimizing on.
    # Note: in some simulated or naive cases, the A camera images will always be a perfect match,
    # but for the real case the joint solution will be a compromise for both the A and B camera images.
    #
    # I have tried to renormalize to aid comparison between the images - based on the relative intensity
    # of the candidate and observed A images. I chose the A images because they will be identical in the case
    # of the 'naive' method (direct warping). However, for the 'joint' method they won't be.
    # TODO: I need to think more about whether this normalization is necessary and appropriate.
    # (I think I introduced it in the hope of fixing a problem,
    # but lack of normalization wasn't the fundamental issue in the end)
    renormHack = np.average(candidateImageAB[0]) / np.average(imageAB[0])
    ssdScore = np.sum((imageToScore/renormHack - referenceImage)**2)

    if comparator is not None:
        maxLoc = np.argmax(np.abs(imageToScore - comparator)[1:-1,1:-1])
        maxVal =    np.max(np.abs(imageToScore - comparator)[1:-1,1:-1])
        print('showing B image diffs')
        plt.imshow((imageToScore[1] - comparator)[170:,150:])
        plt.colorbar()
        plt.title('BRel (max %e)'%maxVal)
        print('Max val %f at %d (image scale %d)' % (maxVal, maxLoc, np.max(comparator)))
        plt.show()

    if shiftHistory is not None:
        shiftHistory.Update(candidateShiftYX, ssdScore)
        if log:
            if shiftHistory.PlotHistory(onlyPlotEvery=50):
                if method == 'joint':
                    dualObject = np.tile(res[:,np.newaxis,:,:] / 2.0, (1,2,1,1))
                    dualObject[:,1,:,:] = ShiftObject(dualObject[:,1,:,:], shiftDescription)
                    ShowDualObjectAndFlow(dualObject, candidateShiftYX)
                else:
                    ShowDualObjectAndFlow(candidateImageAB, candidateShiftYX)
                print('Last trial shift: ', candidateShiftYX.T)

    if log:
        #print('return %e' % ssdScore)
        pass
    return (ssdScore, renormHack, np.average(candidateImageAB[0]), np.average(imageAB), candidateImageAB, res)

def ShowDualObjectAndFlow(dualObject, shiftDescription, otherObject=None, otherObject2=None, destFilename=None, suppressDark=0, histogram=0):
    plt.subplot(1, 2, 1)
    if (len(dualObject.shape) == 4):
        assert(dualObject.shape[1] == 2)
        plt.imshow(dualObject[0,0])
        plt.subplot(1, 2, 2)
        plt.imshow(dualObject[0,1])
        windowSource = dualObject[0,0]
    else:
        assert(len(dualObject.shape) == 3)  # It's actually a dual image not an object
        assert(dualObject.shape[0] == 2)
        plt.imshow(dualObject[1])
        windowSource = dualObject[0]
    iwPos = IWCentresForObject(dualObject)
    velocities = []
    for n in range(iwPos.shape[0]):
        aWindow = windowSource[np.maximum(iwPos[n,1]-int(controlPointSpacing/2),0):np.minimum(iwPos[n,1]+int(controlPointSpacing/2),dualObject.shape[-2]),\
                                   np.maximum(iwPos[n,0]-int(controlPointSpacing/2),0):np.minimum(iwPos[n,0]+int(controlPointSpacing/2),dualObject.shape[-1])]
        if (aWindow.sum() > suppressDark):
            if xMotionPermitted == False:
                velocities.append(shiftDescription[n,0])
                plt.plot([iwPos[n,0], iwPos[n,0]], \
                         [iwPos[n,1], iwPos[n,1] - shiftDescription[n,0]/2.], color='red')
            else:
                velocities.append(np.sqrt(shiftDescription[n,0]**2 + shiftDescription[n,1]**2))

                plt.plot([iwPos[n,0], iwPos[n,0] - shiftDescription[n,0]/2.], \
                         [iwPos[n,1], iwPos[n,1] - shiftDescription[n,1]/2.], color='red')
    plt.xlim(0, dualObject.shape[-1])
    plt.ylim(dualObject.shape[-2], 0)
    if destFilename is not None:
        plt.savefig(destFilename, dpi=200)
    plt.show()
    if (histogram > 0):
        plt.hist(velocities, range=(0,histogram), bins=20)
        plt.show()
    if otherObject is not None:
        plt.imshow(otherObject[0])
        plt.show()        
    if otherObject2 is not None:
        plt.imshow(otherObject2[0])
        plt.show()   
    return np.array(velocities)
        
def CheckConvergence(funcToCall, convergedShift, args):
    initialScore = funcToCall(convergedShift.flatten(), *args)
    print('initial score %e' % initialScore)
    for du in [0.5, -0.5, 1.5, -1.5]:
        for n in [7, 8, 12, 13]:
            temp = convergedShift.copy()
            temp[n] += du
            score = funcToCall(temp, *args)
            print('offset score %e' % score)
            if (score < initialScore):
                print(n, du, 'BETTER! (by %f%%)' % ((initialScore-score)/score*100))

def ReportOnOptimizerConvergence(shiftHistory, method, obj, hMatrix=None):
    if shiftHistory is None:
        print('ReportOnOptimizerConvergence returning - called with shiftHistory=None')
        return
    bestShift = shiftHistory.BestShift()
    print('Best score: %e' % shiftHistory.BestScore())
    print('Best shift: np.array([', end='')
    for n in bestShift.flatten():
        print('%f, '%n, end='')
    print('])')
    CheckConvergence(ScoreShift2, bestShift.flatten(), (method, obj, hMatrix, None, 1.0, False))
    return bestShift
                
class ShiftHistory:
    def __init__(self):
        self.Reset()

    def __copy__(self):
        result = ShiftHistory()
        result.shiftHistory = self.shiftHistory
        result.scoreHistory = self.scoreHistory
        result.counter = self.counter
        return result

    def Reset(self):
        self.scoreHistory = []
        self.shiftHistory = []
        self.counter = 0
    
    def Update(self, shift, score):
        self.shiftHistory.append(shift)
        self.scoreHistory.append(score)
        self.counter = self.counter + 1
        
    def BestScore(self):
        return np.min(self.scoreHistory)

    def BestShift(self):
        return self.shiftHistory[np.argmin(self.scoreHistory)]

    def PlotHistory(self, onlyPlotEvery=1):
        if ((self.counter%onlyPlotEvery) == 0) and (len(self.shiftHistory) > 0):
            print('best score so far: %e' % np.min(self.scoreHistory))
            # Plot one of the shifts
            shiftShape = self.shiftHistory[0].shape
            selectedItem = np.minimum(int(np.sqrt(shiftShape[0])/2), shiftShape[0]-1)
            selectedShift = np.array(self.shiftHistory)[:, selectedItem, -1]
            plt.plot(selectedShift)
            plt.show()
            # Plot scores, with a suitable y axis scaling to see the interesting parts.
            # We limit the y axis to avoid stupid guesses distorting the plot.
            improvement = self.scoreHistory[0] - np.min(self.scoreHistory)
            plt.ylim(np.min(self.scoreHistory), self.scoreHistory[0]+2*improvement)
            plt.plot(self.scoreHistory)
            plt.show()
            # Plot an indication of which values are being updated on which iteration
            for n in range(1, len(self.scoreHistory)):
                changes = np.array(np.where((self.shiftHistory[n] == self.shiftHistory[n-1]).flatten() == False))
                if (changes.size > 0):
                    plt.plot(n, changes, 'x', color='red')
            plt.show()

            with open('scores.txt', 'a') as f:
                f.write('%f\t' % self.scoreHistory[-1])
                for n in self.shiftHistory[-1]:
                    if xMotionPermitted:
                        f.write('%f\t%f\t' % (n[0], n[1]))
                    else:
                        f.write('%f\t' % (n[0]))
                f.write('\n')
            return True
        else:
            return False        

In [None]:
# Generate synthetic light-field-recovered AB images (doing it the naive way, not using my new joint deconvolution)
# Run the imaging cycle on each of the AB images individually (i.e. introduce artefacts into them)
dualObjectRecovered = dualObject.copy()
for n in [0, 1]:
    cameraImage = forwardProjectACC(pivHMatrix, dualObject[:,n,:,:], logPrint=False)
    backProjected = backwardProjectACC(pivHMatrix, cameraImage, logPrint=False)
    
    # With the shifted images, we have problems with true zeroes in regions that have no features remaining.
    # To avoid this, I apply a very small nonzero background so that the deconvolution doesn't fail.
    backProjected = np.maximum(backProjected, 1e-5*np.max(backProjected))
    
    dualObjectRecovered[:,n,:,:] = deconvRL(pivHMatrix, backProjected, maxIter=8, Xguess=backProjected, logPrint=False)

In [None]:
print('Original object')
iwPos = IWCentresForObject(dualObject)
ShowDualObjectAndFlow(dualObject, shiftDescription)
print('Recovered from light field images (plane %d)' % zPlaneToModel)
ShowDualObjectAndFlow(dualObjectRecovered, shiftDescription)
plt.imsave('syntheticInput.tif', dualObject[0,0])
plt.imsave('syntheticInputB.tif', dualObject[0,1])

plt.imsave('syntheticA.tif', dualObjectRecovered[0,0])
plt.imsave('syntheticB.tif', dualObjectRecovered[0,1])

In [None]:
def CalcFlowUsingVanillaPIV(imagePair, iwPos):
    # Calculate the flow based on traditional window-based PIV analysis
    shiftDescriptionPIV = np.zeros(iwPos.shape)
    smallIWSize = controlPointSpacing
    largeIWSize = 2 * controlPointSpacing
    for n in range(iwPos.shape[0]):
        a = imagePair[0, iwPos[n,1]-int(smallIWSize/2):iwPos[n,1]+int(smallIWSize/2),\
                             iwPos[n,0]-int(smallIWSize/2):iwPos[n,0]+int(smallIWSize/2)]
        b = imagePair[1, iwPos[n,1]-int(largeIWSize/2):iwPos[n,1]+int(largeIWSize/2),\
                             iwPos[n,0]-int(largeIWSize/2):iwPos[n,0]+int(largeIWSize/2)]
        sad_using_c_code = jpsad.sad_correlation(a, b)
        zeroPoint = np.array([1,1])*int((largeIWSize-smallIWSize)/2)
        shiftDescriptionPIV[n] = -(np.array(np.unravel_index(sad_using_c_code.argmin(), sad_using_c_code.shape))[::-1]-zeroPoint)
    if xMotionPermitted:
        return shiftDescriptionPIV
    else:
        return shiftDescriptionPIV[:,1:]

if source == 'synthetic':
    thresh = 0
else:
    # Experimental PIV images
    thresh = 6e5
    
shiftDescriptionPIVRaw = CalcFlowUsingVanillaPIV(dualObject[0], iwPos)
ShowDualObjectAndFlow(dualObject, shiftDescriptionPIVRaw*3, suppressDark=thresh)
shiftDescriptionPIVReconstructed = CalcFlowUsingVanillaPIV(dualObjectRecovered[0], iwPos)
ShowDualObjectAndFlow(dualObjectRecovered, shiftDescriptionPIVReconstructed*3, suppressDark=thresh)

#knownGood = np.array([4.179692, 2.422054, 2.277945, -3.442326, 2.588265, -1.628299, -0.701406, -0.351879, 1.371176, -0.887983, -0.329903, -1.814173, -0.039174, -1.546245, 4.530734, 6.433376, -3.923658, 0.765194, -3.139235, 11.998391, 0.159123, 4.712269, 0.000467, 1.881291, -1.076288, 2.571115, -0.752311, 6.346806, 0.705785, 1.599266, 0.526477, 0.520794, 1.613503, -0.944800, -4.052433, 0.938896, -9.285762, 9.058932, -1.427279, 3.465706, -2.667963, 6.611281, -2.711337, 6.691590, 1.149587, 6.157966, 5.232333, 7.419857, 3.860831, 0.035423, 1.096535, -0.919879, 1.315764, 0.783761, -1.649745, 1.829128, -1.506330, 2.903395, -2.640247, 6.333610, -2.974801, 5.116153, -1.640844, 6.727836, 6.771447, 5.497423, 7.318270, 3.963571, -1.879130, 0.376258, -2.545277, 3.033343, -1.405359, 2.988077, -3.664550, 3.713645, -2.404847, 3.906314, -0.068660, 0.731329, 3.443943, 1.132651, 8.621877, 2.114740, 4.915054, 3.548191, 3.346262, 5.315995, -2.250714, 3.869669, -3.046189, 2.948226, -1.592374, 0.569959, -0.875566, 0.699708, -0.309011, -0.220754, 2.785740, 4.885732, 1.892708, 2.223612, 5.977712, 3.248553, 1.007629, 3.325284, -0.803304, 5.829418, -2.376718, 0.837404, 0.982720, 0.897666, -1.992212, 4.365900, -0.497122, 3.024971, -0.809540, 2.668115, 3.179223, -4.673659, 3.866968, 0.850031, 2.411868, 0.370574, 3.250005, 2.128673]).reshape(shiftDescription.shape)
#ShowDualObjectAndFlow(dualObject, knownGood*3, suppressDark=thresh)

In [None]:
# Temp: visualize flow as computer for the PIV dataset using the two runs I had going on my mac pro
correct = np.array([4.179692, 2.422054, 2.277945, -3.442326, 2.588265, -1.628299, -0.701406, -0.351879, 1.371176, -0.887983, -0.329903, -1.814173, -0.039174, -1.546245, 4.530734, 6.433376, -3.923658, 0.765194, -3.139235, 11.998391, 0.159123, 4.712269, 0.000467, 1.881291, -1.076288, 2.571115, -0.752311, 6.346806, 0.705785, 1.599266, 0.526477, 0.520794, 1.613503, -0.944800, -4.052433, 0.938896, -9.285762, 9.058932, -1.427279, 3.465706, -2.667963, 6.611281, -2.711337, 6.691590, 1.149587, 6.157966, 5.232333, 7.419857, 3.860831, 0.035423, 1.096535, -0.919879, 1.315764, 0.783761, -1.649745, 1.829128, -1.506330, 2.903395, -2.640247, 6.333610, -2.974801, 5.116153, -1.640844, 6.727836, 6.771447, 5.497423, 7.318270, 3.963571, -1.879130, 0.376258, -2.545277, 3.033343, -1.405359, 2.988077, -3.664550, 3.713645, -2.404847, 3.906314, -0.068660, 0.731329, 3.443943, 1.132651, 8.621877, 2.114740, 4.915054, 3.548191, 3.346262, 5.315995, -2.250714, 3.869669, -3.046189, 2.948226, -1.592374, 0.569959, -0.875566, 0.699708, -0.309011, -0.220754, 2.785740, 4.885732, 1.892708, 2.223612, 5.977712, 3.248553, 1.007629, 3.325284, -0.803304, 5.829418, -2.376718, 0.837404, 0.982720, 0.897666, -1.992212, 4.365900, -0.497122, 3.024971, -0.809540, 2.668115, 3.179223, -4.673659, 3.866968, 0.850031, 2.411868, 0.370574, 3.250005, 2.128673])
correct.shape = (63,2)
joint_startedWithCorrect = np.array([[-3.46956446e-03, 9.61095429e-01, 4.44415111e-01, -1.53527275e+00, 6.77153487e+00, -7.44146838e-01, 7.12031474e-01, -1.55401929e-02, 6.28613798e-03, -2.20445130e+00, 2.43331717e+00, 1.77874576e-01, -1.95849488e+00, -8.39130055e-01, 2.22057487e-01, 3.90672635e-01, -4.60806967e-01, -6.64075497e+00, -8.61081872e+00, -3.43562023e+00, -5.25693558e+00, -3.31455813e+00, 2.50891016e+00, 3.24509438e+00, 3.43248569e+00, 1.81482624e+00, 1.10590141e+01, -1.26329974e+00, -8.86442327e-01, -2.43612245e+00, -8.36238900e+00, -1.63782961e+00, 7.49045984e+00, 6.76073641e+00, -1.32362267e+01, 2.66819084e-05, -1.69414204e+00, -5.65765566e+00, -1.57664842e+00, 1.23941049e-01, 5.14833019e+00, 7.20529915e+00, 3.31522269e+00, 6.73114159e+00, -4.61407200e+00, -2.97814883e+00, -1.37762656e+00, -1.30854667e+00, 1.42302575e+00, -9.49780114e+00, -8.79775153e+00, 7.30119133e+00, -5.74808622e-01, -1.33123648e+00, -1.30922297e+00, 1.20205418e-04, -9.54930869e-02, -2.01473122e+00, -2.29875358e-01, -6.98139516e-01, 6.65668593e+00, 3.14743091e+00, 4.61573888e+00], [3.16409852e-03, -1.21057022e+00, -1.72096836e+00, 5.94088666e+00, -4.37610627e+00, -4.42214068e+00, -1.76644728e+00, 1.11310822e+01, -9.76478955e-01, 1.93427152e-02, 9.47330916e-01, 4.03117276e+00, 2.12529200e+00, 8.32911533e+00, 4.54480084e+00, -4.27711971e-02, -1.61438429e+00, 1.63542459e+00, 1.00859262e+01, 5.05902506e+00, 1.00863257e+01, 6.58395411e+00, 4.98800108e+00, 1.08599663e+01, -7.30342801e-02, -8.44590403e-01, -8.67462769e+00, 1.81321632e+00, 1.52006223e+00, 1.10191056e+01, 2.66531654e+00, 8.49432347e+00, 5.32421002e+00, 3.92532685e+00, 2.16013667e+00, 3.34837423e+00, 2.22416192e+00, 4.54662917e+00, 3.01834475e+00, 1.12590205e+00, -1.25525510e+00, 2.63074618e+00, 3.28669257e+00, 4.05507128e+00, 6.10703573e+00, 4.12589471e+00, 1.49870093e+00, 5.34812442e-01, -9.01271928e-01, 1.04613605e+01, 5.23274857e+00, 1.84320015e+00, 3.08904788e+00, 4.64561095e+00, 2.20611416e-03, -1.63987044e+00, -6.83487990e+00, 7.02743302e+00, 3.10466764e+00, -3.26659385e+00, -3.96403584e-01, -5.86124215e-01, 3.43776836e+00]]).T

#ShowDualObjectAndFlow(dualObjectRecovered, (correct)*4, suppressDark=thresh)

velocityMultiplier = 2
ShowDualObjectAndFlow(dualObject, correct*velocityMultiplier, suppressDark=thresh)
ShowDualObjectAndFlow(dualObjectRecovered, joint_startedWithCorrect*velocityMultiplier, suppressDark=thresh)
ShowDualObjectAndFlow(dualObjectRecovered, shiftDescriptionPIVReconstructed*velocityMultiplier, suppressDark=thresh)


magnitudesJoint = ShowDualObjectAndFlow(dualObjectRecovered, (joint_startedWithCorrect - correct)*velocityMultiplier, suppressDark=thresh)/velocityMultiplier

magnitudesNaive = ShowDualObjectAndFlow(dualObjectRecovered, (shiftDescriptionPIVReconstructed - correct)*velocityMultiplier, suppressDark=thresh)/velocityMultiplier

#magnitudesPIV = ShowDualObjectAndFlow(dualObjectRecovered, (shiftDescriptionPIVRaw - correct)*velocityMultiplier, suppressDark=thresh)/velocityMultiplier


plt.hist([magnitudesJoint, magnitudesNaive], range=(0,30), bins=20, label=['joint', 'naive'])
plt.legend()
plt.xlabel('|Error|^2')
plt.ylabel('Count')
plt.show()




#plt.plot()

In [None]:
if True:
    # If I want to give the algorithm the best possible starting point,
    # I can give it the actual true shift values as its starting point
    # (but it still may iterate away from that...)
    warnings.warn("WARNING: starting guess is actually the correct flow description")
    startShiftForOptimizer = shiftDescription.copy()
else:
    startShiftForOptimizer = 0.0 * initialShiftGuess.copy()
    
def OptimizeToRecoverFlowField(method, imageAB, hMatrix, shiftDescription, initialShiftGuess, shiftHistory=None):
    imageAB = imageAB.copy()    # This is just paranoia - I don't think it should get manipulated
    print('True shift:', shiftDescription.T)

    if shiftHistory is not None:
        warnings.warn('Overriding initial shift guess with best shift from history')
        initialShiftGuess = shiftHistory.BestShift()
        
    if False:
        plt.imshow(imageAB[0,:,:])
        plt.show()
        plt.imshow(imageAB[1,:,:])
        plt.show()

    if False:
        print('Score for correct shift:', ScoreShift2(shiftDescription.flatten(), method, imageAB, hMatrix))
        print('Score for initial guess:', ScoreShift2(initialShiftGuess.flatten(), method, imageAB, hMatrix))

    if True:
        optimizationAlgorithm = 'Powell'
        options = {'xtol': 1e-2}
    elif True:
        optimizationAlgorithm = 'L-BFGS-B'
        options = {'eps': 5e-03, 'gtol': 1e-6}
    else:
        optimizationAlgorithm = 'Nelder-Mead'
        options = {'eps': 5e-03, 'xatol': 1e-2, 'adaptive': True}

    if shiftHistory is None:
        shiftHistory = ShiftHistory()

    # Optimize to obtain the best-matching shift
    try:
        shift = scipy.optimize.minimize(ScoreShift2, initialShiftGuess, bounds=shiftSearchBounds, args=(method, imageAB, hMatrix, shiftHistory), method=optimizationAlgorithm, options=options)
        print('Optimizer finished:', str(shift.message), 'Final shift:', shift.x.T)
    except KeyboardInterrupt:
        # Catch keyboard interrupts so that we still return whatever shiftHistory we have built up so far.
        print('KEYBOARD INTERRUPT DURING OPTIMIZATION')
    return shiftHistory

In [None]:
# Perform the reconstruction using direct shift-matching of the raw input images (real experimental SPIM images)
if False:
    shiftHistoryRaw = OptimizeToRecoverFlowField('naive', dualObject[0], None, shiftDescription, startShiftForOptimizer)

# Note: if continuing a previously-interrupted run then we can do this to pick up roughly where we left off.
# i.e. provide BestShift() for the two shift-related input parameters, and pass the existing shift history as the final (optional) parameter
#    shiftHistoryRaw = OptimizeToRecoverFlowField('naive', dualObject[0], None, shiftHistoryRaw.BestShift(), shiftHistoryRaw.BestShift(), shiftHistoryRaw)

In [None]:
try:
    bestShift = ReportOnOptimizerConvergence(shiftHistoryRaw, 'naive', dualObject[0])
except NameError:
    warnings.warn('History probably not available')

In [None]:
# Perform the reconstruction using direct shift-matching of the light-field-deconvolved images
if True:
    shiftHistoryNaive = OptimizeToRecoverFlowField('naive', dualObjectRecovered[0], None, shiftDescription, startShiftForOptimizer, shiftHistoryNaive)

In [None]:
try:
    ReportOnOptimizerConvergence(shiftHistoryNaive, 'naive', dualObjectRecovered[0])
except NameError:
    warnings.warn('History probably not available')

In [None]:
# Perform the reconstruction using my new joint algorithm
if True:
    # Generate a camera image pair from the object.
    if source == 'piv':
        # The camera AB images are determined by separate forward projection of the AB spim images in dualObject
        imageAB = forwardProjectACC(pivHMatrix, dualObject)
    else:
        # The synthetic B image is determined with the help of the chosen shift transform.
        imageAB = forwardProjectACC_PIV(pivHMatrix, dualObject[:,0], shiftDescription)

    if False:
        # Try starting using the solution obtained by direct warping of AB image pair,
        # to see if that yields a better minimum than the one I had found so far
        startShiftForOptimizer = np.array([4.179692, 2.422054, 2.277945, -3.442326, 2.588265, -1.628299, -0.701406, -0.351879, 1.371176, -0.887983, -0.329903, -1.814173, -0.039174, -1.546245, 4.530734, 6.433376, -3.923658, 0.765194, -3.139235, 11.998391, 0.159123, 4.712269, 0.000467, 1.881291, -1.076288, 2.571115, -0.752311, 6.346806, 0.705785, 1.599266, 0.526477, 0.520794, 1.613503, -0.944800, -4.052433, 0.938896, -9.285762, 9.058932, -1.427279, 3.465706, -2.667963, 6.611281, -2.711337, 6.691590, 1.149587, 6.157966, 5.232333, 7.419857, 3.860831, 0.035423, 1.096535, -0.919879, 1.315764, 0.783761, -1.649745, 1.829128, -1.506330, 2.903395, -2.640247, 6.333610, -2.974801, 5.116153, -1.640844, 6.727836, 6.771447, 5.497423, 7.318270, 3.963571, -1.879130, 0.376258, -2.545277, 3.033343, -1.405359, 2.988077, -3.664550, 3.713645, -2.404847, 3.906314, -0.068660, 0.731329, 3.443943, 1.132651, 8.621877, 2.114740, 4.915054, 3.548191, 3.346262, 5.315995, -2.250714, 3.869669, -3.046189, 2.948226, -1.592374, 0.569959, -0.875566, 0.699708, -0.309011, -0.220754, 2.785740, 4.885732, 1.892708, 2.223612, 5.977712, 3.248553, 1.007629, 3.325284, -0.803304, 5.829418, -2.376718, 0.837404, 0.982720, 0.897666, -1.992212, 4.365900, -0.497122, 3.024971, -0.809540, 2.668115, 3.179223, -4.673659, 3.866968, 0.850031, 2.411868, 0.370574, 3.250005, 2.128673])

    # Run the joint optimizer optimizer to find the shift value for an input frame pair
    shiftHistoryJoint = OptimizeToRecoverFlowField('joint', imageAB, pivHMatrix, shiftDescription, startShiftForOptimizer, shiftHistoryJoint)

In [None]:
if False:
    np.save('shiftHistoryJoint.npy', shiftHistoryJoint.shiftHistory)
    np.save('scoreHistoryJoint.npy', shiftHistoryJoint.scoreHistory)
    np.save('dualObjectJoint.npy', dualObject)

In [None]:
try:
    ReportOnOptimizerConvergence(shiftHistoryJoint, 'joint', imageAB, pivHMatrix)
except NameError:
    warnings.warn('History probably not available')

In [None]:
# Look at how the scores are evolving during the powell iterations

if True:
    vals = np.array(shiftHistoryJoint.shiftHistory)
    scores = np.array(shiftHistoryJoint.scoreHistory)
else:
    # Load from files previously saving using:
    #   np.save('scoreHistory.npy', np.array(shiftHistoryJoint.scoreHistory))
    #   np.save('shiftHistory.npy', np.array(shiftHistoryJoint.shiftHistory))
    vals = np.load('/Users/jonny/Desktop/shiftHistory.npy')
    scores = np.load('/Users/jonny/Desktop/scoreHistory.npy')
iwOfInterest = 5*7+3
iwOfInterest = 5*7+6# Looking at border control point for shiftHistoryNaive
x = []
y = []
y2 = []

if False:
    # Compare the results from different control points
    sh = vals[3020].flatten()
    sh[iwOfInterest] = 5.011
    (_,_,_,_,comp) = ScoreShift3(sh, 'naive', objectToUse, log=False)    
    for d in [5.012, 5.0135, 5.0145]:
        sh[iwOfInterest] = d
        sc = ScoreShift2(sh, 'naive', objectToUse[0], log=False, comparator=comp)
        print('score', sc)

if True:
    for iw in [iwOfInterest]:
#    for iw in range(49):
        for n in range(0,vals.shape[0]-1):
            if (vals[n,iw,0] != vals[n+1,iw,0]):
                x.append(vals[n,iw,0])
                y.append(scores[n])
            else:
                if (len(x) > 0):
                    if (len(x) > 2):
                        plt.plot(x, y, 'x')
                        plt.title('%d,%d %d(%d)'%(iw/7,iw%7, n, len(x)))
                        plt.show()
                    x = []
                    y = []
                    y2 = []
                nStart = n
    if (len(x) > 2):
        plt.plot(x, y, 'x')
        plt.title('%d,%d %d(%d)'%(iw/7,iw%7, n, len(x)))
        plt.show()



In [None]:
# Code useful for understanding how two images differ, since I have been having
# a lot of problems related to warp(), where tiny changes in shifts make a difference to the result
# (these are largely due to edge effects of one type or another)
def ShowDifferences(im1, im2, fullIm1, sh):
    diff = im1-im2
    print(diff.shape)
    print('Largest difference', np.max(np.abs(diff)), 'loc', np.argmax(np.abs(diff)), \
          np.argmax(np.abs(diff))%diff.shape[1], int(np.argmax(np.abs(diff))/diff.shape[1]))
    plt.imshow(diff)
    iwPos = IWCentresForObject(dualObject, st='piv')
    if False:
        for n in range(iwPos.shape[0]):
            plt.plot(iwPos[n,0], iwPos[n,1], 'x', color='red')
    elif True:
        src = IWCentresForObject(fullIm1[np.newaxis])
        if (src.shape[0] != sh.shape[0]):
            assert(src.shape[0] == sh.shape[0])
        if (shiftType == 'piv-zeroedge'):
            (src, sh) = AddZeroEdgePadding(fullIm1[np.newaxis], src, sh)
            print('padded')
        for n in range(sh.shape[0]):
            plt.plot([iwPos[n,0], iwPos[n,0]+sh[n,0]*1e9], \
                     [iwPos[n,1], iwPos[n,1]+sh[n,1]*1e9], color='red')
            if not sh[n,0] == 0:
                print('x', [iwPos[n,0], iwPos[n,0]+sh[n,0]*1e9])
                print('y', [iwPos[n,1], iwPos[n,1]+sh[n,1]*1e9])
    plt.xlim(-10,60)
    plt.ylim(80,-10)
    
    plt.show()

In [None]:
# To understand how the optimizer is behaving, scan the search space rather than optimizing

# Generate a camera image pair from the object.
# The B image is determined with the help of the chosen shift transform.
imageAB = forwardProjectACC_PIV(pivHMatrix, dualObject[:,0,:,:], shiftDescription)
# Run the joint optimizer optimizer to find the shift value for an input frame pair
shiftHistorySearch = ShiftHistory()
candidateShiftYX = startShiftForOptimizer.copy()
for dx in range(-4,5,1):
#    candidateShiftYX[4*11+3] = dx
#    [5147746000.0, 4475782000.0, 3882488600.0, 3441547500.0, 3223274800.0, 3499643600.0, 4056565800.0, 4845896000.0, 5856099300.0]
    candidateShiftYX[3*11+5] = dx
    ScoreShift2(candidateShiftYX.flatten(), 'joint', imageAB, pivHMatrix, shiftHistorySearch, log=False)


In [None]:
plt.plot(shiftHistorySearch.scoreHistory)

In [None]:
# Temporary: plotting velocity curves because I don't seem to have that code on my laptop
oneLens = np.array([10.862958, -1.443047, 0.332139, 4.405934, -0.153823, -1.755634, 4.066662, 1.899461, 4.774937, 4.288873, 6.098334, 2.156575, 3.283804, 4.029672, 13.566075, 1.801321, 15.448385, 6.394007, 7.178400, 7.106718, 12.273373, -14.666837, 5.111868, 7.882884, 8.721239, 6.867755, 5.641365, 3.340085, 5.588366, 9.117049, 2.237782, -1.251710, 23.535857, 6.499464, 10.420452, 69.905325, -13.358900, 1.926929, 0.526439, 2.005965, 32.774536, 17.063480, 7.802566, -50.389157, 9.439552, -19.231901, -0.933729, -0.905371, -0.275877]).reshape((7,7))
oneLensNaive = []

twoLens = np.array([1.547096, -0.756477, 1.007048, -0.561828, -0.637234, -0.029966, -0.121392, 2.424037, 2.237319, 2.821429, 4.160425, 4.127186, 4.236659, 5.169546, 7.444215, 7.127218, 7.198190, 5.474373, 6.800737, 7.369968, -2.075165, 5.773002, 5.458451, 7.014281, 7.210561, 6.977147, 6.512999, 8.365205, 16.284319, 7.652777, 5.594855, 7.790051, 5.770755, 6.528609, 5.977414, 0.228023, 1.916074, 4.929982, 4.181229, 3.963583, 5.231549, 5.995688, -31.022106, 37.960189, -12.175012, -98.216332, -51.273363, 27.641667, 47.129537]).reshape((7,7))
twoLensNaive = np.array([-0.107739, -0.104656, -0.620940, 0.090302, -0.113247, -0.088426, -0.019108, 1.940174, 1.725361, 6.280136, 2.387378, 2.177253, 2.064151, 2.577545, 26.687781, 1.039631, 12.572762, 4.809212, 7.036026, 13.494221, -15.283873, 4.558655, 7.806684, 10.972292, 6.193278, 12.258869, 9.973479, -7.100600, 16.527190, 10.775738, 4.459501, 3.930470, 1.990099, 8.893538, 15.374002, -1.771851, 4.710975, 1.144725, 9.085837, 3.763832, 1.091623, 0.591693, 6.000000, 6.000000, 6.000000, 6.000000, 6.000000, 6.000000, 6.000000]).reshape((7,7))
# This is when starting with initial values of zero - slightly different, but not significantly worse:
twoLensNaive2 = np.array([-0.051553, -0.065844, -0.612783, 0.116099, -0.028148, -0.169299, 0.049512, 1.905477, 1.603818, 7.074075, 2.340491, 2.159303, 2.307213, 2.529498, 27.071832, 0.968882, 12.458521, 4.884523, 7.116204, 10.276534, 2.921531, 4.574736, 7.607359, 11.039444, 7.017050, 12.942378, 8.733590, -4.089224, 16.927030, 10.635423, 8.563132, 3.564145, 1.755013, 9.761919, 14.704044, 2.132546, -3.809558, 0.647797, 9.273659, 3.640461, 1.220848, 0.116608, 8.569018, 8.569018, 8.569018, 8.569018, 8.569018, 8.569018, 8.569018]).reshape((7,7))

oneLensDenser = np.array([-2.313951, 3.275988, -1.644593, 0.457744, 2.147303, -3.169811, -10.748788, 8.166339, 3.943775, 5.205276, 2.117024, 3.686519, 3.675942, 5.991241, -4.256257, 7.061367, 5.557434, 5.541006, 8.694494, 8.640133, 2.737254, 4.108934, 3.756802, 5.282122, 3.612750, 3.468806, 5.793371, 6.974820, 10.620995, 5.383104, 7.767305, 14.500105, 7.411558, 9.273295, -6.232036, 6.822856, 19.625258, 4.494330, 6.437424, 32.533403, 7.687231, 13.015959, -85.315085, 10.075143, -1.329189, -0.484777, 11.570230, 0.530341, 1.088226]).reshape((7,7))
# This one may not have converged yet:
oneLensDenserNaive = np.array([8.251714, 0.387290, -2.420769, -0.426543, 1.785727, -0.610097, 2.794295, 8.791857, 0.559260, 7.891976, 0.333506, 1.765845, 1.967897, 4.581975, -0.752497, 12.003070, 12.364237, 3.017486, 12.380995, 13.466941, -0.231960, -4.632797, 12.006906, 0.070485, 11.010837, 0.266687, 13.347549, -0.412399, 11.485217, 12.216079, 0.106411, 17.199250, 1.174043, 2.937269, -15.987668, 11.027095, 11.027095, 11.027095, 11.027095, 11.027095, 11.027095, 11.027095, 7.138206, 7.138206, 7.138206, 7.138206, 7.138206, 7.138206, 8.138206]).reshape((7,7))

vPoints = 7*(1-((np.arange(1,6)-3)**2/9.0))
x = np.arange(0,7,0.01)
xPoints = np.arange(1,6) * 30
vCurve = 7*(1-((x-3)**2/9))
plt.plot(x*30, vCurve, color='black', label='true')
errs = []
errsNaive = []
for n in range(1,6):
    l = None
    if n == 1:
        l = 'joint'
    plt.plot(xPoints, twoLens[1:6,n], 'x', color='green', label=l)
    if n == 1:
        l = 'naive'
    plt.plot(xPoints, twoLensNaive[1:6,n], '+', color='red', label=l)
    errs.append(twoLens[1:6,n]-vPoints)
    errsNaive.append(twoLensNaive[1:6,n]-vPoints)
errs = np.array(errs).flatten()
errsNaive = np.array(errsNaive).flatten()
print('joint stdev', np.std(errs))
print('naive stdev', np.std(errsNaive))
plt.xlim(0,6*30)
plt.ylim(0,15)
plt.xlabel('position (px)')
plt.ylabel('velocity (px)')
plt.legend()
plt.savefig('TwoLensParabola.png', dpi=200)
plt.show()

for n in range(7):
    plt.plot(np.arange(1,6), oneLensDenser[1:6,n], 'x', color='green')
    plt.plot(np.arange(1,6), oneLensDenserNaive[1:6,n], '+', color='red')    
plt.xlim(0,6)
plt.ylim(0,15)
plt.show()

ShowDualObjectAndFlow(dualObjectRecovered, twoLensNaive.reshape(49,1)*3, destFilename='NaiveSyntheticFlow.png')
ShowDualObjectAndFlow(dualObjectRecovered, twoLens.reshape(49,1)*3, destFilename='JointSyntheticFlow.png')