# RadialAnalyzer.ipynb
This notebook will walk you through the basics of time-resolved analysis of X-ray scattering data at the CXI instrument at LCLS. It uses the preprocessed data generated for each shot over many runs, which has been stored in a `results` folder somewhere. Because it relies only on the preprocessed data, it can be used independently of the LCLS servers, assuming you have the data saved somewhere locally.

This preprocessed data consists of individual shot readouts of many of the instruments available at CXI, including X-ray beam parameters, laser intensities, timetool correction values, and downsampled detector images. The steps in this notebook walk you through how we examine this data to monitor the health of the machine, reject outliers, and normalize and timebin the data to look for pump-probe signal.

# Import Libraries

In [None]:
# Magic iPython command to enable plotting
%matplotlib inline

experiment='cxilu9218'
pullDataFromUser='igabalsk' 
RESULTSPATH=('/cds/data/psdm/%s/%s/results/%s' % (experiment[0:3],experiment,pullDataFromUser)).strip()

from scipy.ndimage import gaussian_filter1d
import numpy as np
import matplotlib.pyplot as plt

# Load point data from pkl (preferred)

In [None]:
import os
import pickle
import h5py

def load_obj(filename, extension='.pkl'):
    try:
        if extension=='.h5':
            output = {}
            with h5py.File(filename + '.h5','r') as f:
                for key in f.keys():
                    output[key] = f[key][()]
            print filename+" remembered!"
            return output
        else:
            with open(filename + '.pkl', 'rb') as f:
                print filename+" remembered!"
                return pickle.load(f)
    except IOError as e:
        print "IOError: Did you load the correct file? %s" % filename
        raise e

def save_obj(obj, filename ):
    with open(filename + '.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

# These keys will be omitted from run combination and only read in for first runNumber
exclude_keys = ['Qs','phis','bin_sizes','qphirois']

def combineRuns(runNumbers, path=RESULTSPATH, prefix='all',extension='.pkl'):
    '''
    runNumbers : list of run numbers to combine
    path : directory that contains point data .pkl files
    prefix : one of ['all','filtered']. Choose 'all' for qphirois, 'filtered' for radial rois only
    '''
    detArrays = {}
    for idx,run in enumerate(runNumbers):
        if idx == 0:
            detArrays = load_obj(path+'/%sData-run-%d' % (prefix,run),extension=extension)
        else:
            try:
                detArrays0 = load_obj(path+'/%sData-run-%d' % (prefix,run),extension=extension)
                for key in detArrays.keys():
                    if key in exclude_keys:
                        continue
                    try:
                        detArrays[key] = np.append( detArrays[key], detArrays0[key], axis=0 )
                    except KeyError as ke:
                        print('Dropping key %s since it is not in %d' % (key,run))
                        detArrays.pop(key, None)
            except IOError as ioe:
                print(str(ioe))
                continue
    return detArrays

In [None]:
runNumbers = [45,46,47,48,49,50,51,52,53,54,55]
runNumbersRange = '[%d - %d]' % (min(runNumbers),max(runNumbers))

detArrays = combineRuns(runNumbers,prefix='filtered',extension='.h5')

# TASK 0: Taking a first look at the data
The `combineRuns()` function returns a Python dictionary object. Dictionary objects contain keys (which can be strings or any other hashable object) and values (which can be single numbers or entire arrays, or any other Python object). For us, the `detArrays` dictionary has just strings for keys which specify the name of the datasets and Numpy arrays for the values. To get a sense of the types of data available to us, let's take a look at the names of the datasets and the shapes of the Numpy arrays in `detArrays`.

**TASK:** Write a piece of code that prints out all of the keys in `detArrays` and the shape of the array associated with each key. Based on the output, figure out how many shots are in the combined dataset.

In [None]:
# Your code here





# EXAMPLE: Histogram X-ray energies
On each LCLS shot, a whole host of values are read out from the various instruments and saved to disk. The data from instruments that read out a single value on each shot are referred to as "point data." We monitor these point data during beamtime to track the health of the machine. Mostly this consists of histogramming the point data values for individual runs and comparing the distribution of values to an expected distribution.

Here, let's take a look at one of the most important point data, the pulse energy of the X-ray beam. The `detArrays` key for this is `'xrayEnergy'` and the value is read out in milliJoules.

In [None]:
plt.figure()
plt.hist(detArrays['xrayEnergy'][~np.isnan(detArrays['xrayEnergy'])],bins=100)
plt.xlabel('xrayEnergy (mJ)')
plt.ylabel('Counts')
plt.title('Runs %s' % runNumbersRange)
plt.show()

# EXAMPLE: Radial rois vs. Q
Now let's take a look at the actual detector images. We downsample the (large) detector images by identifying Regions Of Interest (ROIs) and averaging the pixel values in each ROI on each shot. Here, we have used 100 bins in $Q$, such that the $i^{th}$ ROI is the average of all pixels that fall between $Q_i$ and $Q_i+dQ$.

In order to verify that we're looking at the molecule we think we are, let's plot the average scattering pattern on the detector as a function of $Q$ and then comparing to the signal we expect from theory.

In [None]:
Q = detArrays['Qs']
print Q.shape

plt.figure()
meanSig = np.nanmean(detArrays['rois'],axis=0)
plt.plot(Q,Q*meanSig)
plt.xlabel('Q')
plt.ylabel('Q * I(Q)')
plt.show()

### Compare to the theoretical signal

In [None]:
import h5py

def normalizedPlot(x,y,s=1,**kwargs):
    plt.plot(x, y/np.nanmean(y[(x>2)&(x<2.5)])*s,**kwargs)

def loadfile(filename):
    mol = {}
    with h5py.File(filename,'r') as f:
        for name, data in f.items():
            mol[name]=f[name][()]
    return mol

molecule = 'CS2'
mol = loadfile('/cds/home/i/igabalsk/xray/diffraction_simulation/isotropic_scattering_%s.h5' % molecule)

interpolatedMol = np.interp(Q, mol['QQ_1d'], mol['isotropic_scattering_1d'])

normalizedPlot(Q,Q*meanSig,s=1)
normalizedPlot(Q,Q*interpolatedMol)
plt.xlabel('Q')
plt.ylabel('Q * I(Q)')
plt.show()

You may notice that we seem to be seeing an excess of scattering at high $Q$. Our working explanation for this at the moment is that this excess scattering into large angles is from Compton scattering. Our current theory only takes elastic Thomson scattering into account and neglects inelastic Compton scattering, but as the X-ray energy is increased this effect will become more important.

# EXAMPLE: Histogram of timetool pixel position
The timetool is the instrument that gives you the time delay between the laser and X-rays on a shot-by-shot basis. In this instrument, the X-rays are overlapped with a broad chirped laser pulse and sent through a material that attenuates the laser. The arrival of the X-rays abruptly increases the absorption of the attenuator, and thus the back end of the laser pulse is attenuated more. Since the pulse is chirped, that means there will be an edge in the spectrum of the transmitted chirped pulse that can be measured in a spectrometer. The edge position in the spectrum is directly related to the arrival time of the X-rays relative to the laser pulse. 

The online analysis of the timetool performs a fit for the edge location, width, and amplitude. These are all written out as point data, and we use these extensively to both diagnose problems during beamtime and get sufficient time resolution in our post-analysis. Let's look a these timetool values.

The first example is the position of the edge on the spectrometer detector. This is read out in raw pixels and must eventually be calibrated to convert to time delay. The `detArrays` key for this value is `'ttfltpos'`.

In [None]:
plt.figure()
plt.hist(detArrays['ttfltpos'][np.abs(detArrays['ttfltpos'])<2000],bins=2000,label='TT Position');
plt.xlim([0,1000])
plt.ylim([0,9e3])
plt.ylabel('counts');
plt.xlabel('pixel pos');
plt.title('Runs %s'%runNumbersRange)
plt.legend()
plt.show()

# TASK 1: Histogram of the FWHM of the fitted timetool edge
Note that in the above plot, there is a central peak but also some side lobes. These side lobes sometimes result from an X-ray shot being dropped or from some other malfunction in the apparatus. This causes the timetool to fit an edge location to a garbage spectrum. These bad shots must be filtered out.

In addition to fitting the edge position on the timetool, the algorithm also fits the edge width. This edge width is saved under the `'ttfltposfwhm'` key. An edge width that is out of the ordinary could indicate that the apparatus malfulctioned and the fit failed, so we can filter our shots based on this key.

**TASK:** Generate a histogram of the timetool edge width, and suggest a range of good values for the width.

In [None]:
# Your code here






# TASK 2: Histogram of the amplitude of the timetool signal
Similarly to the edge width, the edge amplitude can also be a useful indicator of whether or not the timetool fit was good. See if you can figure out which `detArrays` key corresponds to this dataset.

**TASK:** Generate a histogram of the timetool edge amplitude, and suggest a range of good values. (Hint: you may need to adjust the y-axis to see the structure of this plot).

In [None]:
# Your code here





# TASK 3: Laser on/laser off shots
We do not pump the sample with the laser on every shot. We mix in a certain fraction of "laser off" shots using a tool creatively named the "goose trigger" (think the trigger going "duck, duck, duck, goose, ..." to the laser). This is done to take regular static detector images to monitor if we have a changing background or conditions in the sample chamber. We can then use these laser off shots to subtract the static scattering pattern from the shots where the laser was on and the sample was dynamic. 

The `detArrays` key that indicates whether the laser was on is `'laserOn'`. There are also other keys which would seem to measure the same thing, but that is an artifact from the beamtime where the event codes for laser on would be written to one address or another. The keys for the UV laser intensity are `'uvint0'` and `'uvint1'`, corresponding to two separate laser diodes.

**TASK:** Make a single plot that histograms the laser intensity values separately for laser on and laser off shots. Try it for each of the UV laser diodes. Does one of the diodes tell you which shots were on/off?

In [None]:
# Your code here





# TASK 4: Outlier rejection
As the last few tasks have illustrated, not every LCLS shot translates into a good data point for time-resolved analysis. Some shots purposely do not have a pump laser pulse, and some shots are untrustworthy due to random apparatus malfunctions. We need to pick out a subset of the total shots that we think contain good information based on the point data values. We can do this by building up a boolean index that indicates whether we want to keep a particular shot or throw it away.

Let's filter on a list of point data values:
- X-ray Energy
- X-ray On
- Laser On
- Timetool Position
- Timetool FWHM
- Timetool Amplitude

We've already decided on some good ranges for most of these values. I'll get you started with the X-ray Energy and X-ray On filters, and you can go from there

**TASK:** Using the starter code below, build a boolean index array called `goodIdx` that indicates the shots we want to keep. After each additional filter, print out the fraction of good shots still left.

In [None]:
# Starter code
goodIdx = ( detArrays['xrayEnergy']>.6 )
goodIdx = goodIdx & (detArrays['xrayOn'].astype(bool))

# Your code here






# CHECK: Histogram of the timetool positions after outlier rejection
Does this look like a reasonable distribution of timetool values, based on all the filtering you've done?

In [None]:
plt.figure()
plt.hist(detArrays['ttfltpos'][goodIdx],bins=500);
plt.xlim([0,2000])
plt.xlabel('fltpos')
plt.ylabel('counts')
plt.show()

# EXAMPLE: Timebinning the shots
Now that we have rejected outliers, we are ready to timebin our shots. This is the crucial step in any time-resolved analysis since it allows us to get good enough X-ray counting statistics at each time delay to infer the specific geometry of the molecule.

This part contains a fair amount of logic that is specific to the apparatus and the timetool calibration, so it'll just be an example. However, make sure you understand this part, and don't be afraid to ask lots of questions.

To summarize, there are two crucial quantities that together determine the shot-by-shot time delay: `'stageencoder'` and `'ttfltpos'`. We've already seen the timetool edge position quantity earlier. The `'stageencoder'` value corresponds to a delay stage position in mm on the pump laser. A larger value on this delay stage means the laser is getting more delayed with respect to the X-rays, so a smaller pump-probe delay. The timetool has its own delay stage that keeps the timetool edge roughly centered on the spectrometer, so the sum of the delays measured on the stage and the timetool gives the shot-by-shot pump-probe delay.

Let's first just look at the nominal time delays defined solely by the `'stageencoder'` values. 

In [None]:
pos = detArrays['stageencoder'][goodIdx]
goodRois = detArrays['rois'][goodIdx,:]
offIdx = ~detArrays['laserOn'].astype(bool)
offRois =  detArrays['rois'][offIdx,:]

pos0 = 56.35 # LU92 nominal time zero
posfs = -2*(pos-pos0) / (3e-4) # 3e-4 is speed of light in mm/fs
unpos = np.sort(np.unique( np.round(pos,decimals=2)))

plt.figure()
plt.plot(unpos)
plt.ylabel('unique delays (mm)')
plt.xlabel('idx')
plt.show()

# OPTIONAL: Weight individual shots by various quantities
Here we have the opportunity to weight each shot by particular quantities such as the X-ray energy. For now we will not weight individual shots by anything for simplicity.

In [None]:
def makeWeights(rois, goodIdx):
    goodrois = rois[goodIdx,:]
    
    gsum = np.nansum(goodrois,-1)
    groiN = ((goodrois.T)/(gsum.T)).T
    
    weights = np.zeros_like(goodrois)
    for idx,roi in enumerate(groiN):
        weights[idx,:] = groiN[idx,:]
        
    return weights

weightMe = makeWeights(detArrays['rois'], goodIdx)

# EXAMPLE: Rebin the shot-by-shot data into time bins (without timetool correction)
We need to create timebins into which we can then place individual shots. As a first pass, let's ignore the timetool correction, and just use the nominal time delays based on delay stage position as the centers of our timebins.

Remember that earlier, we created a list of unique delay stage positions by rounding the `'stageencoder'` values to two decimal places and finding the unique values. We can now find the bin edges by picking the midpoints between adjacent centers individually.

Once we have the bins and weights, we can efficiently timebin the shots with the `for` loop at the bottom. The way we do this is that for each radial (or Q) bin, we histogram the total number of shots in each timebin, where each shot is weighted by the number of X-rays that scattered into the Q bin on that shot. We then normalize each timebin by the number of shots in each timebin. 

Take some time to convince yourself that this indeed gives us the timebinned X-rays in each Q bin.

In [None]:
def createBinsFromCenters(centers):
    bins = []
    nc = centers.size
    for idx,c in enumerate(centers):
        if idx == 0:
            dc = np.abs( c - centers[idx+1])/2.
            bins.append(c-dc)
            bins.append(c+dc)
        elif idx == nc-1:
            dc = np.abs( c - centers[idx-1])/2.
            bins.append(c+dc)
        else:
            dc = np.abs( c - centers[idx+1])/2.
            bins.append(c+dc)
    return np.array(bins)

# Find the bin centers and edges using the function above
bins = createBinsFromCenters( unpos )
centers = unpos
centersfs = 2*np.flip(np.array(centers)-pos0) / (3e-4) # delay in fs is negative of delay stage position
nb = bins.size
nr = goodRois.shape[1]

radialHist = np.zeros((nb-1,nr))
radialAvg = np.zeros((nb-1,nr))

# This is where the magic happens
counts,edges = np.histogram( pos,bins=bins)
for ir in range(nr):

    radialHist[:,ir],edges = np.histogram( pos,bins=bins, weights=weightMe[:,ir])
    radialAvg[:,ir] = radialHist[:,ir] / counts
    
# Plot the number of shots in each timebin    
plt.figure()
plt.plot(centers, counts,'.-')
plt.xlabel('delay pos')
plt.ylabel('counts')
plt.title('Counts in each timebin')
plt.show()

# EXAMPLE: Update cutoff to reflect bad points above
Note that some bins have very few counts in them. This comes from the fact that when the delay stage moves between positions, the X-rays are still firing. Since the instantaneous position of the delay stage is read out on every shot, this means that there will be timebins in between the delays we want that have very few shots. We need to get rid of these timebins to visualize our data.

Based on the plot above, any cutoff above 100 counts should do the trick. We have chosen 1000 as our cutoff for the number of shots in our timebin. Since we're still using relatively large timebins with no timetool correction, this just serves to get rid of the bad points mentioned above. However, once we start timetool correcting, the counts cutoff will become a meaningful decision we need to make.

In [None]:
cutoff = 1000
plot2d= (radialAvg)[counts>cutoff,:]

# EXAMPLE: Plotting the time-resolved signal
You've made it! We're ready to plot the difference signal between laser-off and laser-on shots.

We're doing a few things here:
- Gaussian filtering in Q to smooth out shot noise in Q bins
- Subtracting the laser-off shots ("goose" shots) from all shots
- Selecting a scale for our colorbar to estimate the percent difference (this will give us the excitation fraction)

Familiarize yourself with the mechanics here. We will do this again as an exercise with the timetool-corrected dataset, but the mechanics are otherwise the same.

In [None]:
plot2d= (radialAvg)[counts>cutoff,:]
rcent = centers[counts>cutoff]
rcentfs = -2*(rcent-pos0) / (3e-4)
roio = np.nansum(offRois,-1)
subAll = np.mean(((offRois.T)/(roio.T)).T,0) # goose subtraction

gf = lambda x: gaussian_filter1d(x,4,axis=-1)
plot2d = (gf(plot2d)-gf(subAll)) / gf(subAll)
plot2d = gaussian_filter1d(plot2d,1,axis=0)

dv = 4e-3
plt.figure(figsize=(10,8))
plt.pcolormesh(Q, rcentfs, plot2d, vmin = -dv, vmax = dv )
cbar = plt.colorbar()
cbar.ax.tick_params(length=5,width=2,labelsize=15)
plt.xlabel('Q (iA)',fontsize=20)
plt.ylabel('delay (fs)',fontsize=20)
plt.title('(I - I(off))/I(off)',fontsize=20)
ax = plt.gca()
ax.tick_params(axis='both',length=5,width=2,labelsize=15)
plt.show()

### Visualizing the same data as above, but now with each timebin getting its own line

In [None]:
fig = plt.figure(figsize=(9,6))
for idx,delay in enumerate(rcentfs):
    plt.plot(Q, plot2d[idx,:], label='%.2f ps'% (delay/1000),linewidth=2 )
plt.ylim([-.015,.015])
plt.xlabel('Q',fontsize=20)
plt.ylabel('dI/I',fontsize=20)
plt.title('Runs %s'%runNumbersRange,fontsize=20)
ax = plt.gca()
ax.tick_params(axis='both',length=5,width=2,labelsize=15)
plt.legend(fontsize=15,bbox_to_anchor=(1, 1), loc='upper left',ncol=2)
plt.show()

# EXAMPLE: Timetool correction
We've looked at timebins using the nominal delays between the pump and probe. However, the X-ray laser has significant jitter in its timing, so to get fine shot-by-shot timing we must use the timetool correction. We do this using the calibrated relationship between `'ttfltpos'` and the actual time delay.

We previously performed a quadratic fit to the calibration run that gave us fit coefficients, shown below in `ttpoly`. We now take the nominal time delays and add back in the timetool correction using the `ttcorr()` function as shown below.

In [None]:
ttpoly = [2.95684259e-06, -1.43969413e-03] # LV11
ttpoly = [2.95684259e-06, -1.43969413e-03] # LU92
ttpoly = [-9.36209506e-10,  3.76314033e-06, -1.63476074e-03] # LU92 quadratic fit
def ttcorr(ttpos,ttpoly):
    return ttpoly[0]*ttpos**2+ttpoly[1]*ttpos+ttpoly[2] # quadratic fit to previous calibration

ttpos = detArrays['ttfltpos'][goodIdx]
truepos = -2*(pos-pos0) / (3e-4)  - ttcorr(ttpos,ttpoly)*1.0e6 # convert calibration from ns to fs

plt.figure(figsize=(9,6))
plt.hist(truepos,bins=1000)
ax = plt.gca()
ax.tick_params(axis='both',length=5,width=2,labelsize=15)
plt.xlabel('pump-probe delay (fs)',fontsize=20)
plt.ylabel('frames in timebin',fontsize=20)
plt.title('pump-probe delay histogram')
plt.title('Day 1: Runs %s'%runNumbersRange,fontsize=20)
plt.show()

# TASK 5: Selecting finer timebins
Let's now select a set of much finer timebins that are evenly spaced in time. We can use much of the same infrastructure as when we were timebinning based on coarse nominal delays. 

**TASK:** Based on the histogram of counts in the very fine timebins in the plot above, choose a range of time delays and a timebin size. Implement this set of timebin centers as a Numpy array called `usecenters`.

In [None]:
usecenters =  # this is a Numpy array of timebin centers
bins = createBinsFromCenters( np.array(usecenters) )
centersfs = usecenters

nb=bins.size

radialHist = np.zeros((nb-1,nr))
radialAvg = np.zeros((nb-1,nr))
counts,edges = np.histogram( truepos,bins=bins)
for ir in range(nr):
    radialHist[:,ir],edges = np.histogram( truepos,bins=bins, weights=weightMe[:,ir])
    radialAvg[:,ir] = radialHist[:,ir] / counts

# TASK 6: Choosing a new counts cutoff
We now have MUCH finer timebins than we did before. Because of this, the number of counts in each timebin will be much lower. We need to choose a good counts cutoff to go with these new timebins.

There are two competing interests here:
- Keeping more timebins allows us to see longer time delays, and potentially more long-time physical processes
- Restricting ourselves to bins with many counts improves the shot noise in each bin, reducing our noise in the bins we do keep

It is up to us to make a judgement call on this. There are no rules written in stone here.

**TASK:** Choose a new cutoff that reflects your best guess based on the two competing interests mentioned above.

In [None]:
cutoff = 0 # Adjust this based on your best judgement
plt.figure(figsize=(9,6))
plt.plot(centersfs, counts,'.-',linewidth=2,markersize=10)
plt.ylim([cutoff,np.max(counts)+500])
plt.xlabel('binned delay (fs)',fontsize=20)
plt.ylabel('counts in bin',fontsize=20)
plt.title('Day 1: Runs %s'%runNumbersRange,fontsize=20)
ax = plt.gca()
ax.tick_params(axis='both',length=5,width=2,labelsize=15)
plt.show()

# TASK 7: Putting it all together
We are now ready to look at our timetool-corrected difference signal. 

**TASK:** Write code to plot the timetool-corrected difference signal using your fine timebins defined above.

*Hint: This code should be virtually identical to the code used to generate the difference signal plot using nominal timebins.*

In [None]:
# Your code here



