# Exercise: HPLC experiment

Credits [ESRF/BM29 beamline](https://www.esrf.fr/home/UsersAndScience/Experiments/MX/About_our_beamlines/bm29.html)

## Introduction

Process data from a [High-performance Liquid Chromatography (HPLC)](https://en.wikipedia.org/wiki/High-performance_liquid_chromatography) experiment performed on [ESRF/BM29 BioSAXS beamline](https://www.esrf.fr/home/UsersAndScience/Experiments/MX/About_our_beamlines/bm29.html).

<div align="center"><img src="img/BM29_picture.jpg" width="40%" alt="BM29 picture" /><img src="img/BM29_setup.jpg" width="40%" alt="BM29 setup" /></div>

The sample is [Bovin Serum Albumin (BSA)](https://en.wikipedia.org/wiki/Bovine_serum_albumin) protein (used as a standard sample):

<img src="img/Bovine_serum_albumin_3v03_crystal_structure.jpg" width="100px" alt="BSA"/>

The buffer and sample are exposed to X-rays while passing through a capillary.
Images are recorded over time (400 in this experiment) and an azimuthal integration is performed for each image with [pyFAI](http://www.silx.org/doc/pyFAI/latest/).

<div align="center"><img src="img/saxs_setup.jpg" alt="SAXS setup"/>
<img src="img/azimuthal_integration.png" alt="Azimuthal integration" width="400px" /></div>


This results in 400 curves of integrated intensities **I** for 1000 values of **q**.
Those **I** values are stored as a 2D dataset of shape (400, 1000) in the `intensities.npy` file.
The **q** values are stored in the `q.txt` file.

At first, only the buffer is passing through the capillary, then sample + buffer and finally buffer again.

The goal is to extract the intensity contributed by the sample.
The steps are:

1. Separate integrated intensities corresponding to buffer+sample from those corresponding to buffer only
2. Estimate the buffer and the sample + buffer intensities by averaging the selected integrated intensities
3. Remove the buffer background from sample + buffer

## Part I

In [None]:
import numpy as np

### Load data

Load intensities **I** from the `intensities.npy` file and **q** values from the `q.txt` file.

In [None]:
intensities = # TODO
q = # TODO

### Plot data

In [None]:
%matplotlib widget
# This requires ipympl
# Or for non-interactive plots: %matplotlib inline

from matplotlib import pyplot as plt

In [None]:
# Plot the intensities
import matplotlib.colors as colors

fig = plt.figure()
#plt.imshow(intensities, norm=colors.LogNorm(), aspect="auto")
# Note: with latest version of matplotlib:
plt.imshow(intensities, norm="log", aspect="auto")
plt.colorbar()
plt.xlabel('q index')
plt.ylabel('Curve index')
plt.title('Each row of that 2d array (image) is a 1d curve')


In [None]:
# Plot the curves 0 and 310
fig = plt.figure()
plt.plot(q, intensities[0],label=f"Curve #0 (first row of the intensities array) - no sample")
plt.plot(q, intensities[310],label=f"Curve #310 (311th row of the intensities array) - with sample")
plt.yscale("log")  # Use logarithmic scale for y axis
plt.xlabel("q")
plt.ylabel("Intensity")
plt.title("Intensity vs q")
plt.legend()

## Part II

### Average of all azimuthal integrations

Compute the averaged intensity over `intensities` for each value of `q`

In [None]:
intensities_mean = # TODO

In [None]:
fig = plt.figure()
plt.plot(q, intensities_mean)
plt.xlabel("q")
plt.ylabel("I")
plt.yscale("log")
plt.title("Average intensity")  # Add a title to the plot

Note: This is not meaningful, the buffer and sample + buffer cases should be separated.

### Summed intensity of each azimuthal integration

Compute the sum of each row of the `intensities` data

In [None]:
intensities_per_frame = # TODO

In [None]:
fig = plt.figure()
plt.plot(intensities_per_frame)
plt.xlabel("Frame ID")
plt.ylabel("I")

## Part III

### Separate sample + buffer from buffer only

Select buffer and sample + buffer intensities by using a threshold over `intensities_per_frame`.

In [None]:
buffer = # TODO

In [None]:
sample_buffer = # TODO

In [None]:
print("buffer shape:", buffer.shape, "sample_buffer shape:", sample_buffer.shape)

### Average sample + buffer and buffer intensities

Compute the average of azimuthal integrations of `buffer` for each `q`.

In [None]:
buffer_mean = # TODO

Do the same for `sample_buffer`.

In [None]:
sample_buffer_mean = # TODO

In [None]:
fig = plt.figure()
plt.plot(q, buffer_mean, "black", q, sample_buffer_mean, "red")
plt.title("buffer and sample + buffer average")
plt.xlabel("q")
plt.ylabel("I")
plt.yscale("log")

### Remove buffer background

Compute the different between `sample_buffer_mean` and `buffer_mean`.

In [None]:
sample = # TODO

In [None]:
fig = plt.figure()
plt.plot(q, sample)
plt.yscale("log")

## Solution

<details><summary>...</summary>

    
```python
# Part I

import numpy as np

# Load data
intensities = np.load("intensities.npy")
q = np.loadtxt("q.txt")

# Part II

# Average of all azimuthal integrations
intensities_mean = np.mean(intensities, axis=0)

# Summed intensity of each azimuthal integration
intensities_per_frame = np.sum(intensities, axis=1)

# Part III

# Separate sample + buffer from buffer only
# 1. with thresholds
buffer_mask = intensities_per_frame < 32500
buffer = intensities[buffer_mask]
sample_buffer_mask = frames_intensities > 33000
sample_buffer = intensities[sample_buffer_mask]
# 2. With slicing
buffer = intensities[:200]
sample_buffer = intensities[270:340]

# Average sample + buffer and buffer intensities
buffer_mean = np.mean(buffer, axis=0)
sample_buffer_mean = np.mean(sample_buffer, axis=0)

# Remove buffer background
sample = sample_buffer_mean - buffer_mean
```
    
</details>