# Analysis of DPPC X-ray Reflectometry Data

This is an example notebook showing the use of the surf_monolayer class for the analysis of surface-active molecules, such as lipids. 

In this example, the lipid DPPC is studied at the air-water interface. This X-ray reflectivity data was measured at Diamond Light Source, and shared openly. 

The SurfMono class constrains the model based on the following relationship. 

$$1-\phi_s = \frac{SLD_td_tb_h}{SLD_hd_hb_t}$$

where, $\phi_s$ is the fractional solvent volume in the head layer, $t$ and $h$ indicate tail or head layer, and $SLD$, $d$, and $b$, identify the scattering length density, thickness and scattering length respectively.

This example only shows one contrast however this same class is also capable of analysing multiple contrasts with the constraint that they all have the same underlying model (e.g. the surface excess should be the same).

In [None]:
# Standard libraries to import
%matplotlib inline
import numpy as np 
import matplotlib.pyplot as plt
from matplotlib import rcParams, rc
from __future__ import division
from scipy.stats import pearsonr, norm
from IPython.display import Markdown as md

# The refnx library, and associated classes
import refnx
from refnx.reflect import structure, ReflectModel, Structure
from refnx.dataset import ReflectDataset
from refnx.analysis import Transform, CurveFitter, Objective

# The SurfMono class to constain the monolayer model. 
import surf_monolayer as sm

The DPPC monolayer model is built, the SurfMono class constrains the number density of the lipid heads and tails such that they are held constant throughout the fitting process. This allows the volume fraction of solvent in the head region to be greater than 0, but keeps the volume fraction of solvent in the tails as 0 throughout. 

In [None]:
# Reading dataset into refnx format
dataset = ReflectDataset('dppc_water_xrr.dat')

# Scattering length of the lipid head group 
# (found from summing the electrons in the head group 
# and multiplying by the classical radius of an electron)
head_sl = [4674e-6]
# Scattering length of the lipid tail group 
tail_sl = [6897e-6]
# Solvent SLD
solvent_sld = [9.45]
# SLD of air 
super_sld = [0]
# Some initial values for the head and tail thicknesses & APM
initial_thicknesses = [12.5, 16.6]
initial_apm = 69.

# Calling the SurfMono class to initialise it
dppc = sm.SurfMono(head_sl, tail_sl, solvent_sld, super_sld, 
                   initial_thicknesses, initial_apm, name='dppc')
# Getting the structure from the SurfMono class
dppc.get_structures()

# Creating a ReflectModel class object, add setting an initial scale 
model_dppc = ReflectModel(dppc.structures['con0'])
model_dppc.scale.setp(vary=True, bounds=(0.005, 10))
# The background for held constant to a value determined from a previous fitting
model_dppc.bkg.setp(3.52703e-10, vary=False)

# The Objective object is created, and the data is transformed so that 
# the fitting is in rq4 space
objective = Objective(model_dppc, dataset, transform=Transform('YX4'))
# A differential evolution algorithm is used to obtain an best fit
fitter = CurveFitter(objective)
# A seed is used to ensure reproduciblity
res = fitter.fit('differential_evolution')

This is where the Markov Chain Monte Carlo (MCMC) sampling begins. This allows the parameter probability density functions to be determined. 

In [None]:
# The first 200*200 samples are binned
fitter.sample(200)
fitter.sampler.reset()

In [None]:
# The collection is across 5000*200 samples
# The random_state seed is to allow for reproducibility
res = fitter.sample(500, nthin=1)

The 1D probability density functions of each of the parameters is then ploted, and 2D pdfs are used to show the correlations that are present between the different parameters. 

In [None]:
print(objective)

In [None]:
saved_params = np.array(objective.parameters)
choose = objective.pgen(ngen=50)
for pvec in choose:
    objective.setp(pvec)
    calc = model_dppc(dataset.x, x_err=dataset.x_err) * np.power(dataset.x, 4)
    plt.plot(dataset.x, calc, color='k', linewidth=1, alpha=0.05)
data = dataset.y * np.power(dataset.x, 4)
data_err = dataset.y_err * np.power(dataset.x, 4)
plt.errorbar(dataset.x, data, yerr=data_err, linestyle='', marker='x', markersize=5, 
             markeredgecolor='k', markerfacecolor='none', ecolor='k')
plt.plot(dataset.x, model_dppc(dataset.x, x_err=dataset.x_err)*dataset.x**4, color='b', 
         linewidth=2)
plt.ylabel('$Rq^4$/Å$^{-4}$')
plt.yscale('log')
plt.xlabel('$q$/Å$^{-1}$')
plt.tight_layout()
plt.show()

In [None]:
objective.setp(saved_params)
z, true_sld = dppc.structures['con0'].sld_profile()
for pvec in choose:
    objective.setp(pvec)
    zs, sld = dppc.structures['con'].sld_profile()
    plt.plot(zs, sld, color='k', linewidth=1, alpha=0.05)
plt.plot(z, true_sld, color='b', linewidth=2)
plt.xlabel('$z$/Å')
plt.ylabel('SLD/$10^{-6}Å^{-2}$')
plt.show()

In [None]:
import corner

variables=['Scale', '$d_t$', '$SLD_t$', '$\sigma_t$', '$d_h$', '$\sigma_h$', '$\phi_s$']

fig = corner.corner(fitter.sampler.flatchain, labels=variables)
fig.show()

There is a clear correlation between the thickness of the head layer and the percentage solvent of this layer. As a result the SLD of the solvated head layer, SLD$_\text{sh}$, as defined as follows:

$$ \text{SLD}_\text{sh} = \frac{\text{SLD}_\text{tl}\text{t}_\text{t}\text{b}_\text{h}}{\text{t}_\text{hl}\text{b}_\text{t}} + \text{SLD}_\text{s}\phi_\text{s}. $$

The first task is to unpack the flatchain

In [None]:
# tail thickness
d_t = fitter.sampler.flatchain[:,1]
# tail sld
sld_t = fitter.sampler.flatchain[:,2]
# head thickness
d_h = fitter.sampler.flatchain[:,4]
# fractional solvation 
phi_s = fitter.sampler.flatchain[:,6]

The SLD$_\text{sh}$ can then be calculated.

In [None]:
# head-layer sld calculation
sld_sh = (sld_t * d_t * head_sl[0] / (d_h * tail_sl[0])) + (10.8 * phi_s)

# building the new chain to plot
compare = np.zeros((len(fitter.sampler.flatchain), 4))
compare[:,0] = sld_t
compare[:,1] = d_t
compare[:,2] = sld_sh
compare[:,3] = d_h

# plotting, showing and saving a high resolution version of the plot
variables=['$SLD_t$', '$d_t$', '$SLD_{sh}$', '$d_h$']
fig = corner.corner(compare, show_titles=False, labels=variables, bins=20)
fig.show()

The area per molecule (APM) can be determined from the SLD$_\text{t}$ and t$_\text{tl}$ by the following relationship:

$$ \text{APM} =  \frac{\text{b}_\text{t}}{\text{SLD}_\text{t}\text{t}_\text{tl}}$$

The APM can be found for each of the MCMC samples and therefore the probability density function found. 

The average and standard deviation are then found by fitting a Gaussian function to the pdf.

In [None]:
# calculation of apm
apm = tail_sl[0]/(sld_t*1e-6*d_t)

# plotting of apm pdf
plt.hist(apm, bins=20, histtype='step', color='black', lw=1, normed=True)
plt.xlabel('APM/Å$\mathregular{^{2}}$')
plt.ylabel('prob(APM|{data},I)')
plt.show()

# fitting of a Gaussian to the pdf and printing the average and standard deviation of the APM
mu_apm, sigma_apm = norm.fit(apm)
md('APM = {:.3f}+/-{:.3f} Å$^2$'.format(mu_apm, sigma_apm))

The volume of the head group (V$_\text{h}$) can be determined from the following relationship:

$$ \text{V}_\text{h} =  \frac{\text{t}_\text{hl}\text{b}_\text{t}(1-\phi_\text{s})^2} {\text{SLD}_\text{t}\text{t}_\text{tl}}$$

The V$_\text{h}$ can be found for each of the MCMC samples and therefore the probability density function found. 

The average and standard deviation are then found by fitting a Gaussian function to the pdf.

In [None]:
# Calculation of Vh
vh = d_h*head_sl[0]*(1-phi_s)**2 / (sld_t*1e-6*d_t)

# plotting of Vh pdf
plt.hist(vh, bins=20, histtype='step', color='black', lw=1, normed=True)
plt.xlabel('V$\mathregular{_h}$/Å$\mathregular{^{3}}$')
plt.ylabel('prob(V$\mathregular{_h}$|{data},I)')
plt.show()

# fitting of a Gaussian to the pdf and printing the average and standard deviation of the Vh
mu_vh, sigma_vh = norm.fit(vh)
md('$V_h$ = {:.1f}+/-{:.1f} Å$^3$'.format(mu_vh, sigma_vh))

The volume of the tail group (V$_\text{t}$) can be determined from the following relationship:

$$ \text{V}_\text{t} =  \frac{\text{b}_\text{t}}{\text{SLD}_\text{t}}$$

The V$_\text{t}$ can be found for each of the MCMC samples and therefore the probability density function found. 

The average and standard deviation are then found by fitting a Gaussian function to the pdf.

In [None]:
# calculation of Vt
vt = tail_sl[0]/(sld_t*1e-6)

# plotting of Vt pdf
plt.hist(vt, bins=20, histtype='step', color='black', lw=1, normed=True)
plt.xlabel('V$\mathregular{_t}$/Å$\mathregular{^{3}}$')
plt.ylabel('prob(V$\mathregular{_t}$|{data},I)')
plt.show()

# fitting of a Gaussian to the pdf and printing the average and standard deviation of the Vt
mu_vt, sigma_vt = norm.fit(vt)
md('$V_t$ = {:.1f}+/-{:.1f} Å$^3$'.format(mu_vt, sigma_vt))

Finally, the volume of whole lipid (V$_\text{l}$) can be determined as the sum  of the head and tail volumes. 

The V$_\text{l}$ can be found for each of the MCMC samples and therefore the probability density function found. 

The average and standard deviation are then found by fitting a Gaussian function to the pdf.

In [None]:
# calculation of Vl
vl = vt + vh

# plotting to Vl pdf
plt.hist(vl, bins=20, histtype='step', color='black', lw=1, normed=True)
plt.xlabel('V$\mathregular{_l}$/Å$\mathregular{^{3}}$')
plt.ylabel('prob(V$\mathregular{_l}$|{data},I)')
plt.show()

# fitting of a Gaussian to the pdf and printing the average and standard deviation of the Vl
mu_vl, sigma_vl = norm.fit(vl)
md('$V_l$ = {:.1f}+/-{:.1f} Å$^3$'.format(mu_vl, sigma_vl))