# Getting started

## Fitting a neutron reflectometry dataset

We start off with all the relevant imports we'll need.

In [None]:
%matplotlib inline
from __future__ import print_function, division

import os.path
import numpy as np
import matplotlib.pyplot as plt
import scipy

import refnx
from refnx.dataset import ReflectDataset, Data1D
from refnx.analysis import Transform, CurveFitter, Objective, Model, Parameter
from refnx.reflect import SLD, Slab, ReflectModel

It's important to note down the versions of the software that you're using, in order for the analysis to be reproducible.

In [None]:
print('refnx: %s\nscipy: %s\nnumpy: %s' % (refnx.version.version, scipy.version.version, np.version.version))

The dataset we're going to use as an example is distributed with every install. The following cell determines its location.

In [None]:
pth = os.path.dirname(refnx.__file__)
DATASET_NAME = 'c_PLP0011859_q.txt'
file_path = os.path.join(pth, 'analysis', 'test', DATASET_NAME)

Now we load the dataset into a `ReflectDataset`.

In [None]:
data = ReflectDataset(file_path)

Now we create a series of `SLD` objects representing each of the materials.

In [None]:
si = SLD(2.07, name='Si')
sio2 = SLD(3.47, name='SiO2')
film = SLD(2.0, name='film')
d2o = SLD(6.36, name='d2o')

We create `Slab`s from these `SLD`s to represent each layer in the system.

In [None]:
# first number is thickness, second number is roughness
# a native oxide layer
sio2_layer = sio2(30, 3)

# the film of interest
film_layer = film(250, 3)

# layer for the solvent
d2o_layer = d2o(0, 3)

Now we specify which parameters are going to vary in a fit, and what the limits are on those parameters:

In [None]:
sio2_layer.thick.setp(bounds=(15, 50), vary=True)
sio2_layer.rough.setp(bounds=(1, 15), vary=True)

film_layer.thick.setp(bounds=(200, 300), vary=True)
film_layer.sld.real.setp(bounds=(0.1, 3), vary=True)
film_layer.rough.setp(bounds=(1, 15), vary=True)

d2o_layer.rough.setp(vary=True, bounds=(1, 15))

A `Structure` is composed from a series of `Component`s. In this case all the components are `Slab`s.

In [None]:
structure = si | sio2_layer | film_layer | d2o_layer

A `Slab` has the following parameters, which are all accessible as attributes:

 - `Slab.thick`
 - `Slab.sld.real`
 - `Slab.sld.imag`
 - `Slab.rough`

In [None]:
print(sio2_layer.parameters)

A `ReflectModel` is made from the `Structure`.`ReflectModel` performs resolution smearing, applies scaling factor and adds a Q-independent constant background. The `ReflectModel` is responsible for calculating the generative model.

In [None]:
model = ReflectModel(structure, bkg=3e-6)
model.scale.setp(bounds=(0.6, 1.2), vary=True)
model.bkg.setp(bounds=(1e-9, 9e-6), vary=True)

An `Objective` is made from a `Model` and a `Data`. Here we use a `Transform` to fit as logY vs X.

In [None]:
objective = Objective(model, data, transform=Transform('logY'))

An `Objective` can calculate statistics for the fitting system.

In [None]:
print(objective.chisqr())

The final setup setup step is to create a `CurveFitter` from the `Objective`. These objects do the fitting/sampling. Let's do an initial fit with differential evolution.

In [None]:
fitter = CurveFitter(objective)
fitter.fit('differential_evolution')

An `bjective` has a plot method, which is a quick visualisation. You need matplotlib installed to create a graph.

In [None]:
objective.plot()
plt.legend()
plt.xlabel('Q')
plt.ylabel('logR')
plt.legend()

`Structure` has a `sld_profile` method to return the SLD profile. Let's also plot that.

In [None]:
plt.plot(*structure.sld_profile())

Let's see the results of the fit. For the case of DifferentialEvolution uncertainties are estimated by estimating the Hessian/Covariance matrix.

In [None]:
print(objective)

Now lets do a MCMC sampling of the curvefitting system. First we do 200 samples which we then discard (burn). These samples are discarded because the initial chain might not be representative of an equilibrated system (i.e. distributed around the mean with the correct covariance).

In [None]:
fitter.sample(200)
fitter.reset()

Now do a production run, only saving 1 in 100 samples. This is to remove autocorrelation. We save 10 steps, giving a total of 10 * 200 samples (200 walkers is the default). The number of burn/sample steps is reduced here for brevity.

In [None]:
res = fitter.sample(10, nthin=100, pool=4)

In the final output of the sampling each varying parameter is given a set of statistics. `Parameter.value` is the median of the chain samples. `Parameter.stderr` is half the [15, 85] percentile, representing a standard deviation.

In [None]:
print(objective)

A corner plot shows the covariance between parameters.

In [None]:
objective.corner();

Once we've done the sampling we can look at the variation in the model at describing the data. In this example there isn't much spread.

In [None]:
objective.plot(samples=100);

In a similar manner we can look at the spread in SLD profiles consistent with the data. The `objective.pgen` generator yields parameter sets from the chain.

In [None]:
# but first we'll save the parameters in an array.
saved_params = np.array(objective.parameters)

z, true_sld = structure.sld_profile()

for pvec in objective.pgen(ngen=500):
    objective.setp(pvec)
    zs, sld = structure.sld_profile()
    plt.plot(zs, sld, color='k', alpha=0.05)

# put back saved_params
objective.setp(saved_params)

plt.plot(z, true_sld, lw=1, color='r')
plt.ylim(2.2, 6)

## Fitting a data to a user defined model

Here we demonstrate a fit to a user defined model. This line example is taken from the [emcee documentation](http://emcee.readthedocs.io/en/stable/user/line.html) and the reader is referred to that link for more detailed explanation. The errorbars are underestimated, and the modelling will account for that.

First we synthesise some data:

In [None]:
np.random.seed(123)

# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 0.534

N = 50
x = np.sort(10*np.random.rand(N))
yerr = 0.1+0.5*np.random.rand(N)
y = m_true*x+b_true
y += np.abs(f_true*y) * np.random.randn(N)
y += yerr * np.random.randn(N)

To use *refnx* we need first need to create a dataset.

In [None]:
from refnx.dataset import Data1D
data = Data1D(data=(x, y, yerr))

Then we need to set up a generative model.

In [None]:
def line(x, params, *args, **kwds):
    p_arr = np.array(params)
    return p_arr[0] + x * p_arr[1]

# the model needs parameters
p = Parameter(1, 'b', vary=True, bounds=(0, 10))
p |= Parameter(-2, 'm', vary=True, bounds=(-5, 0.5))

model = Model(p, fitfunc=line)

Now we create an objective from the mode and the data. We use an extra parameter, `lnsigma`, to describe the underestimated error bars.

In [None]:
lnf = Parameter(0, 'lnf', vary=True, bounds=(-10, 1))
objective = Objective(model, data, lnsigma=lnf)

In [None]:
objective.plot();

Create a CurveFitter from the model.

In [None]:
fitter = CurveFitter(objective)
fitter.fit('differential_evolution')

In [None]:
objective.plot();

Now we'll do some MCMC sampling:

In [None]:
fitter.sample(500);

Burn 200 steps and thin by 50. The number of burn/sample steps is reduced for brevity

In [None]:
from refnx.analysis import process_chain

process_chain(objective, fitter.chain, nburn=100, nthin=50, flatchain=True)
print(objective)

In [None]:
objective.plot(samples=300);