# Speeding Up Analysis


In this tutorial, we'll explore how to speed up analysis of LAMMPS trajectory data with `lammpsio`. We will analyze a dump file from a simulation of a single 100-mer polymer where the bonds interact via the FENE potential:

$$
u(r) = \begin{cases}
-\dfrac{1}{2} k R_0^2 \ln\left(1 - \left(\dfrac{r}{R_0}\right)^2\right) + 4\epsilon\left[\left(\dfrac{\sigma}{r}\right)^{12} - \left(\dfrac{\sigma}{r}\right)^6\right] + \epsilon, & r < R_0 \\[6pt]
\infty, & r \ge R_0.
\end{cases}
$$

and non-bonded beads interact via the Weeks-Chandler-Anderson potential: 

$$
u_{ij}(r) = 4 \varepsilon \left[ \left(\frac{\sigma_{ij}}{r} \right)^{12} - \left(\frac{\sigma_{ij}}{r}\right)^6 + \frac{1}{4} \right], \quad r \le 2^{1/6} \sigma_{ij}.
$$
We've chosen standard Kremer-Grest model parameters of $\sigma=1$, $\varepsilon=1$, $R_0=1.5$, and $k=30$.

We've included the [LAMMPS input](lammps_input.in) and [initial configuration](init.data) files if you want to run the simulation yourself and follow along!

First, we import `lammpsio` and load the corresponding dump file.

In [1]:
import lammpsio
import numpy
import numba
import multiprocessing as mp


traj = lammpsio.DumpFile("traj.lammpstrj")

## Calculating the radius of gyration
We will calculate the radius of gyration, a key measure of polymer size:

$$
R_g^2 = 
\dfrac{1}{N}\sum_{i=1}^{N}(\vec{R_i}-\vec{R_{cm}})^2
$$

where $\vec{R_{cm}}$ is the position vector of the center of mass of the polymer:

$$
R_{cm} = \sum_{j=1}^{N}(M_j\vec{R_j}/\vec{M_j}),
$$

and $\vec{R_i}$ and $\vec{R_j}$ are position vectors and $N$ is the number of beads in the polymer. `lammpsio` makes it easy to extract this data from LAMMPS dump files!

### Pure Python
We'll start by calculating $R_g^2$ using a pure Python implementation. Note that to limit computational resources, we have shortened the simulation considerably, so it will not give an accurate numeric value. However, in principle, if run long enough, this script would produce the correct $R_g^2$.

In [8]:
def compute_rg(pos, N):
    # Compute center of mass
    rcm = [0.0, 0.0, 0.0]
    for i in range(N):
        rcm[0] += pos[i][0]
        rcm[1] += pos[i][1]
        rcm[2] += pos[i][2]
    rcm[0] /= N
    rcm[1] /= N
    rcm[2] /= N
    
    # Compute radius of gyration squared
    rg_sqr = 0
    for i in range(N):
        diff_x = pos[i][0] - rcm[0]
        diff_y = pos[i][1] - rcm[1]
        diff_z = pos[i][2] - rcm[2]
        rg_sqr += diff_x*diff_x + diff_y*diff_y + diff_z*diff_z
    return rg_sqr / N

In [11]:
%%timeit -n 100 -r 3
rg_sqr = []
for i, snapshot in enumerate(traj):
    pos = snapshot.position + 2 * snapshot.box.high[0] * snapshot.image
    N = snapshot.N
    rg_sqr.append(compute_rg(pos, N))

32 ms ± 507 μs per loop (mean ± std. dev. of 3 runs, 100 loops each)


### NumPy
This approach takes advantage of NumPy's optimized array operations.

In [12]:
def compute_rg(pos):
    rcm_sqr = numpy.mean(pos, axis=0)
    rg_sqr = numpy.mean(numpy.sum((pos - rcm_sqr)**2, axis=1))
    return rg_sqr

In [15]:
%%timeit -n 100 -r 3
rg_sqr = numpy.zeros(len(traj))
for i, snapshot in enumerate(traj): 
    pos = snapshot.position + 2*snapshot.box.high[0] * snapshot.image
    rg_sqr[i] = compute_rg(pos)


23.7 ms ± 235 μs per loop (mean ± std. dev. of 3 runs, 100 loops each)


This vectorized method is significantly faster than the first approach, resulting in ~25%  reduction in computation time!

### Numba
Alternatively, we can use just-in-time (JIT) compilation with [numba](https://numba.readthedocs.io/en/stable/index.html) to speed up calculations! We'll take the `compute_rg` function from our first implementation and add the `@numba.njit` decorator to enable JIT compilation.

In [17]:
@numba.njit
def compute_rg(pos, N):
    # Compute center of mass
    rcm = [0.0, 0.0, 0.0]
    for i in range(N):
        rcm[0] += pos[i][0]
        rcm[1] += pos[i][1]
        rcm[2] += pos[i][2]
    rcm[0] /= N
    rcm[1] /= N
    rcm[2] /= N
    
    # Compute radius of gyration squared
    rg_sqr = 0
    for i in range(N):
        diff_x = pos[i][0] - rcm[0]
        diff_y = pos[i][1] - rcm[1]
        diff_z = pos[i][2] - rcm[2]
        rg_sqr += diff_x*diff_x + diff_y*diff_y + diff_z*diff_z
    return rg_sqr / N

In [20]:
%%timeit -n 100 -r 3
rg_sqr = []
for i, snapshot in enumerate(traj):
    pos = snapshot.position + 2*snapshot.box.high[0] * snapshot.image
    N = snapshot.N
    rg_sqr.append(compute_rg(pos, N))

23 ms ± 390 μs per loop (mean ± std. dev. of 3 runs, 100 loops each)


This approach gives a similar speed up, when compared to the pure Python approach, as NumPy!

## Summary

`lammpsio` makes it simple to load and analyze LAMMPS dump files in Python. As shown above, taking advantage of the Python ecosystem can dramatically speed up your analysis! NumPy and Numba are just two examples that provide significant performance gains. There are many other tools that `lammpsio` can interface with to optimize your specific workflows! 