# The Radial Distribution Function for Flying Discs

In this lab we will investigate the internal _structure_ of particles moving on a 2-dimensional surface.
To do so we first use an _analogue simulation_ where small magnetic and repulsive discs move on a flat surface due to air flowing from the sides as well as from underneath to reduce friction (see side view in the figure below). The particles are confined by repulsive, magnetic bars, located on the four sides of the square surface.

During simulation, a camera is placed above the surface and three videos with three different numbers of particles, _N_ are provided.
This allows us to study how the structure and packing changes with surface concentration.
While the videos are pretty to watch, we want to extract exactly how the particles move.
For this we analyse all frames in the movies with image recognition software that allows us to extract particle positions (_xy_ coordinates) over time.
The extracted coordinates is next used to calculate the so-called
[_radial distribution_function (RDF)_](https://en.wikipedia.org/wiki/Radial_distribution_function), $g(r)$.
The RDF is a very interesting property as it describes, on average, how a molecular system is organised, and can be used to extract further thermodynamic information.

![alternate text](figs/experiment.png)

## Learning outcomes
- Gain understanding of molecular structure and the _radial distribution function_, $g(r)$.
- Analyse and interpret $g(r)$ in solid, liquid, and gaseous states.
- Read and write movies and HDF files from/to disk.
- Use image recognition to track particle positions over time.

## Flow of events

This outlines the steps we need to take to analyse the pre-recorded videos:

1. Split pre-recorded videos into individual images (requires `ffmpeg`).
0. Use image recognition to find particle positions and save to trajectory file (`.h5` format). This software
   is also used to track particles in "real" experiments, see [here](http://soft-matter.github.io/trackpy/dev/index.html).
0. Calculate distance histogram.
0. Calculate the radial distribution function, $g()$.

## Quick guide to Jupyter Notebooks
- Double click on a cell to edit it.
- Run code in a cell by pressing `shift+return`.
- For getting help on a function, place the cursor inside the `()` brackets and press `shift+tab-tab`.
- More on text formatting, equations etc.
[here](http://jupyter.cs.brynmawr.edu/hub/dblank/public/Jupyter%20Notebook%20Users%20Manual.ipynb).

---

In [None]:
# load modules required for the analysis
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import os
import subprocess
import pims
import trackpy as tp
import base64
from scipy.spatial import distance
from IPython.display import HTML

plt.rcParams.update({"font.size": 16, "figure.figsize": [8.0, 6.0]})


# this function is used to visualize videos in the notebook
def video(file, mimetype="mp4"):
    """Show given video file"""
    video_encoded = base64.b64encode(open(file, "r+b").read())
    return HTML(
        data="""<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>""".format(video_encoded.decode("ascii"))
    )

---
## Movie files
The following three videos of particle simulations were recorded using a smart phone and these will be the basis of the following analysis.
Here we create an array containing the filenames of the movies and two parameters that will be used for the image analysis:

- `diameter`: the diameter of the particles
- `percentile`: minimum relative brightness of the particles to distinguish them from the background 

In [None]:
movies = [
    {
        "file": "movies/N20-light.mp4",
        "percentile": 60,
        "diameter": 53,
    },  # dictionary for movie 0
    {
        "file": "movies/N40-light.mp4",
        "percentile": 60,
        "diameter": 53,
    },  # dictionary for movie 1
    {
        "file": "movies/N55-light.mp4",
        "percentile": 60,
        "diameter": 53,
    },  # dictionary for movie 2
]  # more movies can be added as needed...

### Select one of the videos 
Use the indexes from 0 to 2 to select one of the `dictionaries` containing filename, percentile and diameter for a system of N particles.

In [None]:
def select_movie(movies, index):
    movie = movies[index]
    return movie, movie["file"]


movie, moviefile = select_movie(
    movies, 1
)  # select movie file here by index, starting from 0

Here we visualize the movie.

In [None]:
video(moviefile)

### Split movie into individual files

Here we use the command line tool `ffmpeg` to split the movie into individual images as these are easier for the `trackpy` module to handle.<br>
First we check if the image directory already existsâ€“if yes the images have already been generated and we can skip this step. 

In [None]:
def split_movie(moviefile):
    imgdir = os.path.splitext(moviefile)[0]
    if not os.path.exists(imgdir):
        os.makedirs(imgdir)
        subprocess.run(
            [
                "ffmpeg",
                "-i",
                moviefile,
                "-f",
                "image2",
                "-vcodec",
                "mjpeg",
                f"{imgdir}/img-%03d.jpg",
                "-v",
                "0",
            ],
            check=True,
        )
    return imgdir


imgdir = split_movie(moviefile)

---

## Image recognition and particle positions

In this section we now extract particle positions (_xy_) from the videos using image recognintion software.

Before analysing all frames found in the 10-15 sec movie, let's check if the (slow) feature extraction
works for a single frame. The function below loads all frames, converts them to grey-scale, and runs feature detection on the first frame to verify the settings.

In [None]:
@pims.pipeline
def as_grey(frame):
    red = frame[:, :, 0]
    green = frame[:, :, 1]
    blue = frame[:, :, 2]
    return 0.2125 * red + 0.7154 * green + 0.0721 * blue


def load_and_test_locate(imgdir, movie):
    color_frames = pims.ImageSequence(imgdir + "/img*.jpg")
    frames = as_grey(color_frames)
    print("read", len(frames), "frames.")
    f = tp.locate(
        frames[0],
        diameter=movie["diameter"],
        invert=True,
        percentile=movie["percentile"],
    )
    print(f.tail())
    # example: separate light and heavy particles by mass
    # heavy = f[f['mass'] > 50000]
    # light = f[f['mass'] < 50000]
    tp.annotate(f, frames[0])
    plt.show()
    return frames


frames = load_and_test_locate(imgdir, movie)

### Extract particle positions from all frames

Assuming that the recognition settings are OK, let's loop over all frames, extract features, and save to a `.h5` trajectory file.
The function skips this process if the trajectory file already exists on disk.

_Warning: this is a slow process!_

In [None]:
def extract_trajectory(moviefile, frames, movie):
    trjfile = os.path.splitext(moviefile)[0] + ".h5"
    if os.path.isfile(trjfile):
        print(f"opening existing trajectory file: {trjfile}")
    else:
        with tp.PandasHDFStore(trjfile) as s:
            for cnt, image in enumerate(frames, 1):
                print("frame %d/%d." % (cnt, len(frames)), end=" ")
                features = tp.locate(
                    image,
                    diameter=movie["diameter"],
                    percentile=movie["percentile"],
                    invert=True,
                )
                print("number of particles =", len(features))
                s.put(features[["x", "y", "mass", "frame"]])
    return trjfile


trjfile = extract_trajectory(moviefile, frames, movie)

### Read trajectory file and calculate distances between all points

In this section we calculate all distances between all particles for each frame. These are then binned into a histogram to give the probability of observing a particular separation.<br>
At the same time we sample the distribution for _ideal_ particles by simply generating random positions and perform the same analysis as for the "real" particles.

In [None]:
def compute_distance_histograms(trjfile):
    dist = np.ndarray(shape=(0, 0))
    with tp.PandasHDFStore(trjfile) as s:
        for frame in s:
            dist = np.append(dist, distance.pdist(frame[["x", "y"]]))
        data = s.dump()
        xmin, xmax = min(data.x), max(data.x)
        ymin, ymax = min(data.y), max(data.y)
        x = np.random.randint(xmin, xmax + 1, 4000)
        y = np.random.randint(ymin, ymax + 1, 4000)
        udist = distance.pdist(np.array([x, y]).T)
        hist = plt.hist(
            dist,
            bins=150,
            density=True,
            range=[0, 700],
            histtype="step",
            color="black",
            label="real",
        )
        uhist = plt.hist(
            udist,
            bins=150,
            density=True,
            range=[0, 700],
            histtype="step",
            color="red",
            label="ideal",
        )
        plt.legend(loc=0, frameon=False)
        plt.xlabel("distance (pixels)")
        plt.ylabel("probability")
        plt.show()
    return hist, uhist


hist, uhist = compute_distance_histograms(trjfile)

### Radial Distribution Function, $g(r)$

We have now calculated the distance distribution, `hist`, from the simulated particles from the movie, as well as for a uniform distribution of $N$ particles, `uhist`. The radial distribution function is simply the ratio between the two.
This means that if the particles were behaving ideally (which they don't), $g(r)$ would be unity for all separations, $r$. After plotting, the final rdf is saved to disk.

In [None]:
def compute_and_save_rdf(hist, uhist, moviefile):
    r = hist[1][: len(hist[0])]
    g = hist[0] / uhist[0]
    plt.plot(r, g, "k-")
    plt.xlabel("$r$ (pixels)")
    plt.ylabel("$g(r)$")
    plt.title("Radial distribution function (RDF)")
    rdffile = os.path.splitext(moviefile)[0] + ".rdf.dat"
    np.savetxt(rdffile, np.array([r, g]).T, header="rdf from " + moviefile)
    return r, g


r, g = compute_and_save_rdf(hist, uhist, moviefile)

### Plot all rdf's found on disk

In [None]:
def plot_all_rdfs(movies):
    for d in movies:
        name = os.path.splitext(d["file"])[0]
        rdffile = name + ".rdf.dat"
        if os.path.isfile(rdffile):
            r, g = np.loadtxt(rdffile, unpack=True)
            plt.plot(r, g, "-", label=os.path.basename(name), lw=2)
    plt.legend(loc=0, frameon=False)
    plt.xlabel("$r$")
    plt.ylabel("$g(r)$")
    plt.xlim([40, 300])
    plt.ylim([0, 4])


plot_all_rdfs(movies)

---
## Questions

Please fill in answers below and use this Notebook, exported or printed as a PDF, as your final report

### Why does $g(r)$ deviate from unity at large separations?

Your answer here; insert code blocks and output as needed.

### What is the particle size and the system's volume fraction (area occupied by the particles / total area)?

Your answer here; insert code blocks and output as needed.


### It seems as if there's a small maximum in $g(r)$ at short separations. Is this real and, if so, how is this possible for repulsive particles?

Your answer here; insert code blocks and output as needed.


### Convert $g(r)$ to the potential of mean force and plot this (Hint: use numpy's function `np.log()` as in `pmf=-np.log(g)`).

Your answer here; insert code blocks and output as needed.


### Repeat the full analysis but for a more concentrated system. Discuss differences.

Hint: the functions defined above (`select_movie`, `split_movie`, `load_and_test_locate`, `extract_trajectory`, `compute_distance_histograms`, `compute_and_save_rdf`) can be reused with a different movie index.

Your answer here; insert code blocks and output as needed.


---
## Outlook

There are always many more things we can investigate.
In the above we studied structure through $g(r)$ which is a static, equilibrium property that does not depend on time.
In the "experiment", we however have access to kinetics as well and the following code snippet shows how we may track particles over time.
With this we could e.g. calculate the velocity distribution to get an idea of the "temperature" which here is controlled by the airflow.

Below you will find some preliminary work in this regard and you may use it to further investigate the system.
Note that this is _not_ mandatory and only here for inspiration!

In [None]:
def track_particles(movies, index):
    movie = movies[index]
    moviefile = movie["file"]
    trjfile = os.path.splitext(moviefile)[0] + ".h5"
    with tp.PandasHDFStore(trjfile) as s:
        data = s.dump()
    t = tp.link_df(data, search_range=20, memory=3)
    tp.plot_traj(t)
    return t


t = track_particles(movies, 0)

In [None]:
def plot_msd(movies):
    for d in movies:
        name = os.path.splitext(d["file"])[0]
        trjfile = name + ".h5"
        with tp.PandasHDFStore(trjfile) as s:
            data = s.dump()
        t = tp.link_df(data, search_range=20, memory=3)
        em = tp.emsd(t, mpp=20.0 / 600.0, fps=24)
        plt.plot(em.index, em, "o", label=os.path.basename(name))
    plt.legend(loc=0, frameon=False)
    plt.xscale("log")
    plt.yscale("log")
    plt.ylabel(r"$\langle \Delta r^2 \rangle$ [cm$^2$]")
    plt.xlabel("lag time $t$ [s]")
    plt.show()


plot_msd(movies)