# Introduction

In this example we will be computing the *radial distribution function*, $g(r)$.

To calculate the radial distribution function, we will follow these steps:

1. Load data
2. Create `freud` RDF object
3. Compute the RDF
4. Extract data from the `freud` RDF object
5. (optional) Plot data

# Required Packages

We will use the following Python packages. Please install them before continuing:

* `numpy`
* `bokeh` (optional, for plotting)
* `matplotlib` (optional, for plotting)

In [None]:
import numpy as np

# Set up freud

Now we need to import `freud` to our environment. In this case, we will import specific modules rather than the entire package. Feel free to just use

    import freud

as you see fit.

In [2]:
from freud import parallel, box, density
parallel.setNumThreads(4)

We imported the following modules:

* `parallel` - allows you to set the number of threads you will use during calculations. Yes, `freud` is parallelized!
    - By default `freud` will use the maximum number of threads available on your machine. Here it is set to 4, please set as needed on your machine. If working on a shared machine, consider using a moderate number of threads.
    - Don't forget to benchmark your calculations, or refer to the `freud` benchmarks to determine the optimal number of cores to use. More is not always better!
* `box` - module used to create the simulation boxes used in `freud` calculations
* `density` - module containing density-related calculations, including the radial distribution function

# Loading Data

`freud` makes no assumptions about your data, and doesn't provide a method to load specified formats. Everything passed into `freud` must be a `np.ndarray` of type required by `freud` (see [documentation](https://freud.readthedocs.io/) for each method to know what it expects). In this example, NumPy binary files containing data are used.

# Create the RDF Object

Each module in `freud` has methods that perform several types of computations. In order to perform any computations, you need to create an instance of the object. Please refer to the documentation for what arguments need to be supplied for the compute module you wish to use.

Let's create the RDF object. For your convenience, we use the help function to show what needs to be supplied:

In [3]:
help(density.RDF)

Help on class RDF in module freud._freud:

class RDF(builtins.object)
 |  Computes RDF for supplied data.
 |  
 |  The RDF (:math:`g \left( r \right)`) is computed and averaged for a given
 |  set of reference points in a sea of data points. Providing the same points
 |  calculates them against themselves. Computing the RDF results in an RDF
 |  array listing the value of the RDF at each given :math:`r`, listed in the r
 |  array.
 |  
 |  The values of :math:`r` to compute the RDF are set by the values of rmin,
 |  rmax, dr in the constructor. rmax sets the maximum distance at which to
 |  calculate the :math:`g \left( r \right)`, rmin sets the minimum distance
 |  at which to calculate the :math:`g \left( r \right)`, and dr determines
 |  the step size for each bin.
 |  
 |  .. moduleauthor:: Eric Harper <harperic@umich.edu>
 |  
 |  .. note::
 |      2D: :py:class:`freud.density.RDF` properly handles 2D boxes.
 |      The points must be passed in as :code:`[x, y, 0]`.
 |      Failin

It looks like the constructor takes in `rmax` and `dr` as parameters. These are the maximum distance at which to calculate the RDF, and the size of the RDF histogram bins. We create an RDF object with `r_max = 10` and `dr = 0.1`:

In [4]:
rdf = density.RDF(rmax=10.0, dr=0.1)

# Compute the RDF

It's time to actually compute the RDF. See code comments.

In [5]:
data_path = "ex_data/phi065"
box_data = np.load("{}/box_data.npy".format(data_path))
pos_data = np.load("{}/pos_data.npy".format(data_path))
n_frames = pos_data.shape[0]

# compute the rdf for the last frame
# read box, position data
l_box = box_data[-1]
l_pos = pos_data[-1]
# create the freud box object
fbox = box.Box(Lx=l_box["Lx"], Ly=l_box["Ly"], is2D=True)
# compute
rdf.compute(fbox, l_pos, l_pos)

<freud._freud.RDF at 0x7fc3602024a8>

# I computed the RDF, now what?

Next steps: First, you need to get the data out of `freud`. Second, you need to visualize the data.

## Getting your data out of freud

The RDF data has been computed, so let's get the arrays.

In [6]:
# get the center of the histogram bins
r = rdf.getR()
# get the value of the histogram bins
y = rdf.getRDF()

## Visualizing your data

Remember, `freud` makes no assumptions about your data or how you want to visualize it. We will use `bokeh`.

In [7]:
# set up bokeh
from bokeh.io import output_notebook
output_notebook()
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import gridplot

def default_bokeh(p):
    p.title.text_font_size = "18pt"
    p.title.align = "center"

    p.xaxis.axis_label_text_font_size = "14pt"
    p.yaxis.axis_label_text_font_size = "14pt"

    p.xaxis.major_tick_in = 10
    p.xaxis.major_tick_out = 0
    p.xaxis.minor_tick_in = 5
    p.xaxis.minor_tick_out = 0

    p.yaxis.major_tick_in = 10
    p.yaxis.major_tick_out = 0
    p.yaxis.minor_tick_in = 5
    p.yaxis.minor_tick_out = 0

    p.xaxis.major_label_text_font_size = "12pt"
    p.yaxis.major_label_text_font_size = "12pt"

In [8]:
# create bokeh plot
p = figure(title="RDF", x_axis_label='r', y_axis_label='g(r)')
p.line(r, y, legend="g(r)", line_width=2)
default_bokeh(p)
show(p)

# Using *accumulate* for averaged data

Frequently, we want to average our data over many simulation frames. The `freud` library does this with `accumulate` methods: 

In [None]:
data_path = "ex_data/phi065"
box_data = np.load("{}/box_data.npy".format(data_path))
pos_data = np.load("{}/pos_data.npy".format(data_path))
n_frames = pos_data.shape[0]

# compute the rdf for for all frames except the first (your syntax will vary based on your reader)
for i in range(1, n_frames):
    # read box, position data
    l_box = box_data[i]
    l_pos = pos_data[i]
    # create the freud box object
    fbox = box.Box(Lx=l_box["Lx"], Ly=l_box["Ly"], is2D=True)
    # accumulate
    rdf.accumulate(fbox, l_pos, l_pos)

# get the center of the histogram bins
r = rdf.getR()
# get the value of the histogram bins
y = rdf.getRDF()
# create bokeh figure
p = figure(title="RDF", x_axis_label='r', y_axis_label='g(r)')
p.circle(r, y, legend="g(r)", line_width=2)
p.line(r, y, legend="g(r)", line_width=2)
default_bokeh(p)
show(p)

# What's the difference?

Let's plot together and find out:

In [None]:
# reset the rdf; required if not using compute
rdf.resetRDF()
# compute the rdf for for all frames except the first (your syntax will vary based on your reader)
for i in range(1, n_frames):
    # read box, position data
    l_box = box_data[i]
    l_pos = pos_data[i]
    # create the freud box object
    fbox = box.Box(Lx=l_box["Lx"], Ly=l_box["Ly"], is2D=True)
    # accumulate
    rdf.accumulate(fbox, l_pos, l_pos)

# get the center of the histogram bins
r_avg = rdf.getR()
# get the value of the histogram bins
y_avg = rdf.getRDF()

# compute the rdf for the last frame
# read box, position data
l_box = box_data[-1]
l_pos = pos_data[-1]
# create the freud box object
fbox = box.Box(Lx=l_box["Lx"], Ly=l_box["Ly"], is2D=True)
# compute
rdf.compute(fbox, l_pos, l_pos)
# get the center of the histogram bins
r = rdf.getR()
# get the value of the histogram bins
y = rdf.getRDF()
# create bokeh plot
p0 = figure(title="RDF", x_axis_label='r', y_axis_label='g(r)')
p0.circle(r, y, legend="Compute")
p0.line(r, y, legend="Compute", line_width=2)
p0.square(r_avg, y_avg, legend="Accumulate", fill_color=None, line_color="red")
p0.line(r_avg, y_avg, legend="Accumulate", line_dash=[4,4], line_width=2, line_color="red")

default_bokeh(p0)

p1 = figure(title="RDF", x_axis_label='r', y_axis_label='g(r)')
p1.line(r, y, legend="Compute", line_width=2)
p1.line(r_avg, y_avg, legend="Accumulate", line_width=2, line_color="red")

default_bokeh(p1)

grid = gridplot([p0, p1], ncols=2, plot_width=400, plot_height=400)

show(grid)

# Wait a second, there's no difference?!

Right you are, but there should be...

In [None]:
# reset the rdf; required if not using compute
rdf.resetRDF()
# compute the rdf for for all frames except the first (your syntax will vary based on your reader)
for i in range(1, n_frames):
    # read box, position data
    l_box = box_data[i]
    l_pos = pos_data[i]
    # create the freud box object
    fbox = box.Box(Lx=l_box["Lx"], Ly=l_box["Ly"], is2D=True)
    # accumulate
    rdf.accumulate(fbox, l_pos, l_pos)

# USE NP.COPY!
# get the center of the histogram bins
r_avg = np.copy(rdf.getR())
# get the value of the histogram bins
y_avg = np.copy(rdf.getRDF())

# compute the rdf for the last frame
# read box, position data
l_box = box_data[-1]
l_pos = pos_data[-1]
# create the freud box object
fbox = box.Box(Lx=l_box["Lx"], Ly=l_box["Ly"], is2D=True)
# compute
rdf.compute(fbox, l_pos, l_pos)
# get the center of the histogram bins
r = rdf.getR()
# get the value of the histogram bins
y = rdf.getRDF()
# create bokeh plot
p0 = figure(title="RDF", x_axis_label='r', y_axis_label='g(r)')
p0.circle(r, y, legend="Compute")
p0.line(r, y, legend="Compute", line_width=2)
p0.square(r_avg, y_avg, legend="Accumulate", fill_color=None, line_color="red")
p0.line(r_avg, y_avg, legend="Accumulate", line_dash=[4,4], line_width=2, line_color="red")

default_bokeh(p0)

p1 = figure(title="RDF", x_axis_label='r', y_axis_label='g(r)')
p1.line(r, y, legend="Compute", line_width=2)
p1.line(r_avg, y_avg, legend="Accumulate", line_width=2, line_color="red")

default_bokeh(p1)

grid = gridplot([p0, p1], ncols=2, plot_width=400, plot_height=400)

show(grid)

# Pointers vs. Copies

By default `freud` returns a NumPy array *as a pointer*. This is done for speed, but can result in the above problem. Please be sure to copy your data out as needed. A more in-depth discussion can be found in the [NumPy arrays from pointers example notebook](CopyVsPointers.ipynb).