Skip to content

Commit

Permalink
Merge pull request #348 from glotzerlab/docs/add_tutorial
Browse files Browse the repository at this point in the history
Docs/add tutorial
  • Loading branch information
vyasr committed Oct 2, 2019
2 parents cb6cc69 + a89e839 commit 0e0e3b8
Show file tree
Hide file tree
Showing 17 changed files with 5,611 additions and 4 deletions.
1 change: 1 addition & 0 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ and this project adheres to
* C++ BondHistogramCompute class encapsulates logic of histogram-based methods.
* NeighborLists and query arguments are now accepted on equal footing by compute methods that involve neighbor finding.
* 2D PMFTs accept quaternions as well as angles for their orientations.
* Extensive new documentation including tutorial for new users and reference sections on crucial topics.
* Added NeighborQuery support to ClusterProperties, GaussianDensity, Voronoi.

### Changed
Expand Down
10 changes: 6 additions & 4 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,10 +297,12 @@
# If true, do not generate a @detailmenu in the "Top" node's menu.
# texinfo_no_detailmenu = False

intersphinx_mapping = {'python': ('https://docs.python.org/3', None),
'numpy': ('https://docs.scipy.org/doc/numpy', None),
'matplotlib': ('https://matplotlib.org', None),
}
intersphinx_mapping = {
'python': ('https://docs.python.org/3', None),
'numpy': ('https://docs.scipy.org/doc/numpy', None),
'matplotlib': ('https://matplotlib.org', None),
'hoomd': ('https://hoomd-blue.readthedocs.io/en/stable/', None),
}

autodoc_mock_import = ["numpy"]

Expand Down
1 change: 1 addition & 0 deletions doc/source/credits.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Vyas Ramasubramani - **Lead developer**
* Enabled usage of quaternions in place of angles for orientations in 2D PMFT calculations.
* Wrote new freud 2.0 compute APIs based on neighbor\_query objects and neighbors as either dictionaries or NeighborLists.
* Rewrote MatchEnv code to fit freud 2.0 API, splitting it into 3 separate calculations and rewriting internals using NeighborQuery objects.
* Wrote tutorial and reference sections of documentation.
* Unified util and common packages.

Bradley Dice - **Lead developer**
Expand Down
67 changes: 67 additions & 0 deletions doc/source/datainputs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
.. _datainputs:

=====================================
Reading Simulation Data for **freud**
=====================================

The **freud** package is designed for maximum flexibility by making minimal assumptions about its data.
However, users accustomed to the more restrictive patterns of most other tools may find this flexibility confusing.
In particular, knowing how to provide data from specific simulation sources can be a significant source of confusion.
This page is intended to describe how various types of data may be converted into a form suitable for **freud**

To simplify the examples below, we will assume in all cases that the user wishes to compute a :class:`radial distribution function <freud.density.RDF>` and that the following code has already been run:

.. code-block:: python
import freud
rdf = freud.density.RDF(bins=50, rmax=5)
GSD Trajectories
================

Using the GSD Python API, GSD files can be very easily integrated with **freud** as shown in :ref:`gettingstarted`

.. code-block:: python
import gsd.hoomd
traj = gsd.hoomd.open('trajectory.gsd', 'rb')
for frame in traj:
rdf.accumulate((frame.configuration.box, frame.particles.position))
XYZ Files
=========

XYZ files are among the simplest data outputs.
As a result, while they are extremely easy to parse, they are also typically lacking in information.
In particular, they usually contain no information about the system box, so this must already be known by the user.
Assuming knowledge of the box used in the simulation, a LAMMPS XYZ file could be used as follows:

.. code-block:: python
N = int(np.genfromtxt('trajectory.xyz', max_rows=1))
traj = np.genfromtxt('trajectory.xyz', skip_header=2,
invalid_raise=False)[:, 1:4].reshape(-1, N, 3
for frame in traj[frame_start:]:
rdf.accumulate((frame.configuration.box, frame.particles.position))
Note that various readers do exist for XYZ files, but due to their simplicity we simply choose to read them in manually in this example.
The first line is the number of particles, so we simply read this then use it to determine how to reshape the contents of the rest of the file into a NumPy array.
DCD Files
=========
DCD files are among the most familiar simulation outputs due to their longevity.
As a result, numerous high-quality DCD readers also already exist.
Here, we provide an example using `MDAnalysis <https://www.mdanalysis.org/>`_ to read the data, but we could just as easily make use of another reader such as `MDTraj <http://mdtraj.org/1.6.2/api/generated/mdtraj.load_dcd.html#mdtraj.load_dcd>`_ or `pytraj <https://amber-md.github.io/pytraj/latest/read_and_write.html>`_.
.. code-block:: python
reader = MDAnalysis.coordinates.DCD.DCDReader('trajectory.dcd')
for frame in reader:
rdf.accumulate((
freud.box.Box.from_matrix(frame.triclinic_dimensions),
frame.positions))
2 changes: 2 additions & 0 deletions doc/source/examples.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _examples:

========
Examples
========
Expand Down
59 changes: 59 additions & 0 deletions doc/source/gettingstarted.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
.. _gettingstarted:

================
Getting Started
================

Once you have `installed freud <installation.rst>`_, you can start using **freud** with any simulation data that you have on hand.
As an example, we'll assume that you have run a simulation using the `HOOMD-blue <http://glotzerlab.engin.umich.edu/hoomd-blue/>`_ and used the :class:`hoomd.dump.gsd` command to output the trajectory into a file ``trajectory.gsd``.
The `GSD file format <https://gsd.readthedocs.io/en/stable/>`_ provides its own convenient Python file reader that offers access to data in the form of NumPy arrays, making it immediately suitable for calculation with **freud**.

We start by reading the data into a NumPy array:

.. code-block:: python
import gsd.hoomd
traj = gsd.hoomd.open('trajectory.gsd', 'rb')
We can now immediately calculate important quantities.
Here, we will compute the radial distribution function :math:`g(r)` using the :class:`freud.density.RDF` compute class.
Since the radial distribution function is in practice computed as a histogram, we must specify the histogram bin widths and the largest interparticle distance to include in our calculation.
To do so, we simply instantiate the class with the appropriate parameters and then perform a computation on the given data:

.. code-block:: python
import freud
rdf = freud.density.RDF(bins=50, rmax=5)
rdf.compute((traj[-1].configuration.box, traj[-1].particles.position))
We can now access the data through properties of the ``rdf`` object; for example, we might plot the data using `Matplotlib <https://matplotlib.org/>`:

.. code-block:: python
import matplotlib as plt
fig, ax = plt.subplots()
ax.plot(rdf.R, rdf.RDF)
You will note that in the above example, we computed :math:`g(r)` only using the final frame of the simulation trajectory.
However, in many cases, radial distributions and other similar quantities may be noisy in simulations due to the natural fluctuations present.
In general, what we are interested in are *time-averaged* quantities once a system has equilibrated.
To perform such a calculation, we can easily modify our original calculation to take advantage of **freud**'s *accumulation* features.
Assuming that you have some method for identifying the frames you wish to include in your sample, our original code snippet would be modified as follows:


.. code-block:: python
import freud
rdf = freud.density.RDF(bins=50, rmax=5)
for frame in traj:
rdf.accumulate((frame.configuration.box, frame.particles.position))
You can then access the data exactly as we previously did.

And that's it!
You now know enough to start making use of **freud**.
If you'd like a complete walkthrough please look at the :ref:`tutorial`.
The tutorial walks through many of the core concepts in **freud** in greater detail, starting with the basics of the simulation systems we analyze and describing the details of the neighbor finding logic in **freud**.
To see specific features of **freud** in action, look through the :ref:`examples`.
More detailed documentation on specific classes and functions can be found in the `API documentation <modules>`_.
4,792 changes: 4,792 additions & 0 deletions doc/source/images/PeriodicBoundaryConditions.ai

Large diffs are not rendered by default.

Binary file added doc/source/images/PeriodicBoundaryConditions.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,20 @@ Table of Contents
:maxdepth: 2
:caption: Getting Started

introduction
installation
gettingstarted
tutorial
examples

.. toctree::
:maxdepth: 2
:caption: Reference

querying
optimizing
datainputs

.. toctree::
:maxdepth: 2
:caption: API
Expand Down
14 changes: 14 additions & 0 deletions doc/source/introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
============
Introduction
============

The **freud** library is a Python package for analyzing particle simulations.
The package is designed to directly use numerical arrays of data, making it easy to use for a wide range of use-cases.
The most common use-case of **freud** is for computing quantities from molecular dynamics simulation trajectories, but it can be used for analyzing any type of particle simulation.
By operating directly on numerical array data, **freud** allows users to parse custom simulation outputs into a suitable structure for input, rather than relying specific file types or data structures.

The core of **freud** is analysis of periodic systems, which are represented through the freud :class:`freud.box.Box` class.
The :class:`freud.box.Box` supports arbitrary triclinic systems for maximum flexibility, and is used throughout the package to ensure consistent treatment of these systems.
The package's many methods are encapsulated in various *compute classes*, which perform computations and populate class attributes for access.
Of particular note are the various computations based on nearest neighbor finding in order to characterize particle environments.
Such methods are simplified and accelerated through a centralized neighbor finding interface defined in the :class:`freud.locality.NeighborQuery` family of classes in the :mod:`freud.locality` module of freud.
67 changes: 67 additions & 0 deletions doc/source/optimizing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
.. _optimizing:

===========================
Using **freud** Efficiently
===========================

The **freud** library is designed to be both fast and easy-to-use.
In many cases, the library's performance is good enough that users don't need to worry about their usage patterns.
However, in highly performance-critical applications (such as real-time visualization or on-the-fly calculations mid-simulation), uses can benefit from knowing the best ways to make use of **freud**.
This page provides some guidance on this topic.

Reusing Locality Information
============================

Perhaps the most powerful method users have at their disposal for speeding up calculations is proper reuse of the data structures in :mod:`freud.locality`.
As one example, consider using **freud** to calculate multiple neighbor-based quantities for the same set of data points.
It is important to recognize that internally, each time such a calculation is performed using a ``(box, points)`` :class:`tuple`, the compute class is internally rebuilding a neighbor-finding accelerator such a :class:`freud.locality.AABBQuery` object and then using it to find neighbors:

.. code-block:: python
# Behind the scenes, freud is essentially running
# freud.locality.AABBQuery(box, points).query(points, dict(r_max=5))
# and feeding the result to the RDF calculation.
rdf = freud.density.RDF(bins=50, rmax=5).compute((box, points))
If users anticipate performing many such calculations on the same system of points, they can amortize the cost of rebuilding the :class:`AABBQuery <freud.locality.AABBQuery>` object by constructing it once and then passing it to multiple computes:

.. code-block:: python
# Now, let's instead reuse the object for a pair of calculations:
nq = freud.locality.AABBQuery(box=box, points=points)
rdf = freud.density.RDF(bins=50, rmax=5).compute(nq)
nbins = 100
rmax = 4
orientations = np.array([[1, 0, 0, 0]*num_points)
pmft = freud.pmft.PMFTXYZ(rmax, rmax, rmax, nbins, nbins, nbins)
pmft.compute(nq, orientations=orientations)
This reuse can significantly improve performance in e.g. visualization contexts where users may wish to calculate a :class:`bond order diagram <freud.environment.BondOrder>` and an :class:`RDF <freud.density.RDF>` at each frame, perhaps for integration with a visualization toolkit like `Ovito <http://ovito.org/>`_.
A slightly different use-case would be the calculation of multiple quantities based on *exactly the same set of neighbors*.
If the user in fact expects to perform computations with the exact same pairs of neighbors (for example, to compute :py:class:`freud.order.Steinhardt` for multiple :math:`l` values), then the user can further speed up the calculation by precomputing the entire :py:class:`freud.locality.NeighborList` and storing it for future use.
.. code-block:: python
r_max = 3
nq = freud.locality.AABBQuery(box=box, points=points)
nlist = nq.query(points, dict(r_max=r_max))
q6_arrays = []
for l in range(3, 6):
ql = freud.density.Steinhardt(l=l)
q6_arrays.append(ql.compute((box, points), neighbors=nlist).order)
Notably, if the user calls a compute method with ``compute(neighbor_query=(box, points))``, unlike in the examples above **freud** **will not construct** a :py:class:`freud.locality.NeighborQuery` internally because the full set of neighbors is completely specified by the :class:`NeighborList <freud.locality.NeighborList>`.
In all these cases, **freud** does the minimal work possible to find neighbors, so judicious use of these data structures can substantially accelerate your code.
Proper Data Inputs
==================
Minor speedups may also be gained from passing properly structured data to **freud**.
The package was originally designed for analyzing particle simulation trajectories, which are typically stored in single-precision binary formats.
As a result, the **freud** library also operates in single precision and therefore converts all inputs to single-precision.
However, NumPy will typically work in double precision by default, so depending on how data is streamed to **freud**, the package may be performing numerous data copies in order to ensure that all its data is in single-precision.
To avoid this problem, make sure to specify the appropriate data types (`numpy.float32 <https://docs.scipy.org/doc/numpy/user/basics.types.html>`_) when constructing your NumPy arrays.

0 comments on commit 0e0e3b8

Please sign in to comment.