Merge pull request #348 from glotzerlab/docs/add_tutorial

Docs/add tutorial
glotzerlab · Oct 2, 2019 · 0e0e3b8 · 0e0e3b8
2 parents cb6cc69 + a89e839
commit 0e0e3b8
Show file tree

Hide file tree

Showing 17 changed files with 5,611 additions and 4 deletions.
diff --git a/ChangeLog.md b/ChangeLog.md
@@ -19,6 +19,7 @@ and this project adheres to
 * C++ BondHistogramCompute class encapsulates logic of histogram-based methods.
 * NeighborLists and query arguments are now accepted on equal footing by compute methods that involve neighbor finding.
 * 2D PMFTs accept quaternions as well as angles for their orientations.
+* Extensive new documentation including tutorial for new users and reference sections on crucial topics.
 * Added NeighborQuery support to ClusterProperties, GaussianDensity, Voronoi.
 
 ### Changed

diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -297,10 +297,12 @@
 # If true, do not generate a @detailmenu in the "Top" node's menu.
 # texinfo_no_detailmenu = False
 
-intersphinx_mapping = {'python': ('https://docs.python.org/3', None),
-                       'numpy': ('https://docs.scipy.org/doc/numpy', None),
-                       'matplotlib': ('https://matplotlib.org', None),
-                       }
+intersphinx_mapping = {
+    'python': ('https://docs.python.org/3', None),
+    'numpy': ('https://docs.scipy.org/doc/numpy', None),
+    'matplotlib': ('https://matplotlib.org', None),
+    'hoomd': ('https://hoomd-blue.readthedocs.io/en/stable/', None),
+}
 
 autodoc_mock_import = ["numpy"]
 

diff --git a/doc/source/credits.rst b/doc/source/credits.rst
@@ -92,6 +92,7 @@ Vyas Ramasubramani - **Lead developer**
 * Enabled usage of quaternions in place of angles for orientations in 2D PMFT calculations.
 * Wrote new freud 2.0 compute APIs based on neighbor\_query objects and neighbors as either dictionaries or NeighborLists.
 * Rewrote MatchEnv code to fit freud 2.0 API, splitting it into 3 separate calculations and rewriting internals using NeighborQuery objects.
+* Wrote tutorial and reference sections of documentation.
 * Unified util and common packages.
 
 Bradley Dice - **Lead developer**

diff --git a/doc/source/datainputs.rst b/doc/source/datainputs.rst
@@ -0,0 +1,67 @@
+.. _datainputs:
+
+=====================================
+Reading Simulation Data for **freud**
+=====================================
+
+The **freud** package is designed for maximum flexibility by making minimal assumptions about its data.
+However, users accustomed to the more restrictive patterns of most other tools may find this flexibility confusing.
+In particular, knowing how to provide data from specific simulation sources can be a significant source of confusion.
+This page is intended to describe how various types of data may be converted into a form suitable for **freud**
+
+To simplify the examples below, we will assume in all cases that the user wishes to compute a :class:`radial distribution function <freud.density.RDF>` and that the following code has already been run:
+
+.. code-block:: python
+
+    import freud
+    rdf = freud.density.RDF(bins=50, rmax=5)
+
+
+GSD Trajectories
+================
+
+Using the GSD Python API, GSD files can be very easily integrated with **freud** as shown in :ref:`gettingstarted`
+
+.. code-block:: python
+
+    import gsd.hoomd
+    traj = gsd.hoomd.open('trajectory.gsd', 'rb')
+
+    for frame in traj:
+        rdf.accumulate((frame.configuration.box, frame.particles.position))
+
+
+XYZ Files
+=========
+
+XYZ files are among the simplest data outputs.
+As a result, while they are extremely easy to parse, they are also typically lacking in information.
+In particular, they usually contain no information about the system box, so this must already be known by the user.
+Assuming knowledge of the box used in the simulation, a LAMMPS XYZ file could be used as follows:
+
+.. code-block:: python
+
+	N = int(np.genfromtxt('trajectory.xyz', max_rows=1))
+	traj = np.genfromtxt('trajectory.xyz', skip_header=2,
+		invalid_raise=False)[:, 1:4].reshape(-1, N, 3
+
+    for frame in traj[frame_start:]:
+        rdf.accumulate((frame.configuration.box, frame.particles.position))
+
+Note that various readers do exist for XYZ files, but due to their simplicity we simply choose to read them in manually in this example.
+The first line is the number of particles, so we simply read this then use it to determine how to reshape the contents of the rest of the file into a NumPy array.
+
+DCD Files
+=========
+
+DCD files are among the most familiar simulation outputs due to their longevity.
+As a result, numerous high-quality DCD readers also already exist.
+Here, we provide an example using `MDAnalysis <https://www.mdanalysis.org/>`_ to read the data, but we could just as easily make use of another reader such as `MDTraj <http://mdtraj.org/1.6.2/api/generated/mdtraj.load_dcd.html#mdtraj.load_dcd>`_ or `pytraj <https://amber-md.github.io/pytraj/latest/read_and_write.html>`_.
+
+.. code-block:: python
+
+    reader = MDAnalysis.coordinates.DCD.DCDReader('trajectory.dcd')
+    for frame in reader:
+        rdf.accumulate((
+            freud.box.Box.from_matrix(frame.triclinic_dimensions),
+            frame.positions))
diff --git a/doc/source/examples.rst b/doc/source/examples.rst
@@ -1,3 +1,5 @@
+.. _examples:
+
 ========
 Examples
 ========

diff --git a/doc/source/gettingstarted.rst b/doc/source/gettingstarted.rst
@@ -0,0 +1,59 @@
+.. _gettingstarted:
+
+================
+Getting Started
+================
+
+Once you have `installed freud <installation.rst>`_, you can start using **freud** with any simulation data that you have on hand.
+As an example, we'll assume that you have run a simulation using the `HOOMD-blue <http://glotzerlab.engin.umich.edu/hoomd-blue/>`_ and used the :class:`hoomd.dump.gsd` command to output the trajectory into a file ``trajectory.gsd``.
+The `GSD file format <https://gsd.readthedocs.io/en/stable/>`_ provides its own convenient Python file reader that offers access to data in the form of NumPy arrays, making it immediately suitable for calculation with **freud**.
+
+We start by reading the data into a NumPy array:
+
+.. code-block:: python
+
+    import gsd.hoomd
+    traj = gsd.hoomd.open('trajectory.gsd', 'rb')
+
+
+We can now immediately calculate important quantities.
+Here, we will compute the radial distribution function :math:`g(r)` using the :class:`freud.density.RDF` compute class.
+Since the radial distribution function is in practice computed as a histogram, we must specify the histogram bin widths and the largest interparticle distance to include in our calculation.
+To do so, we simply instantiate the class with the appropriate parameters and then perform a computation on the given data:
+
+.. code-block:: python
+
+    import freud
+    rdf = freud.density.RDF(bins=50, rmax=5)
+    rdf.compute((traj[-1].configuration.box, traj[-1].particles.position))
+
+We can now access the data through properties of the ``rdf`` object; for example, we might plot the data using `Matplotlib <https://matplotlib.org/>`:
+
+.. code-block:: python
+
+    import matplotlib as plt
+    fig, ax = plt.subplots()
+    ax.plot(rdf.R, rdf.RDF)
+
+You will note that in the above example, we computed :math:`g(r)` only using the final frame of the simulation trajectory.
+However, in many cases, radial distributions and other similar quantities may be noisy in simulations due to the natural fluctuations present.
+In general, what we are interested in are *time-averaged* quantities once a system has equilibrated.
+To perform such a calculation, we can easily modify our original calculation to take advantage of **freud**'s *accumulation* features.
+Assuming that you have some method for identifying the frames you wish to include in your sample, our original code snippet would be modified as follows:
+
+
+.. code-block:: python
+
+    import freud
+    rdf = freud.density.RDF(bins=50, rmax=5)
+    for frame in traj:
+        rdf.accumulate((frame.configuration.box, frame.particles.position))
+
+You can then access the data exactly as we previously did.
+
+And that's it!
+You now know enough to start making use of **freud**.
+If you'd like a complete walkthrough please look at the :ref:`tutorial`.
+The tutorial walks through many of the core concepts in **freud** in greater detail, starting with the basics of the simulation systems we analyze and describing the details of the neighbor finding logic in **freud**.
+To see specific features of **freud** in action, look through the :ref:`examples`.
+More detailed documentation on specific classes and functions can be found in the `API documentation <modules>`_.
diff --git a/doc/source/images/PeriodicBoundaryConditions.ai b/doc/source/images/PeriodicBoundaryConditions.ai
diff --git a/doc/source/images/PeriodicBoundaryConditions.png b/doc/source/images/PeriodicBoundaryConditions.png
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -7,9 +7,20 @@ Table of Contents
    :maxdepth: 2
    :caption: Getting Started
 
+   introduction
    installation
+   gettingstarted
+   tutorial
    examples
 
+.. toctree::
+   :maxdepth: 2
+   :caption: Reference
+
+   querying
+   optimizing
+   datainputs
+
 .. toctree::
    :maxdepth: 2
    :caption: API

diff --git a/doc/source/introduction.rst b/doc/source/introduction.rst
@@ -0,0 +1,14 @@
+============
+Introduction
+============
+
+The **freud** library is a Python package for analyzing particle simulations.
+The package is designed to directly use numerical arrays of data, making it easy to use for a wide range of use-cases.
+The most common use-case of **freud** is for computing quantities from molecular dynamics simulation trajectories, but it can be used for analyzing any type of particle simulation.
+By operating directly on numerical array data, **freud** allows users to parse custom simulation outputs into a suitable structure for input, rather than relying specific file types or data structures.
+
+The core of **freud** is analysis of periodic systems, which are represented through the freud :class:`freud.box.Box` class.
+The :class:`freud.box.Box` supports arbitrary triclinic systems for maximum flexibility, and is used throughout the package to ensure consistent treatment of these systems.
+The package's many methods are encapsulated in various *compute classes*, which perform computations and populate class attributes for access.
+Of particular note are the various computations based on nearest neighbor finding in order to characterize particle environments.
+Such methods are simplified and accelerated through a centralized neighbor finding interface defined in the :class:`freud.locality.NeighborQuery` family of classes in the :mod:`freud.locality` module of freud.
diff --git a/doc/source/optimizing.rst b/doc/source/optimizing.rst
@@ -0,0 +1,67 @@
+.. _optimizing:
+
+===========================
+Using **freud** Efficiently
+===========================
+
+The **freud** library is designed to be both fast and easy-to-use.
+In many cases, the library's performance is good enough that users don't need to worry about their usage patterns.
+However, in highly performance-critical applications (such as real-time visualization or on-the-fly calculations mid-simulation), uses can benefit from knowing the best ways to make use of **freud**.
+This page provides some guidance on this topic.
+
+Reusing Locality Information
+============================
+
+Perhaps the most powerful method users have at their disposal for speeding up calculations is proper reuse of the data structures in :mod:`freud.locality`.
+As one example, consider using **freud** to calculate multiple neighbor-based quantities for the same set of data points.
+It is important to recognize that internally, each time such a calculation is performed using a ``(box, points)`` :class:`tuple`, the compute class is internally rebuilding a neighbor-finding accelerator such a :class:`freud.locality.AABBQuery` object and then using it to find neighbors:
+
+.. code-block:: python
+
+    # Behind the scenes, freud is essentially running
+    # freud.locality.AABBQuery(box, points).query(points, dict(r_max=5))
+    # and feeding the result to the RDF calculation.
+    rdf = freud.density.RDF(bins=50, rmax=5).compute((box, points))
+
+
+If users anticipate performing many such calculations on the same system of points, they can amortize the cost of rebuilding the :class:`AABBQuery <freud.locality.AABBQuery>` object by constructing it once and then passing it to multiple computes:
+
+.. code-block:: python
+
+    # Now, let's instead reuse the object for a pair of calculations:
+    nq = freud.locality.AABBQuery(box=box, points=points)
+    rdf = freud.density.RDF(bins=50, rmax=5).compute(nq)
+
+    nbins = 100
+    rmax = 4
+    orientations = np.array([[1, 0, 0, 0]*num_points)
+    pmft = freud.pmft.PMFTXYZ(rmax, rmax, rmax, nbins, nbins, nbins)
+    pmft.compute(nq, orientations=orientations)
+
+This reuse can significantly improve performance in e.g. visualization contexts where users may wish to calculate a :class:`bond order diagram <freud.environment.BondOrder>` and an :class:`RDF <freud.density.RDF>` at each frame, perhaps for integration with a visualization toolkit like `Ovito <http://ovito.org/>`_.
+
+A slightly different use-case would be the calculation of multiple quantities based on *exactly the same set of neighbors*.
+If the user in fact expects to perform computations with the exact same pairs of neighbors (for example, to compute :py:class:`freud.order.Steinhardt` for multiple :math:`l` values), then the user can further speed up the calculation by precomputing the entire :py:class:`freud.locality.NeighborList` and storing it for future use.
+
+.. code-block:: python
+
+    r_max = 3
+    nq = freud.locality.AABBQuery(box=box, points=points)
+    nlist = nq.query(points, dict(r_max=r_max))
+    q6_arrays = []
+    for l in range(3, 6):
+        ql = freud.density.Steinhardt(l=l)
+        q6_arrays.append(ql.compute((box, points), neighbors=nlist).order)
+
+
+Notably, if the user calls a compute method with ``compute(neighbor_query=(box, points))``, unlike in the examples above **freud** **will not construct** a :py:class:`freud.locality.NeighborQuery` internally because the full set of neighbors is completely specified by the :class:`NeighborList <freud.locality.NeighborList>`.
+In all these cases, **freud** does the minimal work possible to find neighbors, so judicious use of these data structures can substantially accelerate your code.
+
+Proper Data Inputs
+==================
+
+Minor speedups may also be gained from passing properly structured data to **freud**.
+The package was originally designed for analyzing particle simulation trajectories, which are typically stored in single-precision binary formats.
+As a result, the **freud** library also operates in single precision and therefore converts all inputs to single-precision.
+However, NumPy will typically work in double precision by default, so depending on how data is streamed to **freud**, the package may be performing numerous data copies in order to ensure that all its data is in single-precision.
+To avoid this problem, make sure to specify the appropriate data types (`numpy.float32 <https://docs.scipy.org/doc/numpy/user/basics.types.html>`_) when constructing your NumPy arrays.