Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Analyzing Particle Systems for Machine Learning and Data Visualization with freud #471

Merged
merged 34 commits into from Jul 3, 2019
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ef81be9
Added freud paper.
bdice May 17, 2019
e22e89a
Update 36_bradley_dice.rst
vyasr May 22, 2019
40ae61b
Remove newlines.
bdice May 22, 2019
c431562
Merge pull request #1 from bdice/vyas_intro_rewrite
bdice May 22, 2019
98fb5aa
Add citation for UMAP.
bdice May 24, 2019
d1b450d
Added link to GSD, removed TODO, fixed word.
bdice May 30, 2019
efc4446
Add example reference to ML techniques in Harper 2019.
bdice May 30, 2019
4a2a5d7
Minor text edits.
bdice Jun 11, 2019
2d056ff
Initial revision of paper intro
vyasr Jun 13, 2019
af919a0
Merge pipeline concept into a more cohesive intro
vyasr Jun 14, 2019
dc07b14
Add citations.
bdice Jun 14, 2019
bc669a6
Add citation.
bdice Jun 14, 2019
1c5bdcd
Reword sentence.
bdice Jun 14, 2019
11e5c5e
Add fresnel example and figure, update visualization section.
bdice Jun 17, 2019
4fb7088
Remove URLs, prefer DOIs.
bdice Jun 17, 2019
21465a8
Revisions of ML section.
bdice Jun 17, 2019
e42dd1c
Update conclusions, abstract, benchmarks
vyasr Jun 19, 2019
15011f4
Add brief discussion of data input format
vyasr Jun 19, 2019
455e2b5
Clarify fcc data.
bdice Jun 19, 2019
a5e24c0
Merge pull request #4 from bdice/reviewer_comments
bdice Jun 19, 2019
f2f71b9
Additional code comments.
bdice Jun 19, 2019
902f95c
Merge pull request #5 from bdice/reviewer_comments
bdice Jun 19, 2019
a0d2968
Use constant density benchmark, clarify input data.
bdice Jun 21, 2019
a73c579
Update citations.
bdice Jun 21, 2019
5f92849
Merge pull request #6 from bdice/reviewer_comments2
bdice Jun 21, 2019
1b8cec4
Minor wording changes.
bdice Jun 22, 2019
56a5535
Merge branch 'bradley_dice' of https://github.com/bdice/scipy_proceed…
bdice Jun 22, 2019
2a65144
Add data citation.
bdice Jun 25, 2019
45963b6
Update acknowledgments.
bdice Jun 25, 2019
9b6745e
Fixed Harper et al. and math format typo.
bdice Jun 26, 2019
6d35b30
Describe data inputs from binary and text-based files.
bdice Jun 26, 2019
9b06c4b
Change introduction of freud's central use case.
bdice Jun 26, 2019
5809fc0
Rename packages, references.
bdice Jun 28, 2019
fb2edca
Clarify reading from simulation engine output files.
bdice Jun 28, 2019
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.

Always

Just for now

Minor text edits.

  • Loading branch information...
bdice committed Jun 11, 2019
commit 4a2a5d7990c8f67074b851b48979d690021bc97a
@@ -56,19 +56,19 @@ Introduction
These features contrast the assumptions of most analysis tools designed for biomolecular simulations and materials science.
:label:`fig:scales`

With the popularity of "off-the-shelf" molecular dynamics engines capable of running parameterized simulations, it is now simpler than ever to simulate complex systems ranging from large biomolecules and coarse-grained models to reconfigurable materials and colloidal self-assembly.
With the availability of "off-the-shelf" molecular dynamics engines capable of running parameterized simulations, it is now possible to simulate complex systems ranging from large biomolecules and coarse-grained models to reconfigurable materials and colloidal self-assembly.
Various tools have arisen to facilitate the analysis of these simulations, many of which are immediately interoperable with the most popular simulation tools.
The ``freud`` library differentiates itself from other molecular dynamics analysis packages through its focus on colloidal and nano-scale systems.
Due to their immense diversity and adaptability, colloidal materials are a powerful model system for exploring soft matter physics as well as a viable platform for harnessing photonic :cite:`Cersonsky2018a`, plasmonic :cite:`Tan2011BuildingDNA`, and other useful structurally-derived properties.
The ``freud`` library differentiates itself from other analysis packages through its focus on colloidal and nano-scale systems.
Due to their diversity and adaptability, colloidal materials are a powerful model system for exploring soft matter physics as well as a viable platform for harnessing photonic :cite:`Cersonsky2018a`, plasmonic :cite:`Tan2011BuildingDNA`, and other useful structurally-derived properties.

In colloidal systems, features like particle anisotropy play an important role in creating complex crystal structures, some of which have no atomic analogues :cite:`Damasceno2012`.
Design spaces encompassing wide ranges of particle morphology :cite:`Damasceno2012` and interparticle interactions :cite:`Adorf2018` have been studied, yielding phase diagrams filled with complex behavior.
The ``freud`` library is targeted towards studying such systems, providing a unique feature set that is tailored to capturing the important properties that characterize colloidal systems.
For example, the multi-dimensional Potential of Mean Force and Torque allows users to understand the effects of particle anisotropy on entropic self-assembly :cite:`VanAnders2014c,VanAnders2014d,Karas2016,Harper2015,Anderson2017`.
Additionally, ``freud`` has tools for identifying and clustering particles by their local crystal environments :cite:`Teich2019`.
The ``freud`` library's extraordinary scalability is exemplified by its use in computing correlation functions on systems of over a million particles, calculations that were used to elucidate the elusive hexatic phase transition in two-dimensional systems of hard polygons :cite:`Anderson2017`.
The ``freud`` library's scalability is exemplified by its use in computing correlation functions on systems of over a million particles, calculations that were used to elucidate the elusive hexatic phase transition in two-dimensional systems of hard polygons :cite:`Anderson2017`.

The outputs of molecular simulations are usually stored as a file of particle positions, with some metadata like particle types.
The outputs of molecular simulations are usually stored as a file of particle positions, with some metadata like particle types, periodic box dimensions, and bond topologies.
However, these outputs are typically not immediately useful.
Physical invariants of a system such as translational or rotational invariance are difficult to learn from raw arrays of particle positions, making machine learning libraries hard to apply for tasks such as classification or regression.
Data visualizations, on the other hand, rely on position arrays for drawing particles but frequently must be coupled with analysis tools in order to provide interpretable views of the system that allow researchers to identify regions, e.g. defects and well-ordered domains, of self-assembled structures.
@@ -112,13 +112,13 @@ The ``scipy`` package is one such example, where ``freud`` wraps ``scipy``'s beh
Enforcing periodicity with triclinic boxes where the sides are tilted (and thus not orthogonal to one another) can be tricky, necessitating ``freud``'s implementation for determining Voronoi tesselations in both 2D and 3D periodic systems.

Similarly, the mean-squared displacement module (``freud.msd``) utilizes fast Fourier transforms from ``numpy`` or ``scipy`` to accelerate its computations.
The resulting MSD data help to identify how particles' dynamics change over time, e.g. from ballistic to diffusive as systems solidify.
The resulting MSD data help to identify how particles' dynamics change over time, e.g. from ballistic to diffusive.

Machine Learning
----------------

A common challenge in molecular sciences is identifying crystal structures.
Recently, several approaches have been developed that use machine learning for detecting phases :cite:`Schoenholz2015,Spellings2018,Fulford2019,Steinhardt1983,Lechner2008`.
Recently, several approaches have been developed that use machine learning for detecting ordered phases :cite:`Schoenholz2015,Spellings2018,Fulford2019,Steinhardt1983,Lechner2008`.
The Steinhardt order parameters are often used as a structural fingerprint, and are derived from rotationally invariant combinations of spherical harmonics.
In the example below, we create face-centered cubic (fcc), body-centered cubic (bcc), and simple cubic (sc) crystals with added Gaussian noise, and use Steinhardt order parameters with a support vector machine to train a simple crystal structure identifier.
Steinhardt order parameters characterize the spherical arrangement of neighbors around a central particle, and combining values of
@@ -218,8 +218,8 @@ These descriptors have been used with TensorFlow for supervised and unsupervised

.. [#] https://github.com/glotzerlab/pythia
Another useful module for machine learning with ``freud`` is ``freud.cluster``, for pre- or post-processing data that must consider 2D or 3D periodicity.
For example, finding clusters using the right cutoff distance can identify crystalline grains, which can help with building a training set for machine learning models.
Another useful module for machine learning with ``freud`` is ``freud.cluster``, which uses a distance-based cutoff to locate clusters of particles while accounting for 2D or 3D periodicity.
Locating clusters in this way can identify crystalline grains, helpful for building a training set for machine learning models.

Visualization
-------------
@@ -249,6 +249,7 @@ plato

``plato`` is an open-source graphics package that expresses a common interface for defining two- or three-dimensional scenes which can be rendered as an interactive Jupyter widget or saved to a high-resolution image using one of several backends (``pythreejs``, ``matplotlib``, ``fresnel``, POVray [#]_, and Blender [#]_, among others).
This conversation was marked as resolved by bdice

This comment has been minimized.

Copy link
@stsievert

stsievert Jun 26, 2019

Contributor

Why should Blender and POVray be capitalized and in regular font, but the other libraries be in lowercase and in monospace font?

Library names are proper nouns, so I think they should be capitalized and in regular font.

This comment has been minimized.

Copy link
@bdice

bdice Jun 26, 2019

Author Contributor

@stsievert I've tried to operate under a rule of sticking to the name format given most prominently in the docs (I agree that I made a mistake, Matplotlib should be capitalized but I'm not sure about pythreejs, see footnote). I'm honestly a little torn about this -- we prefer to stylize freud in lowercase, monospace (with capitalization only if it begins a sentence, which is typically avoided via phrasing like "the freud library"). Because of our general pattern of naming packages after historical people, we want to emphasize that our package freud is not to be confused with the person Sigmund Freud. This also applies to some other software developed by our research group such as plato and fresnel.

I feel relatively strongly about retaining the stylized names for packages developed by our group because we have a stake in the branding and want to avoid confusion about the name wherever possible. But is it reasonable to stylize only the names of software packages developed by our group and adhere to capitalized, regular font for all other packages? How do you suggest we could reconcile this? We could switch to lowercase regular font for the aforementioned packages and capitalized regular font for all others (depending on your view of the footnote, where I don't have strong opinions).

Footnote: There are some packages which use stylized names (e.g. not capitalized like proper nouns) in most instances, such as pythreejs. Notice that the home page of their docs use lowercase (sometimes bold, but not monospace) in every instance outside of the footer, which has "PyThreejs Development Team."

This comment has been minimized.

Copy link
@stsievert

stsievert Jun 26, 2019

Contributor

we have a stake in the branding

Agreed, do as you wish for the libraries developed by the Glotzer lab.

that our package freud is not to be confused with the person Sigmund Freud.

I believe that referring to the library as "the Freud library" would avoid confusion. I think that's enough context to distinguish between the historical figure and the software library (if the other context clues aren't enough).

stylize only the names of software packages developed by our group and adhere to capitalized, regular font for all other packages? How do you suggest we could reconcile this?

I think that's fine. freud is a new package; there's no precedent for how it's referenced and users aren't used to seeing it. Pandas and Scikit-learn have been around for a while.

Some examples of other software introductions: Pandas was introduced as "pandas" in the original paper and is now referred to as "Pandas" (1, 2). NumPy and SciPy were/will be introduced as "NumPy" and "SciPy" in the original papers (NumPy book, to-be-published SciPy paper), and referred to as "NumPy" (3, 4). Scikit-learn was introduced as "Scikit-learn" in the Scikit-learn paper and it's referred to as "scikit-learn" (5) or "Scikit-learn" (6).

Below is an example of how to render particles from a HOOMD-blue snapshot, colored by the density of their local environment :cite:`Anderson2008,Glaser2015`.
The result is shown in figure :ref:`fig:platopythreejs`.

.. [#] https://www.povray.org/
.. [#] https://www.blender.org/
@@ -281,7 +282,7 @@ Below is an example of how to render particles from a HOOMD-blue snapshot, color
fresnel
=======

``fresnel`` [#]_ is a GPU-accelerated ray tracer designed for particle simulations, with customizable material types and scene lighting, as well as support for a set of common anisotropic shapes simulations.
``fresnel`` [#]_ is a GPU-accelerated ray tracer designed for particle simulations, with customizable material types and scene lighting, as well as support for a set of common anisotropic shapes.
Its feature set is especially well suited for publication-quality graphics.
Its use of ray tracing also means that an image's rendering time scales with the image size, instead of the number of particles -- a desirable feature for extremely large simulations.
An example of ``fresnel`` integration is available online.
@@ -301,7 +302,7 @@ OVITO

OVITO is a GUI application with features for particle selection, making movies, and support for many trajectory formats :cite:`Stukowski2010`.
OVITO has several built-in analysis functions (e.g. Polyhedral Template Matching), which complement the methods in ``freud``.
The Python scripting functionality built into OVITO enables the use of ``freud`` modules, demonstrated in the code below.
The Python scripting functionality built into OVITO enables the use of ``freud`` modules, demonstrated in the code below and shown in figure :ref:`fig:ovitoselection`.

.. code-block:: python
@@ -342,7 +343,7 @@ Conclusions
-----------

The ``freud`` library offers a unique set of high-performance algorithms designed to accelerate the study of nanoscale and colloidal systems.
We have demonstrated several ways in which these tools for particle analysis can be used in conjunction with other popular packages for machine learning and data visualization.
We have demonstrated several ways in which these tools for particle analysis can be used in conjunction with other packages for machine learning and data visualization.
We hope these examples are of use to the computational molecular science community and spark new ideas for analysis and scientific exploration.

Getting ``freud``
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.