Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: Analyzing Particle Systems for Machine Learning and Data Visualization with freud #471

Merged
merged 34 commits into from Jul 3, 2019
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ef81be9
Added freud paper.
bdice May 17, 2019
e22e89a
Update 36_bradley_dice.rst
vyasr May 22, 2019
40ae61b
Remove newlines.
bdice May 22, 2019
c431562
Merge pull request #1 from bdice/vyas_intro_rewrite
bdice May 22, 2019
98fb5aa
Add citation for UMAP.
bdice May 24, 2019
d1b450d
Added link to GSD, removed TODO, fixed word.
bdice May 30, 2019
efc4446
Add example reference to ML techniques in Harper 2019.
bdice May 30, 2019
4a2a5d7
Minor text edits.
bdice Jun 11, 2019
2d056ff
Initial revision of paper intro
vyasr Jun 13, 2019
af919a0
Merge pipeline concept into a more cohesive intro
vyasr Jun 14, 2019
dc07b14
Add citations.
bdice Jun 14, 2019
bc669a6
Add citation.
bdice Jun 14, 2019
1c5bdcd
Reword sentence.
bdice Jun 14, 2019
11e5c5e
Add fresnel example and figure, update visualization section.
bdice Jun 17, 2019
4fb7088
Remove URLs, prefer DOIs.
bdice Jun 17, 2019
21465a8
Revisions of ML section.
bdice Jun 17, 2019
e42dd1c
Update conclusions, abstract, benchmarks
vyasr Jun 19, 2019
15011f4
Add brief discussion of data input format
vyasr Jun 19, 2019
455e2b5
Clarify fcc data.
bdice Jun 19, 2019
a5e24c0
Merge pull request #4 from bdice/reviewer_comments
bdice Jun 19, 2019
f2f71b9
Additional code comments.
bdice Jun 19, 2019
902f95c
Merge pull request #5 from bdice/reviewer_comments
bdice Jun 19, 2019
a0d2968
Use constant density benchmark, clarify input data.
bdice Jun 21, 2019
a73c579
Update citations.
bdice Jun 21, 2019
5f92849
Merge pull request #6 from bdice/reviewer_comments2
bdice Jun 21, 2019
1b8cec4
Minor wording changes.
bdice Jun 22, 2019
56a5535
Merge branch 'bradley_dice' of https://github.com/bdice/scipy_proceed…
bdice Jun 22, 2019
2a65144
Add data citation.
bdice Jun 25, 2019
45963b6
Update acknowledgments.
bdice Jun 25, 2019
9b6745e
Fixed Harper et al. and math format typo.
bdice Jun 26, 2019
6d35b30
Describe data inputs from binary and text-based files.
bdice Jun 26, 2019
9b06c4b
Change introduction of freud's central use case.
bdice Jun 26, 2019
5809fc0
Rename packages, references.
bdice Jun 28, 2019
fb2edca
Clarify reading from simulation engine output files.
bdice Jun 28, 2019
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.

Always

Just for now

Initial revision of paper intro

  • Loading branch information...
vyasr committed Jun 13, 2019
commit 2d056ff2260d08912b345e28af1c448cd3e27e33
@@ -56,25 +56,30 @@ Introduction
These features contrast the assumptions of most analysis tools designed for biomolecular simulations and materials science.
:label:`fig:scales`

With the availability of "off-the-shelf" molecular dynamics engines capable of running parameterized simulations, it is now possible to simulate complex systems ranging from large biomolecules and coarse-grained models to reconfigurable materials and colloidal self-assembly.
The availability of "off-the-shelf" molecular dynamics engines has made simulating complex systems possible across many scientific fields.
Simulations of systems ranging from large biomolecules to colloids are now common, allowing researchers to ask new questions about reconfigurable materials and develop coarse-graining approaches to access increasing timescales.
.. With the availability of "off-the-shelf" molecular dynamics engines capable of running parameterized simulations, it is now possible to simulate complex systems ranging from large biomolecules and coarse-grained models to reconfigurable materials and colloidal self-assembly.
Various tools have arisen to facilitate the analysis of these simulations, many of which are immediately interoperable with the most popular simulation tools.
The ``freud`` library differentiates itself from other analysis packages through its focus on colloidal and nano-scale systems.
Due to their diversity and adaptability, colloidal materials are a powerful model system for exploring soft matter physics as well as a viable platform for harnessing photonic :cite:`Cersonsky2018a`, plasmonic :cite:`Tan2011BuildingDNA`, and other useful structurally-derived properties.
The ``freud`` library is one such analysis package that differentiates itself from others through its focus on colloidal and nano-scale systems.

Due to their diversity and adaptability, colloidal materials are a powerful model system for exploring soft matter physics.
Such materials are also a viable platform for harnessing photonic :cite:`Cersonsky2018a`, plasmonic :cite:`Tan2011BuildingDNA`, and other useful structurally-derived properties.
In colloidal systems, features like particle anisotropy play an important role in creating complex crystal structures, some of which have no atomic analogues :cite:`Damasceno2012`.
Design spaces encompassing wide ranges of particle morphology :cite:`Damasceno2012` and interparticle interactions :cite:`Adorf2018` have been studied, yielding phase diagrams filled with complex behavior.
The ``freud`` library is targeted towards studying such systems, providing a unique feature set that is tailored to capturing the important properties that characterize colloidal systems.
For example, the multi-dimensional Potential of Mean Force and Torque allows users to understand the effects of particle anisotropy on entropic self-assembly :cite:`VanAnders2014c,VanAnders2014d,Karas2016,Harper2015,Anderson2017`.
Additionally, ``freud`` has tools for identifying and clustering particles by their local crystal environments :cite:`Teich2019`.

The ``freud`` Python package targets these systems by avoiding trajectory management and the analysis of chemically bonded structures, the province of most other analysis platforms **ADD CITATIONS**, and instead providing a unique feature set that is tailored to capturing the important properties that characterize colloidal systems.
In particular, ``freud`` excels at performing analyses based on characterizing local particle environments, which makes it a powerful tool for tasks such as calculating order parameters to track crystallization or finding prenucleation clusters.
Among the unique methods present in ``freud`` are the potential of mean force and torque, which allows users to understand the effects of particle anisotropy on entropic self-assembly :cite:`VanAnders2014c,VanAnders2014d,Karas2016,Harper2015,Anderson2017`, and various tools for identifying and clustering particles by their local crystal environments :cite:`Teich2019`.
All such tasks are accelerated by ``freud``'s extremely fast neighbor finding routines and are automatically parallelized, making it an ideal tool for researchers performing peta- or exascale simulations of particle systems.
The ``freud`` library's scalability is exemplified by its use in computing correlation functions on systems of over a million particles, calculations that were used to elucidate the elusive hexatic phase transition in two-dimensional systems of hard polygons :cite:`Anderson2017`.
More details on the use of ``freud`` can be found in **CITE ARXIV PAPER**.

In this paper, we will focus in particular on the usage of ``freud`` for data visualization and machine learning.
While direct visualization of simulation trajectories can provide insights into the behavior of a system, integrating higher-order analyses is often necessary to provide interpretable visualizations that allow researchers to identify meaningful features like defects and ordered domains of self-assembled structures.
Such analyses can also reduce the 6N-dimensional space of particle positions and orientations into a tractable set of features that can be fed into machine learning algorithms.
While most existing existing analysis libraries like MDAnalysis **CITE** are tightly coupled to the files typically output by simulation engines and the system representations embedded in these files, ``freud`` decouples file parsing and trajectory representation from analysis tasks.
This UNIX-like philosophy allows ``freud`` to integrate naturally into visualization or machine learning pipelines using popular tools like TensorFlow, ``scikit-learn``, ``scipy``, or ``matplotlib``, and it enables a wide range of forward-thinking applications for ``freud``, from Jupyter notebook integration to versatile, complex 3D renderings.

The outputs of molecular simulations are usually stored as a file of particle positions, with some metadata like particle types, periodic box dimensions, and bond topologies.
However, these outputs are typically not immediately useful.
Physical invariants of a system such as translational or rotational invariance are difficult to learn from raw arrays of particle positions, making machine learning libraries hard to apply for tasks such as classification or regression.
Data visualizations, on the other hand, rely on position arrays for drawing particles but frequently must be coupled with analysis tools in order to provide interpretable views of the system that allow researchers to identify regions, e.g. defects and well-ordered domains, of self-assembled structures.
Existing analysis libraries like MDAnalysis rely heavily on file-based inputs, making it challenging to couple their analysis methods into an existing workflow using popular tools like TensorFlow, ``scikit-learn``, ``scipy``, or ``matplotlib``.
By contrast, ``freud``'s use of NumPy arrays for input and output allows for seamless integration with machine learning and data visualization tasks.
This UNIX-like philosophy enables a wide range of forward-thinking applications for ``freud``, from Jupyter notebook integration to versatile, complex 3D renderings.

Analysis Pipelines
------------------
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.