Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper: PMDA – Parallel Molecular Dynamics Analysis #476

Merged
merged 116 commits into from Jul 3, 2019
Merged
Changes from 1 commit
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
9b3524f
generate folder and rst, bib files
VOD555 May 17, 2019
a3bced1
add abstract
VOD555 May 17, 2019
e166e7b
skeleton for PMDA paper
orbeckst May 20, 2019
b46f485
add brief description of pmda.parallel.ParallelAnalysisBase
VOD555 May 20, 2019
210309a
new pmda.bib
VOD555 May 20, 2019
ef27ec2
correct pmda.bib
VOD555 May 20, 2019
3cbbcce
change abstract
VOD555 May 20, 2019
6a74028
change sign
VOD555 May 20, 2019
7e653d4
reduced size of bib file
orbeckst May 20, 2019
74a1de2
updated title and citations
orbeckst May 20, 2019
df1cf48
minor edits and comments on workflow
orbeckst May 20, 2019
4f29605
equal contributions: Max and Shujie
orbeckst May 20, 2019
d4ddcda
Merge pull request #16 from VOD555/2019
orbeckst May 20, 2019
b758b1b
manually merged shujie_fan.rst into fan.rst
orbeckst May 20, 2019
7e47bf8
renamed paper to pmda.rst
orbeckst May 20, 2019
c74f224
remove time record part in method section
VOD555 May 21, 2019
36f4020
change name
VOD555 May 21, 2019
0af8f45
remove Timing
VOD555 May 21, 2019
b0816db
fix title overline
VOD555 May 21, 2019
58f5d55
add user-defined parallel task 1
VOD555 May 21, 2019
df6fa01
add self-defined analysis task 2
VOD555 May 21, 2019
f1871e4
Rearrange With pmda.parallel.ParallelAnalysisBase
VOD555 May 21, 2019
f237713
references for MD applications
orbeckst May 21, 2019
7b6710e
first two introductory paragraphs + XSEDE ack
orbeckst May 21, 2019
1e5fe9d
ref prev work
orbeckst May 21, 2019
a324a01
introduction v1
orbeckst May 21, 2019
b1b046e
Merge pull request #21 from Becksteinlab/introduction
orbeckst May 21, 2019
ec2ebab
add to intro: contains library of analysis classes
orbeckst May 21, 2019
156bd06
moved code availability to end
orbeckst May 21, 2019
edac61c
Methods: reference numpy
orbeckst May 21, 2019
1ec5e5d
consistent adornments for headings
orbeckst May 21, 2019
c04b36b
more methods defs
orbeckst May 22, 2019
b5526ed
methods (#23): time series vs reduction
orbeckst May 22, 2019
23bf0d6
add benchmark introduction
VOD555 May 22, 2019
49bcbe3
merge changes
VOD555 May 22, 2019
43d8b80
remove timeit part
VOD555 May 22, 2019
1669922
methods: PMDA schema
orbeckst May 22, 2019
862e7ef
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst May 22, 2019
e09ad83
methods: implementation
orbeckst May 22, 2019
78961f1
better line breaking of code
orbeckst May 22, 2019
dbccb05
Update pmda.rst
richardjgowers May 22, 2019
ff7412d
methods: performance evaluation
orbeckst May 22, 2019
c07f8aa
updated examples and usage section
orbeckst May 22, 2019
a3cbbdf
code fix for Rgyr
orbeckst May 22, 2019
4d8b8b1
add efficiency and speedup plot for rdf and rms
VOD555 May 22, 2019
acf619b
add total time comparison
VOD555 May 22, 2019
d994849
add links to data repo to Methods
orbeckst May 22, 2019
0875aac
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst May 22, 2019
8dbc6da
combine total time, efficiency, speedup for rdf
VOD555 May 22, 2019
302a9a3
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
VOD555 May 22, 2019
c29a2b7
combine total, efficiency, speed up for rms
VOD555 May 22, 2019
818833c
update acknowledgements
kain88-de May 22, 2019
9ca29f6
add fig for wait, compute, io times of rms
VOD555 May 22, 2019
71c2a42
add fig for wait compute io rdf, remove the unfixed wait compute io f…
VOD555 May 22, 2019
01fef5c
add Table with benchmark environments
orbeckst May 22, 2019
18f13c9
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst May 22, 2019
24e7a8f
fix wait compute io plot for rms
VOD555 May 22, 2019
19111b5
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst May 22, 2019
5bdbb88
add figures to Results
orbeckst May 22, 2019
1009f73
fix new fig captions
orbeckst May 22, 2019
c2a9eef
add graph for rdf's prepare, conclude universe time
VOD555 May 22, 2019
4af4bc0
add graph for rms' prepare, conclude and universe time
VOD555 May 22, 2019
e79a444
corrected how detailed timing information was obtained
orbeckst May 22, 2019
d858536
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst May 22, 2019
98531c9
Results: completed RMSD section
orbeckst May 22, 2019
0c2bee0
corrected water g(r): OO
orbeckst May 22, 2019
e8d45d0
results: finished RDF
orbeckst May 23, 2019
13557fd
wrote conclusions
orbeckst May 23, 2019
43cec73
final spell check
orbeckst May 23, 2019
5ad7e8e
small improvements
orbeckst May 23, 2019
3e2953d
abstract fix
orbeckst May 23, 2019
1ba6eac
more abstract fix
orbeckst May 23, 2019
3701f39
tense fix: use past tense for result
orbeckst May 23, 2019
b85f750
made optional/required methods clearer
orbeckst May 23, 2019
d0c8eba
maded AnalysisFromFunction() a bit clearer
orbeckst May 23, 2019
2b9ef60
more tense fixes (RMSD results)
orbeckst May 23, 2019
8198feb
Merge branch '2019' into patch-1
orbeckst May 23, 2019
731c99d
Merge pull request #25 from richardjgowers/patch-1
orbeckst May 23, 2019
b4f1b62
Merge pull request #26 from kain88-de/patch-1
orbeckst May 23, 2019
606ed6d
load booktabs package explicitly
orbeckst May 23, 2019
3746e85
consistently italicized multiprocessing and distributed
orbeckst May 23, 2019
069cfe6
more conservative description of RMSD speed-up
orbeckst May 23, 2019
24f8444
add DOI for test trajectories
orbeckst May 30, 2019
7f3c52f
break code inside column
orbeckst May 30, 2019
f775cff
improved AnalysisFromFunction example
orbeckst May 31, 2019
050947c
fix number of cores and nodes for ssd distributed
VOD555 Jun 7, 2019
2cf16d0
fixed: Table: only 1 node for distributed/SSD
orbeckst Jun 7, 2019
b282591
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst Jun 7, 2019
ace8281
fixed definition of speed-up S(M)
orbeckst Jun 13, 2019
cb46d91
add zenodo DOI for data/script repository
orbeckst Jun 13, 2019
4220250
fixed typo found by reviewer @cyrush
orbeckst Jun 13, 2019
76416eb
data is a plural noun: fixed
orbeckst Jun 13, 2019
97a90d1
added particle type indices to make g(r) equation clearer
orbeckst Jun 13, 2019
783dda4
methods updates
orbeckst Jun 13, 2019
4af3190
installation details
orbeckst Jun 14, 2019
f72c0b6
re-arranged performance evaluation section
orbeckst Jun 14, 2019
bf5ee59
float juggling to make figures appear sooner
orbeckst Jun 14, 2019
d9e1a11
cleaned up bib file
orbeckst Jun 14, 2019
07def91
updated software versions
orbeckst Jun 14, 2019
67b7afb
add detail for RMSD calculation
orbeckst Jun 14, 2019
c31013d
text improvements in RMSD Task results section
orbeckst Jun 14, 2019
0fe1c37
add errorbars to wait compute io plot
VOD555 Jun 19, 2019
f7c2003
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
VOD555 Jun 19, 2019
a978b2d
add errorbars to pre_con_uni plots
VOD555 Jun 19, 2019
f44abfe
add error bars total efficiency speedup plots
VOD555 Jun 19, 2019
28cbff8
add stacked graphs with percentage times
VOD555 Jun 19, 2019
13f4579
modify color of lines on graphs
VOD555 Jun 19, 2019
d78a7e9
updated text for plots with error bars
orbeckst Jun 19, 2019
108ecf5
Merge branch '2019' of github.com:Becksteinlab/scipy_proceedings into…
orbeckst Jun 19, 2019
2da50d9
integrate stacked fraction of time plots
orbeckst Jun 20, 2019
615ae24
add reviewer @cyrush to Acknowledgements for the idea of the stacked …
orbeckst Jun 20, 2019
a32203d
updated computational details for RDF calculation
orbeckst Jun 20, 2019
b82547f
fix some typo
VOD555 Jun 20, 2019
aa19461
updated text for errors of speedup and efficiency
VOD555 Jun 20, 2019
a18166d
fix typo
VOD555 Jul 2, 2019
ade200a
fix typo
VOD555 Jul 2, 2019
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.

Always

Just for now

add self-defined analysis task 2

  • Loading branch information...
VOD555 committed May 21, 2019
commit df6fa01458f2b2b2583afe68ff157b3f8a3a826d
@@ -128,19 +128,12 @@ The parallel analysis algorithms are performed on ``Universe`` and tuple of ``At
self._indices = [ag.indices
for ag in atomgroups]
``run()`` performs the split-apply-combine parallel analysis. The trajectory is split into n_blocks blocks by :code:`make_balanced_slices` with first frame start, final frame stop and step length step (corresponding to the split step). :code:`make_balanced_slices` is a function defined in pmda.util. It generates blocks in such a way that they contain equal numbers of frames when possible, but there are also no empty blocks. The final start and stop frames for each block are restored in a list slices. ``n_jobs`` is the number of jobs to start, this argument will be ignored when the distributed scheduler used. After the additional preparation defined in :code:`_prepare`, the analysis jobs (the apply step, defined in :code:`_dask_helper()`) on each block are delayed with the :code:`delayed()` function in dask. Finally, the results from all blocks are gathered and combined in the :code:`_conclude()` function.

``timeit`` is a context manager defined in pmda.util (to be used with the ``with`` statement) that records the execution time for the enclosed context block ``elapsed``. Here, we record the time for `prepare`, `compute`, `I/O`, `conclude`, `universe`, `wait` and `total`. These timing results are finally stored in the attributes of the class ``pmda.parallel.Timing``.
``run()`` performs the split-apply-combine parallel analysis. The trajectory is split into n_blocks blocks by :code:`make_balanced_slices` with first frame start, final frame stop and step length step (corresponding to the split step). :code:`make_balanced_slices` is a function defined in pmda.util. It generates blocks in such a way that they contain equal numbers of frames when possible, but there are also no empty blocks. The final start and stop frames for each block are restored in a list slices. ``n_jobs`` is the number of jobs to start, this argument will be ignored when the distributed scheduler used. After the additional preparation defined in :code:`_prepare`, the analysis jobs (the apply step, defined in :code:`_dask_helper()`) on each block are delayed with the :code:`delayed()` function in dask. The results from all blocks are moved and reshaped into a sensible new variable ``self.results`` (may have other name) with the :code:`_conclude()` function.

.. code-block:: python
def run(self, start=None, stop=None, step=None,
n_jobs=1, n_blocks=None):
# Get the indices of the start, stop
# and step frames.
start, stop, step =
self._trajectory.check_slice_indices(
start, stop, step)
n_frames = len(range(start, stop, step))
slices = make_balanced_slices(n_frames,
n_blocks, start=start,
@@ -166,11 +159,9 @@ The parallel analysis algorithms are performed on ``Universe`` and tuple of ``At
.. code-block:: python
def _dask_helper(self, bslice, indices, top, traj):
u = mda.Universe(top, traj)
agroups = [u.atoms[idx] for idx in indices]
u = mda.Universe(top, traj)
agroups = [u.atoms[idx] for idx in indices]
res = []
times_io = []
times_compute = []
for i in range(bslice.start,
bslice.stop, bslice.step):
ts = u.trajectory[i]
@@ -189,11 +180,6 @@ Accumulation of frames within a block happens in the :code:`_reduce` function. I
return res
Results and Discussion
======================

===========
Basic Usage
===========
@@ -266,7 +252,7 @@ We can wrap rgyr() in ``pmda.custom.AnalysisFromFunction`` to build a paralleled
parallel_rgyr = pmda.custom.AnalysisFromFucntion(
rgyr, u, protein)
Run the analysis on 8 cores and show the timeseries of the results stored in ``parallel_rgyr.results``:
Run the analysis on 4 cores and show the timeseries of the results stored in ``parallel_rgyr.results``:

.. code-block:: python
@@ -276,6 +262,66 @@ Run the analysis on 8 cores and show the timeseries of the results stored in ``p
With pmda.parallel.ParallelAnalysisBase
+++++++++++++++++++++++++++++++++++++++

In more common cases, one can write the parallel class with the help of ``pmda.parallel.ParallelAnalysisBase``. To build a new analysis class, one should
1. (Required) Define the single frame analysis function ``_single_frame``,
2. (Required) Define the final results conclusion function ``_conclue``,
3. (Not Required) Define the additional preparation function ``_prepare``,
4. (Not Required) Define the accumulation function for frames within the same block ``_reduce``, if the result is not time-series data,
5. Derive a class from ``pmda.parallel.ParallelAnalysisBase`` that uses these functions.

As an example, we show how one can build a class to calculate the radius of gyration of a protein givin in ``AtomGroup`` ``protein``. The class needs to be initialized with ``pmda.parallel.ParallelAnalysisBase`` subclassed. The conclusion function reshapes the ``self._results`` which stores the results from all blocks.


.. code-block:: python
import numpy as np
from pmda.parallel import ParallelAnalysisBase
class RGYR(ParallelAnalysisBase):
def __init__(self, protein):
universe = protein.universe
super(RMSD, self).__init__(universe, (protein, ))
def _prepare(self):
self.rgyr = None
def _conclude(self):
self.rgyr = np.vstack(self._results)
The inputs for ``_single_frame`` are fixed. ``ts`` contains the current time step and ``agroups`` is a tuple of atomgroups that are updated to the current frame. The current frame number, time and radius of gyration are returned as the single frame results. Here we can use the default ``_reduce``.

.. code-block:: python
def _single_frame(self, ts, atomgroups):
protein = atomgroups[0]
return (ts.frame, ts.time,
protein.radius_of_gyration))
The usage of this class is the same as the function we defined with ``pmda.custom.AnalysisFromFunction``.

.. code-block:: python
import MDAnalsys as mda
u = mda.Universe(top, traj)
protein = u.select_atoms('protein')
parallel_rgyr = RGYR(protein)
parallel_rgyr.run(n_jobs=4, n_blocks=4)
print(parallel_rgyr.results)
=========
Benchmark
=========

Method
======

``timeit`` is a context manager defined in pmda.util (to be used with the ``with`` statement) that records the execution time for the enclosed context block ``elapsed``. Here, we record the time for `prepare`, `compute`, `I/O`, `conclude`, `universe`, `wait` and `total`. These timing results are finally stored in the attributes of the class ``pmda.parallel.Timing``.


Results and Discussion
======================


Conclusions
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.