Skip to content

Commit

Permalink
Improving documentation.
Browse files Browse the repository at this point in the history
- Updated README.
- New section about HPC with ML4Chem.

All this is still work in progress.
  • Loading branch information
muammar committed Mar 11, 2020
1 parent f48c1d5 commit bf6a55e
Show file tree
Hide file tree
Showing 5 changed files with 178 additions and 11 deletions.
8 changes: 4 additions & 4 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
ML4Chem: Machine Learning for Chemistry and Materials (ML4Chem) Copyright (c) 2019, The
Regents of the University of California, through Lawrence Berkeley National
Laboratory (subject to receipt of any required approvals from the U.S.
ML4Chem: Machine Learning for Chemistry and Materials (ML4Chem) Copyright (c)
2019, The Regents of the University of California, through Lawrence Berkeley
National Laboratory (subject to receipt of any required approvals from the U.S.
Dept. of Energy). All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down Expand Up @@ -40,4 +40,4 @@ for such Enhancements, then you hereby grant the following license: a
non-exclusive, royalty-free perpetual license to install, use, modify,
prepare derivative works, incorporate into other computer software,
distribute, and sublicense such enhancements or derivative works thereof,
in binary and source code form.
in binary and source code form.
28 changes: 21 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,17 @@ A list of features and ML algorithms are shown below.
- Distributed training in a data parallel paradigm aka mini-batches.
- Scalability and distributed computations are powered by Dask.
- Real-time tools to track status of your computations.
- Easy scaling up/down.
- Easy access to intermediate quantities: `NeuralNetwork.get_activations(X, numpy=True)` or `VAE.get_latent_space(X)`.
- [Messagepack serialization](https://msgpack.org/index.html).

## Notes

This package is under heavy development and might break at some points until
it gets stabilized. It is in its infancy, so if you find there is an error,
you might want to report it so that it can be improved. We also welcome pull
requests if you find any part of ML4Chem should be improved. That would be
very nice.

## Citing

Expand All @@ -49,24 +58,28 @@ doi = "10.26434/chemrxiv.11952516.v1"

## Documentation

To get started, read the documentation at
To get started, read the documentation at
[https://ml4chem.dev](https://ml4chem.dev). It is arranged in a way that you
can go through the theory as well as some code snippets to understand how to
use this software. Additionally, you can dive through the [module
index](https://ml4chem.dev/genindex.html) to get more information about
different classes and functions of ML4Chem.
different classes and functions of ML4Chem. If you think the documentation
has to be improved do not hesistate to state so in the bug reports and help
out if you feel like it.


## Visualizations

![](https://raw.githubusercontent.com/muammar/ml4chem/master/docs/source/_static/dask_dashboard.png)

## Copyright

Note: This package is under development.
License: BSD 3-clause "New" or "Revised" License.

## Copyright
ML4Chem: Machine Learning for Chemistry and Materials (ML4Chem) Copyright (c) 2019, The
Regents of the University of California, through Lawrence Berkeley National
Laboratory (subject to receipt of any required approvals from the U.S.
```
ML4Chem: Machine Learning for Chemistry and Materials (ML4Chem) Copyright (c)
2019, The Regents of the University of California, through Lawrence Berkeley
National Laboratory (subject to receipt of any required approvals from the U.S.
Dept. of Energy). All rights reserved.
If you have questions about your rights to use or distribute this software,
Expand All @@ -80,3 +93,4 @@ its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the
Software to reproduce, distribute copies to the public, prepare derivative
works, and perform publicly and display publicly, and to permit other to do
so.
```
139 changes: 139 additions & 0 deletions docs/source/hpc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
===================
Introduction
===================

ML4Chem uses `Dask <https://docs.dask.org/en/latest/>`_ which is a flexible
library for parallel computing in Python. Dask allows easy scaling up and
down without too much effort.

In this part of the documentation, we will cover how ML4Chem can be run on a
laptop or workstation and how we can scale up to running on HPC clusters.
Dask has a modern and interesting structure:


#. A scheduler is in charge of taking tasks.
#. Tasks can be registered in a delayed way or simply submitted as futures.
#. When the scheduler receives a task, it sends it to workers that carry out
the computations and keep them in memory.
#. Results from computations can be subsequently used for more calculations or
just brought back to memory.


=====================
Scale Down
=====================

Running computations with ML4Chem on a personal workstation or laptop is very
easy thanks to Dask. The :code:`LocalCluster` class uses local resources to
carry out computations. This is useful when prototyping and building your
pipeline withouth wasting time waiting for HPC resources in a crowded cluster
facility.

ML4Chem can run with:code:`LocalCluster` objects, for which the scripts have
to contain the following::

from dask.distributed import Client, LocalCluster

cluster = LocalCluster(n_workers=8, threads_per_worker=2)
client = Client(cluster)

In the snippet above, we imported :code:`Client` that will connect to the
scheduler created by the :code:`LocalCluster` class. The scheduler will have
8 workers with 2 threads. As tasks are required, they are sent by the
:code:`Client` to the :code:`LocalCluster` for being computed and kept in
memory.

A typical script for running training in ML4Chem looks as follows::


from ase.io import Trajectory
from dask.distributed import Client, LocalCluster
from ml4chem.atomistic import Potentials
from ml4chem.atomistic.features import Gaussian
from ml4chem.atomistic.models.neuralnetwork import NeuralNetwork
from ml4chem.utils import logger


def train():
# Load the images with ASE
images = Trajectory("cu_training.traj")

# Arguments for fingerprinting the images
normalized = True

# Arguments for building the model
n = 10
activation = "relu"

# Arguments for training the potential
convergence = {"energy": 5e-3}
epochs = 100
lr = 1.0e-2
weight_decay = 0.0
regularization = 0.0

calc = Potentials(
features=Gaussian(
cutoff=6.5, normalized=normalized, save_preprocessor="model.scaler"
),
model=NeuralNetwork(hiddenlayers=(n, n), activation=activation),
label="cu_training",
)

optimizer = ("adam", {"lr": lr, "weight_decay": weight_decay})
calc.train(
training_set=images,
epochs=epochs,
regularization=regularization,
convergence=convergence,
optimizer=optimizer,
)


if __name__ == "__main__":
logger(filename="cu_training.log")
cluster = LocalCluster()
client = Client(cluster)
train()

=====================
Scale Up
=====================

Once you have finished with prototyping and feel ready to scale up, the
snippet above can be trivially expanded to work with high performance
computing (HPC) systems. Dask offers a module called :code:`dask_jobqueue`
that enables sending computations to HPC systems with Batch systems such as
SLURM, LSF, PBS and others (for more information see
`<https://jobqueue.dask.org/en/latest/index.html>`_.

To scale up in ML4Chem with Dask, you only have to slightly change the
snipped above as follows::


if __name__ == "__main__":
from dask_jobqueue import SLURMCluster
logger(filename="cu_training.log")


cluster = SLURMCluster(
cores=24,
processes=24,
memory="100GB",
walltime="24:00:00",
queue="dirac1",
)
print(cluster)
print(cluster.job_script())
cluster.scale(jobs=4)
client = Client(cluster)
train()

We removed the :code:`LocalCluster` and instead used the :code:`SLURMCluster`
class to submit our computations to a SLURM batch system. As you see, the
:code:`cluster` is now a :code:`SLURMCluster` requesting a job with 24 cores
and 24 processes, 100GB of RAM, a wall time of 1 day, and the queue in this
case is `dirac1`. Then, we scale this by requesting to the HPC cluster 4 jobs
with these requirements for a total of 96 processes. This :code:`cluster` is
passed to the :code:`client` and now our training is scaled up. No more input
is needed :).
6 changes: 6 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,12 @@

models

.. toctree::
:maxdepth: 1
:caption: Computing

hpc

.. toctree::
:maxdepth: 1
:caption: Visualization
Expand Down
8 changes: 8 additions & 0 deletions docs/source/ml4chem.atomistic.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,14 @@ ml4chem.atomistic.models.neuralnetwork module
:undoc-members:
:show-inheritance:

ml4chem.atomistic.models.rt module
----------------------------------

.. automodule:: ml4chem.atomistic.models.rt
:members:
:undoc-members:
:show-inheritance:

ml4chem.atomistic.models.se3net module
--------------------------------------

Expand Down

0 comments on commit bf6a55e

Please sign in to comment.