# Improving Performance

## Overview

### Questions

- How can I write custom actions to be as efficient as possible?

### Objectives

- Mention high performance libraries that can help speed up computation
- Demonstrate using the local snapshot API for increased performance.

## Boilerplate Code

In [None]:
from numbers import Number

import numpy as np

import hoomd
import hoomd.md as md


cpu = hoomd.device.CPU()
sim = hoomd.Simulation(cpu)

# Create a simple cubic configuration of particles
N = 15  # particles per box direction
box_L = 50  # box dimension

snap = hoomd.Snapshot(cpu.communicator)
snap.configuration.box = [box_L] * 3 + [0, 0, 0]
snap.particles.N = N ** 3
x, y, z = np.meshgrid(
    *(np.linspace(-box_L / 2, box_L / 2, N, endpoint=False),) * 3)
positions = np.array((x.ravel(), y.ravel(), z.ravel())).T
snap.particles.position[:] = positions
snap.particles.types = ['A']
snap.particles.typeid[:] = 0

sim.create_state_from_snapshot(snap)

sim.state.thermalize_particle_momenta(hoomd.filter.All(), 1., seed=109)

lj = md.pair.LJ(nlist=md.nlist.Cell())
lj.params[('A', 'A')] = {'epsilon': 1.,
                         'sigma': 1.}
lj.r_cut[('A', 'A')] = 2.5
integrator = md.Integrator(methods=[md.methods.NVE(hoomd.filter.All())],
                           forces=[lj],
                           dt=0.005)

thermo = md.compute.ThermodynamicQuantities(hoomd.filter.All())
logger = hoomd.logging.Logger(flags=['scalar'])
logger.add(thermo, ['kinetic_energy', 'potential_energy'])
logger['total_energy'] = (
    lambda: thermo.kinetic_energy + thermo.potential_energy,
    'scalar')

table = hoomd.write.Table(100, logger, max_header_len=1)

sim.operations += integrator
sim.operations += thermo
sim.operations += table

class GaussianVariant(hoomd.variant.Variant):
    def __init__(self, mean, std):
        hoomd.variant.Variant.__init__(self)
        self.mean = mean
        self.std = std
    
    def __call__(self, timestep):
        return np.random.normal(self.mean, self.std)
    
energy = GaussianVariant(10.0, 0.001)

## General Guidelines

When trying to create custom actions that are as performant as possible
when necessary (and it often isn't), there are multiple considerations
to be had. However, the first step of optimization is to profile first
and secondly profile. Profiling is  necessary to find the true
bottlenecks of a given program or function.

Given that, one of the easiest and most obvious is to efficiently use
NumPy, SciPy, and other core scientific Python libraries. Efficient
use of these packages is beyond the scope of the tutorial, but using
NumPy broadcasting (instead of Python for loops) and built in functions 
can make a big difference.

When this fails or is insufficient, Cython or numba can be used to
compile the slow parts of the code while having immediate compatibility
in Python. Cython is its own langauge which is similar to Python with
some C like constructs. Numba uses just in time compilation on standard
Python functions that use a given subset of available Python features.
Compiled backends in other languages can be used as well as long as
they link to Python.

These all apply to any scientific Python code. We will now discuss some 
HOOMD-blue specifics. When accessing state information consider using
local snapshots (i.e. `hoomd.State.cpu_local_snapshot` and 
`gpu_local_snapshot`). Local snaphsots provide faster access to the
simulation's state information by not copying data or gathering data
across MPI ranks. They also support in-place motification which enables
faster setting as well. A full explanation of the use of local snapshots
will go in a future tutorial.

Further, when accessing object properties like 
`hoomd.md.pair.LJ.energies`, if the data is needed in mutliple locations 
store in a variable such as `energies = lj.energies`. This prevents
having to recalculate the quantity mulitple times or gather the information
across MPI ranks.

## Improve InsertEnergyUpdater

As an example, we will improve the performance of the 
`InsertEnergyUpdater`. Specifically we will change to use
the `cpu_local_snapshot` to update particle velocity.
This example is slightly more complicated than the previous.


In [2]:
class InsertEnergyUpdater(hoomd.custom.Action):
    def __init__(self, energy):
        self._energy = energy
        
    @property
    def energy(self):
        return self._energy
    
    @energy.setter
    def energy(self, new_energy):
        if isinstance(new_energy, Number):
            self._energy = hoomd.variant.Constant(new_energy)
        elif isinstance(new_energy, hoomd.variant.Variant):
            self._energy = new_energy
        else:
            raise ValueError(
                "energy must be a variant or real number.")

    def attach(self, simulation):
        self._state = simulation.state
        self._comm = simulation.device.communicator

    def detach(self):
        del self._state
        del self._comm
    
    def act(self, timestep):
        part_tag = np.random.randint(self._state.N_particles)
        direction = self._get_direction()
        energy = self.energy(timestep)
        with self._state.cpu_local_snapshot as snap:
            # We restrict the computation to the MPI
            # rank containing the particle if applicable.
            # By checking if multiple MPI ranks exist first
            # we can avoid for checking inclusion of a tag id
            # in an array.
            if (self._comm.num_ranks <= 1
                    or part_tag in snap.particles.tag):
                i = snap.particles.rtag[part_tag]
                mass = snap.particles.mass[i]
                magnitude = np.sqrt(2 * energy / mass)
                velocity = direction * magnitude
                old_velocity = snap.particles.velocity[i]
                new_velocity = old_velocity + velocity
                snap.particles.velocity[i] = new_velocity
            
    @staticmethod
    def _get_direction():
        theta, z = np.random.rand(2)
        theta *= 2 * np.pi
        z = 2 * (z - 0.5)
        return np.array([
            np.sqrt(1 - (z * z)) * np.cos(theta),
            np.sqrt(1 - (z * z)) * np.sin(theta),
            z
        ])
    

def create_insert_energy_updater(trigger, *args, **kwargs):
    return hoomd.update.CustomUpdater(
        action=InsertEnergyUpdater(*args, **kwargs),
        trigger=trigger)

In [3]:
# Create and add our custom updater
energy_inserter = create_insert_energy_updater(
    trigger=100, energy=energy)

sim.operations += energy_inserter
sim.run(1000)

 kinetic_energy  potential_energy   total_energy  
   5869.76346       -44.92613        5824.83733   
   7038.32376       -260.58602       6777.73774   
   8185.01720       -402.79031       7782.22689   
   9374.63016       -488.57309       8886.05708   
  10404.02432       -512.89721       9891.12711   
  11344.17493       -489.22638      10854.94855   
  12380.28109       -482.09776      11898.18333   
  13335.80749       -531.34413      12804.46336   
  14402.39507       -517.49419      13884.90088   
  15363.17855       -456.17111      14907.00745   


As can be seen the updater still works. By a system size of
1000 this version is already significantly faster than the
previous iteration, by virtue of the local snapshot modification
having $O(1)$ time complexity.

This concludes the tutorial on custom actions in Python. For
more information see the API documentation.