<p style="font-size:32px; font-weight: bolder; text-align: center"> Molecular dynamics and sampling </p>

This notebook provides a hands-on counterpart to the "Molecular dynamics and sampling" lecture for the MOOC "Path Integrals in Atomistic Modeling". If you haven't done so already, check the [getting started](0-getting_started.ipynb) notebook to make sure that the software infrastructure is up and running. 

The different sections in this notebook match the parts this lecture is divided into:

1. [Thermodynamics and phase-space sampling](#thermo-and-sampling)
2. [Molecular dynamics and integrators](#integrators)
3. [Efficiency of sampling](#sampling-efficiency)
4. [Langevin dynamics](#langevin)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import ase, ase.io
import chemiscope
import pimdmooc
pimdmooc.add_ipi_paths()

<a id="thermo-and-sampling"> </a>

# Thermodynamics and phase-space sampling

We consider a (classical) harmonic oscillator with frequency $\omega$ and unit mass, in the constant-temperature ensemble at inverse temperature $\beta$

In [None]:
def pot_sho(q, omega=1):
    """ The potential for a simple harmonic oscillator with frequency omega and unit mass"""
    return omega**2*q**2/2
def kin_sho(p):
    """ The kinetic energy for a particle with unit mass"""
    return p**2/2
def ham_sho(p, q, omega=1):
    """ The Hamiltonian for the simple harmonic oscillator """
    return kin_sho(p) + pot_sho(q, omega)

In [None]:
beta = 1
omega0 = 2

## Integration on a grid

First, we compute the partition function and observables using explicit integration on a grid - using the simplest rectangle integration

In [None]:
ngrid = 16

pgrid = np.linspace(-10,10,ngrid)
dp = pgrid[1]-pgrid[0]

qgrid = np.linspace(-10,10,ngrid)
dq = qgrid[1]-qgrid[0]

pqgrid = np.meshgrid(pgrid, qgrid)

In [None]:
ham_grid = ham_sho(pqgrid[0], pqgrid[1], omega=omega0)

In [None]:
Z = np.exp(-ham_grid*beta).sum()*dp*dq
print("Partition function: ", Z)

note that the partition function (and the probability) can be factorized

In [None]:
pot_grid = pot_sho(qgrid, omega=omega0)
kin_grid = kin_sho(pgrid)
Zp = np.exp(-kin_grid*beta).sum()*dp
Zq = np.exp(-pot_grid*beta).sum()*dq
Z - Zp*Zq

The mean potential and kinetic energy can be computed as a weighted mean, and again one can equally well compute it on just the relevant variables

In [None]:
pot_mean = (pot_sho(pqgrid[1], omega=omega0)* np.exp(-ham_grid*beta)).sum()*dp*dq/Z
print("Average potential: ", pot_mean)

kin_mean = (kin_sho(pqgrid[0])* np.exp(-ham_grid*beta)).sum()*dp*dq/Z
print("Average kinetic:   ", kin_mean)

In [None]:
pot_mean_q = (pot_grid*np.exp(-pot_sho(qgrid, omega=omega0)*beta)).sum()*dq/Zq
pot_mean - pot_mean_q

In [None]:
kin_mean_p = (kin_grid*np.exp(-kin_sho(pgrid)*beta)).sum()*dp/Zp
kin_mean - kin_mean_p

<p style="color:blue; font-weight:bold"> What is the expected value of $Z$, $\langle V \rangle$, $\langle K \rangle$? Experiment with different `ngrid` parameters to see how many grid points you need to converge the values to roughly 1%. </p>

## Stochastic integration

Naively, we can generate uniform random samples over a large interval, and compute the average by Monte Carlo integration

In [None]:
nmc = 1000

In [None]:
qmc = np.random.uniform(-10,10,size=(nmc))

In [None]:
pot_mc = pot_sho(qmc, omega=omega0)
prob_mc = np.exp(-pot_mc * beta)
pot_mean = (prob_mc*pot_mc).mean() / prob_mc.mean()
print("Average potential: ", pot_mean)

Most of the samples are "wasted" over low-probability regions

In [None]:
plt.hist(prob_mc, bins=100)
plt.xlabel("weight"); plt.ylabel("counts");

<p style="color:blue; font-weight:bold">Repeat the calculation with different numbers of random samples, to get a feeling for the statistical uncertainty and the convergence behavior. What would happen if you reduced the range of the grid to a narrower region around $0$?</p>

You could also rather easily wrap the generation and evaluation in a function to compute more quantitatively the uncertainty over multiple executions.

## Importance sampling

To avoid wasting samples on low-probability regions, we can generate a sequence of configurations that are distributed according to the target probability. 
This uses a Metropolis Monte Carlo scheme, which is not explained in the course. The [original publication](http://doi.org//10.1063/1.1699114) is a classic, and very accessible.

In short, the algorithm works by first _proposing_ a change to the configuration, in a way that is symmetric $u(q_0 \rightarrow q_1) = u(q_1\rightarrow q_0)$. Here we take a random step between $-\Delta q$ and $\Delta q$. 

The probabilities in the initial and final state are then compared, and an _acceptance_ criterion is applied to actually update the position, or to keep the system in $q_0$. The overall probability of making a move is the product of the proposal and acceptance probabilities, $p(q_0\rightarrow q_1) = u(q_0\rightarrow q_1) a(q_0\rightarrow q_1)$.
The criterion is designed so satisfy the detailed-balance condition $P(q_0) p(q_0\rightarrow q_1) = P(q_1) p(q_1\rightarrow q_0)$.

In [None]:
def metropolis_step(q0, step=1):
    """"Performs one step in a Monte Carlo procedure, following the Metropolis scheme, cf.
    N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, 
    "Equation of State Calculations by Fast Computing Machines," 
    Journal of Chemical Physics 21(6), 1087â€“1092 (1953).
    """
    
    # NB: this implementation recomputes the potential at each step, which is 
    # very wasteful - potential could be stored and reused between steps
    pot0 = pot_sho(q0, omega=omega0)
    
    # generates a random displacement (with symmetric probability)
    q1 = q0 + np.random.uniform(-1,1)*step
        
    pot1 = pot_sho(q1, omega=omega0)
    
    # computes the ratio of initial and final probabilities        
    pratio = np.exp((pot0-pot1)*beta)
    
    # accepts or rejects the move to enforce a detailed balance condition
    if (pratio > np.random.uniform(0,1)):
        return q1
    else:
        return q0

In [None]:
q = 0
nstep = 1000
mcstep = 1

traj_q = np.zeros(nstep)
for i in range(nstep):    
    q = metropolis_step(q, step=mcstep)
    traj_q[i] = q

The position fluctuates around equilibrium, and is distributed according to $P(q)$. Note that strictly speaking one should discard the first few steps as they are needed to reach equilibrium.   

In [None]:
plt.plot(traj_q)
plt.xlabel("step"); plt.ylabel("q / a.u.");

In [None]:
plt.hist(traj_q[10:], bins=100)
plt.xlabel("weight")
plt.ylabel("counts")

In [None]:
pot_mean = pot_sho(traj_q, omega = omega0).mean()
print("Average potential: ", pot_mean)

<p style="color:blue; font-weight:bold">
Change the magnitude of the step (variable `mcstep`) to 0.01 and to 100. 
What consideration can you make on the efficiency of the probability sampling process?
</p>

This topic will be investigated further in [section 3](#sampling-efficiency)

<a id="integrators"></a>

# Molecular dynamics and integrators

This section requires use of i-PI, so make sure you have it installed and have familiarized yourself with how to run it in the [getting started](0-getting_started.ipynb) section. 

Here we will modify an existing i-PI input to run constant-energy simulations for a small simulation of liquid water, based on the q-TIP4P/f forcefield ([original paper](http://doi.org/10.1063/1.3167790)), run short trajectories and inspect the output.  

We will first go to the appropriate folder, create a copy of the template and edit it.  You will need to open a terminal and execute

```
$ cd pimd-mooc/1-getting_started
$ cp template_integrator.xml input.xml
```

Edit the `input.xml` file. You can use `vi` in the terminal, or open the file with the file editing interface of Jupyter/Jupyterlab. 

First, we are going to set up a rather "by the book" simulation - a conservative time step for liquid water is of the order of 0.5 fs. Look for the time step specification and edit it so it reads `<timestep units='femtosecond'> 0.5  </timestep>`. You should also set the output prefix to a memorable name - it is recommended to use `<output prefix='md-ts_0.5'>` to be compatible with the postprocessing in this notebook.

Then launch i-PI and the driver - either using two terminals or putting i-PI in the background.

```
$ i-pi input.xml &> log &
$ i-pi-driver -u -h driver -m qtip4pf 
```

we can load the output and plot it. One sees how potential and kinetic energy fluctuate wildly (over an energy scale of a significant fraction of a Hartree, while 

In [None]:
ts_05 = pimdmooc.read_ipi_output('1-md_sampling/md-ts_0.5.out')

In [None]:
plt.plot(ts_05["time"], ts_05["potential"], label="V")
plt.plot(ts_05["time"], ts_05["conserved"], label="H")
plt.plot(ts_05["time"], ts_05["kinetic_md"], label="K")
plt.xlabel("time / ps"); plt.ylabel("energy / a.u."); 
plt.legend()

Now modify `input.xml` to run a simulation with timestep of 1.0 fs, 1.5 fs and 2.0 fs, running each time a separate simulation; make sure to also change the output `prefix` accordingly - use e.g. `'md-ts_2.0'` as format, or adjust the plotting cells below to reflect your naming scheme. 

_NB: wait for each calculation to be finished before launching another one - otherwise, you will have to give different names to each of the socket files_

After running the simulations, we can inspect the behavior of the conserved quantity to check for the accuracy of integration

In [None]:
ts_10 = pimdmooc.read_ipi_output('1-md_sampling/md-ts_1.0.out')
ts_15 = pimdmooc.read_ipi_output('1-md_sampling/md-ts_1.5.out')
ts_20 = pimdmooc.read_ipi_output('1-md_sampling/md-ts_2.0.out')

In [None]:
plt.plot(ts_20["time"], ts_20["conserved"], label=r"$\Delta t = 2.0$ fs")
plt.plot(ts_15["time"], ts_15["conserved"], label=r"$\Delta t = 1.5$ fs")
plt.plot(ts_10["time"], ts_10["conserved"], label=r"$\Delta t = 1.0$ fs")
plt.plot(ts_05["time"], ts_05["conserved"], label=r"$\Delta t = 0.5$ fs")
plt.xlabel("time / ps"); plt.ylabel("energy / a.u."); 
plt.ylim(-0.04,-0.02)
plt.legend()

In [None]:
plt.plot(ts_20["time"], ts_20["potential"], label=r"$\Delta t = 2.0$ fs")
plt.plot(ts_15["time"], ts_15["potential"], label=r"$\Delta t = 1.5$ fs")
plt.plot(ts_10["time"], ts_10["potential"], label=r"$\Delta t = 1.0$ fs")
plt.plot(ts_05["time"], ts_05["potential"], label=r"$\Delta t = 0.5$ fs")
plt.xlabel("time / ps"); plt.ylabel("energy / a.u."); 
plt.ylim(-0.3,-0.1)
plt.legend()

<p style="color:blue; font-weight:bold">
Note the sharp change in behavior of the conserved quantity. Plot also the potential and the kinetic energy separately. Is the signal as clear?
</p>

You can also spot the most dramatic problems by plotting the trajectories. Many MD problems manifest themselves in quite dramatic ways when looking at the motion of the atoms!

In [None]:
trajectory_05 = pimdmooc.read_ipi_xyz("1-md_sampling/md-ts_0.5.pos_0.xyz")

In [None]:
chemiscope.show(frames = trajectory_05, properties = dict(
   time = ts_05["time"][::10],
   potential = ts_05["potential"][::10],
   conserved = ts_05["conserved"][::10]
))

_NB: molecules appear to be spread all over the place because they can move outside of the periodic boundaries (show the unit cell by selecting the appropriate option in the visualization menu)_

In [None]:
# try to fold the atoms back into the supercell, and replot above
for f in trajectory_05:
    f.wrap(pbc=[1,1,1])    

for the large timestep frames, the positions quickly go into 

In [None]:
trajectory_20 = pimdmooc.read_ipi_xyz("1-md_sampling/md-ts_2.0.pos_0.xyz")
n_ok = 0
for f in trajectory_20:
    if not np.isnan(f.positions.sum()):
        n_ok+=1

In [None]:
chemiscope.show(frames = trajectory_20[:n_ok], properties = dict(
   time = ts_20["time"][::10][:n_ok],
   potential = ts_20["potential"][::10][:n_ok],
   conserved = ts_20["conserved"][::10][:n_ok]
))

<a id="sampling-efficiency"></a>

# Efficiency of sampling

<a id="langevin"> </a>

# Langevin dynamics

## Installation

The notebooks associated with this course rely on some basic Python packages. If executing the following cell returns any errors, you should install the corresponding packages, e.g. using 

```
pip install -U numpy matplotlib ase chemiscope
```

`pimdmooc.py` is a small utility package that is present in the root folder of this repository.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import ase, ase.io
import chemiscope
import pimdmooc
pimdmooc.add_ipi_paths()

You should also have [i-PI](https://ipi-code.org) installed, and accessible from the path. This might require some more effort: if everything is configured correctly the following cell should return a prompt, and a message stating that 

```Simulation has already run for total_steps, will not even start. Modify total_steps or step counter to continue.```

In [None]:
!i-pi 0-getting_started/do_nothing.xml

... and the following cell should generate a help string explaining the syntax for running `i-pi-driver`

In [None]:
!i-pi-driver

If you get an error stating that the command has not been found, please open a terminal and follow these instructions.

_NB: this will install i-PI in your home folder, which will make these notebooks work out-of-the-box. if you know what you are doing, you can personalize the installation as long as `i-pi` and `i-pi-driver` are accessible from the path_

1. clone the i-PI repository

``` 
$ git clone https://github.com/i-pi/i-pi.git 
```

2. add the i-pi folder to the default path

```
$ echo ". ~/i-pi/env.sh" >> .bashrc
```

3. compile the driver files 

```
$ cd i-pi/drivers/f90/ & make
```

4. try to execute again the cells above

## Running i-PI

i-PI functions following a client-server protocol, in which i-PI acts as the server, executing advanced MD simulations, while the evaluation of energies and forces is delegated to an external code, that acts as the client

![a scheme of the i-PI client-server model](figures/ipi-scheme.png)

Thus, to run i-PI you need to launch (at least) two processes: `i-pi` and a driver -- here we use a minimalistic FORTRAN tool that can compute energy and forces according to a number of simple potential energy models. 

1. open *two* terminals
2. in the first terminal launch `i-pi`

```
$ cd pimd-mooc/0-getting_started
$ i-pi input.xml
```

3. in the second terminal, launch the driver

```
$ i-pi-driver -u -h driver -m pswater -v
```

_NB: `i-pi-driver` does not need input files nor generate outputs, so you can run it from any folder_

Both programs run in verbose mode, so you can see the communication logs between the two. 

```
 @SOCKET:   Client asked for connection from . Now hand-shaking.
 @SOCKET:   Handshaking was successful. Added to the client list.
 @SOCKET: 21/09/17-15:49:35 Assigning [ none] request id    0 to client with last-id None (  0/  1 : )
 @SOCKET: 21/09/17-15:49:35 Assigning [match] request id    0 to client with last-id    0 (  0/  1 : )
 # Average timings at MD step       0. t/step: 4.08595e-02
 @SOCKET: 21/09/17-15:49:35 Assigning [match] request id    0 to client with last-id    0 (  0/  1 : )
 # Average timings at MD step       1. t/step: 3.96178e-02
 @SOCKET: 21/09/17-15:49:35 Assigning [match] request id    0 to client with last-id    0 (  0/  1 : )
```

and 

```
  Message from server: STATUS
  Message from server: POSDATA
  Message from server: STATUS
  Message from server: GETFORCE
  Message from server: STATUS
  Message from server: STATUS
  Message from server: POSDATA
  Message from server: STATUS
  Message from server: GETFORCE
  Message from server: STATUS
```

You can try to kill the driver with `CTRL+C` and see what happens. `i-pi` should stop and wait for the connection of another client: if you launch `i-pi-driver` again, the simulation will continue. This robust management of multiple client allows the implementation of a trivial level of parallelism when the advanced MD simulation require the calculation of multiple replicas, as we will see in the following exercises.

_NB: if `i-pi` exits abruptly (e.g. by closing down a shell) it will leave a UNIX domain socket file in `/tmp/`, named `/tmp/ipi-NAME`. If you launch again i-PI it will exit with an error message similar to_

```
Error opening unix socket. Check if a file /tmp/ipi_driver exists, and remove it if unused.
```

_needless to say: if you are reasonably confident this has been left around by a previous run, follow the instructions and remove the file._

### i-PI input file format

i-PI uses XML-formatted input files, that describe how the simulation is set up, where to get energy and forces from, and how to output the results of the simulation. in this course we will mostly use prepared input files, where the meaning of the parameters for each specific application will be explained only in relation to the concepts being covered. If you want to learn more about i-PI, you can visit the [website](https://ipi-code.org) or read the [documentation](https://ipi-code.org/i-pi/).

In [None]:
!cat 0-getting_started/input.xml   # the meaning of most of these options will become clear as the course progresses

## Analyzing the results

After you have run your simulations, you can look into the output files, load them and visualize them straight from the notebooks. i-PI does not have a pre-defined output format, and each run can be configured to output multiple files with different content and strides. 

The `<properties>` outputs contain properties of the system as a whole, such as the timestamp, temperature or potential energy of the system. The header of the file contains a summary of its content. In this simple example, the output is printed to `simulation.out`. _NB: Unless explicitly specified, *all* quantities read and output by i-PI are in Hartree atomic units_

In [None]:
!head 0-getting_started/simulation.out

In [None]:
simulation_data = np.loadtxt("0-getting_started/simulation.out")

In [None]:
plt.plot(simulation_data[:,0], simulation_data[:,1])
plt.xlabel("time / ps"); plt.ylabel("temperature / K")

Atomic positions and properties are dumped to files based on the settings given in a `<trajectory>` tag. 
Here we use a combined visualizer called `chemiscope`, but obviously you can use alternative tools for this purpose.

In [None]:
trajectory_data = pimdmooc.read_ipi_xyz("0-getting_started/simulation.pos_0.xyz")

`chemiscope` allows you to visualize simultaneously properties and configurations

In [None]:
chemiscope.show(frames=trajectory_data, 
                properties=dict(
                    time=simulation_data[::10,0], 
                    temperature=simulation_data[::10,1],
                    potential=simulation_data[::10,3]
                               )
               )

given that sometimes it is not trivial to set up a jupyter widget to load properly, if you have problems opening the visualization above you can also export the visualization as a .json file, that can be loaded on [chemiscope.org](https://chemiscope.org)

In [None]:
chemiscope.write_input("0-example.json.gz",
                frames=trajectory_data, 
                properties=dict(
                    time=simulation_data[::10,0], 
                    temperature=simulation_data[::10,1],
                    potential=simulation_data[::10,3]
                               )
               )

# New heading