The purpose of this tutorial is to given an introductory on the subject of Hamiltonian Monte Carlo and deep neural networks. We will begin this tutorial by going through the basics concepts from mechanics and propability theory, which are required for understanding the subject.



## 1. Neural networks

## 2. Convolutional neural networks / deep learning (CNN)

## 3. Training neural networks: back propagation

## 4. CLASSICAL / NEWTONIAN MECHANICS
In classical or Newtonian mechanics, the elementary relation between an object and an exerted force is descriobed by the Newton's 2nd law: 

$$F=ma,$$

which relates the force $F$ exerted on an object with mass $m$ and acceleration $a$. Force is measured in the units of Newton's (N) and 1 Newton can be interpreted in the following way: "One Newton, is the force required for causing an acceleration of $1 \frac{m}{s^2}$ on an object with mass of 1 kg". To give an example, say we have an object with mass 10 kg and we wish to cause acceleration of $5\frac{m}{s^2}$ on the object. We thus need to apply the force:

$$F = 10\cdot 5 \;\frac{kg\cdot m}{s^2} = 50 N,$$

on the object. So far, we have been talking about a single object or a particle, but what about when we have a system of particles or objects? How can we model the dynamics of a system of particles? Well, we simply sum up all the forces exerted on the system (by system I mean all the particles) and so the dynamics of group of particles is defined by the equation: 

$$\sum_{i=1}^n F_i = \sum_{i=1}^n m_i a_i.$$

This equation includes both the internal and external forces on the system and constraint forces. Our task now is to use this equation and explain the dynamics of the system. One can quickly agree however, that this is a rather difficult problem: we have a multitude of particles in our hand with many different forces acting on the system. It is not thus easy to explain the dynamics of such a system using Newtonian mechanics. This is where we bring in the Lagrangian and Hamiltonian mechanics, which are equivalent to the Newtonian mechanics in describing the dynamics of the system, but they offer a much easier way to do this. Plus, they give us more information on the dynamics of the system than Newtonian mechanics does. 

### Example of Newtonian mechanics: Atwood machine

To give an example of Newtonian mechanics, we will consider the classical example of the Atwood machine (1784, by George Atwood). The Atwoow machine is a laboratory experiment for verifying the mechanical laws of motion in a constant acceleration case.

In a basic Atwood machine example, we consider two objects with masses $m_1$ kg and $m_2$ kg, connected by an inextensible massless string over an ideal massless pulley. In this example, we wish to investigate the acceleration $a$ of the 'object-string-object'-system.

First of all, lets find the forces acting on the system. We assume for the sake of simplicity, that the string produces a consant force of $T$ to the two objects. Also, we assume that the acceleration of the system is positive ($a > 0$) when object with mass $m_1$ is falling down and negative $(a < 0)$ if it is rising up. In addition to the string force $T$, the forces acting on the objects are their weights due to gravity, that is: 

$$W_1 = m_1 g\;\;\;\;\;\;\;\;\; W_2 = m_2 g,$$

where $g = 9.8 \;\frac{m}{s^2}$ is the acceleration due to gravity. So the total forces acting on the two objects are: 

$$W_1 - T= m_1 g - T = m_1 a\;\;\;\;\;\;\;\;\; T - W_2= T-m_2 g = m_2 a,$$

and if we add these two equations we get: 

$$m_1 g -T + T-m_2 g = m_1 a + m_2 a,$$

from which we easily get that the acceleration of the 'object-string-object'-system is: 

$$a = g\frac{m_1-m_2}{m_1+m_2}.$$

To make this more concrete with real numbers, let $m_1 = 1.1$ kg and $m_2 = 1$ kg. We thus get that the acceleration of the system is: 

$$a = 9.8 \;\frac{m}{s^2}\frac{(1.1 - 1) kg}{(1.1 + 1) kg} \approx 0.048 \times 9.8 \frac{m}{s^2} \approx 0.47 \frac{m}{s^2}.$$

To get a better feeling of Atwood machine, please refer to the simulation by Andrew Duffy: [Atwood machine](http://physics.bu.edu/~duffy/HTML5/Atwoods_machine.html). 

## 5. LAGRANGIAN MECHANICS

In mechanics, we are interested in the motion of objects: how fast a car drives, how the earth orbits the sun, the oscillating motion of a pendulum, etc. From a simple applied point of view, Lagrangian mechanics is just a different way to approach a given mechanical problem. Let us take as an example the motion of a pendulum. Our goal is to describe how the pendulum will move.

In Newtonian mechanics (the “normal” mechanics taught in high school), we would start by drawing a diagram with multiple arrows for all the forces which are acting on the pendulum. We can then find how the pendulum is moving by using Newton’s second law: F=ma. More generally in Newtonian mechanics, we take Newton’s three laws as fundamental laws of nature and try to derive everything else from there. It is centered around forces, since these are ultimately used to figure out the trajectories.

In Lagrangian mechanics, things work differently. To obtain the same result, we start by calculating the kinetic and potential energy of the pendulum. Instead of Newton’s three laws, we assume that there is another fundamental law of nature: the principle of least action. According to this principle, we can calculate the motion by minimizing a certain quantity, called the action, that is related to the two forms of energy mentioned above. Lagrangian mechanics therefore is centered around energies. Forces are no longer needed to determine the motion of objects.

Both variants of course lead to the same trajectory for the pendulum. It can be shown more generally that these two formalism are equivalent. However, they each are better suited for certain types of problems.

Besides being more convenient to solve some problems, there is a much deeper reason for why it is a good idea to introduce Lagrangian mechanics. It turns out that many of the fundamental laws of physics can be described by such a principle of least action. To give you a taste, here is the so called “Lagrangian” of the standard model (the action we try to minimize in the principle of least action is the integral of this beast):

As in Newtonian mechanics, we begin the Lagrangian formulation from the equation (with vector quantities): 

$$\sum_{i=1}^n F_i = \sum_{i=1}^n m_i a_i,$$

which describes the dynamics of a system of $n$ particles. Furthermore, we decompose the forces into two components, the applied forces $F_i^{(a)}$ and forces due to constraints $f_i$, and so we get: 

$$\sum_{i=1}^n \left(F_i^{(a)} + f_i\right) = \sum_{i=1}^n m_i a_i = \sum_{i=1}^n \dot{p}_i ,$$

where we have included the moment $p_i=m_iv_i$ notation, where $v_i$ is the velocity of particle $i$. That is, 

$$\dot{p}_i = \frac{d p_i}{d t}=\frac{d (m_iv_i)}{d t}=m_i\frac{d v_i}{d t}=m_ia_i.$$

Thus we have for the equation of motion of the system that: 

$$\sum_{i=1}^n \left(F_i^{(a)} + f_i - \dot{p}_i \right) = \textbf{0}.$$

An example of $F_i^{(a)}$ could be caused e.g.via a push by an external agent to the system. An example of a force of constraint $f_i$ could be e.g. gravity which forces particle $i$ to stay on a plane. Another easier example is perhaps to think of a rigid-body movement. The whole system experiences a force by an external agent and thus each particle $i$ experiences a force, and each particle of the system is constrained to its own relative position wit respect to other particles of the system. 

Next, we introduce more factors into the game. The symbols $r_1, r_2, ..., r_n$ refer to the position vectros of the $n$ particles and time is denoted as $t$. The constraints imposed on the system on $n$ particles are represented by the set of equations: 

$$f(r_1, r_2, r_3, ..., t)=0,$$

which are called holonomic constraints. If we have $k$ of these holonomic constraint equations, then we can use them to eliminate $k$ of the $3n$ (each particle has three coordinates) position coordinates of the particle system and we are left with $3n-k$ coordinates which we can select **independently** with respect to each other. These remaining $3n-k$ coordinates (denoted as $q_i$) are called generalized coordinates and they implicitly contain the constraints of the system, which is defined by the $k$ holonimic equations. That is, we now have: 

$$
\begin{matrix} 
r_1 = r_1(q_1, q_2, ..., q_{3n-k}, t) \\
\vdots \\
r_n = r_n(q_1, q_2, ..., q_{3n-k}, t).
\end{matrix}
$$

Next we introduce the concept of virtual displacement, which refers to the a change in the configuration of the system as the result of any arbitrary infinitesimal change of the coordinates $\delta r_i$, consistent with the forces and constraints imposed on the system at the given time instant $t$. By now taking the dot product of above equation with $\delta r_i$ in each particle, we get that:

$$\sum_{i=1}^n \left(F_i^{(a)} + f_i - \dot{p}_i \right)\cdot \delta r_i = 0.$$

By restricting ourselves to a system for which the virtual work of the forces of constraint vanishes we get: 

$$\sum_{i=1}^n \left(F_i^{(a)}- \dot{p}_i \right)\cdot \delta r_i = 0,$$

which is often called D'Alembert's principle. By now making a series of algebraic manipulations and subsitutions to this equation (more on these e.g. in Goldstein) the D'Alembert's principle becomes: 

$$\sum_{j} \left\{ \frac{d}{dt} \left[ \frac{\partial}{\partial\dot{q}_j}\left(\sum_i \frac12 m_iv_i^2\right) \right] - \frac{\partial}{\partial q_j}\left(\sum_i \frac12 m_iv_i^2\right) -Q_j \right\} \delta q_j=0,$$

where 

$$\dot{q}_j = \frac{d q_j}{dt}\;\;\;\text{and}\;\;\;Q_j = \sum_i F_i \cdot \frac{\partial r_i}{\partial q_j}.$$

The symbol $Q_j$ is known as the generalized force in mechanics. We now recognize the two sums involving the particle masses are the total kinetic energy $T=\sum_i \frac12 m_i v_i^2$ of the system and so we can write the D'Alembert's principle as: 

$$\sum_{j} \left\{ \frac{d}{dt} \left[ \frac{\partial T}{\partial\dot{q}_j} \right] - \frac{\partial T}{\partial q_j} -Q_j \right\} \delta q_j=0.$$

If one has experience with calculus of variation the above equations should start to look familiar. Anyway, we now remember that since the generalized coordinates $q_j$ could be arbitrarily chosen due to independency, it follows that in order for the above equation to hold it must be that each of the terms in the sum vanish, that is: 

$$\frac{d}{dt} \left[ \frac{\partial T}{\partial\dot{q}_j} \right] - \frac{\partial T}{\partial q_j} -Q_j=0\;\;\;\forall j .$$

Furthermore, if all the applied forces are derivable from a scalar potential energy function $V$, that is: 

$$F_i^{(a)}=-\nabla_i V,$$

then we have that: 

$$Q_j = \sum_i F_i \cdot \frac{\partial r_i}{\partial q_j} = -\sum_i \nabla_i V\cdot \frac{\partial r_i}{\partial q_j} = -\frac{\partial V}{\partial _j},$$

and also assuming that $V$ does not depend on time $t$ and the generalized velocities $\dot{q}_j$ (an assumption perfectly valid in many real applications) we get that the individual terms in the sum get the form: 

$$\frac{d}{dt} \left[ \frac{\partial (T-V)}{\partial\dot{q}_j} \right] - \frac{\partial (T-V)}{\partial q_j} =0\;\;\;\forall j ,$$

or that 

$$\frac{d}{dt} \left( \frac{\partial L}{\partial\dot{q}_j} \right) - \frac{\partial L}{\partial q_j} =0\;\;\;\forall j ,$$

which is referred to as the **Lagrange's equations** of motion with the Lagrangian function $L=T-U$, that the difference of kinetic and potential energy of the system.  

So what have achieved now? We have transformed the equations describing the mechanics of the system into a different yet equivalent format expressed explicitly in terms of energies of the system. In the Newtonian equations, we were dealing with complicated equations of forces et cetera. Now, we are expressing the same dynamics of the system with energies. We thus only need to know the Lagrangian of the physical system we are interested about and we can then use the Lagrange's equations of motion to find out its dynamics. In many cases in physics, this makes the imvestigation of the system's mechanics a lot easier. There is also another way of producing Lagrange's equations using the variational approach, but we will not cover this in this tutorial and it can be found in many books on mechanics (e.g. Goldstein).

### Example of Lagrangian mechanics: Atwood machine


## 6. HAMILTONIAN MECHANICS


### Example of Hamiltonian mechanics: Atwood machine

## 7. Markov chains

## 8. Monte carlo

## 9. Markov chain Monte Carlo (MCMC)

## 10. Short intro on Bayesian modeling

## 11. Hamiltonian Monte Carlo (HMC)

## 12. Training deep CNNs via HMC

## 13. Python implementation