# Setup

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tqdm.auto as tqdm
import torch
%matplotlib widget

In [None]:
def grab(x: torch.Tensor) -> np.ndarray:
    """Convert a torch Tensor to numpy array"""
    return x.detach().numpy()

In [None]:
def wrap(x):
    """Wrap angle into range [-pi, pi]"""
    return (x + np.pi) % (2*np.pi) - np.pi

# Action
We will consider a simple family of theories on a space of two angles $(\theta_1, \theta_2)$. The general form of the action is
$$
S(\theta_1, \theta_2; \alpha, \beta) := -\beta \cos(\theta_1 - \theta_2) - \alpha \cos(\theta_1) + \alpha \cos(\theta_2)
$$

In [None]:
def action(th, *, alpha, beta):
    """family of actions on two angles"""
    assert th.shape[-1] == 2
    th1, th2 = th[...,0] ,th[...,1]
    return (
        -beta * torch.cos(th1 - th2) - alpha * torch.cos(th1)
        + alpha * torch.cos(th2)
    )

def make_action(alpha, beta):
    return lambda th: action(th, alpha=alpha, beta=beta)

# some target parameters
beta_target = 3.0
alpha_target = 1.0
target_action = make_action(alpha_target, beta_target)

It will be useful to have samples from the target distribution. There are many possible way to build this ensemble, here we just do a crude importance sampling with a single resampling step according to computed weights.

In [None]:
def sample_inds(weights):
    """resample indices according to weights"""
    p = np.copy(weights)
    p /= np.sum(p)
    return np.random.choice(len(weights), p=p, size=len(weights))

def sample(batch_size, action, *, beta0):
    """importance sampling to get ground truth data"""
    shape = (batch_size,)
    dist = torch.distributions.VonMises(0.0, beta0)
    delta = dist.sample(shape)
    S0 = dist.log_prob(delta)
    th1 = 2*np.pi*torch.rand(size=shape)
    th2 = (th1 - delta) % (2*np.pi)
    th = torch.stack([th1, th2], axis=-1)
    logw = -action(th) + S0
    logw -= torch.logsumexp(logw, dim=0)
    weight = np.exp(grab(logw))
    # resample
    inds = sample_inds(weight)
    return th[inds]

**EXERCISE:** Implement a more principled sampling function, like MCMC, rejection sampling, or inverse CDF sampling.

Set up some utilities to plot distributions of samples or analytic action over the two-dimensional plane of angles.

In [None]:
def make_th_grid(steps):
    th = torch.linspace(-np.pi, np.pi, steps=steps)
    th = (th[1:]+th[:-1])/2
    th = torch.stack(torch.meshgrid([th, th], indexing='ij'), axis=-1)
    return th
def plot_dist(action, *, ax, nsteps=60):
    th = make_th_grid(nsteps)
    S = action(th)
    th = grab(th)
    ax.contourf(th[...,0], th[...,1], np.exp(-grab(S)))
def plot_samples(th, *, ax, nbins=60):
    bins = np.linspace(-np.pi, np.pi, num=nbins+1)
    th = wrap(grab(th))
    ax.hist2d(th[...,0], th[...,1], bins=bins)

# Action coefficients

For this simple theory, we can expand any action (our target, or intermediate learned actions) in a Fourier basis:
$$
\tilde{S}(k_1, k_2) \sim \int_0^{2\pi} \frac{d\theta_1}{2\pi} \frac{d\theta_2}{2\pi} e^{-i k \cdot \theta} S(\theta).
$$
We can think of these coefficients as some kind of Wilson coefficients in a systematic expansion. It will provide a way to see how we move through the (infinite-dimensional) space of distributions.

In [None]:
def measure_coeffs_grid(S):
    """extract Wilson-like coeffs using the Fourier transform"""
    pass
def measure_coeffs(action):
    """extract Wilson-like coeffs from action function"""
    pass

# Annealing / trivializing flow
Let's first look at the path through the space of distributions described by annealing / the trivializing flow. This is just linear interpolation in the parameters:

In [None]:
# TODO: measure coeffs of target action

In [None]:
def plot_coeffs(ts, coeffs, x='a1', y='b2', *, ax, cmap, marker='.', label=None):
    pass

# Diffusion

The diffusion path requires implementing the **Langevin SDE**. We can use a simple Euler-Maruyama integrator, starting from samples from the target distribution to simulate the forward process.

In [None]:
def forward(th, *, g=1.5, nsteps=1000, save_freq=10):
    pass

In [None]:
# TODO: forward from samples, measure coeffs

As the forward process proceeds, **noise is added** until we converge towards the **uniform distribution**.

In [None]:
fig, axes = plt.subplots(1, 4, figsize=(8, 3), tight_layout=True)
inds = [0, 25, 50, 100]
# TODO: plot samples

Compared to the annealing path, diffusion takes a **non-linear path in the space of couplings**. It terminates at (or close to) the uniform distribution with zero couplings.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(3, 3), tight_layout=True)
# TODO: plot coeffs

# Normalizing flow

Finally, we implement a simple **hard-coded flow** (no machine learning yet!). To evaluate the flow, we just use a simple Euler integrator. The coefficients of the flow are arbitrarily tuned to approximately reproduce the target distribution.

In [None]:
def velocity(th, t):
    th1, th2 = th[...,0], th[...,1]
    return (
        5*t*(1-t) * torch.stack([
            -torch.sin(th1 - th2), -torch.sin(th2 - th1)], axis=-1)
        + t**2 * torch.stack([-torch.sin(th1), torch.sin(th2)], axis=-1)
    )
def flow(th, nsteps=1000, save_freq=10):
    pass

In [None]:
# TODO: measure coeffs throughout flow

In [None]:
fig, axes = plt.subplots(1, 4, figsize=(8, 3), tight_layout=True)
inds = [0, 25, 50, 100]
# TODO: plot flow samples

**EXERCISE:** Compute the probability density of the flow by integrating the divergence of the flow field. Compare this against the sample density above.

In [None]:
fig, ax = plt.subplots(1,1, figsize=(3, 3), tight_layout=True)
# TODO: plot coeffs over time