This example involves a Radial Basis Function equipped with the Gaussian kernel

$$\varphi(r)=e^{-\varepsilon^2 r^2},$$

where $r$ is the radius from any given point to some distinguished center. We assume the center to be at \(0,0\). This kernel shall feature a shape parameter $\varepsilon=5$.

This worksheet intends to show what happens if we provide a ''direct'' implementation of the Gaussian kernel and attempt to apply automatic differentiation to this kernel.

In [63]:
from typing import Callable
import torch
from torch import nn, Tensor
eps = 5.

Alongside the relevant imports, the following line enables further debugging capabilities for PyTorch:

In [64]:
torch.autograd.set_detect_anomaly(True)

<torch.autograd.anomaly_mode.set_detect_anomaly at 0x7fce7d8a7c90>

We now explicitly define the functions for computing the radius $r$ in any given $\varphi(r)$ as well as the definition of the Gaussian kernel.

In [65]:
def compute_radii(x: Tensor, centers: Tensor) -> Tensor:
    x = x.unsqueeze(1)  # Shape: (batch_size, 1, d)
    centers = centers.unsqueeze(0)  # Shape: (1, num_centers, d)
    
    squared_distances = torch.sum((x - centers)**2, dim=2)  # Shape: (batch_size, num_centers)
    distances = torch.sqrt(squared_distances)
    
    return distances

def gaussian_kernel(eps: float | Tensor):
    def fn(x: Tensor, y: Tensor) -> Tensor:
        radii = compute_radii(x, y)
        return torch.exp(-(eps * radii) ** 2)
    return fn

*Note*: instead of the function `computing_radii()`, one could have tried using the built-in function `torch.cdist()`. At the time of writing this worksheet, this function returns an error when computing nested gradient calculation, as reported in [this](https://github.com/pytorch/pytorch/issues/83510) Github issue. The proposed implementation returns the same numerical results while avoiding this issue.

Now, we can explicitly define the `RBF` module:

In [66]:
class RBF_Direct_Implementation(nn.Module):
    def __init__(self,
                 centers: Tensor,
                 rbf_kernel: Callable[[Tensor, Tensor], Tensor]):

        super(RBF_Direct_Implementation, self).__init__()

        self.centers = centers.clone().detach()
        self.output_layer = nn.Linear(
            in_features=centers.shape[0], out_features=1, bias=False)
        self.kernel = rbf_kernel

    def forward(self, x: Tensor):
        kernel_values = self.kernel(x, self.centers)
        result = self.output_layer(kernel_values)
        return result

We create an instance of this module with the center and shape parameters we prescribed at the beginning of the worksheet:

In [67]:
phi_rbf = RBF_Direct_Implementation(centers=torch.zeros((1,2)), 
                                    rbf_kernel=gaussian_kernel(eps))

with torch.no_grad():
    phi_rbf.output_layer.weight = torch.nn.Parameter(torch.Tensor([1]).reshape(1,1))

We also explicitly create the same function $\varphi(r)$ just for verification purposes:

In [69]:
def phi_rbf_compare(x: Tensor, y: Tensor):
    return torch.exp(-eps**2*(x ** 2 + y ** 2))

We create a 3x3 grid:

In [70]:
xy = torch.cartesian_prod(*[torch.linspace(0, 1, 3,requires_grad=True), 
                            torch.linspace(0, 1, 3, requires_grad=True)])
x, y = xy[:, 0], xy[:, 1]
print(x)
print(y)

tensor([0.0000, 0.0000, 0.0000, 0.5000, 0.5000, 0.5000, 1.0000, 1.0000, 1.0000],
       grad_fn=<SelectBackward0>)
tensor([0.0000, 0.5000, 1.0000, 0.0000, 0.5000, 1.0000, 0.0000, 0.5000, 1.0000],
       grad_fn=<SelectBackward0>)


We verify that the calculations of our module correspond to those of the explicitly created function:

In [72]:
phi_rbf_compare(x, y).reshape(-1,1) - phi_rbf(xy)

tensor([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]], grad_fn=<SubBackward0>)

It seems like the implementation is correct. Now, we can apply `autograd` on our module. We intend to compute the derivatives of our module with respect to $x$ and $y$, which is done in the following:

In [73]:
u = phi_rbf(torch.cat((x.unsqueeze(1),y.unsqueeze(1)), dim=1))
grad_x, grad_y = torch.autograd.grad(u, (x,y), 
                                     grad_outputs=torch.ones_like(u), 
                                     create_graph=True)

  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/mnt/sdb1/Proyectos/tfm-experiments/.venv/lib64/python3.11/site-packages/ipykernel_launcher.py", line 17, in <module>
    app.launch_new_instance()
  File "/mnt/sdb1/Proyectos/tfm-experiments/.venv/lib64/python3.11/site-packages/traitlets/config/application.py", line 1053, in launch_instance
    app.start()
  File "/mnt/sdb1/Proyectos/tfm-experiments/.venv/lib64/python3.11/site-packages/ipykernel/kernelapp.py", line 737, in start
    self.io_loop.start()
  File "/mnt/sdb1/Proyectos/tfm-experiments/.venv/lib64/python3.11/site-packages/tornado/platform/asyncio.py", line 195, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/usr/lib64/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/usr/lib64/python3.11/asyncio/events.py", line 84, in _r

RuntimeError: Function 'SqrtBackward0' returned nan values in its 0th output.

Even if we are not savvy with PyTorch, it is clear that a numerical error took place during the computation of a square root. More specifically, in the very first computation, which involves an input identically equal to our center $(0,0)$. The distance between both points is zero, but the square root is not differentiable at zero!