# Estimating the Local Lipschitz Constant of a Neural Network Using Jacobinet

## Introduction
In this tutorial, we will estimate the local Lipschitz constant of a neural network using the Jacobian matrix and explore how this constant relates to the network's robustness. A neural network's Lipschitz constant bounds the rate at which its outputs can change with respect to small input perturbations. Understanding and controlling this constant is critical for:

- Adversarial robustness: Ensuring the network resists small, intentional perturbations.
- Stability: Preventing large output changes due to minor input variations.
- Generalization: Improving the network's performance on unseen data.

We will use the *Jacobinet* library (based on Keras) to calculate the Jacobian and maximize the $L_p$ norm of the gradient 
to estimate the Lipschitz constant. This provides a lower bound for the Lipschitz constant, a key metric in robustness evaluation.


- When running this notebook on Colab, we need to install *decomon* if on Colab. 
- If you run this notebook locally, do it inside the environment in which you [installed *jacobinet*](https://ducoffeM.github.io/jacobinet/main/install.html).

In [None]:
# On Colab: install the library
on_colab = "google.colab" in str(get_ipython())
if on_colab:
    import sys  # noqa: avoid having this import removed by pycln

    # install dev version for dev doc, or release version for release doc
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install git+https://github.com/ducoffeM/jacobinet@main#egg=decomon
    # install desired backend (by default torch)
    !{sys.executable} -m pip install "torch"
    !{sys.executable} -m pip install "keras"

    # extra librabry used in this notebook
    !{sys.executable} -m pip install "numpy"
    # missing imports IPython

In [None]:
# Set this environment variable *before* importing torch, otherwise it has no effect.
# Ideally, we'd only set this if torch.backends.mps.is_available() is True,
# but checking that requires importing torch first, which would make this setting too late.
# So we preemptively enable the MPS fallback just in case MPS is available.
import os

os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

# 1. Building the Neural Network
We start by defining a simple feedforward neural network with two dense layers and a ReLU activation function.

In [None]:
import keras
from keras.layers import Activation, Dense, Input
from keras.models import Model


def build_model():
    input_ = Input((10,))
    x = Dense(10, name="Dense1")(input_)
    x = Activation("relu", name="ReLU1")(x)
    x = Dense(10)(x)
    x = Activation("relu")(x)
    output = Dense(2, name="Output")(x)
    return Model(input_, output)


model = build_model()
model.summary()

## 2. Computing the Jacobian with Jacobinet

We will now compute the Jacobian matrix using Jacobinet’s get_backward_model function. 
This model returns the gradient of each output with respect to the input.

In [None]:
import jacobinet
import numpy as np
from jacobinet import clone_to_backward

# Placeholder gradient to compute the Jacobian
gradient_placeholder = keras.Variable(np.ones((1, 2), dtype="float32"))

# Compute backward model for Jacobian calculation
backward_model = clone_to_backward(model, gradient=gradient_placeholder)

### Explanation:

The Jacobian represents the gradients of the output w.r.t. the input.
get_backward_model builds a model to compute these gradients.

# 3. Estimating the Lipschitz Constant
To compute the Lipschitz constant, we use the L2 norm (p=2) of the Jacobian.

In [None]:
from jacobinet import get_lipschitz_model

# Create a Lipschitz model using the L2 norm (p=2)
lipschitz_model = get_lipschitz_model(backward_model, p=2)
lipschitz_model.summary()

## Visualizing the Model Structure


In [None]:
import keras.utils
from IPython.display import HTML

dot_img_file_lipschitz = "./model_dense_lipschitz.png"
keras.utils.plot_model(
    lipschitz_model, to_file=dot_img_file_lipschitz, show_shapes=True, show_layer_names=True
)
HTML(
    '<div style="text-align: center;"><img src="{}" width="400"/></div>'.format(
        dot_img_file_lipschitz
    )
)

# 4. Evaluating the Lipschitz Constant on Random Data
We now evaluate the Lipschitz constant using random input data.

In [None]:
data = np.asarray(np.random.rand(10)[None], dtype="float32")  # Generate random input data

# Compute the lower bound of the Lipschitz constant
lipschitz_constant = lipschitz_model(data)
print(f"The Lipschitz constant is at least: {lipschitz_constant}")

In [None]:
lipschitz_threshold = lipschitz_constant[0]

# 5. Maximizing the Lp Norm with Adversarial Attacks (PGD)
We use Projected Gradient Descent (PGD) to iteratively perturb the input and maximize the Lp norm, tightening the lower bound of the Lipschitz constant.

In [None]:
import torch
import torch.nn as nn
import torchattacks


class LipAttack(nn.Module):
    def __init__(self, keras_model):
        super().__init__()
        self.keras_model = keras_model
        self.lipschitz_threshold = keras.Variable(lipschitz_threshold.cpu().detach().numpy())

    def forward(self, x):
        x = self.keras_model(x)
        return torch.cat(
            [
                keras.ops.relu(self.lipschitz_threshold - x),
                keras.ops.relu(x - self.lipschitz_threshold),
            ],
            -1,
        )


# Wrap lipschitz_model with the attack class
torch_lip_model = LipAttack(lipschitz_model)

# Apply PGD attack for different iteration steps
adv_data = data
for steps in [10, 20, 40, 100, 200, 500, 1000]:
    lip_attack = torchattacks.PGD(torch_lip_model, eps=10.0, steps=steps)
    adv_data_ = lip_attack(torch.Tensor(adv_data), torch.Tensor([1, 0]))
    lipschitz_constant_adv = lipschitz_model(adv_data)

    if torch_lip_model(adv_data_).argmax().cpu().detach().numpy() == 1:
        adv_data = adv_data_
        torch_lip_model.lipschitz_threshold.assign(lipschitz_constant_adv[0].cpu().detach().numpy())
    print(f"Lipschitz constant after {steps} PGD steps: {lipschitz_constant_adv}")

## 6. Theoretical Insights: Why Lipschitz Constant Matters

### 1. Adversarial Robustness
A smaller Lipschitz constant implies that the network’s output changes less when small perturbations are applied to the input. This makes it more resistant to **adversarial attacks**, where maliciously crafted input perturbations attempt to mislead the model.

### 2. Stability and Generalization
Networks with lower Lipschitz constants tend to generalize better, as they are less sensitive to noise or variations in input data. This also enhances training stability, as it prevents excessive variations in gradients.

### 3. Mathematical Context
The Lipschitz constant \( L \) is formally defined as:


$$L = \sup_{x \neq y} \frac{\|f(x) - f(y)\|}{\|x - y\|} $$


Locally, this can be approximated using the **Jacobian matrix** \( J(x) \), which contains all first-order partial derivatives of the network’s outputs with respect to its inputs:

$$ L = \max_{x} \|J(x)\|_p $$


where \( \|J(x)\|_p \) is the **Lp norm** of the Jacobian matrix. Maximizing this norm provides a lower bound on the Lipschitz constant, a key metric for evaluating the robustness of the neural network.


# Conclusion

In this tutorial, we used Jacobinet to compute a lower bound for the Lipschitz constant by maximizing the Lp norm using adversarial attacks. The Lipschitz constant is a key measure of a network's robustness and generalization capabilities.