<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title"><b>Training Binary Neural Networks by Integer Linear Programming</b></span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://mate.unipv.it/gualandi" property="cc:attributionName" rel="cc:attributionURL">Stefano Gualandi</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.<br />Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://github.com/mathcoding/opt4ds" rel="dct:source">https://github.com/mathcoding/opt4ds</a>.

**REMARK:** As usual, to install Gurobi on a Colab notebook, run the following command.

In [None]:
# Run if on Colab
# %pip install gurobipy

# Training (Binary) Neural Networks by Integer Linear Programming
In this notebook, we show how to write an ILP model to train (binary) neural networks. We start by considering a basic perceptron, then we define multilayer neural networks, and, finally, we focus on binary neural networks. 
Notice that this type of approach will not scale with the size of the input, but it helps to reason about the optimal architecture of Neural Networks in small examples.

Let the pair $\mathcal{X} = (X, Y)$ be the input training dataset, where $X\in \mathbb{R}^{n \times m}$ and $Y\in \mathbb{R}^n$.
Each row $x_i \in \mathbb{R}^m$ of the matrix $X$ rapresent an input data point, which is mapped to the $i$-th label $y_i$.
The dataset is partitioned into a training set $\mathcal{T} \subset \mathcal{X}$ and a validat set $\mathcal{S} \subset \mathcal{T}$.

A basic perceptron is defined by the following parametric function $F_w : \mathbb{R}^{n \times m} \rightarrow \mathbb{R}^m$ with parameter vector $w$:

$$
    \hat{y}= F_w(X), \text{ where } X \in \mathbb{R}^{n\times m}, \hat{y} \in \mathbb{R^n}, w \in \mathbb{R^n}
$$

The parametric function is defined as:

$$
    F_w(X) = \text{sign}( X \cdot w ) 
$$

or componentwise:

$$
    f_w(x_i) = \text{sign}( x_i^T \cdot w ) 
$$

where the sign() function is +1 for positive input, and -1 otherwise.

Using the training dataset $\mathcal{T}$, we are interested in finding the *best* parameters $w^*$:

$$
    w^* = \argmin_{w \in \mathbb{R}^m} || y - \hat{y} || = \argmin_{w \in \mathbb{R}^m} || y - F_w(X) ||, \quad \text{ with } (X,y) \in \mathcal{T}
$$

Later, we evaluate the perceproton computing the accuracy on the validation set $\mathcal{S}$:
$$
    \text{accuracy} = \frac{\sum_{i=1}^n || y_i - f_{w^*}(x_i) ||}{n}, \quad \text{ with } (x_i,y_i) \in \mathcal{S}
$$

We show next how to model using Integer Linear Programming the problem of finding the best $w^*$.

## Learning basic logical operators
From a didatic perspective, learning the three basic logical operators **and**, **or**, and **xor** is an excellent exercise. Let us start with the **and** operator, whose true table is given below:

| i | $x'_1(i)$ | $x'_2(i)$ | $y'(i)$ |
|--|----|----|---|
| 1 | 0 | 0 | 0 |
| 2 | 0 | 1 | 0 |
| 3 | 0 | 0 | 0 |
| 4 | 1 | 1 | 1 |

Moreover, we can reparametrize the data into the value -1,+1 (by using the transformation $x(i) = 2x'(i) - 1$ as follows.

| i | $x_1(i)$ | $x_2(i)$ | $y(i)$ |
|--|----|----|---|
| 1 | -1 | -1 | -1 |
| 2 | -1 | +1 | -1 |
| 3 | -1 | -1 | -1 |
| 4 | +1 | +1 | +1 |

We can then use the values of $x_1$ and $x_2$ to define the training matrix $X\in \mathbb{R}^{4 \times 2}$ and vector $y \in \mathbb{R}^4$.

To start, we will use the true table as both the training and validation dataset.


In [None]:
# AND function
Xand = [(-1,-1), (-1, 1), (1,-1), (1,1)]
Yand = [-1, -1, -1, 1]

# OR function
Xor = [(-1,-1), (-1, 1), (1,-1), (1,1)]
Yor = [-1, 1, 1, 1]

In [None]:
from gurobipy import Model, GRB, quicksum
from random import randint, seed
seed(13)

def LogicalNN(Xs, Ys):
    # Main ILP model
    model = Model()

    # TODO: complete the model
    # ...

    # Return weight of the final solution
    return lambda x: 1 - 2*randint(0,1)

### Noisy input data
To consider a more realistic setting, let us suppose that the input has a noise, that is, for instance:
$$
    y = f_w(X + \epsilon)
$$
where $\epsilon \in \mathbb{R}^{n \times m}$ is any kind of noise coming from an unknown distribution (e.g., a uniform, normal, or lognormal).

The following function can be used to add noise to the input.

In [None]:
from numpy.random import normal
def AddNoise(X, mu=0.1):
    return list(map(lambda x: x+normal(0, mu), X))

In [None]:
# To increase the size of the input dataset:
T = 10
Xs = T*[AddNoise(X, 0.1) for X in Xand]
Ys = T*Yand
print(Xs[:5], Ys[:5])

**EXERCISE 1:** Adapt the previous script to be trained over random input data.

## The **xor** logical function
The simple perceptron is unable to correctly learn the **xor** function because it is not linearly separable. One possibility to overcome this limitation is to change the structure of the parametric function $f_W$.

For instance, we could use the following function, by adding a new vector of parameters U:
$$
    \hat{y} = f_{W,U} = \text{sign}(U \cdot \text{sign}(X W)) 
$$

Notice that now we have a *hidden* layer of unknows given by the inner $\text{sign}$ function. The inner product between the weight $U$ and the result of the sign function, introduce bilinear terms that must be carfully linearized, in order to use Integer Linear Programming to solve the training problem.

In [None]:
# XOR function
Xxor = [(-1, -1), (-1, 1), (1, -1), (1, 1)]
Yxor = [-1, 1, 1, -1]

In [None]:
def MLP(Xs, Ys, nh):
    # Main ILP model
    model = Model()

    # TODO: complete the model
    # ...

    # Return weight of the final solution W, U
    return lambda x: 1 - 2*randint(0,1)

Then, you have to design the validation test.

**EXERCISE 2:** Modify your script in such a way that the weight belong only to the set of values $\{-1, 0, +1\}$. Note that a value of 0 is equivalent to remove the corresponding link (and simplify the network).

**EXERCISE 3:** Can you *computationally proof* which is the smallest Binary Neural Network that can compute exactly the **xor** function? Hown many weights do you need in total?

## Classification of MNIST digits
Starting from the solution used to model the **xor** function, you can build a parametric function whose parameters are fitted to classify images.

You can use the same structure used for the **xor** function, but by chaning the number of states in the hidden layer:
$$
    \hat{y} = f_{W,U} = \text{sign}(U \cdot \text{sign}(X W)) 
$$

Or you can propose any different type of solution.

You can download the dataset as follows.

In [None]:
# Uncomment and exectue
# !wget https://mate.unipv.it/gualandi/opt4ds/all_three_four.csv

In [None]:
# Uncomment and exectue
# !wget https://mate.unipv.it/gualandi/opt4ds/all_nine_four.csv

## Parsing the dataset
To parse the dataset you can use the following function

In [None]:
import numpy as np
def Parse(filename):
    fh = open(filename, 'r')

    fh.readline()

    Xs, Ys = [], []
    for row in fh:
        line = row.replace('\n','').split(';')
        Ys.append(int(line[0]))
        Xs.append(list(map(int, line[1:])))

    return np.matrix(Xs), np.array(Ys)

In [None]:
Xs, Ys = Parse('../data/train_three_four.csv')

In [None]:
print('dimension of matrix X:', Xs.shape)

In [None]:
print('dimension of y:', Ys.shape)

## Plotting digits
To plot a digit you can use the following snippet.

In [None]:
import matplotlib.pyplot as plt

def DrawDigit(A):
    plt.imshow(A.reshape((28, 28)), cmap='binary')
    plt.show()

In [None]:
DrawDigit(Xs[2])

In [None]:
print(Ys[0], Ys[2])

## Evaluating a NN
To evaluate the accuracy of a Binary NN you can run the following function.

In [None]:
def AccuracyMLP(Xs, Ys, F):
    y_hat = np.array([F(x) for x in Xs])

    n = len(Ys)
    return (np.sum(Ys == y_hat))/n*100

In [None]:
# nh = number of activation in the hidden layer
def TrainingMLP(Xs, Ys, nh=1):
    # Main ILP model
    model = Model()
    # TO COMPLETE with your model

    return lambda x: 1 - 2*randint(0,1)

# Number of internal states
nh = 2

# REMARK: the predict function
F = TrainingMLP(Xs, Ys, nh)

# Evaluate accuracy (be careful of the matrix dimension)
acc = AccuracyMLP(Xs, Ys, F)
print('accuracy:', round(acc, 2))

## BNN Classification Challenge
For this challenge, you have to design a binary neural network, that is a binary neural network where all weights are either +1 or -1, and that is able to solve a binary classification problem defined on pair of MNIST images.

For training, you will have two dataset, the first containing images of the digits 3 and 4, and the second, containing the images of digits 4 and 9. 

For the design phase, you should use a small number of input data points. Later, you can decide if having a *light* model that can take in input several data points, or a *heavy* model that can use only a few data points but is more general.

The evaluation will be carried over a hidden mixed dataset.

*REMARK*: You have to submit your solution by June, Thursday 11, 2025, sending an email containing your python solution script.

Partecipating to this (optional) challenge, you will get extra points for the final exam grade.