## LCPB 23-24 exercise 4 (Restricted Boltzmann Machine)

- Andrea Semenzato 2130973
- Pietro Bernardi 2097494
- Tomàs Mezquita 2109239
- Mariam Chokheli 2122278


We want to study the performances of an RBM, and, by looking at its learned weights and biases,
better understand the correlations in the data (from file x_RBM_2024_exercise.dat, $N=10^4$
configurations with L=10 bits). Use an RBM with M=3 hidden units.

### Point 1

Increase the number of contrastive divergence steps from n=1 to n=5

Referring to the code that was provided in the notebook, we added this loop inside the k-loop that repeats the forward and backward passes between the visible and hidden layers $CDK$-times, where $CDK = 5$ at this time as requested.

```python
h = activate(v[k], w, b, GAP)
hf = h
for cdk_it in range(CDK):
    vf = activate(hf, w.T, a, GAP) #changed to hf from h
    hf = activate(vf, w, b, GAP)
```

### Point 2
Compute the log-likelihood $\mathcal{L}$ during the training, at every epoch, or every minibatch update if it
reaches a maximum already in the first epoch. Use “t” as an index of this “time”, indicating the unit
in the figures.

We computed the log-likelihood by number of epochs and by minibatch. The results are shown below in the case of $M=3$ and $CD=5$.

<div align="center">
    <img src="./ex_img/loglike_cd5_m3.png" width=800>
    Figure 1
</div>

The log-likelihood was computed via these functions:

##### getEnergy
This function computes the energy for a given configuration (h,v) as per:
$$
    E(v,h) = -v\cdot a - h\cdot b - v^T\cdot W\cdot h
$$
```python
def getEnergy(h, v, w, a, b):
# a : bias when back passing : it is the visible bias
# b : bias when forward passing : it is the hidden bias
e0 = np.dot(np.matmul(v.T, w), h)
e1 = np.dot(a.T, v)
e2 = np.dot(b.T, h)
return -1.*(e0+e1+e2)
```

##### getMeanEnergyData
This function computes $<E>_{data}$ via:
$$
    <E>_{data} = \frac{1}{N}\sum_{n} <E(v_n,h)> = \frac{1}{N}\sum_{n} \left(\frac{\sum_h E(v_n,h)\cdot e^{-E(v_n,h)}}{\sum_h e^{-E(v_n,h)}}\right)
$$
where $N$ is the number of vectors being considered ($N=10^4$ in the case of the energy-per-epoch estimate, but it is equal to the batch size in case of the batch estimate).

```python
# w, a, b : model parameters
# v : input vectors
# k_start, k_end : range of input vectors to consider
def getMeanEnergyData(w, a, b, v, k_start=0, k_end=N, M=3):
    # getting all the h configurations
    hs = list(it.product((0,1), repeat=M))
    e = 0
    # checking to not overshoot the number of input vectors
    k_end_limit = k_end
    if k_end >= N:
        k_end_limit = N
    for k in range(k_start,k_end_limit,1):
        # foreach input vector v[k]
        e_num = 0
        e_den = 0
        for h in hs:
            en = getEnergy(h, v[k], w, a, b)
            boltz = np.exp(-1.0*en)
            e_num += (en*boltz)
            e_den += boltz
        # now we have the <E(v_n,h)>
        e += (e_num/e_den)
    return e/(k_end_limit-k_start)
```

##### getPartitionFunction
This function computes the partition function by considering all of the $L+M$ configurations.
```python
def getPartitionFunction(w, a, b, L=10, M=3):
    # generating the configurations
    confs = list(it.product((0,1), repeat=(L+M)))
    e_cfg = 0
    for cfg in confs:
        v = np.array(cfg[:L])
        h = np.array(cfg[L:])
        e_cfg += np.exp(-1.0*getEnergy(h, v, w, a, b))
    # now we have the sum of all terms, hence:
    return e_cfg
```


The following figure shows how the model's parameters are adjusted during the training procedure for a model with $M=3$, over $100$ epochs:
<div align="center">
    <img src="./ex_img/anim_m3_cd5.gif" width=600>
    Figure 2
</div>

### Point 3
Try RBMs with different numbers of hidden units: M=1, 2, 3 (done above), 4, 5, and 6.

The log likelihoods with $M=1,2,4,5,6$ are reported below:

<div align="center">
<table>
    <tr>
        <td><img src="./ex_img/loglike_cd5_m1.png" width=600></td>
        <td><img src="./ex_img/m1_cd5/img_100.png" width=400>
        <br>Weights after $100$ epochs for $CD=5, M=1$.
        </td>
    </tr>
    <tr>
        <td><img src="./ex_img/loglike_cd5_m2.png" width=600></td>
        <td><img src="./ex_img/m2_cd5/img_100.png" width=400>
        <br>Weights after $100$ epochs for $CD=5, M=2$.
        </td>
    </tr>
    <tr>
        <td><img src="./ex_img/loglike_cd5_m4.png" width=600></td>
        <td><img src="./ex_img/m4_cd5/img_100.png" width=400>
        <br>Weights after $100$ epochs for $CD=5, M=4$.
        </td>
    </tr>
    <tr>
        <td><img src="./ex_img/loglike_cd5_m5.png" width=600></td>
        <td><img src="./ex_img/m5_cd5/img_100.png" width=400>
        <br>Weights after $100$ epochs for $CD=5, M=5$.
        </td>
    </tr>
    <tr>
        <td><img src="./ex_img/loglike_cd5_m6.png" width=600></td>
        <td><img src="./ex_img/m6_cd5/img_100.png" width=400>
        <br>Weights after $100$ epochs for $CD=5, M=6$.
        </td>
    </tr>
</table>
    Figure 3
</div>

We observed that with increasing $M$ the log-likelihood tends to stabilize at lower values and also for $M > 4$ it exhibits serious fluctuations.

### Point 4
For $M=3$, plot $\mathcal{L}$ as a function of “t”, comparing the two contrastive divergence cases (n=1 and
n=5). Then, for n=1, plot $\mathcal{L}$ as a function of “t”, comparing the two cases with different M.

<div align="center">
    <img src="./ex_img/ll_m3_cd15.png" width=600>
    Figure 4
</div>

The result of the comparison is shown in figure 4. The log likelihoods for the two cases of contrastive divergence look almost the same, both when calculated for each epoch and each minibatch. We then compared, with $CD=1$, the case with $M=3$ with the other values of $M$.

In the plot below, the case with $M=3$ is always drawn in blue.

<div align="center">
    <table>
        <tr>
            <td><img src="./ex_img/ll_m1_m3_cd1.png" width=600></td>
        </tr>
        <tr>
            <td><img src="./ex_img/ll_m2_m3_cd1.png" width=600></td>
        </tr>
        <tr>
            <td><img src="./ex_img/ll_m4_m3_cd1.png" width=600></td>
        </tr>
        <tr>
            <td><img src="./ex_img/ll_m5_m3_cd1.png" width=600></td>
        </tr>
        <tr>
            <td><img src="./ex_img/ll_m6_m3_cd1.png" width=600></td>
        </tr>
    </table>
    Figure 5
</div>

### Point 5
From the weights learned by the RBM, guess the structure of the data.