---
---
# Born Machine through MPS
## Algorithm
---

#### Legend:
* <font color='blue'>blue</font> means there is still some doubts about the procedure.
* <font color='green'>green</font> means there are some details further discussed about the argument.


***Goal :*** Obtain a wavefunction ${\psi}$ expressed through a MPS Network so that its probability distribution
$$ P(v) = \frac{|\psi(v)|^2}{Z}; \qquad Z = \sum_{v\in V} |\psi(v)|^2 $$
Resembles the latent probability distribution of the data it is trained on.

The above task is done by minimizing the Negative Log-Likelihood.\
<font color='green'>Minimizing the NLL is equivalent of minimizing the Kullback-Leibler Divergence.</font>

1. ***Initialize*** MPS <font color='green'>randomly</font>;

2. ***<font color='green'>Canonicalize</font>*** the Tensor Network;

3. At each step:

    ***Compute the derivative***
    
    3.1 <font color='green'>Merge two adiacents tensors</font> into a rank-4 tensor:
    
    <img src="./imgs/algorithm_merge.svg">
    
    ***Update Network***
    
    3.2 <font color='blue'>Update the merged tensor</font>  $A^{k,k+1}$
    
    3.3 Unfold the merged rank-4 tensor through SVD, <font color='green'>keeping mixed canonicalization of the network</font>

***Generation of Samples***

4. <font color='blue'>idk</font>

***Additional Tasks***

5. <font color='blue'>Reconstruction task</font>

6. <font color='blue'>Can we denoise images?</font>

### <font color='blue'>Still Unanswered</font>

##### ***3.2 How does the update work exactly?***


### <font color='green'>Further Details</font>

##### ***1 Do we need to apply some sort of normalization?***
Most probably it does not matter and updating the TN will change normalization of the network anyway. Although maybe the normalization of the Tensor Network has the same impact of _Weights initialization_ in Neural Networks

##### ***2 Canonicalization***
See canonicalization.ipynb

##### ***3.1 Why should we do the merge?***
(From main paper, last paragraph of page 12)

Most probably to keep the network canonicalized after the updates in learning in a smart way, let me explain:
* MPS is mixed canonicalize around $A^{k}$
* Merge tensors $A^{k}$ and $A^{k+1}$ into a Rank-4 Tensor $A^{k,k+1}$

<img src="./imgs/algorithm_mixedca.svg">

* Update components of tensor $A^{k,k+1}$

* Unfold $A^{k,k+1}$ so that $A^k = S$ and $A^{k+1}=VD$ in SVD algorithm

* Merge $A^{k+1}$ with $A^{k+2}$, MPS network is now canonicalized around the tensor we want to do the update on

##### ***3.3 After updating and unfolding, one of the two tensors is probably not orthogonal anymore, should we apply canonicalization again?***
Yes, See <font color='green'>3.1</font>

##### ***Minimization of KL-Divergence***
Suppose:

$P(x|\vartheta^*)$ being the true distribution (that we want to learn)

$P(x|\vartheta)$ being our estimate

By the definition of Kullback-Leibler Divergence:
$$D_{KL}\left[P(x|\vartheta^*)||P(x|\vartheta)\right] := E_{x\sim P(x|\vartheta^*)}\left[\frac{P(x|\vartheta^*)}{P(x|\vartheta)}\right]$$
Applying the properties of logarithms:
$$E_{x\sim P(x|\vartheta^*)}\left[\frac{P(x|\vartheta^*)}{P(x|\vartheta)}\right] = E_{x\sim P(x|\vartheta^*)}\left[{P(x|\vartheta^*)}-{P(x|\vartheta)}\right]$$
Applying the property of the expected value function:
$$E_{x\sim P(x|\vartheta^*)}\left[{P(x|\vartheta^*)}-{P(x|\vartheta)}\right] = E_{x\sim P(x|\vartheta^*)}\left[P(x|\vartheta^*)\right] - E_{x\sim P(x|\vartheta)}\left[P(x|\vartheta^*)\right]$$
Hence:
$$D_{KL}\left[P(x|\vartheta^*)||P(x|\vartheta)\right] = E_{x\sim P(x|\vartheta^*)}\left[P(x|\vartheta^*)\right] - E_{x\sim P(x|\vartheta)}\left[P(x|\vartheta^*)\right]$$

Considering just the second term
$$ - E_{x\sim P(x|\vartheta)}\left[P(x|\vartheta^*)\right] \approx -\frac{1}{N}\sum_{i}^{N}\log P(x_i|\vartheta)  \propto \mathcal{L} $$

Hence: 
$$D_{KL}\left[P(x|\vartheta^*)||P(x|\vartheta)\right] = E_{x\sim P(x|\vartheta^*)}\left[P(x|\vartheta^*)\right] - \mathcal{L}(\vartheta)$$


(Ez, we can reference also the [paper](https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-22/issue-1/On-Information-and-Sufficiency/10.1214/aoms/1177729694.full) from Kullback and Leibler to be fancy)