# CPD Options

The cpd function has several options at disposal. Some of them may improve performance, precision or give insight about the tensor at hand. If you look at the source code, the cpd is defined this way:

>def cpd(T, r, energy=99.9, maxiter=200, cg_lower=2, cg_upper=10, tol=1e-6, init='smart_random', display='none', full_output=False):

We will see all these parameters now. Let's start importing the necessary modules and creating the same tensor of the previous notebook.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import TensorFox as tf

In [2]:
# Create and print the tensor.
T = np.zeros((2,2,2))
for i in range(0,2):
    for j in range(0,2):
        for k in range(0,2):
            T[i,j,k] = i+j+k
            
tf.showtens(T)

[[0. 1.]
 [1. 2.]]

[[1. 2.]
 [2. 3.]]



# The *Display* Option

There are three choices for the *display* option: 'none' (default), 'partial' and 'full'. These options controls what the user can see during the computations. Previously we let the default option and there were no output whatsoever. The 'partial' option shows useful information about the principal stages of the computation. The 'full' option shows everything the 'partial' option shows plus information about each iteration. 

In [3]:
# Compute the CPD of T with partial display.
r = 2
Lambda, X, Y, Z = tf.cpd(T, r, display='partial')

-------------------------------------------------------
Computing HOSVD of T
    No compression detected
-------------------------------------------------------
Computing truncation
    No truncation detected
-------------------------------------------------------
Type of initialization: smart random
    Initial guess relative error = 0.11883
-------------------------------------------------------
Computing truncated CPD of T
-------------------------------------------------------
Computing refinement of solution
Final results
    Number of steps = 39
    Relative error = 6.70808778737211e-05
    Accuracy =  99.99 %


In [4]:
# Compute the CPD of T with full display.
Lambda, X, Y, Z = tf.cpd(T, r, display='full')

-------------------------------------------------------
Computing HOSVD of T
    No compression detected
-------------------------------------------------------
Computing truncation
    No truncation detected
-------------------------------------------------------
Type of initialization: smart random
    Initial guess relative error = 0.11883
-------------------------------------------------------
Computing truncated CPD of T
    Iteration | Rel Error |  Damp  | #CG iterations 
        1     |  0.11733  |  2.0  |  3
        2     |  0.11613  |  2.0  |  5
        3     |  0.11407  |  1.0  |  5
        4     |  0.10997  |  0.5  |  6
        5     |  0.09863  |  0.25  |  6
        6     |  0.06893  |  0.125  |  6
        7     |  0.04131  |  0.0625  |  6
        8     |  0.02915  |  0.0625  |  6
        9     |  0.02262  |  0.0312  |  6
        10     |  0.01817  |  0.0156  |  6
        11     |  0.01440  |  0.00781  |  6
        12     |  0.01142  |  0.00391  |  6
        13     |  0.009

# Initialization

The iteration process needs a starting point for iterating. This starting point depends on the 'init' option, and there are three possible choices in this case: 'smart_random' (default), 'random', 'fixed'. The 'smart_random' option generates a random CPD of rank $r$ with a original strategy, which makes the starting point to have small relative error, so it is already close to the objective tensor. The 'random' option generates a CPD of rank $r$ with entries drawn by the Normal Distribution. The relative error in this case usually is close to $1$. Finally, there is the 'fixed' option, which generates always the same CPD for the same $r$ and the same dimensions. This is good if the user want to change the code and compare performance.

As we can see in the previous outputs, the initialization used was 'smart_random', and the relative error of this initialization and the objective tensor is of $0.11883$. We can see that the initialization is already very close to the objective tensor. With this we could achieve a CPD in $39$ steps, and the respective relative error is of $6.70809 \cdot 10^{-5}$ approximately. Let's see what we get from the other options.

In [5]:
# Compute the CPD of T with random initialization.
Lambda, X, Y, Z = tf.cpd(T, r, init='random', display='partial')

-------------------------------------------------------
Computing HOSVD of T
    No compression detected
-------------------------------------------------------
Computing truncation
    No truncation detected
-------------------------------------------------------
Type of initialization: random
    Initial guess relative error = 1.1368
-------------------------------------------------------
Computing truncated CPD of T
-------------------------------------------------------
Computing refinement of solution
Final results
    Number of steps = 39
    Relative error = 7.685957683455833e-05
    Accuracy =  99.99 %


In [6]:
# Compute the CPD of T with fixed initialization.
Lambda, X, Y, Z = tf.cpd(T, r, init='fixed', display='partial')

-------------------------------------------------------
Computing HOSVD of T
    No compression detected
-------------------------------------------------------
Computing truncation
    No truncation detected
-------------------------------------------------------
Type of initialization: fixed
    Initial guess relative error = 0.9998
-------------------------------------------------------
Computing truncated CPD of T
-------------------------------------------------------
Computing refinement of solution
Final results
    Number of steps = 28
    Relative error = 0.14631275295795318
    Accuracy =  85.37 %


# *Maxiter* and *Tol*

As the names suggest, 'maxiter' is the maximum number of iterations permitted, while 'tol' is the tolerance parameter, gives a stopping criterion to stop iterating. Both values are related in the sense we should increase 'maxiter' when we decrease 'tol'. Of course in this little example this might not matter, but for larger tensors we may want to increase precision by decreasing 'tol'. In this case the algorithm can reach the maximum number permitted of iterations, so we should increase 'maxiter' to keep iterating.

Let's decrease 'tol' and see if we get better approximations for the CPD. We will use 'tol' = 1e-10 and default initialization and display partial output.

In [7]:
# Compute the CPD of T with tol = 1e-10.
Lambda, X, Y, Z = tf.cpd(T, r, tol=1e-10, display='partial')

-------------------------------------------------------
Computing HOSVD of T
    No compression detected
-------------------------------------------------------
Computing truncation
    No truncation detected
-------------------------------------------------------
Type of initialization: smart random
    Initial guess relative error = 0.11883
-------------------------------------------------------
Computing truncated CPD of T
-------------------------------------------------------
Computing refinement of solution
Final results
    Number of steps = 39
    Relative error = 6.708088336228233e-05
    Accuracy =  99.99 %


We could decrease the relative error just a little much. This indicates that the default tolerance is already good enough for this problem. Remember that the previous error was of $6.70809 \cdot 10^{-5}$ and this one is of $6.70808 \cdot 10^{-5}$, so this is slightly better. 

Sometimes the tolerance parameter may not behave as expected. Decreasing this value makes the algorithm perform more iterations, but also make it follows a different path in the space of tensors. This path can be worse sometimes, and in this case the user could achieve worse results. This is just bad luck and in this case we can increase 'maxiter' or just repeat the computation (which will generate another initialization, maybe better).

# Energy

Consider a matrix $A \in \mathbb{R}^{m \times n}$ and its reduced SVD 

$$A = U \Sigma V^T = [U_1 \ldots U_n] \cdot \text{diag}(\sigma_1, \ldots, \sigma_n) \cdot [V_1 \ldots V_m]^T.$$ 

It is commom to truncate $\Sigma$ in order to obtain the *truncate SVD* of $A$ given by 

$$\tilde{A} = [U_1 \ldots U_p] \cdot \text{diag}(\sigma_1, \ldots, \sigma_p) \cdot [V_1 \ldots V_p]^T,$$
where $p < n$.

There are several application in this procedure we won't discuss here. We just want to mention that the sum $\sigma_1^2 + \ldots + \sigma_p^2$ is called the *energy* of $\tilde{A}$. The more energy the truncation has, more close to $A$ it is. On the other hand, less energy means more truncation, which means fewer dimensions to take in account, and this translate to less computational time. As you can see, there is a trade off between proximity and dimensionality. We want to truncate as much as possible, but keeping the truncation close enough to $A$. 

With the 'energy' parameter the user can impose the least energy permitted at the truncation stage. For example, with we set 'energy' = $95$, the program will search for the truncation with lowest energy bigger than $95$. In this context it means $95 \%$, i.e., the truncation retains $95 \%$ of the energy of the original tensor. 

This is valid whenever $1 \leq $ 'energy' $ < 100$. If the user chooses a value betwenn $0$ and $1$ (exclusive for both), then the program uses another strategy. It will search for a truncation with more than half of the original dimensions and with relative error smaller than 'energy'. This procedure starts with the smaller tensors, i.e., the program tries to truncate the most it can, provided that the relative error is smaller than 'energy'. The default is 'energy' = $99.9$.

We also have an advanced use for 'energy'. If the user knows what truncation to use, it can ba passed as a list with three numbers, the dimensions of the truncation.

Since the example showed here is too much simple, truncating it is just not possible. In the next sections we will see a problem which needs to be truncated in order to be computed in a reasonable time.

# Full Output

Apart from the CPD terms $\Lambda, X, Y, Z$, there are more information we can use from the computations performed by the function *cpd*. By default the option 'full_output' is False, but the user may set it to True and access the following information:


-  **T_approx** is the approximated tensor in coordinate format, as a multidimensional matrix $m \times n \times p$.


-  **rel_error** $\displaystyle = \frac{\|T - T_{approx}\|}{\|T\|}$


- **step_sizes_trunc** is the array with the sizes of the steps given at each iteration of the function *dGN* at the truncation stage.


- **step_sizes_refine** is the array with the sizes of the steps given at each iteration of the function *dGN* at the refinement stage.


- **errors_trunc** is the array with the sizes of the absolute errors obtained at each iteration of the function *dGN* at the truncation stage.


- **errors_refine** is the array with the sizes of the absolute errors obtained at each iteration of the function *dGN* at the refinement stage.


- **stop_trunc** is a number indicating what stopping condition made the *dGN* function stop iterating at the truncation stage.


- **stop_refine** is a number indicating what stopping condition made the *dGN* function stop iterating at the refinement stage.

# cg_lower and cg_upper

These parameters deal with a more mathematical piece of the algorithm. The number of iterations at each conjugate gradient call can affect the overall performance by a reasonable factor. Therefore it is important to find a balance to control the number of iterations without losing efficiency. Let $cg\_maxiter(i)$ the maximum number of iterations permitted for the conjugate gradient at the $i$-th Gauss-Newton iteration. After several tests, it was observed that it is very efficient to take $cg\_maxiter(i)$ as a random integer number in the interval

$$\left[ 2 + \lceil\sqrt{i}\rceil, \ \ 10 + i \right].$$

The numbers $2$ and $10$ may be changed by the user. More generally, we are considering the interval

$$\left[ cg\_lower + \lceil\sqrt{i}\rceil, \ \ cg\_upper + i \right].$$

The values $cg\_lower$ and $cg\_upper$ by default are $2$ and $10$, respectively.