# CPD Options

The cpd function has several options at disposal. Some of them may improve performance, precision or give insight about the tensor at hand. If you look at the source code, the cpd is defined this way:

>def cpd(T, r, energy=0.05, maxiter=200, tol=1e-4, init='smart_random', display='none'):

We will see all these parameters now. Let's start importing the necessary modules and creating the same tensor of the previous notebook.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import TensorFox as tf

In [2]:
# Create and print the tensor.
T = np.zeros((2,2,2))
for i in range(0,2):
    for j in range(0,2):
        for k in range(0,2):
            T[i,j,k] = i+j+k
            
tf.showtens(T)

[[0. 1.]
 [1. 2.]]

[[1. 2.]
 [2. 3.]]



# The *Display* Option

There are three choices for the *display* option: 'none' (default), 'partial' and 'full'. These options controls what the user can see during the computations. Previously we let the default option and there were no output whatsoever. The 'partial' option shows useful information about the principal stages of the computation. The 'full' option shows everything the 'partial' option shows plus information about each iteration. 

In [3]:
# Compute the CPD of T with partial display.
r = 2
Lambda, X, Y, Z, T_approx, rel_err, step_sizes_trunc, step_sizes_ref, errors_trunc, errors_ref = tf.cpd(T, r, display='partial')

------------------------------------------------------------------------------
Starting computation of the HOSVD of T.
------------------------------------------------------------------------------
No compression detected.
------------------------------------------------------------------------------
Starting truncation.
No truncation detected.
------------------------------------------------------------------------------
Initialization: smart random
Relative error of initial guess = 0.11882705024255454
------------------------------------------------------------------------------
Starting damped Gauss-Newton method.
------------------------------------------------------------------------------
Starting refinement.
------------------------------------------------------------------------------
Number of steps = 46
Final Relative error = 2.2288621390821184e-05


In [4]:
# Compute the CPD of T with full display.
Lambda, X, Y, Z, T_approx, rel_err, step_sizes_trunc, step_sizes_ref, errors_trunc, errors_ref = tf.cpd(T, r, display='full')

------------------------------------------------------------------------------
Starting computation of the HOSVD of T.
------------------------------------------------------------------------------
No compression detected.
------------------------------------------------------------------------------
Starting truncation.
No truncation detected.
------------------------------------------------------------------------------
Initialization: smart random
Relative error of initial guess = 0.11882705024255454
------------------------------------------------------------------------------
Starting damped Gauss-Newton method.
Iteration | Step Size | Rel Error | Line Search
    1     |  3.35168  |  0.11176  |  Success
    2     |  0.05219  |  0.10959  |  Success
    3     |  0.07330  |  0.10702  |  Success
    4     |  0.41478  |  0.08993  |  Success
    5     |  0.45392  |  0.04825  |  Success
    6     |  0.61860  |  0.05210  |  Fail
    7     |  0.12039  |  0.01372  |  Success
    8     |  0.

# Initialization

The iteration process needs a starting point for iterating. This starting point depends on the 'init' option, and there are three possible choices in this case: 'smart_random' (default), 'random', 'fixed'. The 'smart_random' option generates a random CPD of rank $r$ with a original strategy, which makes the starting point to have small relative error, so it is already close to the objective tensor. The 'random' option generates a CPD of rank $r$ with entries drawn by the Normal Distribution. The relative error in this case usually is close to $1$. Finally, there is the 'fixed' option, which generates always the same CPD for the same $r$ and the same dimensions. This is good if the user want to change the code and compare performance.

As we can see in the previous outputs, the initialization used was 'smart_random', and the relative error obtained was $0.1188$ approximtely. With this we could achieve a CPD in $46$ steps, and the respective relative error is of $2.2288 \cdot 10^{-5}$ approximately. Let's see what we get from the other options.

In [5]:
# Compute the CPD of T with random initialization.
Lambda, X, Y, Z, T_approx, rel_err, step_sizes_trunc, step_sizes_ref, errors_trunc, errors_ref = tf.cpd(T, r, init='random', display='partial')

------------------------------------------------------------------------------
Starting computation of the HOSVD of T.
------------------------------------------------------------------------------
No compression detected.
------------------------------------------------------------------------------
Starting truncation.
No truncation detected.
------------------------------------------------------------------------------
Initialization: random
Relative error of initial guess = 1.098724120998715
------------------------------------------------------------------------------
Starting damped Gauss-Newton method.
------------------------------------------------------------------------------
Starting refinement.
------------------------------------------------------------------------------
Number of steps = 44
Final Relative error = 2.0570294223419894e-05


In [6]:
# Compute the CPD of T with fixed initialization.
Lambda, X, Y, Z, T_approx, rel_err, step_sizes_trunc, step_sizes_ref, errors_trunc, errors_ref = tf.cpd(T, r, init='fixed', display='partial')

------------------------------------------------------------------------------
Starting computation of the HOSVD of T.
------------------------------------------------------------------------------
No compression detected.
------------------------------------------------------------------------------
Starting truncation.
No truncation detected.
------------------------------------------------------------------------------
Initialization: fixed
Relative error of initial guess = 1.000170243379101
------------------------------------------------------------------------------
Starting damped Gauss-Newton method.
------------------------------------------------------------------------------
Starting refinement.
------------------------------------------------------------------------------
Number of steps = 44
Final Relative error = 2.4940541716982312e-05


# *Maxiter* and *Tol*

As the names suggest, 'maxiter' is the maximum number of iterations permitted, while 'tol' is the tolerance parameter, gives a stopping criterion to stop iterating. Both values are related in the sense we should increase 'maxiter' when we decrease 'tol'. Of course in this little example this might not matter, but for larger tensors we may want to increase precision by decreasing 'tol'. In this case the algorithm can reach the maximum number permitted of iterations, so we should increase 'maxiter' to keep iterating.

Let's decrease 'tol' and see if we get better approximations for the CPD. We will use 'tol' = 1e-10 and default initialization and display partial output.

In [7]:
# Compute the CPD of T with tol = 1e-10.
Lambda, X, Y, Z, T_approx, rel_err, step_sizes_trunc, step_sizes_ref, errors_trunc, errors_ref = tf.cpd(T, r, tol=1e-10, display='partial')

------------------------------------------------------------------------------
Starting computation of the HOSVD of T.
------------------------------------------------------------------------------
No compression detected.
------------------------------------------------------------------------------
Starting truncation.
No truncation detected.
------------------------------------------------------------------------------
Initialization: smart random
Relative error of initial guess = 0.11882705024255454
------------------------------------------------------------------------------
Starting damped Gauss-Newton method.
------------------------------------------------------------------------------
Starting refinement.
------------------------------------------------------------------------------
Number of steps = 52
Final Relative error = 1.3761033160356665e-05


With just more six iterations we could decrease the relative error nearly by a half. Remember that the previous error was of $2.2288 \cdot 10^{-5}$. 

Sometimes the tolerance parameter may not behave as expected. Decreasing this value makes the algorithm perform more iterations, but also make it follows a different path in the space of tensors. This path can be worse sometimes, and in this case the user could achieve worse results. This is just bad luck and in this case we can increase 'maxiter' or just repeat the computation (which will generate another initialization, maybe better).

# Energy

Consider a matrix $A \in \mathbb{R}^{m \times n}$ and its reduced SVD 

$$A = U \Sigma V^T = [U_1 \ldots U_n] \cdot \text{diag}(\sigma_1, \ldots, \sigma_n) \cdot [V_1 \ldots V_m]^T.$$ 

It is commom to truncate $\Sigma$ in order to obtain the *truncate SVD* of $A$ given by 

$$\tilde{A} = [U_1 \ldots U_p] \cdot \text{diag}(\sigma_1, \ldots, \sigma_p) \cdot [V_1 \ldots V_p]^T,$$
where $p < n$.

There are several application in this procedure we won't discuss here. We just want to mention that the sum $\sigma_1^2 + \ldots + \sigma_p^2$ is called the *energy* of $\tilde{A}$. The more energy the truncation has, more close to $A$ it is. On the other hand, less energy means more truncation, which means fewer dimensions to take in account, and this translate to less computational time. As you can see, there is a trade off between proximity and dimensionality. We want to truncate as much as possible, but keeping the truncation close enough to $A$. 

With the 'energy' parameter the user can impose the least energy permitted at the truncation stage. For example, with we set 'energy' = $95$, the program will search for the truncation with lowest energy bigger than $95$. In this context it means $95 \%$, i.e., the truncation retains $95 \%$ of the energy of the original tensor. 

This is valid whenever $1 \leq $ 'energy' $ < 100$. If the user chooses a value betwenn $0$ and $1$ (exclusive for both), then the program uses another strategy. It will search for a truncation with more than half of the original dimensions and with relative error smaller than 'energy'. This procedure starts with the smaller tensors, i.e., the program tries to truncate the most it can, provided that the relative error is smaller than 'energy'. This second approach performed better in practice, so the default is 'energy' = $0.05$.

We also have an advanced use for 'energy'. If the user knows what truncation to use, it can ba passed as a list with three numbers, the dimensions of the truncation.

Since the example showed here is too much simple, truncating it is just not possible. In the next sections we will see a problem which needs to be truncated in order to be computed in a reasonable time.