# Alternating least squares for Canonical Polyadic (CP) Decomposition

```
Copyright 2022 National Technology & Engineering Solutions of Sandia,
LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the
U.S. Government retains certain rights in this software.
```

The function `cp_als` computes an estimate of the best rank-R CP model of a tensor X using the well-known alternating least-squares algorithm (see, e.g., Kolda and Bader, SIAM Review, 2009, for more information). The input X can be almost any type of tensor including a `tensor`, `sptensor`, `ktensor`, or `ttensor`. The output CP model is a `ktensor`.

In [119]:
import os
import sys
import pyttb as ttb
import numpy as np

## Load some data

### TODO: Choose whether to include sample datasets from MATLAB TTB
```
We use the well-known *amino acids data set* from Andersson and Bro. It contains fluorescence measurements of 5 samples containing 3 amino acids: Tryptophan, Tyrosine, and Phenylalanine. Each amino acid corresponds to a rank-one component. The tensor is of size 5 x 51 x 201 from 5 samples, 51 excitations, and 201 emissions. Further details can be found here: http://www.models.life.ku.dk/Amino_Acid_fluo. Please cite the following paper for this data: Rasmus Bro, PARAFAC: Tutorial and applications, Chemometrics and Intelligent Laboratory Systems, 1997, 38, 149-171. This dataset can be found in the `doc` directory.

We will just use random data for now.
```

In [120]:
# Pick the size and rank
R = 3
np.random.seed(0)  # Set seed for reproducibility
X = ttb.tensor(np.random.rand(6,8,10), shape=(6,8,10))

## Basic call to the method, specifying the data tensor and its rank

This uses a *random* initial guess. At each iteration, it reports the *fit* `f` which is defined as 
```
f = 1 - ( X.norm()**2 + M.norm()**2 - 2*<X,M> ) / X.norm()
``` 
and is loosely the proportion of the data described by the CP model, i.e., a fit of 1 is perfect.

In [121]:
# Compute a solution with final ktensor stored in M1
np.random.seed(0) # Set seed for reproducibility
short_tutorial = 10  # Cut off solve early for demo
M1 = ttb.cp_als(X, R, maxiters=short_tutorial)

CP_ALS:
 Iter 0: f = 5.184789e-01 f-delta = 5.2e-01
 Iter 1: f = 5.291340e-01 f-delta = 1.1e-02
 Iter 2: f = 5.314778e-01 f-delta = 2.3e-03
 Iter 3: f = 5.322637e-01 f-delta = 7.9e-04
 Iter 4: f = 5.329675e-01 f-delta = 7.0e-04
 Iter 5: f = 5.337616e-01 f-delta = 7.9e-04
 Iter 6: f = 5.346022e-01 f-delta = 8.4e-04
 Iter 7: f = 5.354130e-01 f-delta = 8.1e-04
 Iter 8: f = 5.361564e-01 f-delta = 7.4e-04
 Iter 9: f = 5.368456e-01 f-delta = 6.9e-04
 Final f = 5.368456e-01


Since we set only a single output, `M1` is actually a *tuple* containing:
1. `M1[0]`: the solution as a `ktensor`. 
2. `M1[1]`: the initial guess as a `ktensor` that was generated at runtime since no initial guess was provided. 
3. `M1[2]`: a dictionary containing runtime information with keys:
    * `params`: parameters used by `cp_als`
    * `iters`: number of iterations performed
    * `normresidual`: the norm of the residual `X.norm()**2 + M.norm()**2 - 2*<X,M>`
    * `fit`: the fit `f` described above

In [122]:
print(f"M1[2]['params']: {M1[2]['params']}")
print(f"M1[2]['iters']: {M1[2]['iters']}")
print(f"M1[2]['normresidual']: {M1[2]['normresidual']}")
print(f"M1[2]['fit']: {M1[2]['fit']}")

M1[2]['params']: (0.0001, 10, 1, [0, 1, 2])
M1[2]['iters']: 9
M1[2]['normresidual']: 5.866999501964479
M1[2]['fit']: 0.5368456335608366


## Run again with a different initial guess, output the initial guess.

In [123]:
np.random.seed(1) # Set seed for reproducibility
M2bad, Minit, _ = ttb.cp_als(X, R, maxiters=short_tutorial)

CP_ALS:
 Iter 0: f = 5.177965e-01 f-delta = 5.2e-01
 Iter 1: f = 5.349302e-01 f-delta = 1.7e-02
 Iter 2: f = 5.377023e-01 f-delta = 2.8e-03
 Iter 3: f = 5.384685e-01 f-delta = 7.7e-04
 Iter 4: f = 5.391323e-01 f-delta = 6.6e-04
 Iter 5: f = 5.398404e-01 f-delta = 7.1e-04
 Iter 6: f = 5.406120e-01 f-delta = 7.7e-04
 Iter 7: f = 5.414577e-01 f-delta = 8.5e-04
 Iter 8: f = 5.424087e-01 f-delta = 9.5e-04
 Iter 9: f = 5.434735e-01 f-delta = 1.1e-03
 Final f = 5.434735e-01


## Increase the maximium number of iterations
Note that the previous run kicked out at only 10 iterations, before reaching the specified convegence tolerance. Let's increase the maximum number of iterations and try again, using the same initial guess.

In [124]:
less_short_tutorial = 10*short_tutorial
M2 = ttb.cp_als(X, R, maxiters=less_short_tutorial, init=Minit)

CP_ALS:
 Iter 0: f = 5.177965e-01 f-delta = 5.2e-01
 Iter 1: f = 5.349302e-01 f-delta = 1.7e-02
 Iter 2: f = 5.377023e-01 f-delta = 2.8e-03
 Iter 3: f = 5.384685e-01 f-delta = 7.7e-04
 Iter 4: f = 5.391323e-01 f-delta = 6.6e-04
 Iter 5: f = 5.398404e-01 f-delta = 7.1e-04
 Iter 6: f = 5.406120e-01 f-delta = 7.7e-04
 Iter 7: f = 5.414577e-01 f-delta = 8.5e-04
 Iter 8: f = 5.424087e-01 f-delta = 9.5e-04
 Iter 9: f = 5.434735e-01 f-delta = 1.1e-03
 Iter 10: f = 5.445673e-01 f-delta = 1.1e-03
 Iter 11: f = 5.455431e-01 f-delta = 9.8e-04
 Iter 12: f = 5.463202e-01 f-delta = 7.8e-04
 Iter 13: f = 5.469128e-01 f-delta = 5.9e-04
 Iter 14: f = 5.473693e-01 f-delta = 4.6e-04
 Iter 15: f = 5.477353e-01 f-delta = 3.7e-04
 Iter 16: f = 5.480438e-01 f-delta = 3.1e-04
 Iter 17: f = 5.483164e-01 f-delta = 2.7e-04
 Iter 18: f = 5.485659e-01 f-delta = 2.5e-04
 Iter 19: f = 5.487997e-01 f-delta = 2.3e-04
 Iter 20: f = 5.490210e-01 f-delta = 2.2e-04
 Iter 21: f = 5.492315e-01 f-delta = 2.1e-04
 Iter 22: f 

## Compare the two solutions
Use the `ktensor` `score()` member function to compare the two solutions. A score of 1 indicates a perfect match.

In [125]:
M1_ktns = M1[0]
M2_ktns = M2[0]
score = M1_ktns.score(M2_ktns)

Here, `score()` returned a tuple `score` with the score as the first element:

In [126]:
score[0]

0.2315335558892375

See the `ktensor` documentation for more information about the return values of `score()`.

# Rerun with same initial guess
Using the same initial guess (and all other parameters) gives the exact same solution.

In [127]:
M2alt = ttb.cp_als(X, R, maxiters=less_short_tutorial, init=Minit)
M2alt_ktns = M2alt[0]
score = M2_ktns.score(M2alt_ktns) # Score of 1 indicates the same solution
print(f"Score: {score[0]}.")

CP_ALS:
 Iter 0: f = 5.177965e-01 f-delta = 5.2e-01
 Iter 1: f = 5.349302e-01 f-delta = 1.7e-02
 Iter 2: f = 5.377023e-01 f-delta = 2.8e-03
 Iter 3: f = 5.384685e-01 f-delta = 7.7e-04
 Iter 4: f = 5.391323e-01 f-delta = 6.6e-04
 Iter 5: f = 5.398404e-01 f-delta = 7.1e-04
 Iter 6: f = 5.406120e-01 f-delta = 7.7e-04
 Iter 7: f = 5.414577e-01 f-delta = 8.5e-04
 Iter 8: f = 5.424087e-01 f-delta = 9.5e-04
 Iter 9: f = 5.434735e-01 f-delta = 1.1e-03
 Iter 10: f = 5.445673e-01 f-delta = 1.1e-03
 Iter 11: f = 5.455431e-01 f-delta = 9.8e-04
 Iter 12: f = 5.463202e-01 f-delta = 7.8e-04
 Iter 13: f = 5.469128e-01 f-delta = 5.9e-04
 Iter 14: f = 5.473693e-01 f-delta = 4.6e-04
 Iter 15: f = 5.477353e-01 f-delta = 3.7e-04
 Iter 16: f = 5.480438e-01 f-delta = 3.1e-04
 Iter 17: f = 5.483164e-01 f-delta = 2.7e-04
 Iter 18: f = 5.485659e-01 f-delta = 2.5e-04
 Iter 19: f = 5.487997e-01 f-delta = 2.3e-04
 Iter 20: f = 5.490210e-01 f-delta = 2.2e-04
 Iter 21: f = 5.492315e-01 f-delta = 2.1e-04
 Iter 22: f 

## Changing the output frequency
Using the `printitn` option to change the output frequency.

In [128]:
M2alt2 = ttb.cp_als(X, R, maxiters=less_short_tutorial, init=Minit, printitn=20)

CP_ALS:
 Iter 0: f = 5.177965e-01 f-delta = 5.2e-01
 Iter 20: f = 5.490210e-01 f-delta = 2.2e-04
 Iter 40: f = 5.528948e-01 f-delta = 3.1e-04
 Iter 60: f = 5.570314e-01 f-delta = 9.3e-05
 Final f = 5.570314e-01


## Suppress all output
Set `printitn` to zero to suppress all output.

In [130]:
# TODO this will pass when issue #235 is resolved
M2alt2 = ttb.cp_als(X, R, printitn=0) # No output

ZeroDivisionError: integer division or modulo by zero

## Use HOSVD initial guess
Use the `'nvecs'` option to use the leading mode-n singular vectors as the initial guess.

In [131]:
M3 = ttb.cp_als(X, R, init='nvecs', printitn=20)
s = M2[0].score(M3[0])
print(f"score(M2,M3) = {s[0]}")


CP_ALS:
 Iter 0: f = 5.380727e-01 f-delta = 5.4e-01
 Iter 9: f = 5.533586e-01 f-delta = 7.4e-05
 Final f = 5.533586e-01
score(M2,M3) = 0.5193080815612473


# Change the order of the dimensions in CP

In [132]:
M4, _, info = ttb.cp_als(X,3,dimorder=[1,2,0], init='nvecs', printitn=20)
s = M2[0].score(M4)
print(f"score(M2,M4) = {s[0]}")

CP_ALS:
 Iter 0: f = 5.435056e-01 f-delta = 5.4e-01
 Iter 10: f = 5.562656e-01 f-delta = 7.8e-05
 Final f = 5.562656e-01
score(M2,M4) = 0.4148247168226822


In the last example, we also collected the third output argument `info` which has runtime information in it. The field `info['iters']` has the total number of iterations. The field `info.['params']` has the information used to run the method. Unless the initialization method is 'random', passing the parameters back to the method will yield the exact same results.

In [133]:
# TODO this will pass when issue #236 is resolved
M4alt, _, info = ttb.cp_als(X,3,**info['params'])
s = M4alt.score(M4)
print(f"score(M4alt,M4) = {s[0]}")

TypeError: pyttb.cp_als.cp_als() argument after ** must be a mapping, not tuple

## Change the tolerance
It's also possible to loosen or tighten the tolerance on the change in the fit. You may need to increase the number of iterations for it to converge.

In [134]:
M5 = ttb.cp_als(X, 3, init='nvecs', stoptol=1e-12, printitn=100)

CP_ALS:
 Iter 0: f = 5.380727e-01 f-delta = 5.4e-01
 Iter 100: f = 5.535944e-01 f-delta = 8.6e-09
 Iter 200: f = 5.535946e-01 f-delta = 1.7e-11
 Iter 246: f = 5.535946e-01 f-delta = 9.8e-13
 Final f = 5.535946e-01


## Control sign ambiguity of factor matrices
The default behavior of `cp_als` is to make a call to `fixsigns()` to fix the sign ambiguity of the factor matrices. You can turn off this behavior by passing the `fixsigns` parameter value of `False` when calling `cp_als`.

In [143]:
X = ttb.ktensor(factor_matrices=[np.array([[1.,1.],[1.,-10.]]),np.array([[1.,1.],[1.,-10.]])], weights=np.array([1.,1.]))
M1 = ttb.cp_als(X, 2, printitn=1, init=ttb.ktensor(X.factor_matrices))
print(M1[0]) # default behavior, fixsigns called
M2 = ttb.cp_als(X, 2, printitn=1, init=ttb.ktensor(X.factor_matrices), fixsigns=False)
print(M2[0]) # fixsigns not called

CP_ALS:
 Iter 0: f = 1.000000e+00 f-delta = 1.0e+00
 Iter 1: f = 1.000000e+00 f-delta = 0.0e+00
 Final f = 1.000000e+00
ktensor of shape (2, 2)
weights=[101.   2.]
factor_matrices[0] =
[[-0.09950372  0.70710678]
 [ 0.99503719  0.70710678]]
factor_matrices[1] =
[[-0.09950372  0.70710678]
 [ 0.99503719  0.70710678]]
CP_ALS:
 Iter 0: f = 1.000000e+00 f-delta = 1.0e+00
 Iter 1: f = 1.000000e+00 f-delta = 0.0e+00
 Final f = 1.000000e+00
ktensor of shape (2, 2)
weights=[101.   2.]
factor_matrices[0] =
[[ 0.09950372  0.70710678]
 [-0.99503719  0.70710678]]
factor_matrices[1] =
[[ 0.09950372  0.70710678]
 [-0.99503719  0.70710678]]


## Recommendations
* Run multiple times with different guesses and select the solution with the best fit.
* Try different ranks and choose the solution that is the best descriptor for your data based on the combination of the fit and the interpretaton of the factors, e.g., by visualizing the results.