# Ecosystem of a Tensor: N-dimensional arrays, their descriptions and meta infromation
### Last modification (08.06.2018).


**Note:** this tutorial assumes that you are familiar with the notion of N-dimensional arrays and their efficient representaitons. The related material can be found in out previous tutorials: [tutorial_1](https://github.com/hottbox/hottbox-tutorials/blob/master/1_N-dimensional_arrays_and_Tensor_class.ipynb) and [tutorial_2](https://github.com/hottbox/hottbox-tutorials/blob/master/2_Efficient_representations_of_tensors.ipynb).


**Requirements:** ``hottbox==0.1.3``

**Authors:** 
Ilya Kisil (ilyakisil@gmail.com); 

In [1]:
import numpy as np
from hottbox.core import Tensor

In [2]:
def show_meta_information(tensor, data=True, shapes=True, modes=True, state=True):
    """ Quick util for showing relevant information for this tutorial
    
    Parameters
    ----------
    tensor : Tensor
    data : bool
        If True, show data array
    shapes : bool
        If True, show current shape and normal shape
    modes : bool
        If True, show mode information
    state : bool    
        If True, show state information
    """
    print(tensor)
    
    if data:
        print("\n\tThe underlying data array is:")
        print(tensor.data)
    
    if shapes:
        print("\n\tIs this tensor in normal state: {}".format(tensor.in_normal_state))
        print("Current shape of the data array: {}".format(tensor.shape))
        print("Normal shape of the data array: {}".format(tensor.ft_shape))
    
    if modes:
        print("\n\tInformation about its modes:")
        for i, tensor_mode in enumerate(tensor.modes):
            print("#{}: {}".format(i, tensor_mode))

    if state:
        print("\n\tInformation about its current state:")    
        tensor.show_state()
        
def print_sep_line():
    print("\n==========================="
          "============================="
          "===========================\n")

Recall tha the collected raw data in form of N-dimensional array represents different characteristics. Here are couple of examples:

![different_tensors](./images/different-tensors.png)

N-dimensional arrays of data can be represented in various different forms. By applying numerical methods (algorithms for tensor decompositions) to the raw data we can obtain, for example, Kruskal or Tucker representation. At the same time, simple data rearrangement procedures (e.g. folding, unfolding) of the raw data also yields different representation.

![different_representations](./images/different-forms-of-data.png)

Each dimension of an N-dimensional array is associated with a certain property, **mode**, of the raw data. At the same time, this characterisc is described by certain features. The relation between these properties defines **state** of this N-dimensional array. In other words, modes and state could be seen as the meta information about the tensor.

**Mode** of the tensor is defined by name of the property it represents and features that describe this property.

**State** of the tensor is defined by transformations applied to the data array. 

**Normal state** of the tensor is such state of the tensor when the underlying raw data array is in its original form. This means that it has not been folded, unfolded or rotated.

Thus, the tensor is described by two different shapes: 
1. Shape of the data array in the current state of the tensor
2. Normal shape (full shape) - shape of the data array in the normal state.

Each transformation can be characterised by the mode order and type of reshaping. This information is enough in order to be able to revert applied transformation of the data array.

Transformations such as folding or unfolding does not change the original properties of the underlying data array, but they change relashionship between these properties.

![data_modes_state](./images/data-modes-state.png)

By default, an object of **Tensor** class is created in **normal state** with generic mode names that describe properties of dimensions of data array.

In [3]:
data_array = np.arange(24).reshape(2, 3, 4)

tensor = Tensor(data_array)

show_meta_information(tensor)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Is this tensor in normal state: True
Current shape of the data array: (2, 3, 4)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))


## Meta information after applying data transformations

Next, we will show changes in the meta information of the tensor when different transformations are applied to it. 

**Note:** at the moment, only one data transformation can be applied at the time. This will be generalised in a future releases of **hottbox** and will be outlined in the [CHANGELOG](https://github.com/hottbox/hottbox/blob/master/CHANGELOG.md).

### Unfolding of the data

In [4]:
tensor.unfold(mode=1)

show_meta_information(tensor)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3 12 13 14 15]
 [ 4  5  6  7 16 17 18 19]
 [ 8  9 10 11 20 21 22 23]]

	Is this tensor in normal state: False
Current shape of the data array: (3, 8)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='T', mode_order=([1], [0, 2]))


### Folding of the data

In [5]:
tensor.fold()

show_meta_information(tensor)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Is this tensor in normal state: True
Current shape of the data array: (2, 3, 4)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))


### Vectorisation of the data

In [6]:
tensor.vectorise()

show_meta_information(tensor)

This tensor is of order 1 and consists of 24 elements.
Sizes and names of its modes are (24,) and ['mode-0_mode-1_mode-2'] respectively.

	The underlying data array is:
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

	Is this tensor in normal state: False
Current shape of the data array: (24,)
Normal shape of the data array: (2, 3, 4)

	Information about its modes:
#0: Mode(name='mode-0', index=None)
#1: Mode(name='mode-1', index=None)
#2: Mode(name='mode-2', index=None)

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='T', mode_order=([0, 1, 2],))


As wee can see, the applied transformations rearrange values of the underlying data array. Also they change relations between mode names and modifies state of the tensor. However, the normal shape, information about original modes remains the same.

## Different reshaping convensions

In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory. For example, for the array
$$
\mathbf{A} = 
\begin{bmatrix}
 a_{11} & a_{12} & a_{13}\\ 
 a_{21} & a_{22} & a_{23} 
\end{bmatrix}
$$
the two possible ways are:

![data_ordering](./images/C_Fortran_ordering.png)

Therefore, there are several conventions for reshaping (unfolding/folding/vectorising) data.
Both of them are available in the **hottbox**. They produce arrays of the same shape, but with values being permuted. The state of the tensor memorises which convention has been applied and will use it for reverting the applied transformation.

### Row and column major unfolding

In [7]:
data_array = np.arange(24).reshape(2, 3, 4)

tensor_1 = Tensor(data_array)
tensor_2 = Tensor(data_array)

tensor_1.unfold(mode=1, rtype="T")
tensor_2.unfold(mode=1, rtype="K")

print("\tRow-major unfolding")
show_meta_information(tensor_1, shapes=False, modes=False)

print_sep_line()

print("\tColumn-major unfolding")
show_meta_information(tensor_2, shapes=False, modes=False)

	Row-major unfolding
This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3 12 13 14 15]
 [ 4  5  6  7 16 17 18 19]
 [ 8  9 10 11 20 21 22 23]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='T', mode_order=([1], [0, 2]))


	Column-major unfolding
This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0 12  1 13  2 14  3 15]
 [ 4 16  5 17  6 18  7 19]
 [ 8 20  9 21 10 22 11 23]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='K', mode_order=([1], [0, 2]))


### Row and column major folding

In [8]:
tensor_1.fold()
tensor_2.fold()
print("\tReverting Row-major unfolding")
show_meta_information(tensor_1, shapes=False, modes=False)

print_sep_line()

print("\tReverting Column-major unfolding")
show_meta_information(tensor_2, shapes=False, modes=False)

	Reverting Row-major unfolding
This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))


	Reverting Column-major unfolding
This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

	Information about its current state:
State(normal_shape=(2, 3, 4), rtype='Init', mode_order=([0], [1], [2]))


As we can see, the different approaches to reshaping uderlying data affect only the data array itself, whereas other properties remain the same. Similarly to the ufolding along different mode, the **state** of the tensor keeps track of this transformation as well. 

**Note:** the same type of unfolding and folding should be applied to the data array, in order not to mix up the values that describe different properties of the tensor. But don't worry about it, since this is handled automatically under the hood.

## Creating Tensor with custom meta information

The **state** and list of **mode** are created at the initialisation of the **Tensor** object: 
1. **State** of the tensor is created. By default, this step assumes that data is passed in normal shape (was not folded or unfolded before).
2. List of **modes** is created based on **state**. By default, it extracts from **state** the number of modes to be created and assigns default names to each of them.

The **hottbox** provides flexibility for this procedure. The **Tensor** can be created with cutom names for the modes and in state that is not inferred (defined) from  the provided data. 

If both customisation are passed to the **Tensor** constructor, the the list of mode names is dependent on the provided state. If only mode names are provided then its length should be consistent witht the number of dimensions of the data array.

Defining a custom state is little bit more trickier, but there is nothing to be scared of. Because **state** and **modes** are crucial parts of **Tensor** ecosystem. Even though there is quit a bit of input validation involded, which will point you to the right direction in case something was not specified correctly, custom state should be specified with caution.

**Note:** The usefullness of the custom mode names is not fully exploited in **hottbox** at the moment, but we work on that.

In [9]:
I, J, K = 2, 3, 4

# Provied with 3D array
data_3d = np.arange(I*J*K).reshape(I, J, K)

# Provied with 3D array that had been unfoled
data_2d = np.arange(I*J*K).reshape(I, (J*K))

### Custom mode names

In [10]:
tensor_1 = Tensor(data_3d, mode_names=["Frequency", "Time", "Subject"])

show_meta_information(tensor_1, data=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['Frequency', 'Time', 'Subject'] respectively.

	Information about its modes:
#0: Mode(name='Frequency', index=None)
#1: Mode(name='Time', index=None)
#2: Mode(name='Subject', index=None)


### Custom state: different mode order

In [11]:
custom_state_1 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )
custom_state_2 = dict(mode_order=([1], [0, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )

tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]


This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-1', 'mode-0_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]


In [12]:
tensor_1.fold()
tensor_2.fold()

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 8  9 10 11]
  [16 17 18 19]]

 [[ 4  5  6  7]
  [12 13 14 15]
  [20 21 22 23]]]


**Note:** this example is for illustration purposes only, since it does not follow true unfolding/folding expressions that is:

```python
unfolded_along = mode_order[0][0]
data_2d.shape[0] != normal_shape[unfolded_along]
```

### Custom state: different reshaping type

In [13]:
custom_state_1 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )
custom_state_2 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="K"
                     )

tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]


This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]


In [14]:
tensor_1.fold()
tensor_2.fold()

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  3  6  9]
  [ 1  4  7 10]
  [ 2  5  8 11]]

 [[12 15 18 21]
  [13 16 19 22]
  [14 17 20 23]]]


### Custom state: different normal shape

In [15]:
custom_state_1 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 3, 4),
                      rtype="T"
                     )
custom_state_2 = dict(mode_order=([0], [1, 2]),
                      normal_shape=(2, 4, 3),
                      rtype="T"
                     )

tensor_1 = Tensor(data_2d, custom_state=custom_state_1)
tensor_2 = Tensor(data_2d, custom_state=custom_state_2)

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]


This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (2, 12) and ['mode-0', 'mode-1_mode-2'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7  8  9 10 11]
 [12 13 14 15 16 17 18 19 20 21 22 23]]


In [16]:
tensor_1.fold()
tensor_2.fold()

show_meta_information(tensor_1, modes=False, shapes=False, state=False)

print_sep_line()

show_meta_information(tensor_2, modes=False, shapes=False, state=False)

This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 3, 4) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (2, 4, 3) and ['mode-0', 'mode-1', 'mode-2'] respectively.

	The underlying data array is:
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]]

 [[12 13 14]
  [15 16 17]
  [18 19 20]
  [21 22 23]]]


### Custom state and mode names

In [17]:
I, J, K = 2, 3, 4
data_2d = np.arange(I*J*K).reshape(J, (I*K))

custom_state = dict(mode_order=([1], [0, 2]),
                    normal_shape=(3, 2, 4),
                    rtype="T"
                   )
tensor_1 = Tensor(data_2d, custom_state, mode_names=["Frequency", "Time", "Subject"])
show_meta_information(tensor_1, shapes=False)

print_sep_line()

tensor_1.fold()
show_meta_information(tensor_1, shapes=False)

This tensor is of order 2 and consists of 24 elements.
Sizes and names of its modes are (3, 8) and ['Time', 'Frequency_Subject'] respectively.

	The underlying data array is:
[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]

	Information about its modes:
#0: Mode(name='Frequency', index=None)
#1: Mode(name='Time', index=None)
#2: Mode(name='Subject', index=None)

	Information about its current state:
State(normal_shape=(3, 2, 4), rtype='T', mode_order=([1], [0, 2]))


This tensor is of order 3 and consists of 24 elements.
Sizes and names of its modes are (3, 2, 4) and ['Frequency', 'Time', 'Subject'] respectively.

	The underlying data array is:
[[[ 0  1  2  3]
  [12 13 14 15]]

 [[ 4  5  6  7]
  [16 17 18 19]]

 [[ 8  9 10 11]
  [20 21 22 23]]]

	Information about its modes:
#0: Mode(name='Frequency', index=None)
#1: Mode(name='Time', index=None)
#2: Mode(name='Subject', index=None)

	Information about its current state:
State(normal_shape=(3, 2, 4), r