# Introduction

This brief tutorial aims to explain how the Python module ``utils/rename_tensors.py`` can be used to rename the ``Labels`` names of ``TensorMap`` and ``TensorBlock`` objects.

The purpose of these functions is allow for greater transferability and interoperability between equistore-based workflows, such as those produced by individuals/groups in different fields and with different naming conventions. This has already been raised as an [issue in the equistore repository](https://github.com/lab-cosmo/equistore/issues/61).

It is worth pointing out that ``utils/rename_tensors.py`` only goes part of the way to solving the problem. It is only a Python-side implementation of the solution and is not rigorously tested. Furthermore, it works by creating **new** ``TensorBlock`` and ``TensorMap`` objects with ``Labels`` with the new names, instead of actuallly **renaming** the ``Labels`` of the existing object. Creating new objects altogether, instead of changing existing ones, has memory implications, as the data stored within the the ``TensorBlocks`` and ``TensorMaps`` being "renamed" is essentially copied.

This is therefore a temporary solution to the problem. Ideally, the real solution will be implemented Rust-side, change the ``Labels`` names of the existing objects, be rigorously tested, and be more elegant both under-the-hood and to the user.

This current Python-side implementation features a single "rename" method, where the user needs to specify explictly the whole name structure. The user-facing API would also have more specific methods such as ``TensorMap.rename_keys()``, ``TensorMap.rename_properties()``, ``TensorMap.rename_components``, ``TensorMap.rename_gradient_components()`` etc.

Nevertheless, this script does a good enough job for now. In this tutorial we will "rename" some example ``TensorBlocks`` and ``TensorMaps``. Hopefully, while a more permanent solution is worked on, this can be useful enough to save others' time.

# Imports

In [1]:
from pprint import pprint as pprint

from equistore import io, TensorMap, TensorBlock
from utils.rename_tensors import *

# Example Data

We will load an example ``TensorMap`` that has been saved to file, ***data/tensormap_periodic.npz***, and use this to illustrate how renaming can be performed.

For context, though not necessarily important for understanding of the tutorial, this example ``TensorMap`` is a descriptor for a small dataset of ~100 periodic crystal structures, generated using the Rascaline ``SphericalExpension`` calculator.  For the illustrative purposes of this tutorial, the descriptor was generated with calculation of both 'cell' and 'positions' gradients.

In [2]:
tensormap = io.load('data/tensormap_periodic.npz')
tensormap

TensorMap with 70 blocks
keys: ['spherical_harmonics_l' 'species_center' 'species_neighbor']
                  0                   1                1
                  1                   1                1
                  2                   1                1
               ...
                  2                   8                8
                  3                   8                8
                  4                   8                8

# Revealing the ``Labels`` Name Structure of a ``TensorMap``

The first step to renaming a ``TensorMap`` is to find out its current ``Labels`` name structure. This can be done with the ``rename_tensors.get_tensormap_name_structure()`` function. This returns a nested dictionary specifying the **complete** name structure. We will 'pretty print' the ``dict`` output of this function and inspect the structure as follows:

In [3]:
pprint(get_tensormap_name_structure(tensormap))

{'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'keys': ('spherical_harmonics_l', 'species_center', 'species_neighbor'),
 'properties': ('n',),
 'samples': ('structure', 'center')}


Upon inspection you can see the names of the various ``Labels`` objects that label the data in the ``TensorMap``. Remember that a ``TensorMap`` is a collection of one or more ``TensorBlock``s, indexed by a key. Here we can see that this descriptor contains some 70 ``TensorBlocks`` indicated by a key that specifies ``('spherical_harmonics_l', 'species_center', 'species_neighbor')``.

Each ``TensorBlock`` is comprised of 'samples', 'components', and 'properties' dimensions. Here, also, each ``TensorBlock`` has associated ``Gradient TensorBlock``s for **both** 'cell' and 'positions' gradients, as these are what were calculated upon generation of the descriptor.

Every ``Gradient TensorBlock``, like a normal ``TensorBlock``, has 'samples', 'components', and 'properties' dimensions. There are some important aspects to how these are named.

* The 'properties' Labels names of the ``TensorBlock`` are/should be the **same** as those of the associated ``Gradient TensorBlock``s.
    * i.e. above you can see that the properties for the radial channels are called ``'n'`` for the ``TensorBlock``s and associated ``Gradient TensorBlocks``.
* Where there are $N$ components names for the ``TensorBlock``, **the final $N$ names** of the components for the associated ``Gradient TensorBlock``s are/should be be the same.
    * i.e. the ``TensorBlock``s have $N=1$ name for the components, ``'spherical_harmonics_m'``, so the final 1 components names for the ``Gradient TensorBlock``s is also ``'spherical_harmonics_m'``. The preceding names are different: for the 'cell' gradients this correspond to pairs of distortions in each of the cartesian axes (i.e. xx, xy, xz, yx, yy, yz, zx, zy, or zz); and for the 'positions' gradients this just corresponds to displacement along a cartesian axis (i.e. x, y, or z).
* Names of samples of the ``TensorBlock`` compared to its associated ``Gradient TensorBlocks`` are different.
    * Here, each ``TensorBlock`` has an associated key for the combination of 'structure' (i.e. the periodic crystal structure) and 'center' (i.e. an atom in that structure). For the ``Gradient TensorBlock``s, the names of the samples are different. For 'cell' gradients, the samples just correspond to 'samples', i.e. how the values of a given ('structure, 'center') combination change upon cell distortion in each of the ('direction1', 'direction2') pair for each 'spherical_harmonic_m'. For 'positions' gradients the samples are given as ('sample', 'structure', 'atom'), which tracks the gradient for displacement of an 'atom' in the 'structure' changes for a given sample (i.e. 'structure' and 'center') and 'spherical_harmonics_m'.

# Renaming a ``TensorMap``

The best way to use these functions is to copy the output from the ``get_tensormap_name_structure()`` function and edit the names as desired.

The cell below features this copied-and-pasted ``dict`` from above, but with some changes, for no good reason at all except to illustrate the functionality:

* ``TensorMap`` keys names changed:
    * ``('spherical_harmonics_l', 'species_center', 'species_neighbor')`` to ``('l', 'species_center', 'species_neighbor')``
* ``TensorBlock`` samples names changed:
    * ``('structure', 'center')`` to ``('crystal', 'center')``

In [4]:
# Change keys and samples names
new_names = {
 'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'keys': ('l', 'species_center', 'species_neighbor'),
 'properties': ('n',),
 'samples': ('crystal', 'center')
}

# "Rename" (i.e. create a new) tensormap with the desired names
new_tensormap = rename_tensormap(tensormap, new_names)

pprint(get_tensormap_name_structure(new_tensormap))

{'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'keys': ('l', 'species_center', 'species_neighbor'),
 'properties': ('n',),
 'samples': ('crystal', 'center')}


Now, knowing that the properties names should be consistent between ``TensorBlock``s and ``Gradient TensorBlock``s, let's try to changes one and not the other. An error is thrown.

In [5]:
new_names['properties'] = ('radial_channel_n',)

new_tensormap = rename_tensormap(tensormap, new_names)

AssertionError: The names of TensorBlock properties Labels should be the same as those for the Gradient TensorBlocks. You have passed ['radial_channel_n'] as the names of the properties of the TensorBlock, and [('n',), ('n',)] as the names of the properties for the Gradient TensorBlocks.

We would need to change all the names of the properties consistently, as follows:

In [6]:
# Change keys, samples, and properties names
new_names = {
 'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('radial_channel_n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('radial_channel_n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'keys': ('l', 'species_center', 'species_neighbor'),
 'properties': ('radial_channel_n',),
 'samples': ('crystal', 'center')
}

# "Rename" (i.e. create a new) tensormap with the desired names
new_tensormap = rename_tensormap(tensormap, new_names)

pprint(get_tensormap_name_structure(new_tensormap))

{'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('radial_channel_n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('radial_channel_n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'keys': ('l', 'species_center', 'species_neighbor'),
 'properties': ('radial_channel_n',),
 'samples': ('crystal', 'center')}


# Renaming a ``TensorBlock``

This can be done in a very similar way to renaming a ``TensorMap``, except the ``rename_tensors.rename_tensorblock()`` function is used instead. In fact, under-the-hood, ``rename_tensormap()`` works by iterating over blocks and calling ``rename_tensorblock()``.

In [7]:
tensorblock = tensormap.block(0)

pprint(get_tensorblock_name_structure(tensorblock))

{'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'properties': ('n',),
 'samples': ('structure', 'center')}


Here we will change only the 'properties' names, from ``'n'`` to ``'radial_channel_n'``:

In [8]:
# Change properties names
new_names_tensorblock = {'components': [('spherical_harmonics_m',)],
                         'gradients': {'cell': {'components': [('direction_1',),
                                                               ('direction_2',),
                                                               ('spherical_harmonics_m',)],
                                                'properties': ('radial_channel_n',),
                                                'samples': ('sample',)},
                                       'positions': {'components': [('direction',),
                                                                    ('spherical_harmonics_m',)],
                                                     'properties': ('radial_channel_n',),
                                                     'samples': ('sample', 'structure', 'atom')}},
                         'properties': ('radial_channel_n',),
                         'samples': ('structure', 'center')}

# "Rename" (i.e. create a new) tensormap with the desired names
new_tensorblock = rename_tensorblock(tensorblock, new_names_tensorblock)

pprint(get_tensorblock_name_structure(new_tensorblock))

{'components': [('spherical_harmonics_m',)],
 'gradients': {'cell': {'components': [('direction_1',),
                                       ('direction_2',),
                                       ('spherical_harmonics_m',)],
                        'properties': ('radial_channel_n',),
                        'samples': ('sample',)},
               'positions': {'components': [('direction',),
                                            ('spherical_harmonics_m',)],
                             'properties': ('radial_channel_n',),
                             'samples': ('sample', 'structure', 'atom')}},
 'properties': ('radial_channel_n',),
 'samples': ('structure', 'center')}


# Summary

To re-emphasize, this is a clunky but functional way of renaming ``TensorMap`` and ``TensorBlock`` objects. The Rust-side and more-permanent implementation will edit the existing objects, not create new ones, and have specific methods to rename separate ``Labels``, i.e. for samples, components, properties, gradient components, etc.