# User guide on the StructureData class

The atomistic `StructureData` class is basically an enhanced version of the `orm.StructureData`, which was implemented in `aiida-core`. 
Relevant changes are:
- introduction of the `properties` attribute, used to store all the properties associated to the crystal structure;
- dropped the kind-based definition of the atoms, *no more supported* in favour of a code-agnostic site-based definition of the properties;
- the StructureData node is now really a *data container*, meaning that we do not have methods to modify it after its creation, i.e. it is *immutable* even before we store the node in the AiiDA database; 
explanation on this decisions can be found in the following.


<div style="border:2px solid #f7d117; padding: 10px; margin: 10px 0;">
    <strong>Site-based definition of properties:</strong> this simplifies multiple properties defintion and respect the philosophy of a code-agnostic data for the structure. The kinds determination can be done using the built-in `get_kinds()` method of the StructureData. It is also possible to provide a user-defined set of kinds via *tags*.
</div>

## Properties
Properties are divided in three main domains:  *global*, *intra-site*, and *inter-site*, e.g.:

global:
  - cell
  - periodic boundary conditions (PBC)

intra-site:
  - positions
  - symbols 
  - masses
  - electronic charge
  - magnetization - TOBE added
  - Hubbard U parameters - TOBE added

inter-site:
  - Hubbard V parameters - TOBE added 

Some of these properties are related to the sites/atoms (e.g. atomic positions, symbols, electronic charge) and some are related to the whole structure (e.g. PBC, cell). So, each property will have an attribute `domain`, which can be "intra-site", "inter-site", "global". 

## Custom properties
The possibility to have user defined custom properties is discussed in another section (TOBE ADDED).

## The first StructureData instance
One of the principle of the new StructureData is the fact that it is "just" a container of the information about a given structure: this means that, after that instances of this class are immutable. After the initialization, it is not possible to change the stored properties.

Properties should be contained in a dictionary:

In [1]:
from aiida import orm, load_profile
load_profile()

from aiida_atomistic.data.structure import StructureData

In [2]:
properties_dict = {
    "cell":{"value":[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
    "pbc":{"value":[True,True,True]},
    "positions":{"value":[[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]]},
    "symbols":{"value":["Li","Li"]},
    }

where the value of each defined property is defined under the corresponding dictionary, under the key `value`. 


In [3]:
print(f"The whole list of currently supported properties is: \n{StructureData().properties.get_supported_properties()}")

The whole list of currently supported properties is: 
['kinds', 'symbols', 'pbc', 'custom', 'positions', 'mass', 'charge', 'cell']


To initialise a StructureData node is then sufficient to do:

In [4]:
structure = StructureData(properties = properties_dict)
structure

<StructureData: uuid: 46793b2c-77a2-4e07-b896-98def304e99e (unstored)>

we can inspect the properties by accessing the corresponding attribute (tab completion is enabled):

In [5]:
print(f"The cell property class: \n{structure.properties.cell}\n")
print(f"The cell property value: \n{structure.properties.cell.value}\n")
print(f"The cell property domain: \n{structure.properties.cell.domain}\n")

The cell property class: 
parent=<StructureData: uuid: 46793b2c-77a2-4e07-b896-98def304e99e (unstored)> domain='global' value=[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]

The cell property value: 
[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]

The cell property domain: 
global



In [6]:
print(f"The positions property class: \n{structure.properties.positions}\n")
print(f"The positions property value: \n{structure.properties.positions.value}\n")
print(f"The positions property domain: \n{structure.properties.positions.domain}\n")

The positions property class: 
parent=<StructureData: uuid: 46793b2c-77a2-4e07-b896-98def304e99e (unstored)> domain='intra-site' value=[[0.0, 0.0, 0.0], [1.5, 1.5, 1.5]]

The positions property value: 
[[0.0, 0.0, 0.0], [1.5, 1.5, 1.5]]

The positions property domain: 
intra-site



In [7]:
print(f"Stored properties are: \n{structure.properties.get_stored_properties()}")

Stored properties are: 
['kinds', 'symbols', 'pbc', 'positions', 'mass', 'cell']


## StructureData as a data container - immutability

We already anticipated that the StructureData is just a data container, .i.e. is immutable. This is a safety measure needed to 
avoid unpredicted behavior of a step-by-step data manipulation, which moreover may introduce incosistencies among the various properties.
In this way, only an initial consistency check can be performed among the whole set of defined properties. 

The StructureData is a *read-only* type of Data.

In [8]:
structure.properties.cell.value = [[1,2,3],[1,2,3],[1,2,3]]

ValidationError: 1 validation error for Cell
value
  Instance is frozen [type=frozen_instance, input_value=[[1, 2, 3], [1, 2, 3], [1, 2, 3]], input_type=list]
    For further information visit https://errors.pydantic.dev/2.6/v/frozen_instance

In [9]:
structure.properties.cell = [[1,2,3],[1,2,3],[1,2,3]]

AttributeError: property of 'PropertyCollector' object has no setter

### The  `to_dict()` method

A crucial aspect of the new `StructureData` is that it is immutable even if the node is not stored, i.e. the API does not support on-the-fly or interactive modifications (it will raise errors). This helps in avoiding unexpected 
behaviour coming from a step-by-step defintion of the structure, e.g. incosistencies between properties definitions, which are then not cross-checked again.

One has to define a new `StructureData` instance by scratch.
To make user life simpler, we provide a `to_dict` method, which can be used to generate the properties dictionary:

In [10]:
structure.to_dict()

{'cell': {'value': [[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
 'pbc': {'value': [True, True, True]},
 'positions': {'value': [[0.0, 0.0, 0.0], [1.5, 1.5, 1.5]]},
 'symbols': {'value': ['Li', 'Li']},
 'mass': {'value': [6.941, 6.941]},
 'kinds': {'value': ['Li0', 'Li0']}}

Here below and example where we want to change the dimensionality of the structure: we elongate the cell along Z and we fix the pbc property consistently.

## The `get_kinds()` method

It is possible to get a list of kinds using the `get_kinds` method. 
This will generate the corresponding predicted kinds for all the properties (the "intra-site" ones) 
and then generate the list of global different kinds. 
The default threshold used for each property can be found under the attribute ``

This method should be used in the plugins which requires a kind-based definition of properties, e.g. the aiida-quantumespresso one.

In [11]:
new_properties_dict = structure.to_dict()
new_properties_dict["pbc"] = {"value":[True,True,False]}
new_properties_dict["cell"]["value"][2] = [0,0,15]

new_structure = StructureData(properties=new_properties_dict)

print(f"The cell property value: \n{new_structure.properties.cell.value}\n")

The cell property value: 
[[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 15.0]]



In [12]:
unit_cell = [[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]
atomic_positions = [[0.0, 0.0, 0.0],[1.5, 1.5, 1.5]]
symbols = ["Li"]*2
mass = [6.941,6.941]
charge = [1,0]

properties = {
    "cell":{"value":unit_cell},
    "pbc":{"value":[True,True,True]},
    "positions":{"value":atomic_positions,},
    "symbols":{"value":symbols},
    "mass":{"value":mass,},
    "charge":{"value":charge}
    }

structure = StructureData(
        properties=properties
        )
kinds = structure.get_kinds()

print(kinds)

{'kinds': {'value': ['Li0', 'Li1']}, 'mass': {'value': [6.941, 6.941]}, 'charge': {'value': [1.0, 0.0]}}


Up to now, the kind labels are not ordered from zero, i.e. we may have a label "Li1" even if there is only one kind (but more than one symbol).
This should be fixed soon, but does not have an impact on the usefulness of the method. 

#### Specification of default threshold for the kinds

It is possible to specify a custom threshold for a given property, if needed.
See the following example:

In [13]:
structure.properties.charge.default_kind_threshold

0.1

In [14]:
kinds = structure.get_kinds(custom_thr={"charge":2})

print(kinds)

{'kinds': {'value': ['Li0', 'Li0']}, 'mass': {'value': [6.941, 6.941]}, 'charge': {'value': [0.0, 0.0]}}


#### Specification of `kind_tags`

We can assign tags to each atom, in such a way to override results of the `get_kinds` method. If we define a tag for 
each atom of the structure, the method will return unchanged value of the properties
with the desired tags.

In [15]:
kinds = structure.get_kinds(kind_tags=["Li1","Li2"])

print(kinds)

{'kinds': {'value': ['Li1', 'Li2']}, 'mass': {'value': [6.941, 6.941]}, 'charge': {'value': [1.0, 0.0]}}


It is possible also to exclude one property, when we determine kinds (maybe we ignore it in the plugin):

In [16]:
kinds, kinds_values = structure.get_kinds(exclude=["charge"])

print(kinds)
print(kinds_values)

kinds
mass


It is possible to combine the `to_dict` and the `get_kinds` methods, in such a way to have a ready-to-use dictionary with also the kinds, automatically generated:

In [17]:
new_properties = structure.to_dict(generate_kinds= True, kinds_exclude=['mass'],kinds_thresholds={"charge":1.5})
new_properties

{'cell': {'value': [[3.5, 0.0, 0.0], [0.0, 3.5, 0.0], [0.0, 0.0, 3.5]]},
 'pbc': {'value': [True, True, True]},
 'positions': {'value': [[0.0, 0.0, 0.0], [1.5, 1.5, 1.5]]},
 'symbols': {'value': ['Li', 'Li']},
 'mass': {'value': [6.941, 6.941]},
 'charge': {'value': [0.0, 0.0]},
 'kinds': {'value': ['Li0', 'Li0']}}

In [18]:
structure_with_kinds = StructureData(properties=new_properties)

In [19]:
structure_with_kinds.properties.kinds

Kinds(parent=<StructureData: uuid: 8197a2fa-07d0-4bf0-8839-53def3899fb6 (unstored)>, domain='intra-site', value=['Li0', 'Li0'])

## Store and load again.

In [20]:
structure_with_kinds.store()

<StructureData: uuid: 8197a2fa-07d0-4bf0-8839-53def3899fb6 (pk: 1)>

In [21]:
loaded_structure_kinds = orm.load_node(structure_with_kinds.pk)

In [22]:
loaded_structure_kinds.properties.kinds

Kinds(parent=<StructureData: uuid: 8197a2fa-07d0-4bf0-8839-53def3899fb6 (pk: 1)>, domain='intra-site', value=['Li0', 'Li0'])