# Describing Collections

[Documentation for `pyrelate`](https://msg-byu.github.io/pyrelate/)

In [1]:
import sys
sys.path.append("../")
from pyrelate.collection import AtomsCollection
import pyrelate.descriptors as descriptors

tut = AtomsCollection('tutorial', 'tutorial_store')
tut.read('tutorial_data/', 28, f_format='lammps-dump-text', prefix='tutorial')

With a collection created and the atoms read in, we can start to calculate descriptors for these collections.

To calculate descriptors, we use the collection's `describe()` function which can be used to calculate built-in or custom descriptors.

`pyrelate` comes with three built-in descriptors, the
* Smooth Overlap of Atomic Position (SOAP)
* Averaged SOAP Representation (ASR)
* Local Environment Representation (LER)

**Note that both ASR and LER depend on SOAP, which must be calculated first**

To use these built in descriptors we just specify the name of descriptor and all the arguments the descriptor needs.

```python
kwargs={
    "all": "of", 
    "the":"args", 
    "for": "descriptor"
    }
    
AtomsCollection.describe(descriptor="name_of_descriptor", fcn=None, needs_store=False, **kwargs)
```

## SOAP

In [4]:
soapargs = {
    'rcut': 5.,
    'lmax': 9,
    'nmax': 9
}
tut.describe('soap', **soapargs)


  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [06:00<36:02, 360.43s/it][A
 29%|██▊       | 2/7 [20:25<51:04, 612.89s/it][A
 43%|████▎     | 3/7 [22:51<30:28, 457.25s/it][A
 57%|█████▋    | 4/7 [32:31<24:23, 487.94s/it][A
 71%|███████▏  | 5/7 [39:13<15:41, 470.73s/it][A
 86%|████████▌ | 6/7 [40:32<06:45, 405.47s/it][A
100%|██████████| 7/7 [41:37<00:00, 356.83s/it][A
[A

**Note: SOAP takes quite a few minutes even for a small collection**

### Get

To retrieve your results from the store, you may use the `get` function.

`AtomsCollection.get(descriptor="name_of_descriptor", idd=None, **kwargs)` 

You may get a result for a single aid

In [5]:
tut.get('soap', 'tutorial_ni.p453.out', **soapargs)

array([[ 3.9545154e-07,  3.0695119e-07,  5.2894811e-06, ...,
         6.7849243e-08, -6.2533509e-08,  6.1898120e-08],
       [ 4.0618261e-07,  3.2279854e-07,  5.4191169e-06, ...,
         1.4569958e-06, -1.2224010e-06,  1.0451884e-06],
       [ 4.9571639e-07,  4.8575771e-07,  6.4149663e-06, ...,
         1.6219198e-06, -1.3667036e-06,  1.1908161e-06],
       ...,
       [ 4.0978921e-07,  3.2543298e-07,  5.4599109e-06, ...,
         4.9699520e-07, -3.7059939e-07,  2.8500213e-07],
       [ 3.9885427e-07,  3.0491498e-07,  5.3432996e-06, ...,
         3.1472084e-08, -2.7238446e-08,  2.4076277e-08],
       [ 4.8816372e-07,  4.8480155e-07,  6.3109742e-06, ...,
         7.4221140e-07, -7.5615566e-07,  7.9348428e-07]], dtype=float32)

Or for all atoms objects in the collection (for those specific parameters) by leaving out the aid parameter

In [5]:
tut.get('soap', **soapargs)

## ASR

Both ASR and LER are dependent on previously computed results from another descriptor (SOAP), and have a parameter named res_needed, indicating it's dependency, defaulting to 'soap'. You may set res_needed to another descriptor name (perhaps for another SOAP implementation) to change that dependency. In addition to the arguments specific to ASR to LER, you must pass in the parameters used to compute the SOAP results that will be used.

In [7]:
#run cell to see docstrings
? descriptors.asr 

In [7]:
tut.describe('asr', **soapargs)


  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:00<00:01,  4.72it/s][A
 29%|██▊       | 2/7 [00:00<00:01,  2.79it/s][A
 43%|████▎     | 3/7 [00:00<00:01,  3.04it/s][A
 57%|█████▋    | 4/7 [00:01<00:01,  2.60it/s][A
 71%|███████▏  | 5/7 [00:01<00:00,  2.57it/s][A
 86%|████████▌ | 6/7 [00:02<00:00,  2.85it/s][A
100%|██████████| 7/7 [00:02<00:00,  3.08it/s][A
[A

Once again, you can retrieve your results by calling the `get()` function which returns one descriptor for the atoms id passed

In [8]:
aid = tut.aids()[0]
res = tut.get('asr', aid, **soapargs)
print(res.shape)

(450,)


or a dictionary with results for all atoms objects in the collection if no atoms id is passed into the get function.

In [9]:
res = tut.get('asr', **soapargs)
#print("Dictionary of length ", len(res))
#print(res)

## LER

In [8]:
#run cell to see docstrings
? descriptors.ler 

In [10]:
lerargs = {
    'eps': 0.05,
    'collection': tut,
}
tut.describe('ler', **soapargs, **lerargs)

aid = tut.aids()[0]
tut.get('ler', aid, **soapargs, **lerargs)
#tut.get('ler', **soapargs, **lerargs)


  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:22<02:13, 22.33s/it][A
 29%|██▊       | 2/7 [00:22<00:56, 11.27s/it][A
 43%|████▎     | 3/7 [00:22<00:30,  7.60s/it][A
 57%|█████▋    | 4/7 [00:22<00:17,  5.75s/it][A
 71%|███████▏  | 5/7 [00:23<00:09,  4.64s/it][A
 86%|████████▌ | 6/7 [00:23<00:03,  3.91s/it][A
100%|██████████| 7/7 [00:23<00:00,  3.42s/it][A
[A

array([0.99378613, 0.00283164, 0.00176977, 0.00161246])

### Clear

Sometimes you may want to remove certain results from your store, in which case you can use the `clear` function. 
`AtomsCollection.clear(descriptor=None, idd=None, **kwargs)`


As a versatile function you may:
- remove a result for a specific Atoms object
    - `tut.clear("ler", aid, **soapargs, **lerargs)`
- remove specific results for all Atoms objects in the collection
    - `tut.clear("tut", **soapargs, **lerargs)`
- remove all results for a certain type of descriptor, and
    - `tut.clear("ler")`
- remove all results in the store
    - `tut.clear()`

## Custom Descriptor

Collections also have the ability to use custom descriptors. All we do is define a function for the descriptor we want to use, than pass into describe as the `fcn` parameter along with a descriptive name that will be used to the store the descriptor.

To define a custom descriptor function, the first parameter must be the ASE atoms object that is being described.

In [11]:
def custom_descriptor1(atoms, arg1, arg2, arg3):
    aid = atoms.get_array("aid")[0]
    return [len(aid), arg1, arg2, arg3]

custom_descriptor1args = {
    'arg1': 1.,
    'arg2': 2.,
    'arg3': 3.
}

tut.describe('custom_descriptor1', fcn=custom_descriptor1, **custom_descriptor1args)

aid = tut.aids()[0]
tut.get('custom_descriptor1', aid, **custom_descriptor1args)


  0%|          | 0/7 [00:00<?, ?it/s][A
 14%|█▍        | 1/7 [00:01<00:07,  1.26s/it][A
100%|██████████| 7/7 [00:01<00:00,  5.27it/s][A

[20, 1.0, 2.0, 3.0]

If the custom descriptor depends on another descriptor (like LER depends on SOAP), then the second parameter must be a results store which can be used to obtain that information, or else a result store will not be passed in automatically. Additionally, you may choose to include an optional parameter (res_needed) indicating what previously computed results to draw from.

In [12]:
def custom_descriptor2(atoms, store, res_needed="custom_descriptor1", arg1, arg2, arg3, arg4):
    aid = atoms.get_array("aid")[0] 
    #the aid is attached to the ASE atoms object in an array labelled "aid", so to get the aid 
    #from the object use the above command
    cd1 = store.get(res_needed, aid, arg1=arg1, arg2=arg2, arg3=arg3)
    return [x * arg4 for x in cd1]

custom_descriptor2args = {
    'arg4': 10.
}

tut.describe('custom_descriptor2', fcn=custom_descriptor2, **custom_descriptor1args, **custom_descriptor2args)

aid = tut.aids()[0]
tut.get('custom_descriptor2', aid, **custom_descriptor1args, **custom_descriptor2args)


  0%|          | 0/7 [00:00<?, ?it/s][A
 29%|██▊       | 2/7 [00:00<00:00,  7.54it/s][A
100%|██████████| 7/7 [00:00<00:00, 24.93it/s][A

[200.0, 10.0, 20.0, 30.0]

## How Files are Stored

The descriptions created are stored in a directory structure with teirs like this: 
    main_store => descriptors => aids => descriptions.pkl

The filenames are aggregates of the descriptor, aid, and parameters to create a unique filename of form
`descriptor__aid__arg1_1__arg2_2__arg3_3.pkl`

For example,

`asr__tutorial_ni.p453.out___lmax_9___nmax_9___rcut_5.0.pkl`

If you are having trouble finding your results with the get function, check the filename in your store-- you might be missing a required parameter