# Describing Collections

[Documentation for gblearn](https://msg-byu.github.io/gblearn/)

In [2]:
import sys
sys.path.append("../")
from gblearn.collection import AtomsCollection

tut = AtomsCollection('tutorial', 'tutorial_store')
tut.read('tutorial_data/', 28, f_format='lammps-dump-text', prefix='tutorial')

With a collection created and the atoms read in, we can start to calculate descriptors for these collections.

To calculate descriptors, we use the collection's `describe()` function which can be used to calculate built-in or custom descriptors.

gblearn comes with three built-in descriptors, the
* Smooth Overlap of Atomic Position (SOAP)
* Averaged SOAP Representation (ASR)
* Local Environment Representation (LER)

**Note that both ASR and LER depend on SOAP, which must be calculated first**

To use these built in descriptors we just specify the name of descriptor and all the arguments the descriptor needs.

```python
kwargs={
    "all": "of", 
    "the":"args", 
    "for": "descriptor"
    }
    
AtomsCollection.describe(descriptor="name_of_descriptor", fcn=None, needs_store=False, **kwargs)
```

## SOAP

In [3]:
soapargs = {
    'rcut': 5.,
    'lmax': 9,
    'nmax': 9
}
tut.describe('soap', **soapargs)

100%|██████████| 7/7 [54:04<00:00, 463.53s/it]  


**Note: SOAP takes quite a few minutes even for a small collection**

To retrieve your results from the store, you may use the `get` function.

`AtomsCollection.get(descriptor="name_of_descriptor", aid=None, **kwargs)` 

You may get a result for a single aid

In [5]:
tut.get('soap', 'tutorial_ni.p453.out', **soapargs)

array([[ 3.9545154e-07,  3.0695119e-07,  5.2894811e-06, ...,
         6.7849243e-08, -6.2533509e-08,  6.1898120e-08],
       [ 4.0618261e-07,  3.2279854e-07,  5.4191169e-06, ...,
         1.4569958e-06, -1.2224010e-06,  1.0451884e-06],
       [ 4.9571639e-07,  4.8575771e-07,  6.4149663e-06, ...,
         1.6219198e-06, -1.3667036e-06,  1.1908161e-06],
       ...,
       [ 4.0978921e-07,  3.2543298e-07,  5.4599109e-06, ...,
         4.9699520e-07, -3.7059939e-07,  2.8500213e-07],
       [ 3.9885427e-07,  3.0491498e-07,  5.3432996e-06, ...,
         3.1472084e-08, -2.7238446e-08,  2.4076277e-08],
       [ 4.8816372e-07,  4.8480155e-07,  6.3109742e-06, ...,
         7.4221140e-07, -7.5615566e-07,  7.9348428e-07]], dtype=float32)

Or for all atoms objects in the collection (for those specific parameters) by leaving out the aid parameter

In [6]:
tut.get('soap', **soapargs)

{'tutorial_ni.p453.out': array([[ 3.9545154e-07,  3.0695119e-07,  5.2894811e-06, ...,
          6.7849243e-08, -6.2533509e-08,  6.1898120e-08],
        [ 4.0618261e-07,  3.2279854e-07,  5.4191169e-06, ...,
          1.4569958e-06, -1.2224010e-06,  1.0451884e-06],
        [ 4.9571639e-07,  4.8575771e-07,  6.4149663e-06, ...,
          1.6219198e-06, -1.3667036e-06,  1.1908161e-06],
        ...,
        [ 4.0978921e-07,  3.2543298e-07,  5.4599109e-06, ...,
          4.9699520e-07, -3.7059939e-07,  2.8500213e-07],
        [ 3.9885427e-07,  3.0491498e-07,  5.3432996e-06, ...,
          3.1472084e-08, -2.7238446e-08,  2.4076277e-08],
        [ 4.8816372e-07,  4.8480155e-07,  6.3109742e-06, ...,
          7.4221140e-07, -7.5615566e-07,  7.9348428e-07]], dtype=float32),
 'tutorial_ni.p454.out': array([[ 2.5282091e-07,  4.0842362e-08,  3.6461845e-06, ...,
          3.3861991e-06, -2.4801186e-06,  1.8458626e-06],
        [ 4.1881850e-07,  3.3282748e-07,  5.5675955e-06, ...,
          5.0248364e

## ASR

As both ASR and LER are dependent on another descriptor, we must set the `needs_store` boolean to true, and must also pass in the arguments of the descriptor they depend on, along with any other parameters ASR or LER needs.

In [7]:
tut.describe('asr', needs_store=True, **soapargs)

100%|██████████| 7/7 [00:02<00:00,  2.97it/s]


Once again, you can retrieve your results by calling the `get()` function which returns one descriptor for the atoms id passed

In [8]:
aid = tut.aids()[0]
print(tut.get('asr', aid, **soapargs))

[ 4.06990978e-07  3.20442922e-07  5.43192073e-06 -1.06669168e-05
  4.38659372e-05 -1.00162506e-04  2.10025188e-04 -3.18376784e-04
  5.57616760e-04  2.65993947e-07  4.24794098e-06 -8.18151511e-06
  3.40951956e-05 -7.74203436e-05  1.62708835e-04 -2.46608513e-04
  4.31749329e-04  7.25846112e-05 -1.42850433e-04  5.86487062e-04
 -1.34108320e-03  2.80982489e-03 -4.25869599e-03  7.45516224e-03
  2.83002999e-04 -1.15680508e-03  2.64837500e-03 -5.54809161e-03
  8.41810275e-03 -1.47246178e-02  4.74163750e-03 -1.08439885e-02
  2.27284934e-02 -3.44583318e-02  6.03067912e-02  2.48308890e-02
 -5.19794151e-02  7.88250864e-02 -1.38003394e-01  1.08887225e-01
 -1.65111274e-01  2.89045036e-01  2.50606388e-01 -4.38149214e-01
  7.66907573e-01  9.34396397e-12  1.26992306e-11  1.86424418e-10
  3.93179413e-11  1.21046240e-09 -6.90125068e-10  3.04601144e-09
 -4.87766094e-09  1.50424395e-09  2.88894741e-10  3.91186916e-10
  2.87811774e-09  5.50696655e-10  1.14250565e-08 -9.49494527e-09
  1.81424422e-08 -2.49739

or a dictionary with results for all atoms objects in the collection if no atoms id is passed into the get function.

In [16]:
res = tut.get('asr', **soapargs)
print("Dictionary of length ", len(res))
print(res)

Dictionary of length  7
{'tutorial_ni.p453.out': array([ 4.06990978e-07,  3.20442922e-07,  5.43192073e-06, -1.06669168e-05,
        4.38659372e-05, -1.00162506e-04,  2.10025188e-04, -3.18376784e-04,
        5.57616760e-04,  2.65993947e-07,  4.24794098e-06, -8.18151511e-06,
        3.40951956e-05, -7.74203436e-05,  1.62708835e-04, -2.46608513e-04,
        4.31749329e-04,  7.25846112e-05, -1.42850433e-04,  5.86487062e-04,
       -1.34108320e-03,  2.80982489e-03, -4.25869599e-03,  7.45516224e-03,
        2.83002999e-04, -1.15680508e-03,  2.64837500e-03, -5.54809161e-03,
        8.41810275e-03, -1.47246178e-02,  4.74163750e-03, -1.08439885e-02,
        2.27284934e-02, -3.44583318e-02,  6.03067912e-02,  2.48308890e-02,
       -5.19794151e-02,  7.88250864e-02, -1.38003394e-01,  1.08887225e-01,
       -1.65111274e-01,  2.89045036e-01,  2.50606388e-01, -4.38149214e-01,
        7.66907573e-01,  9.34396397e-12,  1.26992306e-11,  1.86424418e-10,
        3.93179413e-11,  1.21046240e-09, -6.9012506

## LER

In [10]:
lerargs = {
    'eps': 0.05,
    'collection': tut,
}
tut.describe('ler', needs_store=True, **soapargs, **lerargs)

aid = tut.aids()[0]
tut.get('ler', aid, **soapargs, **lerargs)
#tut.get('ler', **soapargs, **lerargs)

100%|██████████| 7/7 [00:26<00:00,  3.79s/it]


array([0.99378613, 0.00283164, 0.00176977, 0.00161246])

## Custom Descriptor

Collections also have the ability to use custom descriptors. All we do is define a function for the descriptor we want to use, than pass into describe as the `fcn` parameter along with a descriptive name that will be used to the store the descriptor.

To define a custom descriptor function, the first parameter must be the ASE atoms object that is being described.

In [13]:
def custom_descriptor1(atoms, arg1, arg2, arg3):
    aid = atoms.get_array("aid")[0]
    return [len(aid), arg1, arg2, arg3]

custom_descriptor1args = {
    'arg1': 1.,
    'arg2': 2.,
    'arg3': 3.
}

tut.describe('custom_descriptor1', fcn=custom_descriptor1, **custom_descriptor1args)

aid = tut.aids()[0]
tut.get('custom_descriptor1', aid, **custom_descriptor1args)

100%|██████████| 7/7 [00:00<00:00, 53.78it/s]


[20, 1.0, 2.0, 3.0]

If the custom descriptor depends on another descriptor (like LER depends on SOAP), then the second parameter must be a results store which can be used to obtain that information.

In [14]:
def custom_descriptor2(atoms, store, arg1, arg2, arg3, arg4):
    aid = atoms.get_array("aid")[0] 
    #the aid is attached to the ASE atoms object in an array labelled "aid", so to get the aid 
    #from the object use the above command
    cd1 = store.get("custom_descriptor1", aid, arg1=arg1, arg2=arg2, arg3=arg3)
    return [x * arg4 for x in cd1]

custom_descriptor2args = {
    'arg4': 10.
}

tut.describe('custom_descriptor2', fcn=custom_descriptor2, needs_store=True, **custom_descriptor1args, **custom_descriptor2args)

aid = tut.aids()[0]
tut.get('custom_descriptor2', aid, **custom_descriptor1args, **custom_descriptor2args)

100%|██████████| 7/7 [00:00<00:00, 108.79it/s]


[200.0, 10.0, 20.0, 30.0]

## How Files are Stored

The descriptions created are stored in a directory structure with teirs like this: 
    main_store => descriptors => aids => descriptions.pkl

The filenames are aggregates of the descriptor, aid, and parameters to create a unique filename of form
`descriptor__aid__arg1_1__arg2_2__arg3_3.pkl`

For example,

`asr__tutorial_ni.p453.out___lmax_9___nmax_9___rcut_5.0.pkl`

If you are having trouble finding your results with the get function, check the filename in your store-- you might be missing a required parameter