In [54]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Describing the Atomic Systems Held in Your AtomsCollection

[Documentation for `pyrelate`](https://msg-byu.github.io/pyrelate/)

In [55]:
import sys
sys.path.append("../")
from pyrelate.collection import AtomsCollection
import pyrelate.descriptors as descriptors

tut = AtomsCollection('tutorial', 'tutorial_store')
tut.read(['tutorial_data/ni.p453.out', 'tutorial_data/ni.p454.out', 'tutorial_data/ni.p455.out'], 28, prefix='tutorial')

100%|██████████| 3/3 [00:03<00:00,  1.24s/it]


In [56]:
tut.aids()

['tutorial_ni.p453.out', 'tutorial_ni.p454.out', 'tutorial_ni.p455.out']

## Trim

See documentation

## Describe function

In [None]:
? AtomsCollection.trim

With a collection created and the atoms read in, we can start to calculate atomic descriptions for the ASE atoms objects in the collection.

To calculate atomic descriptors, we use the collection's `describe()` function which can be used to calculate built-in or custom descriptors.

`pyrelate` has one built in atomic descriptor, called the Smooth Overlap of Atomic Positions (SOAP). This built in descriptor utilizes the pycsoap implementation. As the researcher, you can write your own descriptor functions that you can use with `pyrelate`, using the built-in soap function as a pattern.

To use these built in descriptors we just specify the name of descriptor and all the arguments the descriptor needs.

```python
kwargs={
    "all": "of", 
    "the":"args", 
    "for": "descriptor"
    }
    
AtomsCollection.describe(descriptor, aid=None, fcn=None, override=False, **kwargs)
```

### SOAP

Using the `aid` parameter you can choose which atomic structures to describe. If you leave as `None`, all atomic systems in the AtomsCollection will be described. 

Using the `fcn` parameter you specify what descriptor function to use.

Using the `override` parameter you can choose to redo a description previously calculated and stored, and overwrite the result in the Store.

In [57]:
from pyrelate.descriptors import soap
soapargs = {
    'rcut': 5.,
    'lmax': 9,
    'nmax': 9
}

In [58]:
tut.describe('soap', aid="tutorial_ni.p453.out", fcn=soap, **soapargs)

#describe all results in the AtomsCollection, and it will not redo the one done above
tut.describe('soap', fcn=soap, **soapargs)

100%|██████████| 1/1 [02:52<00:00, 172.62s/it]
100%|██████████| 3/3 [17:59<00:00, 359.68s/it]


**Note: SOAP takes quite a few minutes even for a small collection**

### Retrieving Description Results

The part of the `Store` that holds the results from your atomic description is organized like so:

[store name] --> "Descriptions" --> [aID] --> [descriptor name]

The function you would use to retrieve your results from the Store is called `get_description()`, and to retrieve the results, you just need the parameters that were used to generate the description in the first place (**WARNING**: do not use use the "fcn" used when retrieving your results). 

So to retrieve your SOAP results you could do

In [59]:
soap_res = tut.get_description("tutorial_ni.p453.out", "soap", **soapargs)
soap_res.shape

(49054, 450)

There is an optional `metadata` parameter that you could use to fetch the metadata of exactly what parameters were used in storing the description, along with anything else the description method stores alongside the description.

In [60]:
res, meta = tut.get_description("tutorial_ni.p453.out", "soap", metadata=True, **soapargs)
meta

{'desc_args': {'rcut': 5.0, 'lmax': 9, 'nmax': 9}}

## Process function

In addition to atomic descriptors, there are what we can call "derived descriptors", which are basically just ways to post-process a collection of atomic descriptions into an alternate representation. An example of this would be taking the SOAP *P* matrices of a collection of many different atomic structures, and processing such that you get a single "feature matrix" that can be used in machine learning. In the `pyrelate` vocabulary we call these post-processing techniques "methods". 

The `process()` function in pyrelate handles applying these processing methods to previously calculated atomic descriptor results stored in the AtomsCollection's Store object.

`pyrelate` comes with three built-in processing methods:
* Averaged SOAP Representation (ASR)
* summation
* Local Environment Representation (LER)

In [None]:
? AtomsCollection.process #run cell to see 'process' docstring

All of these processing methods can be used to process atomic descriptor results from a collection of ASE Atoms objects into a single representation. Therefore, these descriptors are *collection specific*, meaning that if your AtomsCollection has a different group of atomic systems, you will get a different representation. 


In [7]:
#run cell to see docstrings
? descriptors.asr 

? descriptors.ler

? descriptors.sum

In [61]:
asr_res = tut.process('asr', based_on=("soap",soapargs))
# Note: you see here that we do not include the fcn parameter indicating what function to use. This is because 
# for the methods/descriptors in 'descriptors.py', if your method name or descriptor name is the same as the
# function name in that file, it will automatically use that function from 'descriptors.py'.

asr_res.shape #should have 3 rows (one for each atomic system SOAP result) and a lot of columns

(3, 450)

In [62]:
ler_args = {
    "soap_fcn" : soap,
    "eps" : 0.1,
    "dissim_args" : {"gamma":4000},
}

In [68]:
ler_res = tut.process("ler", based_on=("soap", soapargs), **ler_args)
ler_res.shape

(3, 57)

### Retrieving Method Results

The part of the `Store` that holds the results from your atomic description is organized like so:

[store name] --> "Collections" --> [collection name] --> [method name]

The function you would use to retrieve your results from the Store is called `get_collection_result()`, and to retrieve the results, you need the parameters that were used to generate the atomic description in the first place, *and* the parameters used for the method. (**WARNING**: do not use use the "fcn" used when retrieving your results). 

So to retrieve your ASR or LER results you could do

In [64]:
asr_res = tut.get_collection_result('asr', based_on=('soap', soapargs))

ler_res = tut.get_collection_result('ler', based_on=('soap', soapargs), **ler_args)
ler_res.shape

(3, 57)

## Clear

Sometimes you may want to remove certain results from your store, in which case you can use the `clear` or `clear_all` functions. 

`AtomsCollection.clear(self, descriptor=None, aid=None, collection_name=None, method=None, based_on=None, **kwargs)`


As a versatile function, with `clear` you may:
- remove a specific description result
    - `tut.clear(descriptor="soap", aid="tutorial_ni.p453.out", **soapargs) #clears single SOAP result with given parameters`
- remove all results for a specific descriptor
    - `tut.clear(descriptor="soap") #clears all soap results from store`
- remove a specific collection result
    - `tut.clear(collection_name="tutorial", method="ler", based_on=("soap", soapargs), **ler_args) #clears the ler result generated with given parameters`
- remove collection results created with specific method for given collection
    - `tut.clear(collection_name="tutorial", method="ler") #clears all collection results processed with ler from the collection named tutorial`
    - `tut.clear(method="ler") #same as above, if the collection you are calling function from has the same name as the collection that generated the results`

**Note: you may not include the collection name parameter if the results you want came from the AtomsCollection (with the same name) that generated the results. The collection name parameter will be pulled from the AtomsCollection object you are using.


And you may clear all the results in the store:
- `tut.clear_all()`

In [69]:
def _delete_store(store):
    import shutil
    shutil.rmtree(store.root)

_delete_store(tut.store)