# Using Rascal to Calculate SOAP Vectors of Small Molecules

This notebook is intended as an introductory how-to on calculating the SOAP vectors of small molecules and train a model for their atomization energies on these vectors. For more information on the derivation, utility, and calculation of SOAP vectors, please refer to (among others): 
- [On representing chemical environments (Bartók 2013)](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.87.184115)
- [Comparing molecules and solids across structural and alchemical space (De 2016)](https://pubs.rsc.org/en/content/articlepdf/2016/cp/c6cp00415f)

Beyond libRascal, the packages used in this tutorial are: [time](https://docs.python.org/2/library/time.html), [json](https://docs.python.org/2/library/json.html), [tqdm](https://tqdm.github.io/), [numpy](https://numpy.org/), [matplotlib](https://matplotlib.org/), and [ase](https://wiki.fysik.dtu.dk/ase/index.html).

In [1]:
%matplotlib notebook
%reload_ext autoreload
%autoreload 2
from tutorial_utils import *
readme_button()

Button(description='Show README', style=ButtonStyle())

Output()

In [2]:
mySOAP=SOAP_representation()

ToggleButtons(description='Input File: ', options=('CSD-500.xyz', 'small_molecules-1000.xyz'), style=ToggleBut…

ToggleButtons(description='SOAP Presets: ', options=('Power Spectrum', 'Full Power Spectrum', 'Radial Spectrum…

Dropdown(description='Soap Type', options=('PowerSpectrum', 'RadialSpectrum'), style=DescriptionStyle(descript…

FloatSlider(value=3.5, description='Interaction Cutoff', max=5.0, min=2.0, step=0.15, style=SliderStyle(descri…

IntSlider(value=2, description='Max Radial', max=10, style=SliderStyle(description_width='initial'))

IntSlider(value=1, description='Max Angular', max=10, style=SliderStyle(description_width='initial'))

FloatSlider(value=0.5, description='Gaussian Sigma Constant', max=1.0, step=0.05, style=SliderStyle(descriptio…

Dropdown(description='Gaussian Sigma Type', options=('Constant',), style=DescriptionStyle(description_width='i…

FloatSlider(value=0.0, description='Cutoff Smooth Width', max=1.0, step=0.05, style=SliderStyle(description_wi…

IntSlider(value=500, description='Number of Frames', max=500, style=SliderStyle(description_width='initial'))

In [3]:
mySOAP.train_model(verbose=True)

First, I am going to separate my dataset:
	Training Set: 400 pts (80%)
	Testing Set: 100 pts (20%)

Now we will compute the SOAP representation of our training frames.
In this run, computing the SOAP vectors took 0.00303307 seconds/frame

Next we find the kernel for our training model.
(This step may take a few minutes for larger training sets.)

We will adjust the diagonals of our kernel so that it is properly scaled.

Now we can take this kernel to compute the weights of our KRR.


ValueError: solve1: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m)->(m) (size 400 is different from 51385)

In [None]:
mySOAP.plot_prediction()

## make an interactive map

In [None]:
# package to visualize the structures in the notebook
# https://github.com/arose/nglview#released-version
import nglview

In [None]:
iwdg = nglview.show_asetraj(frames)
# set up the visualization
iwdg.add_unitcell()
iwdg.add_spacefill()
iwdg.remove_ball_and_stick()
iwdg.camera = 'orthographic'
iwdg.parameters = { "clipDist": 0 }
iwdg.center()
iwdg.update_spacefill(radiusType='covalent',
                                   scale=0.6,
                                   color_scheme='element')
iwdg._remote_call('setSize', target='Widget',
                               args=['%dpx' % (600,), '%dpx' % (400,)])
iwdg.player.delay = 200.0

In [None]:
link_ngl_wdgt_to_ax_pos(plt.gca(), X, iwdg)
plt.scatter(X[:,0],X[:,1],s=3)
iwdg