# Own dataset example
This example illustrates how you can generate the 3 B2R2 representations for your own dataset.

You need to specify nuclear charges and coordinates of all reactant and product molecules. If you want to specify additional molecules, e.g. reagents, you can specify these together with the reactants.

In [1]:
from src import b2r2

## Load dataset
Here I will use the qml package to load the nuclear charges and coordinates from xyz files, but this is not necessary as long as nuclear charges and coordinates are provided.

In [2]:
import qml

In [3]:
from glob import glob

In [4]:
reactants = sorted(glob("data/GDB7-20-TS/xyz/reactant_*.xyz"))

In [5]:
products = sorted(glob("data/GDB7-20-TS/xyz/product_*.xyz"))

In [6]:
len(reactants)

11961

In [7]:
len(products)

11961

In [8]:
# cut for demo

In [9]:
reactants = reactants[:100]

In [10]:
products = products[:100]

In [11]:
mols_reactants = [qml.Compound(x) for x in reactants]

Here we have a single reactant and product, but the framework is designed to handle multiple reactants and products. Therefore provide each reactant and product as a sublist.

In [12]:
ncharges_reactants = [[x.nuclear_charges] for x in mols_reactants]

In [13]:
coords_reactants = [[x.coordinates] for x in mols_reactants]

In [14]:
mols_products = [qml.Compound(x) for x in products]

In [15]:
ncharges_products = [[x.nuclear_charges] for x in mols_products]

In [16]:
coords_products = [[x.coordinates] for x in mols_products]

In [17]:
import numpy as np

In [18]:
unique_ncharges = np.unique(np.concatenate([x[0] for x in
                                            ncharges_reactants]))

In [19]:
unique_ncharges

array([1, 6, 7, 8])

## Get representations

In [20]:
b2r2_n = b2r2.get_b2r2_n(ncharges_reactants, ncharges_products,
                        coords_reactants, coords_products,
                         elements=unique_ncharges, Rcut=3
                        )

In [21]:
b2r2_n.shape

(100, 100)

In [22]:
b2r2_l = b2r2.get_b2r2_l(ncharges_reactants, ncharges_products,
                        coords_reactants, coords_products,
                         elements=unique_ncharges, Rcut=3
                        )

In [23]:
b2r2_l.shape

(100, 400)

In [24]:
b2r2_a = b2r2.get_b2r2_a(ncharges_reactants, ncharges_products,
                        coords_reactants, coords_products,
                         elements=unique_ncharges, Rcut=3
                        )

In [25]:
b2r2_a.shape

(100, 1000)