## Gather QM9 Data

This notebook is here just to extract a set of structures from the QM9 dataset that we want to use for our analysis.

In [1]:
import collections
import json
from schnetpack import datasets

qm9data = datasets.QM9('data/qm9.db', download=True)
len(qm9data)

133885

Let's define parameters of the structures we want to extract

In [2]:
num_of_each_size = 3
max_size = 1000 # Maximum number of atoms (including hydrogen) we want to consider

Now let's go through the QM9 and pick out structure we want to use for testing

In [3]:
test_set = collections.defaultdict(list)

for idx in range(len(qm9data)):
    system = qm9data.get_atoms(idx=idx)
    if len(system) > max_size:
        continue
    
    indices = test_set[len(system)]
    if len(indices) < num_of_each_size:
        indices.append(idx)
        
for size, values in test_set.items():
    print(size, ": ", len(values))

5 :  3
4 :  3
3 :  2
8 :  3
6 :  3
7 :  3
11 :  3
9 :  3
10 :  3
14 :  3
12 :  3
17 :  3
15 :  3
13 :  3
16 :  3
18 :  3
20 :  3
19 :  3
21 :  3
23 :  3
22 :  3
26 :  3
24 :  3
25 :  3
27 :  3
29 :  3


Now let's just save the dictionary so it can be used in our experiments

In [4]:
with open('data/qm9_subset.json', 'w') as subset:
    json.dump(test_set, subset)