# 01: QM9 Loader Tutorial
In this notebook, we demonstrate how to utilize this repository to load and analyze data in the `QM9` dataset, which can be downloaded [here](http://quantum-machine.org/datasets/). 

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os
complete_path = os.getcwd()
if 'tutorials' in complete_path:
    os.chdir("..")

In [None]:
import matplotlib.pyplot as plt
import networkx as nx

In [None]:
from crescendo.datasets.qm9 import QMXDataset

## Load the data
Here, we load the internally-stored testing data from the local `data` directory, but the user can always modify this path to point to their own copy of the `QM9` database. Note that the `debug=9` flag below contstrains `load` to loading a maximum of 10 data points, thus speeding up the runtime. For a real analysis of the entire database, one should leave `debug` to it's default value of `-1`.

In [None]:
path = 'data/qm9_test_data'
qm9_dat = QMXDataset(debug=10)

The `load` method contains a lot of useful flags for loading subsets of `QM9`.
* `max_heavy_atoms`: the maximum number of C, N, O and F allowed. Default is 9, which corresponds to the entire `QM9` dataset.
* `keep_zwitter`: if `True`, will keep [Zwitterionic](https://en.wikipedia.org/wiki/Zwitterion) compounds. Default is `False`.
* `canonical`: if `True`, will load the canonical `SMILES` string as opposed to the normal one. Default is `True`. 

In [None]:
qm9_dat.load(path)

## Examining the raw data
The raw data loaded into the `qm9_dat` object is contained in `qm9_dat.raw` dictionary and consists of `QM9SmilesDatum` objects indexed by keys corresponding to the QM9 ID's. We can do quite a few things with the data, including analysis methods:
* `has_n_membered_ring`
* `is_aromatic`
* `has_double_bond`
* etc...

and a `to_graph` method

In [None]:
g = qm9_dat.raw[100001].to_graph()

In [None]:
print(qm9_dat.raw[100001].elements)
print(qm9_dat.raw[100001].smiles)

In [None]:
nx.draw(g.to_networkx(), with_labels=True)

In [None]:
g.ndata['features']