Repository accompanying my NormConf 2022 talk.
In this repository you can retrace the design ideas from my talk with an end-to-end experiment, comparing Graph Convolution to Graph Attention on two small molecule classification datasets.
If you notice slight differences between the code here and the screenshots in my talk - I refactored the
download.py
andplot.ipynb
to make the repository self-contained and re-ran all experiments to test the repository.
For this demo, I used Weights & Biases to log metrics but it should be straightforward to make the code work with other experiment trackers such as Neptune or Comet.
To help you retrace the ideas from my talk, I added comments starting with "🤔" in the code and the config.yaml
. Any questions, remarks or feedback can be logged as an issue on GitHub or via e-mail.
The example task we solve in this repository is a comparison of two popular models for graph classification, Graph Convolution and Graph Attention. To enable a fair comparison, we train and evaluate the models on two small molecule datasets (Proteins, Enzymes, available here) and across multiple seeds.
The workflow follows the design ideas from my talk and comprises three steps:
- Train and test both models on both datasets for multiple random seeds and log the metrics to a cloud service. (
main.py
) - Download all the metrics into one big
.csv
file for later processing. (download.py
) - Analyze the experiment with different plots and a results table in LaTex. (
plot.ipynb
)
Note that the actual task is unimportant to get my ideas across and just serves as an excuse to log actual metrics. However, if it gets you interested in graph learning, all the better 😇
You can get the whole thing running in just three steps.
I strongly recommend to set everything up with conda. Tested with Python 3.9.
- Install PyTorch (tested with
pytorch=1.12.1
). - Install PyTorch Geometric (matching your PyTorch version)1.
- Install the remaining dependencies below with
pip
2.
pip install hydra-core torchmetrics wandb pandas seaborn ipykernel
One of the dependencies of this repository is Hydra, which turns both main.py
(to train and test the models and log metrics) and download.py
(to download the metrics from the cloud) into fully-configurable programs.
You can either adjust settings in the config.yaml
file or via the command-line, e.g., to set a different seed, simply run
python main.py seed=42
A really powerful functionality of Hydra is the multi-run, which executes a script over all combinations of parameters. This is really useful here, because it lets us compute the experiment matrix
Graph Convolution | Graph Attention | |
---|---|---|
Proteins | x | x |
Enzymes | x | x |
for multiple seeds with one command, namely:
python main.py -m hparams.Dataset=Enzymes,Proteins hparams.Model="Graph Convolution","Graph Attention" hparams.seed=91,17,44
Make sure to set the config parameters
wandb.entity
andwandb.project
to enable logging to a Weights & Biases instance.
In general, I recommend separating any local settings (such as wandb settings) from your experiments hyper-parameters, which is why the latter are specified under hparams
in the config.yaml
. As a result, we can now pass these params collectively to wandb.init
:
wandb.init(
...
config=dict(cfg.hparams),
...
)
which reduces the config-related code to a bare-minimum.
Footnotes
-
On Mac M1, it should be sufficient to run
pip install torch-scatter torch-sparse torch-cluster torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html
. Make sure the wheel matches your torch version. ↩ -
I have also installed
black[jupyter]
to have black formatted code, but it is not stricly a dependency of the experiment code. ↩