Metrics Are Born At Sea But Stored in the Cloud

Repository accompanying my NormConf 2022 talk.

Introduction

In this repository you can retrace the design ideas from my talk with an end-to-end experiment, comparing Graph Convolution to Graph Attention on two small molecule classification datasets.

If you notice slight differences between the code here and the screenshots in my talk - I refactored the download.py and plot.ipynb to make the repository self-contained and re-ran all experiments to test the repository.

For this demo, I used Weights & Biases to log metrics but it should be straightforward to make the code work with other experiment trackers such as Neptune or Comet.

To help you retrace the ideas from my talk, I added comments starting with "🤔" in the code and the config.yaml. Any questions, remarks or feedback can be logged as an issue on GitHub or via e-mail.

Task

The example task we solve in this repository is a comparison of two popular models for graph classification, Graph Convolution and Graph Attention. To enable a fair comparison, we train and evaluate the models on two small molecule datasets (Proteins, Enzymes, available here) and across multiple seeds.

The workflow follows the design ideas from my talk and comprises three steps:

Train and test both models on both datasets for multiple random seeds and log the metrics to a cloud service. (main.py)
Download all the metrics into one big .csv file for later processing. (download.py)
Analyze the experiment with different plots and a results table in LaTex. (plot.ipynb)

Note that the actual task is unimportant to get my ideas across and just serves as an excuse to log actual metrics. However, if it gets you interested in graph learning, all the better 😇

Installation

You can get the whole thing running in just three steps.

I strongly recommend to set everything up with conda. Tested with Python 3.9.

Install PyTorch (tested with pytorch=1.12.1).
Install PyTorch Geometric (matching your PyTorch version)¹.
Install the remaining dependencies below with pip².

pip install hydra-core torchmetrics wandb pandas seaborn ipykernel

Configuration

One of the dependencies of this repository is Hydra, which turns both main.py (to train and test the models and log metrics) and download.py (to download the metrics from the cloud) into fully-configurable programs.

You can either adjust settings in the config.yaml file or via the command-line, e.g., to set a different seed, simply run

python main.py seed=42

A really powerful functionality of Hydra is the multi-run, which executes a script over all combinations of parameters. This is really useful here, because it lets us compute the experiment matrix

	Graph Convolution	Graph Attention
Proteins	x	x
Enzymes	x	x

for multiple seeds with one command, namely:

python main.py -m hparams.Dataset=Enzymes,Proteins hparams.Model="Graph Convolution","Graph Attention" hparams.seed=91,17,44

Make sure to set the config parameters wandb.entity and wandb.project to enable logging to a Weights & Biases instance.

In general, I recommend separating any local settings (such as wandb settings) from your experiments hyper-parameters, which is why the latter are specified under hparams in the config.yaml. As a result, we can now pass these params collectively to wandb.init:

wandb.init(
    ...
    config=dict(cfg.hparams),
    ...
)

which reduces the config-related code to a bare-minimum.

On Mac M1, it should be sufficient to run pip install torch-scatter torch-sparse torch-cluster torch-geometric -f https://data.pyg.org/whl/torch-1.12.0+cpu.html. Make sure the wheel matches your torch version. ↩
I have also installed black[jupyter] to have black formatted code, but it is not stricly a dependency of the experiment code. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
download.py		download.py
main.py		main.py
plot.ipynb		plot.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metrics Are Born At Sea But Stored in the Cloud

Introduction

Task

Installation

Configuration

About

Releases

Packages

Languages

License

luis-mueller/normconf-2022

Folders and files

Latest commit

History

Repository files navigation

Metrics Are Born At Sea But Stored in the Cloud

Introduction

Task

Installation

Configuration

Footnotes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages