Skip to content

A real-world example of creating custom datasets in PyTorch Geometric.

License

Notifications You must be signed in to change notification settings

kliment-olechnovic/gnn-custom-dataset-example

Repository files navigation

A real-world example of creating custom datasets in PyTorch Geometric

This repository is intended purely to demonstrate how to make a graph dataset for PyTorch Geometric from graph vertices and edges stored in CSV files.

The demonstration is done through a node-prediction GNN training/evaluation example with a very small amount of code and data.

Usage

Main usage is to read all the "*.py" scripts.

The scripts can also be executed, for example:

# train GNN
python run_training.py

# test a trained GNN model saved after epoch 5
python run_evaluation.py testing_data ./output_saved_trained_models/epoch5.pth

# test multiple saved trained GNN models
find "./output_saved_trained_models/" -type f | sort -V | xargs python ./run_evaluation.py testing_data

# use a single trained model to predict vertex values for a single graph, and save the predictions to a file
python run_inference_for_one_graph.py \
  ./output_saved_trained_models/epoch15.pth \
  ./input_graph_CSV_files/data/1A22/1A22_sr1_vertices_in.csv \
  ./input_graph_CSV_files/data/1A22/1A22_sr1_edges.csv \
  ./output_vertex_predictions.txt

System requirements

Using Miniconda (or Anaconda)

Example of installing prerequisites with Miniconda:

# download an install latest Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

source ~/miniconda3/bin/activate

# install PyTorch using instructions from 'https://pytorch.org/get-started/locally/'
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# install PyTorch Geometric using instructions from 'https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html'
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

# install Pandas, to use for reading CSV files
conda install pandas

Using Apptainer

Alternatively, Apptainer container can be built from the image supplied among the files:

# build Apptainer container
apptainer build container.sif container.def

# enter the container
apptainer shell container.sif

It should be noted that Apptainer's container may not be able to use host's GPU.

Remarks about data

The included input graphs are already prepared for GNN training and application. The graph preparation code is not included, but below are the main recommendations for the graphs to work with the provided training and inference code.

Graph connectivity

The graphs should have bidirectional connections and self-connections. That is, in any '*_edges.csv' file:

  • if there is (i -> j) edge, there should also be (j -> i) edge with the same weight
  • there should be (i -> i) edge with an apppropriate weight for every vertex id i

Normalization of vertex and edge feature values

All the vertex and edge feature values should be normalized universally (not on per-graph basis, but based on some global statistics) - for example, converted to z-scores using mean and standard deviation values known beforehand or derived from all the graphs used in training:

z_score = (x - mean) / standard_deviation

About

A real-world example of creating custom datasets in PyTorch Geometric.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published