A real-world example of creating custom datasets in PyTorch Geometric

This repository is intended purely to demonstrate how to make a graph dataset for PyTorch Geometric from graph vertices and edges stored in CSV files.

The demonstration is done through a node-prediction GNN training/evaluation example with a very small amount of code and data.

Usage

Main usage is to read all the "*.py" scripts.

The scripts can also be executed, for example:

# train GNN
python run_training.py

# test a trained GNN model saved after epoch 5
python run_evaluation.py testing_data ./output_saved_trained_models/epoch5.pth

# test multiple saved trained GNN models
find "./output_saved_trained_models/" -type f | sort -V | xargs python ./run_evaluation.py testing_data

# use a single trained model to predict vertex values for a single graph, and save the predictions to a file
python run_inference_for_one_graph.py \
  ./output_saved_trained_models/epoch15.pth \
  ./input_graph_CSV_files/data/1A22/1A22_sr1_vertices_in.csv \
  ./input_graph_CSV_files/data/1A22/1A22_sr1_edges.csv \
  ./output_vertex_predictions.txt

System requirements

Using Miniconda (or Anaconda)

Example of installing prerequisites with Miniconda:

# download an install latest Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

source ~/miniconda3/bin/activate

# install PyTorch using instructions from 'https://pytorch.org/get-started/locally/'
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# install PyTorch Geometric using instructions from 'https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html'
pip install torch_geometric
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

# install Pandas, to use for reading CSV files
conda install pandas

Using Apptainer

Alternatively, Apptainer container can be built from the image supplied among the files:

# build Apptainer container
apptainer build container.sif container.def

# enter the container
apptainer shell container.sif

It should be noted that Apptainer's container may not be able to use host's GPU.

Remarks about data

The included input graphs are already prepared for GNN training and application. The graph preparation code is not included, but below are the main recommendations for the graphs to work with the provided training and inference code.

Graph connectivity

The graphs should have bidirectional connections and self-connections. That is, in any '*_edges.csv' file:

if there is (i -> j) edge, there should also be (j -> i) edge with the same weight
there should be (i -> i) edge with an apppropriate weight for every vertex id i

Normalization of vertex and edge feature values

All the vertex and edge feature values should be normalized universally (not on per-graph basis, but based on some global statistics) - for example, converted to z-scores using mean and standard deviation values known beforehand or derived from all the graphs used in training:

z_score = (x - mean) / standard_deviation

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
input_graph_CSV_files		input_graph_CSV_files
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.markdown		README.markdown
container.def		container.def
custom_dataset_from_graph_csv_files.py		custom_dataset_from_graph_csv_files.py
custom_gnn_model.py		custom_gnn_model.py
rerun_training_and_evaluation.bash		rerun_training_and_evaluation.bash
run_evaluation.py		run_evaluation.py
run_inference_for_one_graph.py		run_inference_for_one_graph.py
run_training.py		run_training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input_graph_CSV_files

input_graph_CSV_files

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.markdown

README.markdown

container.def

container.def

custom_dataset_from_graph_csv_files.py

custom_dataset_from_graph_csv_files.py

custom_gnn_model.py

custom_gnn_model.py

rerun_training_and_evaluation.bash

rerun_training_and_evaluation.bash

run_evaluation.py

run_evaluation.py

run_inference_for_one_graph.py

run_inference_for_one_graph.py

run_training.py

run_training.py

Repository files navigation

A real-world example of creating custom datasets in PyTorch Geometric

Usage

System requirements

Using Miniconda (or Anaconda)

Using Apptainer

Remarks about data

Graph connectivity

Normalization of vertex and edge feature values

About

Releases

Packages

Contributors 2

Languages

License

kliment-olechnovic/gnn-custom-dataset-example

Folders and files

Latest commit

History

Repository files navigation

A real-world example of creating custom datasets in PyTorch Geometric

Usage

System requirements

Using Miniconda (or Anaconda)

Using Apptainer

Remarks about data

Graph connectivity

Normalization of vertex and edge feature values

About

Resources

License

Stars

Watchers

Forks

Languages