Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs

Citation

If you use this code, please cite our work:

@misc{bas2024dataaugmentationgraphneural,
  title={Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs}, 
  author={Sumeyye Bas and Kiymet Kaya and Resul Tugay and Sule Gunduz Oguducu},
  year={2024},
  eprint={2407.14765},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2407.14765}
}

Overview

This repository is organized into two main parts:

Graph Generation
Graph Classification

Some models utilize NetworkX graphs, while others are built on PyTorch Geometric graphs. Conversions between these formats may be necessary depending on the use case.

The graph_analysis.ipynb notebook can be used to obtain statistical information on any graph dataset.

Graph Generation

We employ two different generation models based on graph size:

For graphs with fewer than 100 nodes, we use GraphRNN.
For larger graphs, we use GRAN.

You can find the notebooks we used for GraphRNN and GRAN.

GRAN

Original repository: GRAN GitHub Repository

We modified several files to customize data loading and splitting for our experiments:

Data Splits: Edit utils/data_helper.py to adjust dataset splits.
Configuration: Update config/collab_sample.yaml to set parameters for your experiments.

GraphRNN

Original repository: GraphRNN GitHub Repository

We altered the files to read data from graphs.pt and to handle data splits as required for our experiments. Changed files are :

Graph Classification

Graph classification algorithms are implemented to evaluate the generated synthetic graphs. The datasets folder contains subfolders for different experiments. Currently, it includes experiments on the COLLAB dataset, with variations b (raw-data), c (w/ Real), exp_1 (w/ Gen.) and e (test data). Graphs should be in files named "graphs.pt" in the corresponding directory.

Experiments exp1 and exp2 include augmented data. You can modify the config.txt file to adjust hyperparameters and set up different experiments. Test data will always be the folder e.

Datasets

The datasets folder includes multiple subfolders, each corresponding to different experimental setups. Currently, it contains the COLLAB dataset's experimental files. The different configurations (b, c, e) are represented as explained in the paper.

Configuration

To adjust the hyperparameters or experimental settings, modify the config.txt file according to your needs.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
GRAPH CLASSIFICATION		GRAPH CLASSIFICATION
GRAPH GENERATION		GRAPH GENERATION
README.md		README.md
graph_analysis.ipynb		graph_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs

Citation

Overview

Graph Generation

GRAN

GraphRNN

Graph Classification

Datasets

Configuration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Augmentation in Graph Neural Networks: The Role of Generated Synthetic Graphs

Citation

Overview

Graph Generation

GRAN

GraphRNN

Graph Classification

Datasets

Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages