Skip to content

Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset#192

Open
Loris697 wants to merge 16 commits intogeometric-intelligence:mainfrom
Loris697:integrate-graphland
Open

Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset#192
Loris697 wants to merge 16 commits intogeometric-intelligence:mainfrom
Loris697:integrate-graphland

Conversation

@Loris697
Copy link
Copy Markdown
Collaborator

@Loris697 Loris697 commented Oct 12, 2025

Checklist

  • My pull request has a clear and explanatory title.
  • My pull request passes the Linting test.
  • I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
  • My PR follows PEP8 guidelines. (refer to comment below)
  • My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
  • I linked to issues and PRs that are relevant to this PR.

Pull Request: Integration of GraphLand Benchmark Datasets

This pull request integrates the datasets from the GraphLand benchmark into the repository. Specifically, I have added:

  1. ✅ Implementation of the dataset class, that implements torch_geometric.data.InMemoryDataset class.
  2. ✅ A dataloader that implements AbstractLoader class.
  3. ✅ A Zenodo download class to fetch the datasets, that handle the download.
  4. ✅ A dedicated YAML configuration file for each dataset.

GraphLand is a benchmark of 14 different graph datasets for predicting node properties in a wide range of industrial applications. GraphLand allows you to evaluate graph ML models on graphs of different sizes, structures, and feature sets, all in a unified environment. Furthermore, GraphLand focus on previously unexplored research questions, such as the extent to which realistic temporal distributional changes in transductive and inductive settings affect the performance of graph ML models.

Furthermore, this pull request introduces both a configuration file and a dataloader for the Wiki-CS dataset . The dataset comprises nodes representing computer science articles, with edges defined by hyperlinks between them, and includes 10 classes corresponding to distinct subfields of computer science.

Reference:

Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova (2025). GraphLand: A Landscape of Benchmark Datasets for Graph Machine Learning. arXiv:2409.14500. https://arxiv.org/abs/2409.14500
Peter Mernyei, Catalina Cangea (2022). Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks .https://arxiv.org/abs/2007.02901

Additional context

While implementing this integration, I identified some limitations in the current framework that may require further development:

  • Dataset organization: Currently, the framework does not support creating subfolders in datasets/graph/ to better organize YAML configuration files.I suggest adding this functionality to improve maintainability as the number of datasets grows.
  • Missing labels (semi-supervised setting):The GraphLand datasets contain nodes with missing labels. TopoBench currently does not support semi-supervised settings. As a workaround, I added a drop_missing_y flag to remove nodes without labels.A more robust solution would be to handle missing labels during split creation.
  • Missing values in node features: The datasets include missing values in node features. I implemented the default imputation strategy used in the GraphLand paper (most frequent imputation). However, to avoid data leakage, imputation should ideally be applied after splitting (fit on training, transform on test). This feature is currently not supported in TopoBench.

@Loris697 Loris697 changed the title Category: A1 Team name: Loris; Dataset: Graphland Benchmark Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset Oct 16, 2025
@levtelyatnikov levtelyatnikov added the category-a1 Submission to TDL Challenge 2025: Mission A, Category 1. label Nov 24, 2025
@gbg141
Copy link
Copy Markdown
Collaborator

gbg141 commented Nov 26, 2025

Hi @Loris697! Did you fill out the required Google Form with the information of your PR? We don't find an entry assigned to your PR.

Thank you!

@Loris697
Copy link
Copy Markdown
Collaborator Author

Hi @gbg141, my apologies. I was sure I had already completed the required Google Form. I’ve filled it out now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category-a1 Submission to TDL Challenge 2025: Mission A, Category 1.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants