Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset by Loris697 · Pull Request #192 · geometric-intelligence/TopoBench

Loris697 · 2025-10-12T14:46:19Z

Checklist

My pull request has a clear and explanatory title.
My pull request passes the Linting test.
I added appropriate unit tests and I made sure the code passes all unit tests. (refer to comment below)
My PR follows PEP8 guidelines. (refer to comment below)
My code is properly documented, using numpy docs conventions, and I made sure the documentation renders properly.
I linked to issues and PRs that are relevant to this PR.

Pull Request: Integration of GraphLand Benchmark Datasets

This pull request integrates the datasets from the GraphLand benchmark into the repository. Specifically, I have added:

✅ Implementation of the dataset class, that implements torch_geometric.data.InMemoryDataset class.
✅ A dataloader that implements AbstractLoader class.
✅ A Zenodo download class to fetch the datasets, that handle the download.
✅ A dedicated YAML configuration file for each dataset.

GraphLand is a benchmark of 14 different graph datasets for predicting node properties in a wide range of industrial applications. GraphLand allows you to evaluate graph ML models on graphs of different sizes, structures, and feature sets, all in a unified environment. Furthermore, GraphLand focus on previously unexplored research questions, such as the extent to which realistic temporal distributional changes in transductive and inductive settings affect the performance of graph ML models.

Furthermore, this pull request introduces both a configuration file and a dataloader for the Wiki-CS dataset . The dataset comprises nodes representing computer science articles, with edges defined by hyperlinks between them, and includes 10 classes corresponding to distinct subfields of computer science.

Reference:

Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova (2025). GraphLand: A Landscape of Benchmark Datasets for Graph Machine Learning. arXiv:2409.14500. https://arxiv.org/abs/2409.14500
Peter Mernyei, Catalina Cangea (2022). Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks .https://arxiv.org/abs/2007.02901

Additional context

While implementing this integration, I identified some limitations in the current framework that may require further development:

Dataset organization: Currently, the framework does not support creating subfolders in datasets/graph/ to better organize YAML configuration files.I suggest adding this functionality to improve maintainability as the number of datasets grows.
Missing labels (semi-supervised setting):The GraphLand datasets contain nodes with missing labels. TopoBench currently does not support semi-supervised settings. As a workaround, I added a drop_missing_y flag to remove nodes without labels.A more robust solution would be to handle missing labels during split creation.
Missing values in node features: The datasets include missing values in node features. I implemented the default imputation strategy used in the GraphLand paper (most frequent imputation). However, to avoid data leakage, imputation should ideally be applied after splitting (fit on training, transform on test). This feature is currently not supported in TopoBench.

Introducing nodes with missing y

gbg141 · 2025-11-26T01:58:34Z

Hi @Loris697! Did you fill out the required Google Form with the information of your PR? We don't find an entry assigned to your PR.

Thank you!

Loris697 · 2025-11-26T10:14:00Z

Hi @gbg141, my apologies. I was sure I had already completed the required Google Form. I’ve filled it out now.

Loris697 added 12 commits October 8, 2025 15:45

Introducing WikiCS dataset

221869c

Complete introduction of wikics

5ba98db

Introducing all the datasets from GraphLand

9c608c2

Introducing node features imputing

7f4090a

Introducing nodes with missing y

Implementing Linting changes

c7383f0

Formatting import

6e20dea

Correcting whitespace

cb25b67

Fix string too long

b30d75b

Fix formatting

2d0a80b

Fix error web-topics configuration

1f7c779

Fix wiki_ccs configuration

61ee598

Modifying test loaders to avoid overflow of memory and time

66bea5d

Loris697 changed the title ~~Category: A1 Team name: Loris; Dataset: Graphland Benchmark~~ Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset Oct 16, 2025

Loris697 added 4 commits October 16, 2025 16:07

Comment ZINC

6f1e72c

Changing test on datasets

c07c396

Deleting other dataset from test

47bb68c

Not using all the dataset in the testing phase

81d855a

levtelyatnikov added the category-a1 Submission to TDL Challenge 2025: Mission A, Category 1. label Nov 24, 2025

anilkeshwani mentioned this pull request Mar 22, 2026

Fix GraphLand categorical feature encoding and update num_features #289

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset#192

Category: A1 Team name: Loris; Dataset: Graphland Benchmark and WikiCS dataset#192
Loris697 wants to merge 16 commits intogeometric-intelligence:mainfrom
Loris697:integrate-graphland

Loris697 commented Oct 12, 2025 •

edited

Loading

Uh oh!

gbg141 commented Nov 26, 2025

Uh oh!

Loris697 commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Loris697 commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Pull Request: Integration of GraphLand Benchmark Datasets

Additional context

Uh oh!

gbg141 commented Nov 26, 2025

Uh oh!

Loris697 commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Loris697 commented Oct 12, 2025 •

edited

Loading