Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphGym incompatible with custom PyG torch_geometric.data.Dataset datasets #6474

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Sann5
Copy link

@Sann5 Sann5 commented Jan 19, 2023

GraphGym assumes that if you use a PyG dataset it is a torch_geometric.data.InMemoryDataset dataset. This is problematic because if you are a user like me that would like to do model selection for your custom-made torch_geometric.data.Dataset dataset, well you can't, at least out of the box.

Specifically, the issue comes from torch_geometric/graphgym/loader.py where on several occasions the dataset._data attribute is accessed, and torch_geometric.data.Dataset datasets have no such attribute. I replaced these withdataset.<a method>() to access information about the dataset.

Additionally, since the data-loader calling strategy depended on dataset._data attribute, I designed a new one. With these changes, one can use both types of PyG datasets. I'm then able to run GraphGym both on my own dataset and on the ones used in the examples. I tried to retain as much as possible from the original code, to avoid unforeseen issues. Please let me know if anything needs explanation.

Cheers!

How to reproduce the error:

  1. Clone PyG, or make a new branch.
  2. Add a torch_geometric.data.Dataset dataset register to /pytorch_geometric/graphgym/custom_graphgym/loader
  3. Place your data wherever you please.
  4. Adjust the path in your config to the location of the data.
  5. Run pytorch_geometric/graphgym/main.py.

Here are my examples of some data, a register, and the config I used.
example_files 2.zip

@codecov
Copy link

codecov bot commented Jan 19, 2023

Codecov Report

Merging #6474 (f24ed0b) into master (5d777e7) will increase coverage by 0.21%.
The diff coverage is 81.81%.

@@            Coverage Diff             @@
##           master    #6474      +/-   ##
==========================================
+ Coverage   85.09%   85.31%   +0.21%     
==========================================
  Files         402      402              
  Lines       21672    21726      +54     
==========================================
+ Hits        18442    18535      +93     
+ Misses       3230     3191      -39     
Impacted Files Coverage Δ
torch_geometric/graphgym/loader.py 42.58% <81.81%> (+2.32%) ⬆️
torch_geometric/deprecation.py 100.00% <0.00%> (ø)
torch_geometric/nn/kge/base.py 95.83% <0.00%> (ø)
torch_geometric/nn/aggr/base.py 95.65% <0.00%> (ø)
torch_geometric/nn/pool/asap.py 92.00% <0.00%> (ø)
torch_geometric/data/__init__.py 100.00% <0.00%> (ø)
torch_geometric/nn/kge/transe.py 100.00% <0.00%> (ø)
torch_geometric/loader/cluster.py 95.06% <0.00%> (ø)
torch_geometric/graphgym/config.py 95.07% <0.00%> (ø)
torch_geometric/nn/models/captum.py 100.00% <0.00%> (ø)
... and 40 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Sann5
Copy link
Author

Sann5 commented Jan 19, 2023

I would like to comment on a bug that I did not address with this PR. Let's say that you set the data, the register and the config as I stated above. Then if you run an experiment for one of the other PyG datsets (not you custom dataset) with a different config, for example graphgym/configs/pyg/example_graph.yaml you will get an error. This is because the register (example_register.py) is being imported and pytorch_geometric/graphgym/main.py trys to use the register function with the dataset: name specified in example_graph.yaml but such location and data do not exist. The simple solution to this is to delete the register and run the experiment. But its annoying to do so every time you mean to use a different dataset than the ones that are compatible with you register function.

@rusty1s rusty1s changed the title GraphGym incompatible with custom PyG torch_geometric.data.Dataset datasets. GraphGym incompatible with custom PyG torch_geometric.data.Dataset datasets Jan 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants