Explainability Benchmark Dataset Framework #6104

rfdavid · 2022-12-01T04:46:42Z

This PR implements benchmark dataset overall framework and API for generating benchmark datasets (#5817). Any feedback is appreciated, especially regarding the proposed architecture.

Implementing a new graph generator

Create a class inside generators inheriting from GraphGenerator
Implement generate_base_graph using the provided methods (generate_feature and attach_motif)
See BAGraph dataset for GNNExplainer #6072 and Explainability Dataset Task: ERGraph #6073 to check how it was implemented
The current Motif generator used here will be replaced by Custom motif generation #6179

Example for the final user

motif = Motif('house')
generator = BAGraph(num_nodes=300, num_motifs=80, motif=motif)
dataset = ExplainerDataset(generator)

TODO:

Finish graph generator (provide all necessary methods: feature generator, label generator)
Add tests
Add to Change log
Documentation

RexYing

Thanks for the great pr:) We also had a great discussion and I just note down some comments I made.

torch_geometric/datasets/generators/graph_generator.py

torch_geometric/datasets/generators/motif.py

torch_geometric/datasets/generators/graph_generator.py

rfdavid · 2022-12-02T15:49:42Z

Thanks for the great pr:) We also had a great discussion and I just note down some comments I made.

It was a great pleasure for me, thank you so much for the insights. All points are clear, I'll address them and commit my changes soon.

…ric into benchmark_dataset

codecov · 2022-12-05T05:12:36Z

Codecov Report

Merging #6104 (9b191d1) into master (bc47556) will decrease coverage by 1.87%.
The diff coverage is 83.87%.

❗ Current head 9b191d1 differs from pull request most recent head 9c4bba3. Consider uploading reports for the commit 9c4bba3 to get more accurate results

@@            Coverage Diff             @@
##           master    #6104      +/-   ##
==========================================
- Coverage   86.43%   84.55%   -1.88%     
==========================================
  Files         371      372       +1     
  Lines       20877    20848      -29     
==========================================
- Hits        18046    17629     -417     
- Misses       2831     3219     +388

Impacted Files	Coverage Δ
torch_geometric/resolver.py	`82.75% <82.75%> (ø)`
torch_geometric/nn/resolver.py	`88.70% <100.00%> (-1.07%)`	⬇️
torch_geometric/nn/models/dimenet_utils.py	`0.00% <0.00%> (-75.52%)`	⬇️
torch_geometric/nn/models/dimenet.py	`14.90% <0.00%> (-52.76%)`	⬇️
torch_geometric/profile/profile.py	`33.33% <0.00%> (-25.39%)`	⬇️
torch_geometric/nn/conv/utils/typing.py	`83.75% <0.00%> (-15.00%)`	⬇️
torch_geometric/nn/pool/asap.py	`92.10% <0.00%> (-7.90%)`	⬇️
torch_geometric/nn/dense/linear.py	`85.40% <0.00%> (-7.84%)`	⬇️
torch_geometric/nn/inits.py	`67.85% <0.00%> (-7.15%)`	⬇️
torch_geometric/transforms/add_self_loops.py	`94.44% <0.00%> (-5.56%)`	⬇️
... and 30 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

- Added seed in graph generator - Added documentation in graph and motif classes - Simplified GraphGenerator base class - Fixed test error - Added tests - Modified Changelog

- Added files from pyg-team#6104 - Change ba_graph.py and graph_generator.py to use the Benchmark Dataset Framework API

RexYing

Thanks overall it looks great now. I'll let Zecheng or Charles to have a look in case I miss anything.

zechengz

Thank you :) ! In general the PR looks good to me. Left some comments.

zechengz · 2022-12-08T03:04:20Z

torch_geometric/datasets/generators/graph_generator.py

+    def generate_feature(self, num_features: int = 10) -> None:
+        self._x = torch.ones((self.num_nodes, num_features), dtype=torch.float)
+
+    def attach_motif(self, num_motifs=80) -> None:


Suggested change

def attach_motif(self, num_motifs=80) -> None:

def attach_motif(self, num_motifs: int = 80):

And it will be appreciated if we can add some doc string here.

zechengz · 2022-12-08T03:06:16Z

torch_geometric/datasets/generators/graph_generator.py

+        self.generate_base_graph()
+
+    @property
+    def data(self):


Suggested change

def data(self):

def data(self) -> Data:

torch_geometric/datasets/generators/graph_generator.py

zechengz · 2022-12-08T03:18:39Z

torch_geometric/datasets/generators/graph_generator.py

+
+class GraphGenerator:
+    r"""Base class for generating benchmark datasets. It contains
+        `generate_feature` and `attach_motif` methods used to generate


Suggested change

`generate_feature` and `attach_motif` methods used to generate

:meth:`generate_feature` and :meth:`attach_motif` to generate

zechengz · 2022-12-08T03:35:25Z

torch_geometric/datasets/generators/graph_generator.py

+            Motif object to be attached to the base graph.
+        seed (int, Optional): seed number for the generator.
+    """
+    def __init__(self, num_nodes: int = 300, motif: Optional[Callable] = None,


I think motif here should not be a Callable?

zechengz · 2022-12-08T03:54:24Z

torch_geometric/datasets/generators/graph_generator.py

+        currently attached in random order.
+
+    Args:
+        num_nodes (int): The number of nodes used to attach the motifs.


Suggested change

num_nodes (int): The number of nodes used to attach the motifs.

num_nodes (int): Number of nodes in the base graph.

Does the base graph indicate the graph before attaching the motifs?

yes, exactly.

zechengz · 2022-12-08T04:08:54Z

torch_geometric/datasets/generators/graph_generator.py

+            self.num_nodes += self.motif.num_nodes
+
+        self._expl_mask = torch.zeros(self.num_nodes, dtype=torch.bool)
+        self._expl_mask[torch.arange(self.motif.num_nodes * num_motifs,


Just curious why we assign the nodes start from self.motif.num_nodes * num_motifs with the step self.motif.num_nodes with the explain mask True?

This is a very good question, and I don't have an answer. I'm following the current implementation on ba_shapes.py dataset, which generates a base graph with 300 nodes + 80 houses (80 * 5 nodes), and assign the explain mask True, starting from 400.

test/explain/test_graph_generator.py

torch_geometric/datasets/explainer_dataset.py

BlazStojanovic

Thank you for your amazing work @rfdavid! 👍🏻

I just have one caveat regarding the return type of benchmark datasets, I am of the opinion that it should be Explanation instead of Data (sorry for not chiming into the discussion earlier). This would make more sense because these benchmark datasets really return ground truth explanations, and it will make interfacing to evaluation of explanations more clear. Also interested what others think about this (mainly @RexYing)

torch_geometric/datasets/generators/graph_generator.py

rusty1s · 2022-12-12T08:59:52Z

Hey @rfdavid, thank you for this awesome PR. Really like it. I still made some modifications in order to separate concerns. For example, I think that ExplainerDataset should take care of attaching motifs, not the underlying graph generator. As such, I splitted the logic into MotifGenerator, GraphGenerator, and they all come together in the ExplainerDataset. I also added automatic resolving of generator names such that one can do motif_generator="house". Hope the changes are okay to you.

This PR implements BA graphs following the framework implemented in #6104. Depends on #6104. - BAGraph uses `barabasi_albert_graph` logic from `utils.py` to generate the base graph and then call `generate_feature` and `attach_motif` from GraphGenerator - ExplainerDataset is the main interface to call the generator and return the dataset object #### Example of using BA Shapes ``` motif = Motif('house') generator = BAGraph(num_nodes=300, num_motifs=80, motif=motif, seed=1234) dataset = ExplainerDataset(generator) ``` The following files are NOT part of this repository and will be removed since it is part of #6104. I included those to facilitate the reproducibility of the feature: - `torch_geometric/datasets/explainer_dataset.py` - `torch_geometric/datasets/generators/graph_generator.py` - `torch_geometric/datasets/generators/motif.py` This PR is part of the task defined in #5817. Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

@rfdavid

This task is part of [GNN Explainability Dataset Generation](#5817) Implementation of the base class for motif generation. It depends on [the framework and API to generate benchmark datasets](#6104). The base class follows some of the structure from @rfdavid PRs. However, I changed a bit since I believe using Data is a better and cleaner approach to generate structures and create wrappers for other structures in PyG or NetworkX. Once I have some feedback about the `MotifGenerator`, I will go ahead and - [x] update the `GraphGenerator.attach_motif()` - [x] add tests - [x] documentation. Co-authored-by: Emanuel Seemann <3380606+seemanne@users.noreply.github.com> Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>

rfdavid · 2022-12-12T22:30:45Z

Hey @rfdavid, thank you for this awesome PR. Really like it. I still made some modifications in order to separate concerns. For example, I think that ExplainerDataset should take care of attaching motifs, not the underlying graph generator. As such, I splitted the logic into MotifGenerator, GraphGenerator, and they all come together in the ExplainerDataset. I also added automatic resolving of generator names such that one can do motif_generator="house". Hope the changes are okay to you.

Thank you, @rusty1s! That's way better. I hope my previous implementation was helpful somehow. I'll follow the pattern from your development for the next implementations.

rusty1s · 2022-12-13T09:18:26Z

Yes, it was great :)

rfdavid added 2 commits November 30, 2022 22:30

Benchmark dataset (overall framework and API)

60f71ac

added generate_base_graph

3ed03ac

rfdavid mentioned this pull request Dec 1, 2022

BAGraph dataset for GNNExplainer #6072

Merged

rusty1s assigned rfdavid Dec 1, 2022

rusty1s added feature 0 - Priority P0 dataset explain labels Dec 1, 2022

rusty1s changed the title ~~Benchmark dataset~~ Explainability Benchmark Dataset Framework Dec 1, 2022

rusty1s self-assigned this Dec 1, 2022

RexYing requested changes Dec 1, 2022

View reviewed changes

rfdavid added 6 commits December 2, 2022 16:19

desgin pattern changes and added tests

8d90589

design pattern changes and added tests

95adbb0

Merge branch 'benchmark_dataset' of github.com:rfdavid/pytorch_geomet…

b03edb7

…ric into benchmark_dataset

Merge branch 'master' into benchmark_dataset

6d24f5a

Merge branch 'benchmark_dataset' of github.com:rfdavid/pytorch_geomet…

a7446db

…ric into benchmark_dataset

added seed option

7e4d600

added seed support, tests and class descriptions

3f65984

- Added seed in graph generator - Added documentation in graph and motif classes - Simplified GraphGenerator base class - Fixed test error - Added tests - Modified Changelog

rfdavid added a commit to rfdavid/pytorch_geometric that referenced this pull request Dec 7, 2022

added files from depeneding branch

d96404c

- Added files from pyg-team#6104 - Change ba_graph.py and graph_generator.py to use the Benchmark Dataset Framework API

RexYing requested a review from zechengz December 7, 2022 16:36

RexYing approved these changes Dec 7, 2022

View reviewed changes

added reproducibility test

3ea51b3

zechengz reviewed Dec 8, 2022

View reviewed changes

cuent mentioned this pull request Dec 8, 2022

Custom motif generation #6179

Merged

3 tasks

BlazStojanovic reviewed Dec 8, 2022

View reviewed changes

torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved

rfdavid added 3 commits December 8, 2022 23:05

Merge branch 'master' into benchmark_dataset

7ec8815

Code refactoring and annotations

4ca1e9c

Change Data to Explanation class

a22e66c

rfdavid added 3 commits December 9, 2022 14:44

Merge branch 'master' into benchmark_dataset

1d1b492

Added TODO to remove class from test

878329f

Merge branch 'master' into benchmark_dataset

93a3788

shenoynikhil mentioned this pull request Dec 10, 2022

Explainability Dataset Task: ERGraph #6073

Merged

2 tasks

rfdavid and others added 3 commits December 11, 2022 23:36

Merge branch 'master' into benchmark_dataset

69dfac1

update

72f0403

move

35d1e4a

rusty1s approved these changes Dec 12, 2022

View reviewed changes

rusty1s added 4 commits December 12, 2022 09:41

doc-string

fa6b90a

update

3b56d4f

update

9b191d1

update

9c4bba3

rusty1s merged commit e3c50c1 into pyg-team:master Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explainability Benchmark Dataset Framework #6104

Explainability Benchmark Dataset Framework #6104

rfdavid commented Dec 1, 2022 •

edited

RexYing left a comment

rfdavid commented Dec 2, 2022

codecov bot commented Dec 5, 2022 •

edited

RexYing left a comment

zechengz left a comment

zechengz Dec 8, 2022

zechengz Dec 8, 2022

zechengz Dec 8, 2022

zechengz Dec 8, 2022

zechengz Dec 8, 2022

rfdavid Dec 9, 2022

zechengz Dec 8, 2022

rfdavid Dec 9, 2022

BlazStojanovic left a comment

rusty1s commented Dec 12, 2022

rfdavid commented Dec 12, 2022

rusty1s commented Dec 13, 2022

	def attach_motif(self, num_motifs=80) -> None:
	def attach_motif(self, num_motifs: int = 80):

	`generate_feature` and `attach_motif` methods used to generate
	:meth:`generate_feature` and :meth:`attach_motif` to generate

	num_nodes (int): The number of nodes used to attach the motifs.
	num_nodes (int): Number of nodes in the base graph.

Explainability Benchmark Dataset Framework #6104

Explainability Benchmark Dataset Framework #6104

Conversation

rfdavid commented Dec 1, 2022 • edited

Implementing a new graph generator

Example for the final user

RexYing left a comment

Choose a reason for hiding this comment

rfdavid commented Dec 2, 2022

codecov bot commented Dec 5, 2022 • edited

Codecov Report

RexYing left a comment

Choose a reason for hiding this comment

zechengz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BlazStojanovic left a comment

Choose a reason for hiding this comment

rusty1s commented Dec 12, 2022

rfdavid commented Dec 12, 2022

rusty1s commented Dec 13, 2022

rfdavid commented Dec 1, 2022 •

edited

codecov bot commented Dec 5, 2022 •

edited