-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explainability Benchmark Dataset Framework #6104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great pr:) We also had a great discussion and I just note down some comments I made.
It was a great pleasure for me, thank you so much for the insights. All points are clear, I'll address them and commit my changes soon. |
…ric into benchmark_dataset
…ric into benchmark_dataset
Codecov Report
@@ Coverage Diff @@
## master #6104 +/- ##
==========================================
- Coverage 86.43% 84.55% -1.88%
==========================================
Files 371 372 +1
Lines 20877 20848 -29
==========================================
- Hits 18046 17629 -417
- Misses 2831 3219 +388
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
- Added seed in graph generator - Added documentation in graph and motif classes - Simplified GraphGenerator base class - Fixed test error - Added tests - Modified Changelog
- Added files from pyg-team#6104 - Change ba_graph.py and graph_generator.py to use the Benchmark Dataset Framework API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks overall it looks great now. I'll let Zecheng or Charles to have a look in case I miss anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :) ! In general the PR looks good to me. Left some comments.
def generate_feature(self, num_features: int = 10) -> None: | ||
self._x = torch.ones((self.num_nodes, num_features), dtype=torch.float) | ||
|
||
def attach_motif(self, num_motifs=80) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def attach_motif(self, num_motifs=80) -> None: | |
def attach_motif(self, num_motifs: int = 80): |
And it will be appreciated if we can add some doc string here.
self.generate_base_graph() | ||
|
||
@property | ||
def data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def data(self): | |
def data(self) -> Data: |
|
||
class GraphGenerator: | ||
r"""Base class for generating benchmark datasets. It contains | ||
`generate_feature` and `attach_motif` methods used to generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`generate_feature` and `attach_motif` methods used to generate | |
:meth:`generate_feature` and :meth:`attach_motif` to generate |
Motif object to be attached to the base graph. | ||
seed (int, Optional): seed number for the generator. | ||
""" | ||
def __init__(self, num_nodes: int = 300, motif: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think motif
here should not be a Callable
?
currently attached in random order. | ||
|
||
Args: | ||
num_nodes (int): The number of nodes used to attach the motifs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_nodes (int): The number of nodes used to attach the motifs. | |
num_nodes (int): Number of nodes in the base graph. |
Does the base graph indicate the graph before attaching the motifs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly.
self.num_nodes += self.motif.num_nodes | ||
|
||
self._expl_mask = torch.zeros(self.num_nodes, dtype=torch.bool) | ||
self._expl_mask[torch.arange(self.motif.num_nodes * num_motifs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious why we assign the nodes start from self.motif.num_nodes * num_motifs
with the step self.motif.num_nodes
with the explain mask True
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very good question, and I don't have an answer. I'm following the current implementation on ba_shapes.py
dataset, which generates a base graph with 300 nodes + 80 houses (80 * 5 nodes), and assign the explain mask True
, starting from 400.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your amazing work @rfdavid! 👍🏻
I just have one caveat regarding the return type of benchmark datasets, I am of the opinion that it should be Explanation
instead of Data
(sorry for not chiming into the discussion earlier). This would make more sense because these benchmark datasets really return ground truth explanations, and it will make interfacing to evaluation of explanations more clear. Also interested what others think about this (mainly @RexYing)
Hey @rfdavid, thank you for this awesome PR. Really like it. I still made some modifications in order to separate concerns. For example, I think that |
This PR implements BA graphs following the framework implemented in #6104. Depends on #6104. - BAGraph uses `barabasi_albert_graph` logic from `utils.py` to generate the base graph and then call `generate_feature` and `attach_motif` from GraphGenerator - ExplainerDataset is the main interface to call the generator and return the dataset object #### Example of using BA Shapes ``` motif = Motif('house') generator = BAGraph(num_nodes=300, num_motifs=80, motif=motif, seed=1234) dataset = ExplainerDataset(generator) ``` The following files are NOT part of this repository and will be removed since it is part of #6104. I included those to facilitate the reproducibility of the feature: - `torch_geometric/datasets/explainer_dataset.py` - `torch_geometric/datasets/generators/graph_generator.py` - `torch_geometric/datasets/generators/motif.py` This PR is part of the task defined in #5817. Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
This task is part of [GNN Explainability Dataset Generation](#5817) Implementation of the base class for motif generation. It depends on [the framework and API to generate benchmark datasets](#6104). The base class follows some of the structure from @rfdavid PRs. However, I changed a bit since I believe using Data is a better and cleaner approach to generate structures and create wrappers for other structures in PyG or NetworkX. Once I have some feedback about the `MotifGenerator`, I will go ahead and - [x] update the `GraphGenerator.attach_motif()` - [x] add tests - [x] documentation. Co-authored-by: Emanuel Seemann <3380606+seemanne@users.noreply.github.com> Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Thank you, @rusty1s! That's way better. I hope my previous implementation was helpful somehow. I'll follow the pattern from your development for the next implementations. |
Yes, it was great :) |
This PR implements benchmark dataset overall framework and API for generating benchmark datasets (#5817). Any feedback is appreciated, especially regarding the proposed architecture.
Implementing a new graph generator
generate_base_graph
using the provided methods (generate_feature
andattach_motif
)BAGraph
dataset forGNNExplainer
#6072 and Explainability Dataset Task:ERGraph
#6073 to check how it was implementedExample for the final user
TODO: