Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explainability Benchmark Dataset Framework #6104

Merged
merged 23 commits into from
Dec 12, 2022

Conversation

rfdavid
Copy link
Contributor

@rfdavid rfdavid commented Dec 1, 2022

This PR implements benchmark dataset overall framework and API for generating benchmark datasets (#5817). Any feedback is appreciated, especially regarding the proposed architecture.

Implementing a new graph generator

Example for the final user

motif = Motif('house')
generator = BAGraph(num_nodes=300, num_motifs=80, motif=motif)
dataset = ExplainerDataset(generator)

TODO:

  • Finish graph generator (provide all necessary methods: feature generator, label generator)
  • Add tests
  • Add to Change log
  • Documentation

@rusty1s rusty1s changed the title Benchmark dataset Explainability Benchmark Dataset Framework Dec 1, 2022
@rusty1s rusty1s self-assigned this Dec 1, 2022
Copy link
Contributor

@RexYing RexYing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great pr:) We also had a great discussion and I just note down some comments I made.

torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved
torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved
torch_geometric/datasets/generators/motif.py Outdated Show resolved Hide resolved
torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved
torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved
@rfdavid
Copy link
Contributor Author

rfdavid commented Dec 2, 2022

Thanks for the great pr:) We also had a great discussion and I just note down some comments I made.

It was a great pleasure for me, thank you so much for the insights. All points are clear, I'll address them and commit my changes soon.

@codecov
Copy link

codecov bot commented Dec 5, 2022

Codecov Report

Merging #6104 (9b191d1) into master (bc47556) will decrease coverage by 1.87%.
The diff coverage is 83.87%.

❗ Current head 9b191d1 differs from pull request most recent head 9c4bba3. Consider uploading reports for the commit 9c4bba3 to get more accurate results

@@            Coverage Diff             @@
##           master    #6104      +/-   ##
==========================================
- Coverage   86.43%   84.55%   -1.88%     
==========================================
  Files         371      372       +1     
  Lines       20877    20848      -29     
==========================================
- Hits        18046    17629     -417     
- Misses       2831     3219     +388     
Impacted Files Coverage Δ
torch_geometric/resolver.py 82.75% <82.75%> (ø)
torch_geometric/nn/resolver.py 88.70% <100.00%> (-1.07%) ⬇️
torch_geometric/nn/models/dimenet_utils.py 0.00% <0.00%> (-75.52%) ⬇️
torch_geometric/nn/models/dimenet.py 14.90% <0.00%> (-52.76%) ⬇️
torch_geometric/profile/profile.py 33.33% <0.00%> (-25.39%) ⬇️
torch_geometric/nn/conv/utils/typing.py 83.75% <0.00%> (-15.00%) ⬇️
torch_geometric/nn/pool/asap.py 92.10% <0.00%> (-7.90%) ⬇️
torch_geometric/nn/dense/linear.py 85.40% <0.00%> (-7.84%) ⬇️
torch_geometric/nn/inits.py 67.85% <0.00%> (-7.15%) ⬇️
torch_geometric/transforms/add_self_loops.py 94.44% <0.00%> (-5.56%) ⬇️
... and 30 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

- Added seed in graph generator
- Added documentation in graph and motif classes
- Simplified GraphGenerator base class
- Fixed test error
- Added tests
- Modified Changelog
rfdavid added a commit to rfdavid/pytorch_geometric that referenced this pull request Dec 7, 2022
- Added files from pyg-team#6104
- Change ba_graph.py and graph_generator.py to use the
  Benchmark Dataset Framework API
@RexYing RexYing requested a review from zechengz December 7, 2022 16:36
Copy link
Contributor

@RexYing RexYing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks overall it looks great now. I'll let Zecheng or Charles to have a look in case I miss anything.

Copy link
Member

@zechengz zechengz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :) ! In general the PR looks good to me. Left some comments.

def generate_feature(self, num_features: int = 10) -> None:
self._x = torch.ones((self.num_nodes, num_features), dtype=torch.float)

def attach_motif(self, num_motifs=80) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def attach_motif(self, num_motifs=80) -> None:
def attach_motif(self, num_motifs: int = 80):

And it will be appreciated if we can add some doc string here.

self.generate_base_graph()

@property
def data(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def data(self):
def data(self) -> Data:

torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved
torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved

class GraphGenerator:
r"""Base class for generating benchmark datasets. It contains
`generate_feature` and `attach_motif` methods used to generate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`generate_feature` and `attach_motif` methods used to generate
:meth:`generate_feature` and :meth:`attach_motif` to generate

Motif object to be attached to the base graph.
seed (int, Optional): seed number for the generator.
"""
def __init__(self, num_nodes: int = 300, motif: Optional[Callable] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think motif here should not be a Callable?

currently attached in random order.

Args:
num_nodes (int): The number of nodes used to attach the motifs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_nodes (int): The number of nodes used to attach the motifs.
num_nodes (int): Number of nodes in the base graph.

Does the base graph indicate the graph before attaching the motifs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly.

self.num_nodes += self.motif.num_nodes

self._expl_mask = torch.zeros(self.num_nodes, dtype=torch.bool)
self._expl_mask[torch.arange(self.motif.num_nodes * num_motifs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why we assign the nodes start from self.motif.num_nodes * num_motifs with the step self.motif.num_nodes with the explain mask True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good question, and I don't have an answer. I'm following the current implementation on ba_shapes.py dataset, which generates a base graph with 300 nodes + 80 houses (80 * 5 nodes), and assign the explain mask True, starting from 400.

test/explain/test_graph_generator.py Outdated Show resolved Hide resolved
torch_geometric/datasets/explainer_dataset.py Outdated Show resolved Hide resolved
@cuent cuent mentioned this pull request Dec 8, 2022
3 tasks
Copy link
Contributor

@BlazStojanovic BlazStojanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your amazing work @rfdavid! 👍🏻

I just have one caveat regarding the return type of benchmark datasets, I am of the opinion that it should be Explanation instead of Data (sorry for not chiming into the discussion earlier). This would make more sense because these benchmark datasets really return ground truth explanations, and it will make interfacing to evaluation of explanations more clear. Also interested what others think about this (mainly @RexYing)

torch_geometric/datasets/generators/graph_generator.py Outdated Show resolved Hide resolved
@rusty1s
Copy link
Member

rusty1s commented Dec 12, 2022

Hey @rfdavid, thank you for this awesome PR. Really like it. I still made some modifications in order to separate concerns. For example, I think that ExplainerDataset should take care of attaching motifs, not the underlying graph generator. As such, I splitted the logic into MotifGenerator, GraphGenerator, and they all come together in the ExplainerDataset. I also added automatic resolving of generator names such that one can do motif_generator="house". Hope the changes are okay to you.

@rusty1s rusty1s merged commit e3c50c1 into pyg-team:master Dec 12, 2022
rusty1s added a commit that referenced this pull request Dec 12, 2022
This PR implements BA graphs following the framework implemented in
#6104.
Depends on #6104.

- BAGraph uses `barabasi_albert_graph` logic from `utils.py` to generate
the base graph and then call `generate_feature` and `attach_motif` from
GraphGenerator
- ExplainerDataset is the main interface to call the generator and
return the dataset object

#### Example of using BA Shapes

```
motif = Motif('house')
generator = BAGraph(num_nodes=300, num_motifs=80, motif=motif, seed=1234)
dataset = ExplainerDataset(generator)
```
The following files are NOT part of this repository and will be removed
since it is part of #6104. I included those to facilitate the
reproducibility of the feature:

- `torch_geometric/datasets/explainer_dataset.py`
- `torch_geometric/datasets/generators/graph_generator.py`
- `torch_geometric/datasets/generators/motif.py`

This PR is part of the task defined in #5817.

Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
rusty1s added a commit that referenced this pull request Dec 12, 2022
This task is part of [GNN Explainability Dataset
Generation](#5817)

Implementation of the base class for motif generation. It depends on
[the framework and API to generate benchmark
datasets](#6104).

The base class follows some of the structure from @rfdavid PRs. However,
I changed a bit since I believe using Data is a better and cleaner
approach to generate structures and create wrappers for other structures
in PyG or NetworkX.

Once I have some feedback about the `MotifGenerator`, I will go ahead
and

- [x] update the `GraphGenerator.attach_motif()`
- [x] add tests 
- [x] documentation.

Co-authored-by: Emanuel Seemann <3380606+seemanne@users.noreply.github.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
@rfdavid
Copy link
Contributor Author

rfdavid commented Dec 12, 2022

Hey @rfdavid, thank you for this awesome PR. Really like it. I still made some modifications in order to separate concerns. For example, I think that ExplainerDataset should take care of attaching motifs, not the underlying graph generator. As such, I splitted the logic into MotifGenerator, GraphGenerator, and they all come together in the ExplainerDataset. I also added automatic resolving of generator names such that one can do motif_generator="house". Hope the changes are okay to you.

Thank you, @rusty1s! That's way better. I hope my previous implementation was helpful somehow. I'll follow the pattern from your development for the next implementations.

@rusty1s
Copy link
Member

rusty1s commented Dec 13, 2022

Yes, it was great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants