-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Explainability Evaluation] - GNN model (Un)Faithfulness #6090
[Explainability Evaluation] - GNN model (Un)Faithfulness #6090
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6090 +/- ##
==========================================
- Coverage 86.46% 84.55% -1.91%
==========================================
Files 378 379 +1
Lines 21106 21100 -6
==========================================
- Hits 18249 17842 -407
- Misses 2857 3258 +401
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. Lets add some tests.
node and graph faithfulness merged to one function updates for examples adding test file
…o explain_metric_faithfulness # Conflicts: # examples/explain/explain_metric_faithfulness_graph.py # examples/explain/explain_metric_faithfulness_node.py # torch_geometric/explain/metrics/faithfulness.py
for more information, see https://pre-commit.ci
…o explain_metric_faithfulness # Conflicts: # torch_geometric/explain/metrics/faithfulness.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @ZeynepP for your work, and sorry for late PR review!
I've left some comments on how to make the functionality of faithfulness.py
more general and make the usability more consistent between all the metrics.
High-level comments
- Instead of trying to implement
faithfulness
just as in the paper, let's think about how to make it as good and as general as possible! (See notes about faithfulness and perturbations) - I don't think we need two examples for this metric. The behaviour between node and graph faithfulness should be similar enough that only one example is needed to understand both.
- We want the metrics to be as similar as possible among each other, so let's try and define faithfulness as:
graph_faithfulness(explainer: Explainer, explanation: Explanation, index: Union[int, Tensor] = None, topk: int, **kwargs):
. Alongside more unified interface, this has the additional benefit of having the Explainer (along with all of its configs) available for various checks and branches you might want to do. - Let's try to leverage as much of the existing functionality of
Explainer
andExplanation
as possible, this will make implementation offaithfulness
much easier and clearer (see my comments for details). Also on this note, I'll try to updateExplainer
andExplanation
so I make your job easier, this will include updatingget_explanation_subgraph
andget_complement_subgraph
so that they include all updated attributes needed for prediction and different utils for properly (and consistently across the codebase) obtaining thresholded masks. Will follow up with PR, so you can see what new functionality is available.
Notes about faithfulness and perturbations
You are correct that there are really two types of faithfulness which makes all of this very confusing.
- there is
node_faithfulness
(or just faithfulness) as defined in https://arxiv.org/pdf/2106.09078.pdf. This is the case with node perturbations. - then you have
graph_explanation_faithfulness
which is from the paper https://arxiv.org/pdf/2208.09339.pdf, this one is a bit easier to implement as it just imposes a top-k map over features and calls it a day. - They also measure faithfulness in different ways (1) measures the average difference between model predictions (2) measures the KL divergence.
- My suggestion: Let's combine all of this functionality into a single function, moreover let's expand it in different ways
Here are my implementation considerations
- Focus on explaining a single instance at a time (this you have already done)
- The instance can be anything the
Explainer
can explain, this means: node, edge, graph level tasks (also many nodes and edges)
- The instance can be anything the
- The basic idea behind faithfulness is simple, we make small perturbations and see if these have a significant effect on the predictions:
- Users might want to measure faithfulness both in regression and classification settings
- Users might want to perturb node features, edge features, graph connectivity, we can provide them with functionality for all of these
- Users might want to test different thresholding (i.e. topk at different levels or hard_mask at different thresholds)
- My ultimate suggestion is to start with implementing a
faithfulness
function that takes into account all of the above, and returns a single score, but it is flexible enough that it encompasses all previous definitions of faithfulness. We can later wrap the function to produce specific faithfulnesses (e.g. node feature faithfulness, etc.)
Tell me what you think, I'd love to chat more and guide you through this (quite challenging) task! :)
7569215
to
49cc613
Compare
663dd8d
to
49cc613
Compare
I did go with graph (un)faithfulness for now to be able to finally merge this. @BlazStojanovic Let me know if you think node (un)faithfulness is also necessary. |
…rch_geometric into explain_metric_faithfulness
Here is the implementation of metric faithfulness with its examples for node and graph classification. I made some implementation decisions, we can discuss. I did not implement edge_mask, again to discuss.
#5961