-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add ability to check validity of PAGs #73
Conversation
Codecov Report
@@ Coverage Diff @@
## main #73 +/- ##
==========================================
- Coverage 84.42% 79.23% -5.19%
==========================================
Files 34 34
Lines 2555 2736 +181
Branches 687 750 +63
==========================================
+ Hits 2157 2168 +11
- Misses 251 421 +170
Partials 147 147
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@adam2392 I made this draft PR since I am uncertain about a couple of things:
|
Thanks for opening up the PR to get early feedback!
The location of the file is fine.
No, you have to follow the theorem linked and then the definition of the MAG. Just cuz you have or don't have circle edges does not imply anything.
I believe theorem 2 says o-> into -> and -o into ->, so directed, so I don't know if you did the first step correctly; the -o case is not handled. Take a look at the docstring of PAG to see how we implement these edge patterns. Orienting to a DAG with no unshielded colliders means for circle triangles (circle triplets that are shielded) they can be oriented arbitrarily since a collider is possible but can't be detected due to the shieldedness. For circle triplets that are unshielded, they can be oriented arbitrarily as long as it's not a collider. So it's a combo of directed and bidirected edges. Maybe you can find the tetrad code that implements this for inspiration. This step may require some generic graph traversal... this issue might actually be harder than I thought, but it's just a standard graph BFS/DFS traversal if you're familiar with graph algorithms. E.g. see the other algorithms, and what they do. If this part is an issue, lmk. You can proceed as if tho there is a function that does the above. |
I found the tetrad code for turning a PAG into an MAG!
I meant to ask if this check is enough as a pre-cursor to the actual validity check. I am assuming that a graph cannot be a PAG if it doesn't have any circle edges, and if so, it is automatically classified as an invalid PAG.
I am! (I took multiple classes on graphs and I have done a past project where I implemented some of the graph traversal algorithms for robotics application.) I still expect to encounter some difficulties though. I will look through the tetrad code and let you know if I am unable to understand what to do exactly after that. |
Ok so I looked it up and it turns out that a graph can be a PAG even if it doesn't have any circle edges. |
pywhy_graphs/algorithms/pag.py
Outdated
for u, v in cedges: | ||
if (v, u) in dedges: | ||
to_remove.append((u, v)) | ||
elif (v, u) not in cedges: | ||
to_reorient.append((u, v)) | ||
elif (v, u) in cedges and (v, u) not in to_replace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The not in to_replace
does not scale well. You should use a different data structure to check inclusion (e.g. set, or dict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
pywhy_graphs/algorithms/pag.py
Outdated
if (v, u) in dedges: | ||
to_remove.append((u, v)) | ||
elif (v, u) not in cedges: | ||
to_reorient.append((u, v)) | ||
elif (v, u) in cedges and (v, u) not in to_replace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add some in-line comment to describe this sequence of if/else statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Feel free to ping me when/if there's any issues or you need a review. |
@adam2392 sorry for the delay but I have been stuck with a work emergency. Will get back to this issue from Wednesday.
Occasionally I have a problem understanding some theoretical things and I feel like GitHub is not the place clarify them. Is there a better way to communicate with you (like email or slack?) or would like me to ask all of those question here as well? |
Sounds good no rush!
Feel free to summarize them here, in case other devs following e.g. @jaron-lee. If it warrants a call, then we can hop on discord. Are you on the discord for pywhy? |
I am not. But I will join now. Did not know a discord server existed for pywhy (I was expecting a slack channel but I like Discord more anyway :-)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding sub-functions to is_valid_PAG
to make it more modular.
@robertness you're right. I have been planning to make major chunks of the implementation modular (starting from the meek rules). |
pywhy_graphs/algorithms/pag.py
Outdated
@@ -908,3 +908,232 @@ def _check_ts_node(node): | |||
) | |||
if node[1] > 0: | |||
raise ValueError(f"All lag points should be 0, or less. You passed in {node}.") | |||
|
|||
|
|||
def orient_edges(graph: CPDAG) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't just copy/paste the code directly. Prolly should be renamed, etc.
E.g. orient_edges
is sort of a bad name, cuz it is in the pag.py
file and it works on a CPDAG. Perhaps, you rename it to a private function called _apply_meek_rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@adam2392 I am trying to import the function and use it in a script like this:
Do you know why? |
You have to expose it to the import module by adding an entry to |
pywhy_graphs/algorithms/pag.py
Outdated
@@ -551,6 +552,7 @@ def pds( | |||
---------- | |||
.. footbibliography:: | |||
""" | |||
print("HAHAHAHAHHA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is just a debugging tool but should be removed before committing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
@adam2392 I think I have constructed the MAG correctly (I think the rule 4 is correctly implemented). Now how do I check the validity of an MAG? Can you explain it to me in a detailed way? |
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
I think the easier thing might be to start with a few tests where this should work and where it shouldn't work. You can start w/ small PAGs and use tetrad GUI to construct examples that you then add into a unit-test. |
Ah! Thank you for the suggestion @adam2392. I will have to setup tetrad before that but this will make my job much easier. |
@adam2392 I am facing a weird issue. In the middle of the function, I construct a temporary CPDAG. When I print out the edges of the CPDAG, this is what I get:
But when I draw the graph, I get two different graphs on different runs (both are wrong too): Note that the edge list remains the same in both the graphs. |
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Do you have some reproducing MVP code for this? This seems like a bug in the Feel free to attend weekly meetings in the |
For future, it might be helpful to keep the code a bit shorter to make the error apparent. I took a look tho and it seems like there's probably just a bug in the CPDAG drawing. It is drawing extra edges it should not be. Are we able to replicate this with just two/three nodes? |
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392 I have three questions for you.
|
@adam2392 As for your question, yes, I am able to replicate the bug with just two three nodes. |
The function implements it correctly and checks for more than just collider status. pywhy-graphs/pywhy_graphs/networkx/algorithms/causal/m_separation.py Lines 151 to 165 in 8658c5b
Try looking up d-separation materials and reading up on why descendants open up collider paths.
Why not just use @jaron-lee does
This is not true because the PAG reconstructed from the MAG may not match the PAG you fed in. I think if you implement a few positive/negative test cases starting w/ what is easy, this will be easier to discuss. Do you have such unit tests yet? Do you mind pushing them up to the PR? |
Can you open a GH issue related to this? and paste the MVP code sample? |
@adam2392 I didn't know of this function. I will use it now.
How come? I was under the belief that when you say 'reconstruct' the PAG, you want me to re-trace the steps followed throughout the function backwards. If I do that correctly, won't the reconstructed PAG always match the input PAG? And if not this, then what do you mean by reconstruct?
I do not have such test cases yet. Do you know where I can find test cases that have pre-tagged valid and non-valid PAGs? |
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
If you use Tetrad and draw an arbitrary graph with circle endpoints and then run the check via the GUI (I think it's called check valid graph coloring or something like that), you will be able to construct various cases. |
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start on the unit-tests.
You can further think of unit-tests by taking a look at each of the FCI rules and draw in tetrad a graph that is sort of opposite those rules, and then you can add a separate unit-test checking each one.
E.g.
X <-o Y <-o Z <-o A
is invalid and so is
X <- Y <- Z <-o A; Z <-o B
because Z <-o A should be Z<-A due to R1 of FCI. And you can repeat this exercise for each rule having each one be a different unit-test, so it's easy to review and debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this might not cover everything in the algorithm, but then this is a great step in the right direction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam2392 It took me a while to figure out Tetrad.
How many different tests do you want me to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will probably help you develop the algorithm, so probably 1 test for each FCI rule. The is_valid_pag algorithm will by construction return False if there are any rules left to be applied, so a separate test for each rule will help you as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it will help us narrow down the possible runtime error-surfaces. For graph stuff, it's hard to test every single edge case, but good to test things that we know are definitely wrong/correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam2392 is this enough?
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
# A o--> B o--> C o--o D o--o F | ||
assert not is_valid_PAG(pag) | ||
|
||
pag.remove_edge("B", "C") | ||
pag.add_edge("B", "C", pag.circle_edge_name) | ||
pag.add_edge("C", "B", pag.directed_edge_name) | ||
|
||
# A o--> B <--o C o--o D o--o F | ||
assert is_valid_PAG(pag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have each as a separate test, so that way the graph is explicitly constructed to be the example/counter-example.
It's hard to review/read the test rn becuz one has to keep track of the edge changes that occur.
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392 this branch has become too convoluted for me to keep everything straight. Do you mind if I delete this and start fresh on a new branch? |
Yes that is a perfectly fine and acceptable practice |
Closes #67
Changes proposed in this pull request:
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting