Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add ability to check validity of PAGs #73

Closed
wants to merge 28 commits into from

Conversation

aryan26roy
Copy link
Collaborator

Closes #67

Changes proposed in this pull request:

  • Adds a function to check the validity of provided PAGs.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

@codecov-commenter
Copy link

codecov-commenter commented Apr 7, 2023

Codecov Report

Merging #73 (006c8e3) into main (8658c5b) will decrease coverage by 5.19%.
The diff coverage is 7.10%.

@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
- Coverage   84.42%   79.23%   -5.19%     
==========================================
  Files          34       34              
  Lines        2555     2736     +181     
  Branches      687      750      +63     
==========================================
+ Hits         2157     2168      +11     
- Misses        251      421     +170     
  Partials      147      147              
Impacted Files Coverage Δ
pywhy_graphs/algorithms/pag.py 54.92% <7.10%> (-36.10%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@aryan26roy
Copy link
Collaborator Author

@adam2392 I made this draft PR since I am uncertain about a couple of things:

  • Is this the right place to put this function? (I thought of making it a method of the PAG class but you said you intend to do away with these specific classes, so chose to put it in this file instead.)
  • Right now I am only checking if the provided PAG has circle edges, is this enough?
  • I implemented the first step of the theorem 2 from the linked paper. The second step requires getting rid of all unshielded colliders, for this I have to add edges between several node. What should these edges be? Directed? Undirected? Bidirected? If they are directed, in what direction?

@adam2392
Copy link
Collaborator

adam2392 commented Apr 7, 2023

Thanks for opening up the PR to get early feedback!

  • Is this the right place to put this function? (I thought of making it a method of the PAG class but you said you intend to do away with these specific classes, so chose to put it in this file instead.)

The location of the file is fine.

  • Right now I am only checking if the provided PAG has circle edges, is this enough?

No, you have to follow the theorem linked and then the definition of the MAG. Just cuz you have or don't have circle edges does not imply anything.

  • I implemented the first step of the theorem 2 from the linked paper. The second step requires getting rid of all unshielded colliders, for this I have to add edges between several node. What should these edges be? Directed? Undirected? Bidirected? If they are directed, in what direction?

I believe theorem 2 says o-> into -> and -o into ->, so directed, so I don't know if you did the first step correctly; the -o case is not handled. Take a look at the docstring of PAG to see how we implement these edge patterns.

Orienting to a DAG with no unshielded colliders means for circle triangles (circle triplets that are shielded) they can be oriented arbitrarily since a collider is possible but can't be detected due to the shieldedness. For circle triplets that are unshielded, they can be oriented arbitrarily as long as it's not a collider. So it's a combo of directed and bidirected edges. Maybe you can find the tetrad code that implements this for inspiration.

This step may require some generic graph traversal... this issue might actually be harder than I thought, but it's just a standard graph BFS/DFS traversal if you're familiar with graph algorithms. E.g. see the other algorithms, and what they do.

If this part is an issue, lmk. You can proceed as if tho there is a function that does the above.

@aryan26roy
Copy link
Collaborator Author

aryan26roy commented Apr 8, 2023

I found the tetrad code for turning a PAG into an MAG!

No, you have to follow the theorem linked and then the definition of the MAG. Just cuz you have or don't have circle edges does not imply anything.

I meant to ask if this check is enough as a pre-cursor to the actual validity check. I am assuming that a graph cannot be a PAG if it doesn't have any circle edges, and if so, it is automatically classified as an invalid PAG.

This step may require some generic graph traversal... this issue might actually be harder than I thought, but it's just a standard graph BFS/DFS traversal if you're familiar with graph algorithms. E.g. see the other algorithms, and what they do.

I am! (I took multiple classes on graphs and I have done a past project where I implemented some of the graph traversal algorithms for robotics application.) I still expect to encounter some difficulties though. I will look through the tetrad code and let you know if I am unable to understand what to do exactly after that.

@aryan26roy
Copy link
Collaborator Author

Ok so I looked it up and it turns out that a graph can be a PAG even if it doesn't have any circle edges.

for u, v in cedges:
if (v, u) in dedges:
to_remove.append((u, v))
elif (v, u) not in cedges:
to_reorient.append((u, v))
elif (v, u) in cedges and (v, u) not in to_replace:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The not in to_replace does not scale well. You should use a different data structure to check inclusion (e.g. set, or dict)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Comment on lines 939 to 943
if (v, u) in dedges:
to_remove.append((u, v))
elif (v, u) not in cedges:
to_reorient.append((u, v))
elif (v, u) in cedges and (v, u) not in to_replace:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some in-line comment to describe this sequence of if/else statements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@adam2392
Copy link
Collaborator

Feel free to ping me when/if there's any issues or you need a review.

@aryan26roy
Copy link
Collaborator Author

@adam2392 sorry for the delay but I have been stuck with a work emergency. Will get back to this issue from Wednesday.

Feel free to ping me when/if there's any issues or you need a review.

Occasionally I have a problem understanding some theoretical things and I feel like GitHub is not the place clarify them. Is there a better way to communicate with you (like email or slack?) or would like me to ask all of those question here as well?

@adam2392
Copy link
Collaborator

@adam2392 sorry for the delay but I have been stuck with a work emergency. Will get back to this issue from Wednesday.

Sounds good no rush!

Occasionally I have a problem understanding some theoretical things and I feel like GitHub is not the place clarify them. Is there a better way to communicate with you (like email or slack?) or would like me to ask all of those question here as well?

Feel free to summarize them here, in case other devs following e.g. @jaron-lee. If it warrants a call, then we can hop on discord. Are you on the discord for pywhy?

@aryan26roy
Copy link
Collaborator Author

Feel free to summarize them here, in case other devs following e.g. @jaron-lee. If it warrants a call, then we can hop on discord. Are you on the discord for pywhy?

I am not. But I will join now. Did not know a discord server existed for pywhy (I was expecting a slack channel but I like Discord more anyway :-))

Copy link
Collaborator

@robertness robertness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding sub-functions to is_valid_PAG to make it more modular.

@aryan26roy
Copy link
Collaborator Author

@robertness you're right. I have been planning to make major chunks of the implementation modular (starting from the meek rules).

@@ -908,3 +908,232 @@ def _check_ts_node(node):
)
if node[1] > 0:
raise ValueError(f"All lag points should be 0, or less. You passed in {node}.")


def orient_edges(graph: CPDAG) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't just copy/paste the code directly. Prolly should be renamed, etc.

E.g. orient_edges is sort of a bad name, cuz it is in the pag.py file and it works on a CPDAG. Perhaps, you rename it to a private function called _apply_meek_rules

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@aryan26roy
Copy link
Collaborator Author

@adam2392 I am trying to import the function and use it in a script like this:
pywhy_graphs.algorithms.is_valid_PAG(pag)
The output is this error:

AttributeError: module 'pywhy_graphs.algorithms' has no attribute 'is_valid_PAG'

Do you know why?

@adam2392
Copy link
Collaborator

@adam2392 I am trying to import the function and use it in a script like this: pywhy_graphs.algorithms.is_valid_PAG(pag) The output is this error:

AttributeError: module 'pywhy_graphs.algorithms' has no attribute 'is_valid_PAG'

Do you know why?

You have to expose it to the import module by adding an entry to __all__ inside the algorithms/pag.py file at the top.

@@ -551,6 +552,7 @@ def pds(
----------
.. footbibliography::
"""
print("HAHAHAHAHHA")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is just a debugging tool but should be removed before committing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@aryan26roy
Copy link
Collaborator Author

@adam2392 I think I have constructed the MAG correctly (I think the rule 4 is correctly implemented). Now how do I check the validity of an MAG? Can you explain it to me in a detailed way?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392
Copy link
Collaborator

I think the easier thing might be to start with a few tests where this should work and where it shouldn't work. You can start w/ small PAGs and use tetrad GUI to construct examples that you then add into a unit-test.

@aryan26roy
Copy link
Collaborator Author

Ah! Thank you for the suggestion @adam2392. I will have to setup tetrad before that but this will make my job much easier.

@aryan26roy
Copy link
Collaborator Author

@adam2392 I am facing a weird issue. In the middle of the function, I construct a temporary CPDAG. When I print out the edges of the CPDAG, this is what I get:

{'directed': OutEdgeView([('w', 'z'), ('z', 'x'), ('x', 'xy'), ('xy', 'w')]), 'undirected': EdgeView([])}

But when I draw the graph, I get two different graphs on different runs (both are wrong too):

weird2

weird

Note that the edge list remains the same in both the graphs.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392
Copy link
Collaborator

adam2392 commented May 2, 2023

@adam2392 I am facing a weird issue. In the middle of the function, I construct a temporary CPDAG. When I print out the edges of the CPDAG, this is what I get:
But when I draw the graph, I get two different graphs on different runs (both are wrong too):
Note that the edge list remains the same in both the graphs.

Do you have some reproducing MVP code for this? This seems like a bug in the draw function then.

Feel free to attend weekly meetings in the causal discovery channel on discord too

@aryan26roy
Copy link
Collaborator Author

@adam2392 I am sharing the MVP code for this through pastebin as it is too long for a github comment.
I am planning to attend the last one but something else came in the way. I will be there for the next one!

@adam2392
Copy link
Collaborator

adam2392 commented May 4, 2023

@adam2392 I am sharing the MVP code for this through pastebin as it is too long for a github comment.
I am planning to attend the last one but something else came in the way. I will be there for the next one!

For future, it might be helpful to keep the code a bit shorter to make the error apparent.

I took a look tho and it seems like there's probably just a bug in the CPDAG drawing. It is drawing extra edges it should not be. Are we able to replicate this with just two/three nodes?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@aryan26roy
Copy link
Collaborator Author

@adam2392 I have three questions for you.

  • The paper says for m-seperation you check if every collider in the path has a descendant in the set 'z' or not. While the function you linked checks if the collider itself is in 'z' or not. Which is the correct check?
  • To check for m-seperation what I did was instead of making random sets to use as 'z' I made a set of all the colliders in the graph. Now all the colliders in any path are necessarily in this set and all the non-colliders are necessarily not. I think this is a better way doing it, what do you think?
  • You said that I will have to reconstruct the PAG from the MAG. Is that really important? Because my code does not track the changes made to the graph during the application of the meek rules. Isn't checking the validity of the MAG enough?

@aryan26roy
Copy link
Collaborator Author

@adam2392 As for your question, yes, I am able to replicate the bug with just two three nodes.

@adam2392
Copy link
Collaborator

adam2392 commented May 8, 2023

@adam2392 I have three questions for you.

  • The paper says for m-seperation you check if every collider in the path has a descendant in the set 'z' or not. While the function you linked checks if the collider itself is in 'z' or not. Which is the correct check?

The function implements it correctly and checks for more than just collider status.

# Consider if *-> node <-* is opened due to conditioning on collider,
# or descendant of collider
if node in an_z:
if has_directed:
# add <- edges to backward deque
for x, _ in G_directed.in_edges(nbunch=node):
if x not in backward_visited:
backward_deque.append(x)
# add <-> edge to backward deque
if has_bidirected:
for nbr in G_bidirected.neighbors(node):
if nbr not in forward_visited:
forward_deque.append(nbr)

Try looking up d-separation materials and reading up on why descendants open up collider paths.

  • To check for m-seperation what I did was instead of making random sets to use as 'z' I made a set of all the colliders in the graph. Now all the colliders in any path are necessarily in this set and all the non-colliders are necessarily not. I think this is a better way doing it, what do you think?

Why not just use minimal_m_separator, which performs searches to determine if a minimal m-separator exists?

@jaron-lee does minimal_m_separator work to just check if two nodes have an m-separating set? It just does the extra work to make sure it is also minimal right? But if two nodes are m-separated, then minimal_m_separator should return some set always?

  • You said that I will have to reconstruct the PAG from the MAG. Is that really important? Because my code does not track the changes made to the graph during the application of the meek rules. Isn't checking the validity of the MAG enough?

This is not true because the PAG reconstructed from the MAG may not match the PAG you fed in.

I think if you implement a few positive/negative test cases starting w/ what is easy, this will be easier to discuss. Do you have such unit tests yet? Do you mind pushing them up to the PR?

@adam2392
Copy link
Collaborator

adam2392 commented May 8, 2023

@adam2392 As for your question, yes, I am able to replicate the bug with just two three nodes.

Can you open a GH issue related to this? and paste the MVP code sample?

@aryan26roy
Copy link
Collaborator Author

Why not just use minimal_m_separator, which performs searches to determine if a minimal m-separator exists?

@adam2392 I didn't know of this function. I will use it now.

This is not true because the PAG reconstructed from the MAG may not match the PAG you fed in.

How come? I was under the belief that when you say 'reconstruct' the PAG, you want me to re-trace the steps followed throughout the function backwards. If I do that correctly, won't the reconstructed PAG always match the input PAG? And if not this, then what do you mean by reconstruct?

I think if you implement a few positive/negative test cases starting w/ what is easy, this will be easier to discuss. Do you have such unit tests yet? Do you mind pushing them up to the PR?

I do not have such test cases yet. Do you know where I can find test cases that have pre-tagged valid and non-valid PAGs?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392
Copy link
Collaborator

I do not have such test cases yet. Do you know where I can find test cases that have pre-tagged valid and non-valid PAGs?

If you use Tetrad and draw an arbitrary graph with circle endpoints and then run the check via the GUI (I think it's called check valid graph coloring or something like that), you will be able to construct various cases.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Copy link
Collaborator

@adam2392 adam2392 May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start on the unit-tests.

You can further think of unit-tests by taking a look at each of the FCI rules and draw in tetrad a graph that is sort of opposite those rules, and then you can add a separate unit-test checking each one.

E.g.

X <-o Y <-o Z <-o A

is invalid and so is

X <- Y <- Z <-o A; Z <-o B

because Z <-o A should be Z<-A due to R1 of FCI. And you can repeat this exercise for each rule having each one be a different unit-test, so it's easy to review and debug

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this might not cover everything in the algorithm, but then this is a great step in the right direction

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 It took me a while to figure out Tetrad.

How many different tests do you want me to add?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will probably help you develop the algorithm, so probably 1 test for each FCI rule. The is_valid_pag algorithm will by construction return False if there are any rules left to be applied, so a separate test for each rule will help you as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it will help us narrow down the possible runtime error-surfaces. For graph stuff, it's hard to test every single edge case, but good to test things that we know are definitely wrong/correct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 is this enough?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Comment on lines 702 to 710
# A o--> B o--> C o--o D o--o F
assert not is_valid_PAG(pag)

pag.remove_edge("B", "C")
pag.add_edge("B", "C", pag.circle_edge_name)
pag.add_edge("C", "B", pag.directed_edge_name)

# A o--> B <--o C o--o D o--o F
assert is_valid_PAG(pag)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have each as a separate test, so that way the graph is explicitly constructed to be the example/counter-example.

It's hard to review/read the test rn becuz one has to keep track of the edge changes that occur.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@aryan26roy
Copy link
Collaborator Author

@adam2392 this branch has become too convoluted for me to keep everything straight. Do you mind if I delete this and start fresh on a new branch?

@adam2392
Copy link
Collaborator

Yes that is a perfectly fine and acceptable practice

@aryan26roy aryan26roy closed this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checking the validity of a constructed PAG
5 participants