[ENH] Add ability to check validity of PAGs #73

aryan26roy · 2023-04-07T06:28:19Z

Closes #67

Changes proposed in this pull request:

Adds a function to check the validity of provided PAGs.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

codecov-commenter · 2023-04-07T06:32:45Z

Codecov Report

Merging #73 (006c8e3) into main (8658c5b) will decrease coverage by 5.19%.
The diff coverage is 7.10%.

@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
- Coverage   84.42%   79.23%   -5.19%     
==========================================
  Files          34       34              
  Lines        2555     2736     +181     
  Branches      687      750      +63     
==========================================
+ Hits         2157     2168      +11     
- Misses        251      421     +170     
  Partials      147      147

Impacted Files	Coverage Δ
pywhy_graphs/algorithms/pag.py	`54.92% <7.10%> (-36.10%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

aryan26roy · 2023-04-07T06:33:08Z

@adam2392 I made this draft PR since I am uncertain about a couple of things:

Is this the right place to put this function? (I thought of making it a method of the PAG class but you said you intend to do away with these specific classes, so chose to put it in this file instead.)
Right now I am only checking if the provided PAG has circle edges, is this enough?
I implemented the first step of the theorem 2 from the linked paper. The second step requires getting rid of all unshielded colliders, for this I have to add edges between several node. What should these edges be? Directed? Undirected? Bidirected? If they are directed, in what direction?

adam2392 · 2023-04-07T15:31:05Z

Thanks for opening up the PR to get early feedback!

Is this the right place to put this function? (I thought of making it a method of the PAG class but you said you intend to do away with these specific classes, so chose to put it in this file instead.)

The location of the file is fine.

Right now I am only checking if the provided PAG has circle edges, is this enough?

No, you have to follow the theorem linked and then the definition of the MAG. Just cuz you have or don't have circle edges does not imply anything.

I implemented the first step of the theorem 2 from the linked paper. The second step requires getting rid of all unshielded colliders, for this I have to add edges between several node. What should these edges be? Directed? Undirected? Bidirected? If they are directed, in what direction?

I believe theorem 2 says o-> into -> and -o into ->, so directed, so I don't know if you did the first step correctly; the -o case is not handled. Take a look at the docstring of PAG to see how we implement these edge patterns.

Orienting to a DAG with no unshielded colliders means for circle triangles (circle triplets that are shielded) they can be oriented arbitrarily since a collider is possible but can't be detected due to the shieldedness. For circle triplets that are unshielded, they can be oriented arbitrarily as long as it's not a collider. So it's a combo of directed and bidirected edges. Maybe you can find the tetrad code that implements this for inspiration.

This step may require some generic graph traversal... this issue might actually be harder than I thought, but it's just a standard graph BFS/DFS traversal if you're familiar with graph algorithms. E.g. see the other algorithms, and what they do.

If this part is an issue, lmk. You can proceed as if tho there is a function that does the above.

aryan26roy · 2023-04-08T06:37:16Z

I found the tetrad code for turning a PAG into an MAG!

No, you have to follow the theorem linked and then the definition of the MAG. Just cuz you have or don't have circle edges does not imply anything.

I meant to ask if this check is enough as a pre-cursor to the actual validity check. I am assuming that a graph cannot be a PAG if it doesn't have any circle edges, and if so, it is automatically classified as an invalid PAG.

This step may require some generic graph traversal... this issue might actually be harder than I thought, but it's just a standard graph BFS/DFS traversal if you're familiar with graph algorithms. E.g. see the other algorithms, and what they do.

I am! (I took multiple classes on graphs and I have done a past project where I implemented some of the graph traversal algorithms for robotics application.) I still expect to encounter some difficulties though. I will look through the tetrad code and let you know if I am unable to understand what to do exactly after that.

aryan26roy · 2023-04-08T06:42:43Z

Ok so I looked it up and it turns out that a graph can be a PAG even if it doesn't have any circle edges.

adam2392 · 2023-04-10T01:29:54Z

pywhy_graphs/algorithms/pag.py

    for u, v in cedges:
        if (v, u) in dedges:
            to_remove.append((u, v))
        elif (v, u) not in cedges:
            to_reorient.append((u, v))
+        elif (v, u) in cedges and (v, u) not in to_replace:


The not in to_replace does not scale well. You should use a different data structure to check inclusion (e.g. set, or dict)

Makes sense.

adam2392 · 2023-04-10T01:30:50Z

pywhy_graphs/algorithms/pag.py

        if (v, u) in dedges:
            to_remove.append((u, v))
        elif (v, u) not in cedges:
            to_reorient.append((u, v))
+        elif (v, u) in cedges and (v, u) not in to_replace:


I would add some in-line comment to describe this sequence of if/else statements.

adam2392 · 2023-04-10T01:31:54Z

Feel free to ping me when/if there's any issues or you need a review.

aryan26roy · 2023-04-11T09:46:41Z

@adam2392 sorry for the delay but I have been stuck with a work emergency. Will get back to this issue from Wednesday.

Feel free to ping me when/if there's any issues or you need a review.

Occasionally I have a problem understanding some theoretical things and I feel like GitHub is not the place clarify them. Is there a better way to communicate with you (like email or slack?) or would like me to ask all of those question here as well?

adam2392 · 2023-04-11T19:28:22Z

@adam2392 sorry for the delay but I have been stuck with a work emergency. Will get back to this issue from Wednesday.

Sounds good no rush!

Occasionally I have a problem understanding some theoretical things and I feel like GitHub is not the place clarify them. Is there a better way to communicate with you (like email or slack?) or would like me to ask all of those question here as well?

Feel free to summarize them here, in case other devs following e.g. @jaron-lee. If it warrants a call, then we can hop on discord. Are you on the discord for pywhy?

aryan26roy · 2023-04-12T16:59:04Z

Feel free to summarize them here, in case other devs following e.g. @jaron-lee. If it warrants a call, then we can hop on discord. Are you on the discord for pywhy?

I am not. But I will join now. Did not know a discord server existed for pywhy (I was expecting a slack channel but I like Discord more anyway :-))

robertness

I suggest adding sub-functions to is_valid_PAG to make it more modular.

aryan26roy · 2023-04-14T18:59:18Z

@robertness you're right. I have been planning to make major chunks of the implementation modular (starting from the meek rules).

adam2392 · 2023-04-14T19:06:53Z

pywhy_graphs/algorithms/pag.py

@@ -908,3 +908,232 @@ def _check_ts_node(node):
        )
    if node[1] > 0:
        raise ValueError(f"All lag points should be 0, or less. You passed in {node}.")
+
+
+def orient_edges(graph: CPDAG) -> None:


Probably don't just copy/paste the code directly. Prolly should be renamed, etc.

E.g. orient_edges is sort of a bad name, cuz it is in the pag.py file and it works on a CPDAG. Perhaps, you rename it to a private function called _apply_meek_rules

aryan26roy · 2023-04-14T19:14:31Z

@adam2392 I am trying to import the function and use it in a script like this:
pywhy_graphs.algorithms.is_valid_PAG(pag)
The output is this error:

AttributeError: module 'pywhy_graphs.algorithms' has no attribute 'is_valid_PAG'

Do you know why?

adam2392 · 2023-04-14T19:16:40Z

@adam2392 I am trying to import the function and use it in a script like this: pywhy_graphs.algorithms.is_valid_PAG(pag) The output is this error:

AttributeError: module 'pywhy_graphs.algorithms' has no attribute 'is_valid_PAG'

Do you know why?

You have to expose it to the import module by adding an entry to __all__ inside the algorithms/pag.py file at the top.

pywhy_graphs/algorithms/pag.py

jaron-lee · 2023-04-15T21:05:41Z

pywhy_graphs/algorithms/pag.py

@@ -551,6 +552,7 @@ def pds(
    ----------
    .. footbibliography::
    """
+    print("HAHAHAHAHHA")


I assume this is just a debugging tool but should be removed before committing

aryan26roy · 2023-04-23T10:39:35Z

@adam2392 I think I have constructed the MAG correctly (I think the rule 4 is correctly implemented). Now how do I check the validity of an MAG? Can you explain it to me in a detailed way?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 · 2023-04-27T15:49:18Z

I think the easier thing might be to start with a few tests where this should work and where it shouldn't work. You can start w/ small PAGs and use tetrad GUI to construct examples that you then add into a unit-test.

aryan26roy · 2023-04-27T16:15:20Z

Ah! Thank you for the suggestion @adam2392. I will have to setup tetrad before that but this will make my job much easier.

aryan26roy · 2023-04-29T07:15:18Z

@adam2392 I am facing a weird issue. In the middle of the function, I construct a temporary CPDAG. When I print out the edges of the CPDAG, this is what I get:

{'directed': OutEdgeView([('w', 'z'), ('z', 'x'), ('x', 'xy'), ('xy', 'w')]), 'undirected': EdgeView([])}

But when I draw the graph, I get two different graphs on different runs (both are wrong too):

Note that the edge list remains the same in both the graphs.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 · 2023-05-02T14:30:24Z

@adam2392 I am facing a weird issue. In the middle of the function, I construct a temporary CPDAG. When I print out the edges of the CPDAG, this is what I get:
But when I draw the graph, I get two different graphs on different runs (both are wrong too):
Note that the edge list remains the same in both the graphs.

Do you have some reproducing MVP code for this? This seems like a bug in the draw function then.

Feel free to attend weekly meetings in the causal discovery channel on discord too

aryan26roy · 2023-05-04T08:34:25Z

@adam2392 I am sharing the MVP code for this through pastebin as it is too long for a github comment.
I am planning to attend the last one but something else came in the way. I will be there for the next one!

adam2392 · 2023-05-04T14:45:51Z

@adam2392 I am sharing the MVP code for this through pastebin as it is too long for a github comment.
I am planning to attend the last one but something else came in the way. I will be there for the next one!

For future, it might be helpful to keep the code a bit shorter to make the error apparent.

I took a look tho and it seems like there's probably just a bug in the CPDAG drawing. It is drawing extra edges it should not be. Are we able to replicate this with just two/three nodes?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

aryan26roy · 2023-05-08T14:30:25Z

@adam2392 I have three questions for you.

The paper says for m-seperation you check if every collider in the path has a descendant in the set 'z' or not. While the function you linked checks if the collider itself is in 'z' or not. Which is the correct check?
To check for m-seperation what I did was instead of making random sets to use as 'z' I made a set of all the colliders in the graph. Now all the colliders in any path are necessarily in this set and all the non-colliders are necessarily not. I think this is a better way doing it, what do you think?
You said that I will have to reconstruct the PAG from the MAG. Is that really important? Because my code does not track the changes made to the graph during the application of the meek rules. Isn't checking the validity of the MAG enough?

aryan26roy · 2023-05-08T14:37:56Z

@adam2392 As for your question, yes, I am able to replicate the bug with just two three nodes.

adam2392 · 2023-05-08T14:47:47Z

@adam2392 I have three questions for you.

The paper says for m-seperation you check if every collider in the path has a descendant in the set 'z' or not. While the function you linked checks if the collider itself is in 'z' or not. Which is the correct check?

The function implements it correctly and checks for more than just collider status.

pywhy-graphs/pywhy_graphs/networkx/algorithms/causal/m_separation.py

Lines 151 to 165 in 8658c5b

    
           # Consider if *-> node <-* is opened due to conditioning on collider, 
        
           # or descendant of collider 
        
           if node in an_z: 
        
               if has_directed: 
        
                   # add <- edges to backward deque 
        
                   for x, _ in G_directed.in_edges(nbunch=node): 
        
                       if x not in backward_visited: 
        
                           backward_deque.append(x) 
        
               # add <-> edge to backward deque 
        
               if has_bidirected: 
        
                   for nbr in G_bidirected.neighbors(node): 
        
                       if nbr not in forward_visited: 
        
                           forward_deque.append(nbr)

Try looking up d-separation materials and reading up on why descendants open up collider paths.

To check for m-seperation what I did was instead of making random sets to use as 'z' I made a set of all the colliders in the graph. Now all the colliders in any path are necessarily in this set and all the non-colliders are necessarily not. I think this is a better way doing it, what do you think?

Why not just use minimal_m_separator, which performs searches to determine if a minimal m-separator exists?

@jaron-lee does minimal_m_separator work to just check if two nodes have an m-separating set? It just does the extra work to make sure it is also minimal right? But if two nodes are m-separated, then minimal_m_separator should return some set always?

You said that I will have to reconstruct the PAG from the MAG. Is that really important? Because my code does not track the changes made to the graph during the application of the meek rules. Isn't checking the validity of the MAG enough?

This is not true because the PAG reconstructed from the MAG may not match the PAG you fed in.

I think if you implement a few positive/negative test cases starting w/ what is easy, this will be easier to discuss. Do you have such unit tests yet? Do you mind pushing them up to the PR?

adam2392 · 2023-05-08T14:49:20Z

@adam2392 As for your question, yes, I am able to replicate the bug with just two three nodes.

Can you open a GH issue related to this? and paste the MVP code sample?

aryan26roy · 2023-05-14T06:23:15Z

Why not just use minimal_m_separator, which performs searches to determine if a minimal m-separator exists?

@adam2392 I didn't know of this function. I will use it now.

This is not true because the PAG reconstructed from the MAG may not match the PAG you fed in.

How come? I was under the belief that when you say 'reconstruct' the PAG, you want me to re-trace the steps followed throughout the function backwards. If I do that correctly, won't the reconstructed PAG always match the input PAG? And if not this, then what do you mean by reconstruct?

I think if you implement a few positive/negative test cases starting w/ what is easy, this will be easier to discuss. Do you have such unit tests yet? Do you mind pushing them up to the PR?

I do not have such test cases yet. Do you know where I can find test cases that have pre-tagged valid and non-valid PAGs?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 · 2023-05-15T01:22:30Z

I do not have such test cases yet. Do you know where I can find test cases that have pre-tagged valid and non-valid PAGs?

If you use Tetrad and draw an arbitrary graph with circle endpoints and then run the check via the GUI (I think it's called check valid graph coloring or something like that), you will be able to construct various cases.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 · 2023-05-16T14:10:29Z

pywhy_graphs/algorithms/tests/test_pag.py

Great start on the unit-tests.

You can further think of unit-tests by taking a look at each of the FCI rules and draw in tetrad a graph that is sort of opposite those rules, and then you can add a separate unit-test checking each one.

E.g.

X <-o Y <-o Z <-o A

is invalid and so is

X <- Y <- Z <-o A; Z <-o B

because Z <-o A should be Z<-A due to R1 of FCI. And you can repeat this exercise for each rule having each one be a different unit-test, so it's easy to review and debug

Note: this might not cover everything in the algorithm, but then this is a great step in the right direction

@adam2392 It took me a while to figure out Tetrad.

How many different tests do you want me to add?

It will probably help you develop the algorithm, so probably 1 test for each FCI rule. The is_valid_pag algorithm will by construction return False if there are any rules left to be applied, so a separate test for each rule will help you as well.

And it will help us narrow down the possible runtime error-surfaces. For graph stuff, it's hard to test every single edge case, but good to test things that we know are definitely wrong/correct.

@adam2392 is this enough?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 · 2023-05-18T17:02:25Z

pywhy_graphs/algorithms/tests/test_pag.py

+    # A o--> B o--> C o--o D o--o F
+    assert not is_valid_PAG(pag)
+
+    pag.remove_edge("B", "C")
+    pag.add_edge("B", "C", pag.circle_edge_name)
+    pag.add_edge("C", "B", pag.directed_edge_name)
+
+    # A o--> B <--o C o--o D o--o F
+    assert is_valid_PAG(pag)


I would have each as a separate test, so that way the graph is explicitly constructed to be the example/counter-example.

It's hard to review/read the test rn becuz one has to keep track of the edge changes that occur.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

aryan26roy · 2023-11-14T06:09:28Z

@adam2392 this branch has become too convoluted for me to keep everything straight. Do you mind if I delete this and start fresh on a new branch?

adam2392 · 2023-11-14T06:43:55Z

Yes that is a perfectly fine and acceptable practice

adam2392 reviewed Apr 10, 2023

View reviewed changes

robertness reviewed Apr 13, 2023

View reviewed changes

adam2392 reviewed Apr 14, 2023

View reviewed changes

adam2392 reviewed Apr 15, 2023

View reviewed changes

pywhy_graphs/algorithms/pag.py Outdated Show resolved Hide resolved

adam2392 reviewed Apr 15, 2023

View reviewed changes

pywhy_graphs/algorithms/pag.py Outdated Show resolved Hide resolved

jaron-lee reviewed Apr 15, 2023

View reviewed changes

aryan26roy added 10 commits April 23, 2023 16:14

Add a function to check validity of PAGs

f7cc90a

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Linting

e729b83

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Reverting some automatic changes

6705eec

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Completed the first step of converting a PAG into an MAG

2eb890e

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

linting

8842c6c

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Commented out circle edge check

a9fea9e

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

add undirected edges

6a81f8c

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Added comments and using sets to check inclusion

0952d71

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Added more code to generate MAG

9fe0d2e

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

linting

aaba274

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Added functions to check validity of MAG

eb280e2

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Merge branch 'main' into valid_pag

9b956fa

aryan26roy added 2 commits May 7, 2023 15:24

Added check for m-seperation

9697f33

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Completed m-seperation check

c13d202

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Using minimal_m_seperator

006c8e3

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

add some unit tests

0bddbd3

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 reviewed May 16, 2023

View reviewed changes

Add more tests

74f46a1

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 reviewed May 18, 2023

View reviewed changes

aryan26roy added 2 commits May 19, 2023 10:22

Seperate the tests

8741a88

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

Fix bug

bd43e5a

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>

adam2392 mentioned this pull request May 19, 2023

A function to determine whether an inducing path exists between two nodes #70

Closed

adam2392 mentioned this pull request Jul 6, 2023

Convert MAG to PAG #85

Open

aryan26roy mentioned this pull request Jul 17, 2023

[ENH] Function for converting PAG to MAG #86

Closed

5 tasks

aryan26roy mentioned this pull request Jul 28, 2023

Check the Validity of an MAG #89

Closed

aryan26roy closed this Nov 14, 2023

[ENH] Add ability to check validity of PAGs #73

[ENH] Add ability to check validity of PAGs #73

Conversation

aryan26roy commented Apr 7, 2023

Before submitting

After submitting

codecov-commenter commented Apr 7, 2023 • edited Loading

Codecov Report

aryan26roy commented Apr 7, 2023

adam2392 commented Apr 7, 2023 • edited Loading

aryan26roy commented Apr 8, 2023 • edited Loading

aryan26roy commented Apr 8, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adam2392 commented Apr 10, 2023

aryan26roy commented Apr 11, 2023

adam2392 commented Apr 11, 2023

aryan26roy commented Apr 12, 2023

robertness left a comment

Choose a reason for hiding this comment

aryan26roy commented Apr 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aryan26roy commented Apr 14, 2023

adam2392 commented Apr 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aryan26roy commented Apr 23, 2023

adam2392 commented Apr 27, 2023

aryan26roy commented Apr 27, 2023

aryan26roy commented Apr 29, 2023

adam2392 commented May 2, 2023 • edited Loading

aryan26roy commented May 4, 2023

adam2392 commented May 4, 2023 • edited Loading

aryan26roy commented May 8, 2023

aryan26roy commented May 8, 2023

adam2392 commented May 8, 2023

adam2392 commented May 8, 2023

aryan26roy commented May 14, 2023

adam2392 commented May 15, 2023

adam2392 May 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aryan26roy commented Nov 14, 2023

adam2392 commented Nov 14, 2023

codecov-commenter commented Apr 7, 2023 •

edited

Loading

adam2392 commented Apr 7, 2023 •

edited

Loading

aryan26roy commented Apr 8, 2023 •

edited

Loading

adam2392 commented May 2, 2023 •

edited

Loading

adam2392 commented May 4, 2023 •

edited

Loading

adam2392 May 16, 2023 •

edited

Loading