Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add the ability to convert a PAG to MAG #93

Merged
merged 24 commits into from
Oct 30, 2023

Conversation

aryan26roy
Copy link
Collaborator

Closes #92

Changes proposed in this pull request:

  • A function that converts a PAG to an MAG.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Aryan Roy and others added 2 commits July 17, 2023 12:44
Signed-off-by: Aryan Roy <aryanroy@Aryans-MacBook-Air.local>
@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Merging #93 (7cfaa6b) into main (9f3e202) will decrease coverage by 0.61%.
The diff coverage is 60.60%.

@@            Coverage Diff             @@
##             main      #93      +/-   ##
==========================================
- Coverage   77.35%   76.74%   -0.61%     
==========================================
  Files          42       42              
  Lines        3413     3543     +130     
  Branches      984     1032      +48     
==========================================
+ Hits         2640     2719      +79     
- Misses        570      603      +33     
- Partials      203      221      +18     
Files Coverage Δ
pywhy_graphs/algorithms/pag.py 80.26% <60.60%> (-10.76%) ⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@aryan26roy
Copy link
Collaborator Author

@adam2392 I can't find the DagToPag class in dodiscover. Could you point me to it's implementation?

@adam2392
Copy link
Collaborator

@adam2392 I can't find the DagToPag class in dodiscover. Could you point me to it's implementation?

I would just use Tetrad to draw up a few graphs; it has the nice functionality of checking if the PAG is valid (see cmu-phil/tetrad#1556). If you read the orientation rules, by breaking any of them, then you can violate the PAG.

The DAGtoPAG class does not exist in dodiscover. Sorry that was just a shorthand for saying you can use the FCI algo with a graph as the Oracle CI test.

@aryan26roy
Copy link
Collaborator Author

@adam2392 converting a PAG to MAG requires implementing the Meek rules. Is there an implementation already available for it? I remember seeing something like that in some pywhy repo.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392
Copy link
Collaborator

What does the PAG to MAG algo look like?

@aryan26roy
Copy link
Collaborator Author

aryan26roy commented Aug 27, 2023

@adam2392
Copy link
Collaborator

Does THM 2 in the paper use these? Can you maybe summarize the algorithm here or in a docstring? Idr where the meek rules come in or why.

@aryan26roy
Copy link
Collaborator Author

The tetrad implementation does exactly what THM 2 asks in step 1. The step 2 involves removing any unshielded colliders. So I just assumed Meek rules do just that. So now I have two questions:

  • What do the Meek rules do exactly?
  • Is there a better way of getting rid of unshielded colliders?

@adam2392
Copy link
Collaborator

You should put this into the PR description for making communication easier to reference. Apologies, but if you can help me see where the confusion lies, it will help in answering your questions.

"Let H be the graph resulting from the following procedure applied to the PAG, P:

(1) orient the circles on o-> edges in P as tails, and orient the circles on --o edges in P as arrowheads (that is, turn all o-> edges and all --o edges into directed edges ->); and
(2) orient $P^C$ into a DAG with no unshielded colliders.
"

The first part doesn't require anything except changing edges in the graph?
The second part just means you orient the circle-edge subgraph edges. Looking at the tetrad implementation, I see this involves the MEEK rules. So the Meek rules are just R1-R4 of the PC algorithm.

The tetrad implementation does exactly what THM 2 asks in step 1. The step 2 involves removing any unshielded colliders. So I just assumed Meek rules do just that. So now I have two questions:

  • What do the Meek rules do exactly?

Read up on the PC algorithm and FCI algorithm. The FCI algorithm is described in the Zhang 2008 paper. A version of the PC algorithm is described in https://arxiv.org/pdf/1302.4972.pdf.

  • Is there a better way of getting rid of unshielded colliders?

Not sure what the question is here.

@aryan26roy
Copy link
Collaborator Author

aryan26roy commented Aug 29, 2023

You should put this into the PR description for making communication easier to reference.

Sure, will do this from next time.

Not sure what the question is here.

I was wondering if there was a pre-implemented way of getting rid of unshielded colliders in pywhy-graphs?

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@adam2392
Copy link
Collaborator

I was wondering if there was a pre-implemented way of getting rid of unshielded colliders in pywhy-graphs?

This isn't a "standard algorithm", so not that I'm aware of. But the idea of getting rid of unshielded colliders stems from the fact that unshielded colliders provide conditional independence (CI) constraints.

DAGs encode CI constraints via d-separation. However, many DAGs can have the same CI constraints. Thus the literature invented MAGs. However, many MAGs can have the same CI constraints as well. Thus the literature invented PAGs. So the algorithm doesn't specify how to do it, just that you should be able to take the Thm2 step 2 and orient randomly as long as there isn't an unshielded collider.

E.g.

A o-o B o-o C, with A and C not connected should not get oriented into A o-> B <-o C.

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
@aryan26roy aryan26roy mentioned this pull request Sep 8, 2023
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a short comment cuz I had some time during lunch, but lmk when you want me to properly review this.

pywhy_graphs/algorithms/pag.py Outdated Show resolved Hide resolved
@aryan26roy
Copy link
Collaborator Author

@adam2392 When I generated a MAG - PAG pair, I go the following edges for the MAG:

C <- A -> B -> D; C -> D <- A;

Now sometimes, my implementation of the PAG to MAG function produces this exact graph, but sometimes it produces these as well:

[('A', 'D'), ('A', 'C'), ('C', 'D'), ('B', 'D'), ('B', 'A')]

and

[('A', 'D'), ('A', 'B'), ('C', 'D'), ('C', 'A'), ('B', 'D')]

I remember you said something about there being multiple right MAGs (the MAGs would have to be markov equivalent is what you said, iirc). Are these alternate graphs right? Or is my implementation wrong?

@adam2392
Copy link
Collaborator

adam2392 commented Oct 6, 2023

@adam2392 When I generated a MAG - PAG pair, I go the following edges for the MAG:

C <- A -> B -> D; C -> D <- A;

Now sometimes, my implementation of the PAG to MAG function produces this exact graph, but sometimes it produces these as well:

[('A', 'D'), ('A', 'C'), ('C', 'D'), ('B', 'D'), ('B', 'A')]

and

[('A', 'D'), ('A', 'B'), ('C', 'D'), ('C', 'A'), ('B', 'D')]

Aren't these the same skeleton? Just not in the same order?

@aryan26roy
Copy link
Collaborator Author

The deference is hard to notice. If you look at the first result, the edge A -> B is B -> A and in the second result, A -> C is the other way, i.e C -> A.

@adam2392
Copy link
Collaborator

adam2392 commented Oct 6, 2023

Ah I see. So the question is whether or not the two graphs are Markov equivalent. Zhang has a theorem stating when two MAGS are markov equivalent. You can check those three conditions easily by hand for a small graph.

@aryan26roy
Copy link
Collaborator Author

I could not find a theorem but I did find these:

Proposition 2. Two MAGs over the same set of vertices are Markov equivalent if and only if
(e1) They have the same adjacencies;
(e2) They have the same unshielded colliders;
(e3) If a path p is a discriminating path for a vertex V in both graphs, then V is a collider on the path in one graph if and only if it is a
collider on the path in the other.

and

Definition 5 (Markov equivalence). Two MAGs G1,G2 (with the same set of vertices) are Markov equivalent if for any three disjoint sets of vertices X,Y,Z, X and Y are m-separated by Z in G1 if and only if X and Y are m-separated by Z in G2.

Is this what you are talking about? If not, can you point me to the exact place where it is defined?

@adam2392
Copy link
Collaborator

adam2392 commented Oct 6, 2023

Proposition 1 is the one that you can check

@aryan26roy
Copy link
Collaborator Author

Assuming that you meant proposition 2, can you explain what exactly point 3 means in the text?

@adam2392
Copy link
Collaborator

adam2392 commented Oct 6, 2023

Discriminating path is a graphical property defined in Zhang I think(?) lmk.

They should also provide an example. I would take a look and walk thru examples and counter examples.

There is also an implementation in pywhy-graphs if that helps the intuitive understanding.

@aryan26roy
Copy link
Collaborator Author

Alright, from what I see they seem Markov equivalent.

@aryan26roy
Copy link
Collaborator Author

aryan26roy commented Oct 7, 2023

How many tests should I add to finish this PR? It doesn't seem like I will be able to construct test cases to hit every line in this.

@adam2392
Copy link
Collaborator

adam2392 commented Oct 9, 2023

How many tests should I add to finish this PR? It doesn't seem like I will be able to construct test cases to hit every line in this.

You don't have to test every line in the _apply_meek_rules*, but you should be able to construct a test-case for each line in pag_to_mag. As of right now, there seem to be some CI failures.

@aryan26roy
Copy link
Collaborator Author

@adam2392 I have added a GH issue, is this ok to merge?

@adam2392
Copy link
Collaborator

@adam2392 I have added a GH issue, is this ok to merge?

I will review the code.

Meanwhile, there's still some CI errors, so will need to resolve those.

Thanks for the patience!

@aryan26roy
Copy link
Collaborator Author

Meanwhile, there's still some CI errors, so will need to resolve those.

The errors are for networkx : ERROR: Package 'networkx' requires a different Python: 3.9.13 not in '>=3.10'



def test_pag_to_mag():
pag = PAG()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, for unit-tests, these are commonly read by other developers, so it's useful to add some comments specifying what the test is actually testing for vs other tests. You can even split this into multiple unit-test functions w/ more descriptive naming if that's easier.

For example: test_pag_to_mag_when_graph_is_confounded_and...

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits:

The docstring needs to get cleaned up a bit. The unit-tests could also be better documented. It's hard to parse from an outside perspective what the test is doing

@aryan26roy
Copy link
Collaborator Author

I added more context to the unit tests and cleaned up the docstring a bit. Do you want me to do something more?

dependabot bot and others added 17 commits October 30, 2023 15:08
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add is_maximal function
* add DAG to MAG function

---------

Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Aryan Roy <50577809+aryan26roy@users.noreply.github.com>
Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Aryan Roy <50577809+aryan26roy@users.noreply.github.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Aryan Roy <aryanroy5678@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 merged commit 608206e into py-why:main Oct 30, 2023
23 of 29 checks passed
@adam2392
Copy link
Collaborator

Thanks @aryan26roy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert PAG to MAG
3 participants