Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] MixedEdgeGraph for enablement of causal graphs #5947

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Contributor

@adam2392 adam2392 commented Aug 27, 2022

Closes: #5811

Summary of changes

  1. MixedEdgeGraph is a graph that accepts arbitrary combinations of "base" networkx graphs (Graph, DiGraph). These internal graphs can be used to then represent any arbitrary types of edges. Each internal graph is linked to a str of characters that semantically represent the type of edge the user wants. E.g. 'bidirected': nx.Graph links the word 'bidirected' with a nx.Graph object to represent the set of bidirected edges.
  2. nx.algorithms.causal submodule is a set of general-purpose causal algorithms that operate over mixed edge graphs. The most canonical example is the generalization of d-separation to that of "m-separation". It's pretty much the same thing, except now there are bidirected edges.

In general, I realize this is a big PR, so once the high-level changes are deemed desirable by the networkx team, I can just simply break up the PR into smaller chunks to make reviewing easier at the "detailed" level.

Relevant Examples of MixedEdgeGraph Usage

The most prominent explicit example demonstrating how MixedEdgeGraph is used is in Pywhy-graphs
:

  • The ADMG class (bidirected, directed edges and sometimes undirected): link
  • The PAG class (directed, bidirected, undirected and directed edges with circle endpoints): link

Here is an example spun up in the pywhy-graphs docs.

Note (11/20/23): We will most likely rename these graphs to something like DiBiUnGraph and DiBiUnCiGraph to reflect the same design philosophy networkx took.

Some notes on how mixed-edge graph API differs from regular networkx graphs. Iterating through the graph is not exactly the same, and I don't have a great way of consolidating. Iterating through a NetworkX graph iterates through the edges. However, when we iterate through a Mixed-edge-graph, what do we then return? Edge types and then edges, or edges and their edge-types simultaneously?

Misc.

See: https://github.com/py-why/dowhy/wiki/Networkx-Proposals-For-MixedEdgeGraph-Class-To-Enable-Causal-Graphs for a high level API design overview.

See: #5811 for discussion on the inclusion of graphs with mixed edges for the sake of causality based procedures.

cc: @dschult

cc folks invested in py-why: the result of this thread will affect future API and operations in pywhy. @amit-sharma @robertness @darthtrevino @petergtz @bloebp @emrekiciman @jaron-lee

@adam2392
Copy link
Contributor Author

Kay I got the CI passing. I think this is ready for discussion @dschult . Lmk if perhaps a call would be easier.

@adam2392
Copy link
Contributor Author

adam2392 commented Sep 7, 2022

Hi @dschult just wanted to follow up to see if we could converse on some of the high level design choices made here and if they're in line w/ what you think would work and be acceptable from maintainers end. Lmk if you need more time tho. I totally understand since this is a pretty large PR.

We are actively building up causal algorithms in py-why/dodiscover and also specific causal graphs in py-why/pywhy-graphs, which all rely on the underlying MixedEdgeGraph presented here. So preferably can converge on the API, so I'm refactoring fewer LOC later :p

@dschult
Copy link
Member

dschult commented Oct 12, 2022

Thanks for the gentle ping. I still am time-crunched but took a quick look. Would it reduce maintenance burden to make MixedEdgeGraph a subclass of Graph? Then much of the node functionality would already be there.

The general approach looks good to me.

@adam2392
Copy link
Contributor Author

Sure completely understand!

Re subclassing Graph: The issue I have there is how to handle the factory methods. My understanding is with #5850 we are at least improving this pattern. E.g. mainly how to handle:

  • adjlist_outer_dict_factory
  • adjlist_inner_dict_factory

Should I override the The Graph class uses a dict-of-dict-of-dict data structure. structure with a dict-of-dict-of-dict of edge_type dict data structure?

Or for now... should I just make the inheritance pretty ugly for the sake of getting it to work and at least subclassing the node functionality?

@dschult
Copy link
Member

dschult commented Oct 17, 2022

Well -- I've looked through your code a little more carefully. And I have to say that I don't see that subclassing will help very much. So many of the methods would have to be changed, it would probably be harder to read than the current version.

I'll go through in more detail. But I think this is a good framework and a good design choice.
Thanks!

@MridulS
Copy link
Member

MridulS commented Sep 22, 2023

@adam2392 Thanks for this! I was revisiting this PR and I was also skimming through https://github.com/py-why/pywhy-graphs (it looks nice!). Do you think given pywhy-graphs exists should we try to merge the MixedEdgeGraph graph upstream in NetworkX? Personally I would love to see to get this in NetworkX but I also don't want to slow down the pace of development if merging this inside NetworkX means pywhy-graphs wouldn't be able to quickly experiment with new stuff.

@adam2392
Copy link
Contributor Author

@MridulS thanks for following up!

Yes we would be interested in getting some form of this into networkX! Right now tho there's various class functions that don't work "exactly the way" I think networkx does but perhaps this can be improved in some iterations with networkx dev team input.

I think if the API or internals change and it affects pywhy-graphs that's fine. ESP rn it's in early stages of experimentation.

What's the best way to proceed here?

@MridulS
Copy link
Member

MridulS commented Sep 24, 2023

What's the best way to proceed here?

I think first step would be either cleaning up this PR or creating a new one. I also see that pywhy-graphs uses type annotations everywhere, we aren't there yet. Would that be a blocker to get MixedEdgeGraph into networkx?

@adam2392
Copy link
Contributor Author

No it wouldn't be a blocker. We can remove the type annotations. I can adjust the PR here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this

@adam2392
Copy link
Contributor Author

Kay @MridulS lmk wdyt? This contains our WIP example, documentation and related algorithms for the new class MixedEdgeGraph, which we would utilize in pywhy-graphs.

Re MixedEdgeGraph usage in pywhy-graphs: Just for transparency, we are planning on refactoring the pywhy-graphs graphs to contain instead of things like ADMG and CPDAG to have graph classes:

  • DiBiUnGraph: directed, bidirected and undirected edge graph (i.e. ADMG, or CPDAG, MAG, )
  • DiBiUnCiGraph: the above, with also circular endpoint edges (i.e. PAG)

This is more inline w/ networkx, which puts responsibility of checking if a graph is of a certain type (e.g. acyclic) on a function instead. We would then implement these specific functions. However, as far as I can tell, this won't require any changes to nx.MixedEdgeGraph. But any significant changes to MixedEdgeGraph might impact our plan downstream.

adam2392 and others added 12 commits September 25, 2023 14:14
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
* improve m-separation property runtime and efficiency

Signed-off-by: Jaron Lee <jaron2005@gmail.com>

* Add unit tests and coverage of error statements

Co-authored-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@dschult
Copy link
Member

dschult commented Sep 25, 2023

I tried to rebase this PR on the current main so it can pass the tests and github, workflows, etc.
The pytest test-finding routine was finding duplicates in causal and mixed_edge and balking. So I ended up removing the causal directory because it looked like it was an early version of the tools in mixed_edge directory. If that isn't correct, either open a new PR or let me know which files go where.

It looks like there might still be some errors/updates where the code tries to import from pywhy, etc.
Can you try to pull this down to your local repo and then work with it? I thought it would be easier for me to rebase than for you to go through that. Hopefully I didn't cause more trouble than it was worth.

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Contributor Author

Thanks @dschult !

I fixed the doc-test. I think this works now at least locally for me. I'll take a pause here to allow you all to take a look?

@adam2392
Copy link
Contributor Author

@MridulS and @dschult sorry just to clarify, I should leave this as is until you both have a chance to look things over, correct? Just wanted to make sure I am not missing anything. Thanks!

@MridulS
Copy link
Member

MridulS commented Sep 28, 2023

Yes @adam2392. I'll try to go over this PR this week. Thanks again for the updates :)

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the right direction for moving forward. That is, it looks good!
I think you can go ahead and make other changes and improvements. It is a pretty big change in terms of lines of code, so it's hard to review, but I think it is generally fine. I have added a few minor suggestions below.

Comment on lines 35 to 40
# Using the ``MixedEdgeGraph``, we can represent a causal graph
# with two different kinds of edges. To create the graph, we
# use networkx ``nx.DiGraph`` class to represent directed edges,
# and ``nx.Graph`` class to represent edges without directions (i.e.
# bidirected edges). The edge types are then specified, so the mixed edge
# graph object knows which graphs are associated with which types of edges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comment lines repeat what is described above in the doc_string. This should probably be removed and any new ideas from here should be transferred to the doc_string. (maybe? push back if you think otherwise)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I re-wrote this part to just have a short summary.

examples/mixededge/plot_mixed_edge_graph.py Outdated Show resolved Hide resolved
bidirected_edge_name="bidirected",
):
"""Return the moral graph from an ancestral graph in :math:`O(|V|^2)`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a paragraph reminding people what the moral graph is and what an ancestral graph is?
Also, you should move the complexity statement out of the first line (that is used for short text near any link in the docs to this function. That first line should have a blank line after it. Then the paragraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done lmk wdyt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay -- progress!

Some comments on the paragraph:
How can the undirected moral graph have a v-structure (which is directed) (it sounds like it is from the same graph)?
Is it correct that the v-structure appears in the ancestral graph? You don't actually say what the ancestral graph is -- just that it contains many types of edges. How are those edges determined (from ancestors?)
Also, the moral and ancestral graph must be related in some way, but the paragraph doesn't say how they are related. :}

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay tried to address this in a better way. Lmk if this still could be improved!

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Contributor Author

adam2392 commented Nov 20, 2023

Apologies @dschult and @MridulS for the delay. I've been backlogged on stuff to do. I just addressed some of Dan's comments.

I also updated the PR description to note some nuances that I forgot to describe when comparing networkx graphs and mixed-edge-graph.

Btw totally happy to break up this PR, but just wanted clarity on how to best move forward first, slash if this is even something networkx wants (i.e. I don't want to put in the time to refactor code for PR sake if the PR is not desired).

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Contributor Author

I am wondering if it would be better for G.adj[node] to return a dictionary of nodes with edge_type being the data, rather than returning a nested dictionary of the form {edge_type: G.get_graph(edge_type).adj}?

If we return a dictionary of nodes with always the edge type, this is more in-line w/ how networkx currently works. The only change is there is essentially always an extra "node attribute" that is returned that indicates the edge type. On the other hand, if we return the nested dictionary, it would always preserve the edge type semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants