Replies: 7 comments 30 replies
-
Can this be accomplished without subclassing/adding new dedicated data structures? The existing graph classes are extremely flexible, and usually sufficient to represent different graph types. For example, NetworkX has a whole set of algorithms for working with bipartite graphs, but there is no BipartiteGraph data structure (similarly dags, etc.). This is (IMO) very convenient because users don't have to learn new interfaces - they can use the data structures they already know to represent their problem however they like! I'm certainly not against adding more algorithms for causal inference! Personally I would prefer an approach similar to bipartite graphs; i.e. there can be a |
Beta Was this translation helpful? Give feedback.
-
In addition to Adam's work, I'd point out this implementation of mixed graphs in the causal learn package. |
Beta Was this translation helpful? Give feedback.
-
@rossbar and @dschult in line with our discussion above, I have sketched out a basic API proposal that I think aims to have both the generality that networkx aims to have, as well as a concrete proposal for some core algorithm(s) for the The goal for the pywhy team is to make sure networkx is kept in the loop, such that we don't waste any work and make sure the general interface is PRable to networkx and then pywhy can be responsible for the more nuanced causal-graph operations. Let us know your thoughts? |
Beta Was this translation helpful? Give feedback.
-
I've looked through the proposal you've put together -- and I understand it is (rightly) a rough view -- but I'm glad to see multiple function signatures, and APIs being fleshed out. I am encouraged by the direction you are taking. I have a few quick comments below. But they should not be considered complete. The discussion will continue for sure. Comments: Writing to a dot file should be straightfoward so long as you have a way to report all the nodes and all the edges with attributes. Actually, you may not need all the attributes. As for finding a minimal separating set, that sounds like a cut-set (which we have a way to get) but we'll have to check if they are the same thing. I think your way to finesse bidirectional edges in m_separation using phantom nodes is slick. |
Beta Was this translation helpful? Give feedback.
-
Hi @adam2392, The approach you are using in the link above looks like a copy of the graph.py code with doc_strings and all and then some items have changed. It is really hard to tell what has changed and what hasn't. To go with this route, maybe it would be helpful to make a subclass of the Graph class so you can see what has changed and what hasn't. I didn't see much if any check that the nodes in each of the graphs are the same throughout. I think that would be just fine -- you can rely on the user to not mess up the data structure. But if you are going to do that, then you should probably build this without the machinery of EdgeViews and the rest. Just stick to G.nodes is a dict-of-dict and G.adj is a collection of dict-of-dict-of-dicts. Then worry about making thing read-only views later. I am thinking that you are creating a collection of graphs -- one for each edge type. The node manipulations are applied uniformly to all the graphs. <Typo: add_nodes_from is currently doing an add_node on each graph instead of add_nodes_from> The edge manipulations are not as clear to me. has_edge, add_edge, add_edges_from take a require edge_type to indicate which graph to update. So, perhaps the reporting of edges should also... Then you'd want a reporting option that includes all types of edges. Maybe the edges method should include a list of edge_types to report on -- and the edges function makes a long list of edges collected from all the different graphs. Similarly with for nbr in (nbr for edge_type, nbrs in G.adj.items() for nbr in nbrs if edge_type in requested_edge_type):
... Should edges work that way too? G.edges returns a dict keyed by edge_type to the edges view from the corresponding graph? for edge in (edge for edge_type, edges in G.edges.items() for edge in edges if edge_type in requested_edge_type):
... I'm just thinking out loud -- so probably are issues with this you/we would need to work through. I think the def degree(self, requested_edge_types, ...):
for node in self:
for edge_type in requested_edge_types:
deg = sum(len(nbrs) for nbrs in G.adj[edge_type][node])
yield node, deg These examples don't provide all the bells and whistles of G.edges -- like whether it returns data, etc; and G.degree -- like weighted degree, etc. But that could be added later. And it avoids the whole How are you going to denote the edge type when reporting the edge? (u, v, data, edge_type)? I'm not sure this was what you were looking for, but it is some ideas based on what you've got so far about how we might handle reporting |
Beta Was this translation helpful? Give feedback.
-
Your example is a good start to a tutorial as well. :} Another comment: We might consider G to be a multigraph since there can be multiple edges between the same nodes -- so long as they are of distinct types. I'm not sure what words should describe this... it's not really a multigraph or a graph.... maybe a mixededgegraph??? I guess the important restriction right now is that reduction to each type of edge produces a Graph or DiGraph. Maybe its too early to figure out how to distinguish between these labels. For example, I can envision someone wanting a multigraph for each of the directed and bidirected reductions. We'll know more when we get to the algorithms. How would the d-separation algorithm take advantage of this structure? |
Beta Was this translation helpful? Give feedback.
-
Hi @dschult just wanted to lightly ping here to follow up. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I've become quite acquainted with the networkx API and recently began working with the pywhy/dowhy team on the discussion of how to enable a centralized API for causal-graph operations to enable i) structural learning algorithms and ii) causal ID from graphs and iii) other future pipelines.
Initially, we had a brief discussion on representing mixed-edge graphs. At a high level, I believe a networkx-like API is the best path forward considering networkx has cemented itself for "graph-related" things in Python. However, causal graphs have some issues in easily subclassing networkx. Therefore, we run into a dual problem of i) how to represent causal graphs robustly and ii) how to then leverage them to implement algorithms in causal inference algorithms.
Motivation for adding causal graphs in networkx
This would be an ideal centralized place since the API can then be consistent, and then pywhy / causal inference would not need to re-invent the wheel.
I understand that networkx is not interested in representing mixed-edge graphs currently because it would involve a large overhaul. However, I think that causal inference is growing so much that it could serve as a potentially interesting submodule within networkx that works with certain graph algorithms (e.g. d-separated), but not necessarily the others. Similar to how some algos work with undirected but not directed graphs (and vice versa).
This would then not only i) consolidate the causal graph API, but ii) also engage causal inference community into networkx.
Possible solution
In https://github.com/adam2392/causal-networkx, I've implemented my take on causal-graphs. I would specifically take a look at the files in
graphs/
such as the ADMG.The pywhy team and I were wondering if there is interest in taking causal-graph implementations directly in networkx?
Note I've implemented mixed edge graphs by representing different edges using different networkx graphs. Rather than use edge attributes because it seemed more natural and robust to me.
Misc.
DAGs work fine, but that only covers a small portion of causal inference, whereas to fully encompass the literature, we would need representations of ADMGs, CPDAGs and PAGs as well.
Beta Was this translation helpful? Give feedback.
All reactions