-
Notifications
You must be signed in to change notification settings - Fork 231
Description
I was running this program (newest version from a few days ago) on a dataset of roughly 6000 x 800 matrix and found in-between depth levels the program was taking a long time to get to the next depth level. I extrapolated and found this was going to take 331 days for just one depth level. I studied the code and eventually traced the bottleneck to GeneralGraph.remove_edge(edge1), which rebuilds the entire edge list in the final line self.reconstitute_dpath(self.get_graph_edges()). get_graph_edges is the problem as it traverses all of self.graph to build an edge list.
I fixed the problem by the following: create new functions remove_edge_only(self, edge: Edge) which is the same as remove_edge except it omits the last line self.reconstitute_dpath(self.get_graph_edges(). I also added a function
def clear_dpath(self):
self.dpath = np.zeros((self.num_vars, self.num_vars), np.dtype(int))
Then in SkeletonDiscovery.py:136, I call remove_edge_only instead of remove_edge. At the end of that loop, I call
cg.G.clear_dpath()
cg.G.reconstitute_dpath(cg.G.get_graph_edges())
so the edge list is rebuilt only once at the end of all edge deletions instead of after every individual edge deletion. This brought the runtime down to seconds and produced identical causal images on smaller datasets. Since this isn't my project, is there some kind of approval process I need to upload suggested changes like this to or is this something you all would want to implement yourselves?