-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing DFS in Data.Graph #882
Comments
Sounds great. Please write the most comprehensive comments you can about how everything works. |
Looking into this further, removing the second set of trees doesn't appear to be very beneficial for What I thought was happening: trees from So there is an intermediate structure either way. This also explains why So I'll probably send a PR to only remove the first set of trees first, because that would be simpler with clear gains (as seen above), and we can think about the rest later. |
Now that the obvious optimization is done, I want to explain the situation with the second set of trees because I haven't made much progress there. We currently have: dfs :: Graph -> [Vertex] -> [Tree Vertex]
dfs g vs0 = run (bounds g) $ go vs0
where
go :: [Vertex] -> SetM s (Forest Vertex)
go [] = pure []
go (v:vs) = do
visited <- contains v
if visited
then go vs
else do
include v
as <- go (g!v)
bs <- go vs
pure $ Node v as : bs Which is nice when But suppose we only want the preorder of this list of trees. Currently we create this Why should we settle for this, when it's possible to build the preorder list directly via dfs? dfs_preOrder :: Graph -> [Vertex] -> [Vertex]
dfs_preOrder g vs0 = run (bounds g) $ go vs0 (pure [])
where
go :: [Vertex] -> SetM s [Vertex] -> SetM s [Vertex]
go [] acc = acc
go (v:vs) acc = do
visited <- contains v
if visited
then go vs acc
else do
include v
as <- go (g!v) (go vs acc)
pure $ v : as This is also true for reverse postorder, which is the topological sort of a graph. dfs_revPostOrder :: Graph -> [Vertex] -> [Vertex]
dfs_revPostOrder g vs0 = run (bounds g) $ go vs0 []
where
go :: [Vertex] -> [Vertex] -> SetM s [Vertex]
go [] acc = pure acc
go (v:vs) acc = do
visited <- contains v
if visited
then go vs acc
else do
include v
acc' <- go (g!v) acc
go vs (v : acc') There are likely other useful direct constructions too. In general, it is possible to construct any result as long as we obey the dfs rules of marking the current vertex as visited, visiting its subtrees, and then visiting the rest. But so far I've been unable to come up with a general |
None of those are lazy either. Unlike dfs_preOrder :: Graph -> [Vertex] -> [Vertex]
dfs_preOrder g vs0 = run (bounds g) $ go vs0 (pure [])
where
go :: [Vertex] -> SetM s [Vertex] -> SetM s [Vertex]
go [] acc = acc
go (v:vs) acc = do
visited <- contains v
if visited
then go vs acc
else do
include v
as <- SetM $ unsafeInterleaveST . runSetM (go (g!v) (go vs acc))
pure $ v : as I don't know how well that will perform, however. Your question is still a good one: can we expose tools that will let the user determine the shape of a strict computation with marking? I suspect so. One good first step would be to expose |
That's right, but my concern has been just with avoiding the extra trees. Even if we construct the full output list, it is what was asked for being directly computed. Of course, if we can make it lazy that would be useful. I'm not too familiar with unsafe magic, so I'll have to figure out your code above.
The primary benefit would be internal, for |
dfs
is the core function inData.Graph
that all other algorithms (topSort
,scc
,bcc
, etc) are based on. It takes aGraph
and generates a[Tree Vertex]
.dfs
makes use ofgenerate
,prune
andchop
to generate a[Tree Vertex]
and chop it into the returned[Tree Vertex]
.Other functions further transform this
[Tree Vertex]
, for exampletopSort
flattens the[Tree Vertex]
into a[Vertex]
.These intermediate trees are unnecessary.
generate
are immediately chopped into other trees.topSort
, the trees returned bydfs
are immediately turned into a list.There is no reason we need to create these trees in memory.
I propose combining the existing
generate
,prune
, andchop
function into a new core function that eliminates the intermediate trees fromgenerate
and also allows us to choose what to build from the DFS.Then we can define, for example
Benchmarks on a random graph with 10,000 vertices and 100,000 edges:
Other function such as
scc
andbcc
should also benefit, I've not implemented them yet.Thoughts? I can clean up my code into a PR if this sounds good.
The text was updated successfully, but these errors were encountered: