Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is graphFromEdges too lazy? #917

Open
meooow25 opened this issue Jan 30, 2023 · 4 comments
Open

Is graphFromEdges too lazy? #917

meooow25 opened this issue Jan 30, 2023 · 4 comments

Comments

@meooow25
Copy link
Contributor

graphFromEdges is one way to construct a Graph in Data.Graph, and the line in it which actually constructs the graph looks like

graph = array bounds0 [(,) v (mapMaybe key_vertex ks) | (,) v (_, _, ks) <- edges1]

This is very lazy. The elements of the array (lists of vertices) are lazy, and the lists themselves would be lazily generated when required. I doubt building up all these thunks is good for us. So I checked out if construction times improve if we are strict, and it does, from 371 ms ± 18 ms, 107 MB allocated to 323 ms ± 30 ms, 77 MB allocated on the largest graph.

Now I've tried to think of situations where lazily constructing a graph is useful.
The only case I can think of is if the user constructs a large graph through graphFromEdges, then runs dfs on a subset of the graph the user already knows is not connected to the rest of the graph. Then they avoid paying the cost of constructing the full graph. But this seems far-fetched.
All other functions like dff, topSort, scc, bcc will always evaluate the full graph.

So, is there any other scenario where lazily constructing a graph is useful?

And if not, should we make it strict?
This would improve the times and also make it consistent with buildG, the other way in Data.Graph to build a graph. The second reason alone might be good reason to do this.

@treeowl
Copy link
Contributor

treeowl commented Jan 30, 2023

I'm a bit skeptical about this one, though I won't rule it out entirely. Would it be possible to speed it up without being eager about those binary searches? I would start with these lines:

  edges1          = zipWith (,) [0..] sorted_edges

    graph           = array bounds0 [(,) v (mapMaybe key_vertex ks) | (,) v (_,    _, ks) <- edges1]
    key_map         = array bounds0 [(,) v k                       | (,) v (_,    k, _ ) <- edges1]
    vertex_map      = array bounds0 edges1

We build three arrays using the same list, edges1. I think this might be a spot where it'll pay to get lower level with arrays, and build all three "in parallel". Another option (which I'm guessing will be slower and harder to optimize) would be to build vertex_map and then build the other two arrays based on it.

@meooow25
Copy link
Contributor Author

That's interesting, I'll test it out. One thing I have tried is replacing array with listArray, the zipWith (,) [0..] is avoidable, which makes it a little faster.

@treeowl
Copy link
Contributor

treeowl commented Jan 30, 2023

More concretely, use this building block:

array3 :: Ix i => (i, i) -> [(i, a, b, c)] -> (Array i a, Array i b, Array i c)

You'll probably need a bunch of inline pragmas to make things work nicely, including likely one on that zip.

@treeowl
Copy link
Contributor

treeowl commented Jan 30, 2023

(Or maybe you can avoid the zip with an accumulator? I dunno.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants