Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement complement for undirected graphs #237

Merged
merged 6 commits into from Nov 6, 2019

Conversation

gabrielelana
Copy link
Contributor

Regarding the last tests we don't know if it's ok for Arbitrary to generate graphs with loops

Fixes #21

Proudly made by @sphaso and @gabrielelana at "Open Source Saturday Milano"

Open Source Saturday

@snowleopard
Copy link
Owner

snowleopard commented Sep 22, 2019

@gabrielelana @sphaso Thank you! :)

Could you estimate the complexity of your implementation in terms of time and memory, as well as the size of the resulting expression? I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.

Of course, having an expensive algorithm is much better than having none, so I'm still happy to merge your PR, but I'd like to document the complexity.

we don't know if it's ok for Arbitrary to generate graphs with loops

I think the answer is Yes because we don't rule out self-loops.

It is not entirely clear how complement should handle self-loops. What about just not touching them at all? This allows you to keep the nice property complement . complement = id.

@sphaso
Copy link

sphaso commented Sep 23, 2019

Hi @snowleopard , thanks for getting back :)

I suspect it can only handle very small graphs because of the repeated use of expensive removeEdge, but I may be wrong.

If you deem removeEdge too expensive there are alternative strategies we considered: simply using \\ on the edgeList was one, going through the symmetricRelation sets was another. Our first idea was to XOR the adjancency matrix with a 1 mask, but we went for what we believed was the most "idiomatic" solution

we don't know if it's ok for Arbitrary to generate graphs with loops

I think the answer is Yes because we don't rule out self-loops.

We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs, and the property complement . complement = id won't pass for those either. The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.

As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement and keep these as food for thought depending on how you feel about it.

@snowleopard
Copy link
Owner

If you deem removeEdge too expensive

It's linear w.r.t. to the size of the graph expression, which is quite expensive:

http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:removeEdge

simply using \\ on the edgeList

I think filtering existing edges may indeed be faster, although it would be nice to do some benchmarking. Could you compare your approach with \\?

We were indeed a bit puzzled by loops, because then one might start thinking about multigraphs

The basic Graph data type corresponds to "classic" graphs represented by a pair (V, E), where self-loops are allowed, so we should be able to handle self-loops.

The only way to handle loops I can think of, would be to re-apply them on the clique, but that can't work for other types of multigraphs.

Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.

As you said, an expensive algorithm is better than none, so we're happy to analyze the complexity of complement

Great, please do! And some benchmarking would be nice too: I'm curious how your current implementation based on removeEdge compares to the one based on \\.

@sphaso
Copy link

sphaso commented Sep 28, 2019

Why don't you simply leave self-loops as is, without removing them? As far as I can see, this is the simplest option.

I feel there might be a critical piece of information regarding this library I'm missing. The problem I see is that:

  • Arbitrary Graph produces graphs with self-loops
  • clique does not (e.g. clique [x,y,z] == 'edges' [(x,y), (x,z), (y,z)])

So the situation is: I want complement . complement = id to be a valid property even for graphs with self-loops. However, if I use clique as the reference graph for the complement, I don't have the self-loops in the first place.
That's what I meant by "re-apply". If the pseudo code for the complement function is clique edges \\ current edges, I would have to take the self-loops from current edges and re-apply them after the difference operation.

We'll work on the benchmark ASAP.

@sphaso
Copy link

sphaso commented Oct 5, 2019

Hi @snowleopard ,
we finally ran the benchmark and you were right! Data.List.\\ is at least one order of magnitude faster than removeEdge.

Here you can find the benchmark code.
Here you can find the results.

For easier reference, these are the two differing implementations. We took some time to calculate the complexity of the faster one as well as handling loops:

complement :: Ord a => Graph a -> Graph a
complement g@(UG _) = overlay (edges loops) $ foldr (uncurry removeEdge) (cliqueG g) previousEdges
 where cliqueG = clique . vertexList
       previousEdges = edgeList g
       loops = filter (uncurry (==)) previousEdges

-- Complexity: /O(E^2+V)/ time, /O(E+V)/ memory where
-- E is the number of edges and V is the number of vertices
-- The quadratic bound is due to `edges`.
complement' :: Ord a => Graph a -> Graph a
complement' g@(UG _) = overlay (vertices allVertices) (edges complementEdges)
 where cliqueG = clique . vertexList
       allVertices = vertexList g
       previousEdges = edgeList g
       loops = filter (uncurry (==)) previousEdges
       complementEdges = loops ++ (edgeList (cliqueG g) \\ previousEdges)

We were curious about the time complexity of edges and we ran a second benchmark on it. The benchmark results seem to show that edges is actually linear with regard to time.
Here the benchmark code.
Here the results.

What do you think?

If you like our implementation we'll move it to the current PR.
Would you like to keep the benchmarks as well?

@snowleopard
Copy link
Owner

we finally ran the benchmark and you were right! Data.List.\ is at least one order of magnitude faster than removeEdge.

@sphaso Great, thank you for doing this!

We were curious about the time complexity of edges and we ran a second benchmark on it. The benchmark results seem to show that edges is actually linear with regard to time.

Yes, edges is indeed linear with respect to the length of the given list, as stated in the documentation:

http://hackage.haskell.org/package/algebraic-graphs-0.4/docs/Algebra-Graph.html#v:edges

Did you expect its complexity to be non-linear?

If you like our implementation we'll move it to the current PR.

Yes, let's use the faster implementation based on \\.

Responding to your earlier comment: yes, I think what you call "re-applying" will work. Conceptually, it's not very complex: you complement all non-loop edges and make sure to keep the original self-loops in the final graph.

@snowleopard
Copy link
Owner

Would you like to keep the benchmarks as well?

Regarding benchmarks: there is a separate repository for benchmarks, which you can find here:

https://github.com/haskell-perf/graphs

@sphaso
Copy link

sphaso commented Oct 7, 2019

Yes, edges is indeed linear with respect to the length of the given list [...] Did you expect its complexity to be non-linear?

When estimating the complexity of complement we looked at the docs of edges inside the Undirected module where it's stated:

Complexity: /O(L^2)/ time, /O(L)/ memory and size

We'll adjust our estimates of complement accordingly. Would you like us to amend this comment as well?

@snowleopard
Copy link
Owner

@sphaso Oh, that's just wrong. If you correct it to "Complexity: O(L) time, memory and size, where L is the length of the given list." right in this PR that would be great!

@gabrielelana
Copy link
Contributor Author

@snowleopard we ported the faster implementation and amended the complexity of edges, can you check if everything is ok?

@snowleopard
Copy link
Owner

@gabrielelana Many thanks! The implementation looks good, but see my comments regarding the docs; they should be easy to address.

@sphaso
Copy link

sphaso commented Oct 31, 2019

@snowleopard I should have addressed all the points you mentioned in the latest comments. Good catch on the quadratic complexity!

@snowleopard
Copy link
Owner

@sphaso Thanks for the fixes! The result doesn't seem completely right though, please see my comments.

@sphaso
Copy link

sphaso commented Nov 5, 2019

@snowleopard I pushed a new commit that takes care of at least one concern, see my answer above for the other (time \ memory estimates, for which I already updated it as m^2). Regarding tests, the fact that we're now preserving self-loops rendered two properties false. Good thing we swapped those fixed values :)
Namely:
complement (edge x y) == (vertices [x, y] :: UGI)
is no longer true if x == y.

@snowleopard
Copy link
Owner

Namely:
complement (edge x y) == (vertices [x, y] :: UGI)
is no longer true if x == y.

@sphaso Indeed :-) Thanks!

@sphaso
Copy link

sphaso commented Nov 6, 2019

@snowleopard hopefully I got everything right this time :) thanks!

@snowleopard snowleopard merged commit c0e9706 into snowleopard:master Nov 6, 2019
@snowleopard
Copy link
Owner

@sphaso Thank you, merged! :)

By the way, feel free to add yourself and @gabrielelana to the list of contributors:

https://github.com/snowleopard/alga/blob/master/AUTHORS.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement graph complement
3 participants