Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Succinct graph representations #1041

Merged
merged 28 commits into from
Mar 8, 2021
Merged

Succinct graph representations #1041

merged 28 commits into from
Mar 8, 2021

Conversation

vigna
Copy link

@vigna vigna commented Feb 18, 2021

This PR adds to jgrapht-unimi-dsi two graph representations based on succinct data structures.

The WebGraph adapters already provide ways to use succinct representation (e.g., EFGraph), but the implementations in this PR are modeled after the sparse representations of JGraphT—nodes and arcs are represented by integers and numbered starting from zero (they should be usable from Python).

Unfortunately, JGraphT's architecture clashes a bit with the succinct representation. For example, getEdgeSource() and getEdgeTarget() have to make twice the same expensive call. The result is that while the graphs are about 5 times smaller, access is 5 times slower, too.

UPDATE: My claims about speed are quite wrong.

The problem is that I (stupidly) quickly tested enumeration time for the whole arc set, which is not a good idea because in these graphs the arc set is trivial.

More accurate testing shows that that the directed version is 50% slower than SparseIntDirectedGraph when enumerating successors, and it is 2-3 times faster when checking adjacency (but see #1042 — the speed test in this case might be meaningless). This figures will vary with the density of the graph.

The undirected implementation is unfortunately significantly slower (5 times slower when enumerating successors). Adjacency, however, is about 150 times faster. Is there some reason why adjacency in SparseIntUndirectedGraph is so slow?

@d-michail
Copy link
Member

By adjacency I guess that you mean getAllEdges(source, target). The problem is that we do not keep the list of neighbors sorted, in order to perform binary search. This is one of my TODO items.

@d-michail
Copy link
Member

I will try to do this after #1029 gets merged.

@vigna
Copy link
Author

vigna commented Feb 20, 2021

By adjacency I guess that you mean getAllEdges(source, target). The problem is that we do not keep the list of neighbors sorted, in order to perform binary search. This is one of my TODO items.

Yes, or even just containsEdge(source, target). Maybe it's a linear search now? That would explain the timings.

I got a few interesting ideas and I'm going to partially rewrite this PR. My goal is to have size 4-5 times smaller than the sparse implementation and faster accessors. Let's see whether I can get there :).

@d-michail
Copy link
Member

Yes, it is currently linear in the size of the neighborhood. containsEdge is similar or maybe just a wrapper for getAllEdges(), which means that both are linear.

@vigna
Copy link
Author

vigna commented Feb 20, 2021

OK, this is probably the most sensible approach. There is an implementation mimicking the sparse one which is quite slow, and one using pairs as edges that is an order of magnitude faster. I'm still doing some speed tests and I have to review the docs but it looks pretty usable. Footprint is 3 to 10 times less than the sparse implementation, depending on density.

@vigna
Copy link
Author

vigna commented Feb 22, 2021

OK, I think this is ready to merge if you like it. I have written extensive Javadocs as the tradeoffs between the two different kind of implementations might not be trivial to understand.

I don't know how that might be difficult, but having a bridge to Python for implementations using IntIntPair for edges might be very interesting—the succinct implementations using that instead of Integer are an order of magnitude faster.

@vigna
Copy link
Author

vigna commented Feb 23, 2021

BTW, do you guys think there's some value in a constructor for directed graphs that encodes only outgoing arcs? The space would be halved, but of course you woudn't get incoming arcs, similarly to the forward-only constructor of the WebGraph adapters.

@d-michail
Copy link
Member

Yes, I did this recently for the SparseIntDirectGraph.

@vigna
Copy link
Author

vigna commented Feb 23, 2021

Ok, I'll do it ASAP.

@vigna
Copy link
Author

vigna commented Feb 24, 2021

I exploited also your new constructors using suppliers of edges of streams.

BTW, is there any reason why sparse representations are not serializable? I've been reliably storing and loading such instances without problems (just adding implements Serializable) to estimate their footprint.

Since it takes some time to build them I think it would be a useful feature. And people could easily publish graphs using that format.

@vigna
Copy link
Author

vigna commented Mar 3, 2021

Is there anything more I should do?

Once this is released, I was thinking about making part of the LAW graph database (say, graphs with <2B nodes) in this format. It would make it possible to test easily JGraphT on large graphs even with relatively little memory.

@d-michail
Copy link
Member

Looks good. I will wait a bit to see if John has any comments.

Copy link
Member

@jsichi jsichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few doc nits. Now we can proudly say, "JGraphT Sux!"

* <var>k</var>-th element of the sequence and some bit shifting (the encoding
* <var>x</var><var>n</var> + <var>y</var> would be slightly more compact, but much slower to
* decode). Since we know the list of cumulative outdegrees, we know which range of indices
* corresponds to the edges outgoing from each vertex. If we need to now whether
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "know"

* {@link SparseIntUndirectedGraph}).
*
* <p>
* {@linkplain org.jgrapht.GraphIterables#outgoingEdgesOf(Object) Enumeration of edges} is is very
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "is is"

* are very fast and happen in almost constant time.
*
* <p>
* {@link SuccinctDirectedGraph} is a much slower implementation with a similar footprint using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is supposed to reference SuccintIntDirectedGraph instead?

@vigna
Copy link
Author

vigna commented Mar 4, 2021

Thank you for the check! All fixed.

BTW, once I talked with a guy who wouldn't use the library because of the word pun 😂.

@d-michail d-michail merged commit c523396 into jgrapht:master Mar 8, 2021
@d-michail
Copy link
Member

Just merged, thanks!

@vigna
Copy link
Author

vigna commented Mar 8, 2021

Great! Any planned release date for 1.5.1?

@jkinable
Copy link
Collaborator

Great! Any planned release date for 1.5.1?

yes, yesterday :)

@vigna
Copy link
Author

vigna commented Mar 21, 2021

I don't know if this is the intended behavior, and I don't know if this might be an Ivy problem, but adding the artifact jgrapht to my ivy.xml dependencies does not bring in any jar. I have to manually point Ivy to, say, jgraph-unimi-dsi.

@d-michail
Copy link
Member

Yes, you need to explicitly request the module that you want to use. The core library is the jgrapht-core.

@jkinable
Copy link
Collaborator

@vigna @d-michail
Similar to the webgraphs in #1002 , can we include a demo/tutorial on the usage/purpose of succinct graphs? Key points:

  • When to use a succinct graph representation: advantages/disadvantages and some examples of practical applications
  • What is the advantage of a succinct graph representation over existing graph implementations, in particular, how does a succinct graph representation compare against (1) the 'standard' jgrapht graphs, (2) the specialized implementations in the jgrapht-opt module, (3) the webgraph representation. It should be clear for the user which implementation he/she should pick.

Prior to this PR, I personally hadn't heard about 'succinct' graphs. This might be used in a very specific field? We should try to make this accessible to 'standard' users. Also, from the wiki I understand that succinct graphs are incredibly space efficient. The only (?) reason to make graph storage this efficient is when you intend to store a massive graph (massive in terms of nr of edges/vertices). The same question applies here: do the JGraphT algorithms perform well enough on those massive graphs? Here we can obviously limit ourselves to algorithms you would reasonably expect to execute on those graphs, e.g. you would not compute an exact TSP on a graph with 20MM vertices.

@vigna
Copy link
Author

vigna commented Mar 23, 2021

I tried to explain all this in the Javadoc, but we can rephrase this somewhere else. The point is that we already have succinct representation in WebGraph, but these are more targeted at JGraphT and in particular to the Python bridge. We plan to distribute all our datasets with less than 2^31 arcs in serialized succinct JGraphT form directly, so users can just load and use them (less friction than with adapters).

Succinct data structures are fairly recent, with progress in implementations starting in the mid-2000. For example, Facebook's graph and text index are all stored using partitioned Elias-Fano, a succinct data structure (you can see some public code here, but what they actually use is more complex). Lucene has a succinct Elias-Fano encoder.

The reason to use succinct graphs is simply that you can analyze in core memory much larger graphs. Access is asymptotically the same of a redundant format, with constant factors that can be large or small depending on the implementation.

@vigna
Copy link
Author

vigna commented Mar 26, 2021

We're having a look at the wiki with Dimitrios. Where would you think would be sensible to put information about this (and WebGraph adapters)?

@jsichi
Copy link
Member

jsichi commented Apr 7, 2021

The correct place would be in the user guide (edit docs/guide-templates/UserOverview.md):

https://jgrapht.org/guide/UserOverview#graph-adapters

A new page (with code examples) linked from here would be best.

@vigna vigna mentioned this pull request May 10, 2021
@syoon2 syoon2 mentioned this pull request Apr 12, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants