Succinct graph representations #1041

vigna · 2021-02-18T23:23:26Z

This PR adds to jgrapht-unimi-dsi two graph representations based on succinct data structures.

The WebGraph adapters already provide ways to use succinct representation (e.g., EFGraph), but the implementations in this PR are modeled after the sparse representations of JGraphT—nodes and arcs are represented by integers and numbered starting from zero (they should be usable from Python).

Unfortunately, JGraphT's architecture clashes a bit with the succinct representation. For example, getEdgeSource() and getEdgeTarget() have to make twice the same expensive call. The result is that while the graphs are about 5 times smaller, access is 5 times slower, too.

UPDATE: My claims about speed are quite wrong.

The problem is that I (stupidly) quickly tested enumeration time for the whole arc set, which is not a good idea because in these graphs the arc set is trivial.

More accurate testing shows that that the directed version is 50% slower than SparseIntDirectedGraph when enumerating successors, and it is 2-3 times faster when checking adjacency (but see #1042 — the speed test in this case might be meaningless). This figures will vary with the density of the graph.

The undirected implementation is unfortunately significantly slower (5 times slower when enumerating successors). Adjacency, however, is about 150 times faster. Is there some reason why adjacency in SparseIntUndirectedGraph is so slow?

d-michail · 2021-02-20T09:02:44Z

By adjacency I guess that you mean getAllEdges(source, target). The problem is that we do not keep the list of neighbors sorted, in order to perform binary search. This is one of my TODO items.

d-michail · 2021-02-20T09:13:13Z

I will try to do this after #1029 gets merged.

vigna · 2021-02-20T09:18:22Z

By adjacency I guess that you mean getAllEdges(source, target). The problem is that we do not keep the list of neighbors sorted, in order to perform binary search. This is one of my TODO items.

Yes, or even just containsEdge(source, target). Maybe it's a linear search now? That would explain the timings.

I got a few interesting ideas and I'm going to partially rewrite this PR. My goal is to have size 4-5 times smaller than the sparse implementation and faster accessors. Let's see whether I can get there :).

d-michail · 2021-02-20T09:21:08Z

Yes, it is currently linear in the size of the neighborhood. containsEdge is similar or maybe just a wrapper for getAllEdges(), which means that both are linear.

vigna · 2021-02-20T13:40:45Z

OK, this is probably the most sensible approach. There is an implementation mimicking the sparse one which is quite slow, and one using pairs as edges that is an order of magnitude faster. I'm still doing some speed tests and I have to review the docs but it looks pretty usable. Footprint is 3 to 10 times less than the sparse implementation, depending on density.

vigna · 2021-02-22T08:28:53Z

OK, I think this is ready to merge if you like it. I have written extensive Javadocs as the tradeoffs between the two different kind of implementations might not be trivial to understand.

I don't know how that might be difficult, but having a bridge to Python for implementations using IntIntPair for edges might be very interesting—the succinct implementations using that instead of Integer are an order of magnitude faster.

vigna · 2021-02-23T09:02:41Z

BTW, do you guys think there's some value in a constructor for directed graphs that encodes only outgoing arcs? The space would be halved, but of course you woudn't get incoming arcs, similarly to the forward-only constructor of the WebGraph adapters.

d-michail · 2021-02-23T09:08:40Z

Yes, I did this recently for the SparseIntDirectGraph.

vigna · 2021-02-23T09:22:38Z

Ok, I'll do it ASAP.

…ructors acceping a supplier of streams of edges

vigna · 2021-02-24T11:47:50Z

I exploited also your new constructors using suppliers of edges of streams.

BTW, is there any reason why sparse representations are not serializable? I've been reliably storing and loading such instances without problems (just adding implements Serializable) to estimate their footprint.

Since it takes some time to build them I think it would be a useful feature. And people could easily publish graphs using that format.

vigna · 2021-03-03T15:20:53Z

Is there anything more I should do?

Once this is released, I was thinking about making part of the LAW graph database (say, graphs with <2B nodes) in this format. It would make it possible to test easily JGraphT on large graphs even with relatively little memory.

d-michail · 2021-03-03T16:20:07Z

Looks good. I will wait a bit to see if John has any comments.

jsichi

Just a few doc nits. Now we can proudly say, "JGraphT Sux!"

jsichi · 2021-03-04T07:09:59Z

jgrapht-unimi-dsi/src/main/java/org/jgrapht/sux4j/AbstractSuccinctDirectedGraph.java

+ * <var>k</var>-th element of the sequence and some bit shifting (the encoding
+ * <var>x</var><var>n</var> + <var>y</var> would be slightly more compact, but much slower to
+ * decode). Since we know the list of cumulative outdegrees, we know which range of indices
+ * corresponds to the edges outgoing from each vertex. If we need to now whether


typo: "know"

jsichi · 2021-03-04T07:11:57Z

jgrapht-unimi-dsi/src/main/java/org/jgrapht/sux4j/SuccinctIntUndirectedGraph.java

+ * {@link SparseIntUndirectedGraph}).
+ *
+ * <p>
+ * {@linkplain org.jgrapht.GraphIterables#outgoingEdgesOf(Object) Enumeration of edges} is is very


typo "is is"

jsichi · 2021-03-04T07:21:54Z

jgrapht-unimi-dsi/src/main/java/org/jgrapht/sux4j/SuccinctDirectedGraph.java

+ * are very fast and happen in almost constant time.
+ *
+ * <p>
+ * {@link SuccinctDirectedGraph} is a much slower implementation with a similar footprint using


I think this is supposed to reference SuccintIntDirectedGraph instead?

vigna · 2021-03-04T08:01:34Z

Thank you for the check! All fixed.

BTW, once I talked with a guy who wouldn't use the library because of the word pun 😂.

d-michail · 2021-03-08T08:37:33Z

Just merged, thanks!

vigna · 2021-03-08T18:23:51Z

Great! Any planned release date for 1.5.1?

jkinable · 2021-03-21T00:23:29Z

Great! Any planned release date for 1.5.1?

yes, yesterday :)

vigna · 2021-03-21T07:54:13Z

I don't know if this is the intended behavior, and I don't know if this might be an Ivy problem, but adding the artifact jgrapht to my ivy.xml dependencies does not bring in any jar. I have to manually point Ivy to, say, jgraph-unimi-dsi.

d-michail · 2021-03-21T10:15:37Z

Yes, you need to explicitly request the module that you want to use. The core library is the jgrapht-core.

jkinable · 2021-03-22T18:51:45Z

@vigna @d-michail
Similar to the webgraphs in #1002 , can we include a demo/tutorial on the usage/purpose of succinct graphs? Key points:

When to use a succinct graph representation: advantages/disadvantages and some examples of practical applications
What is the advantage of a succinct graph representation over existing graph implementations, in particular, how does a succinct graph representation compare against (1) the 'standard' jgrapht graphs, (2) the specialized implementations in the jgrapht-opt module, (3) the webgraph representation. It should be clear for the user which implementation he/she should pick.

Prior to this PR, I personally hadn't heard about 'succinct' graphs. This might be used in a very specific field? We should try to make this accessible to 'standard' users. Also, from the wiki I understand that succinct graphs are incredibly space efficient. The only (?) reason to make graph storage this efficient is when you intend to store a massive graph (massive in terms of nr of edges/vertices). The same question applies here: do the JGraphT algorithms perform well enough on those massive graphs? Here we can obviously limit ourselves to algorithms you would reasonably expect to execute on those graphs, e.g. you would not compute an exact TSP on a graph with 20MM vertices.

vigna · 2021-03-23T22:56:14Z

I tried to explain all this in the Javadoc, but we can rephrase this somewhere else. The point is that we already have succinct representation in WebGraph, but these are more targeted at JGraphT and in particular to the Python bridge. We plan to distribute all our datasets with less than 2^31 arcs in serialized succinct JGraphT form directly, so users can just load and use them (less friction than with adapters).

Succinct data structures are fairly recent, with progress in implementations starting in the mid-2000. For example, Facebook's graph and text index are all stored using partitioned Elias-Fano, a succinct data structure (you can see some public code here, but what they actually use is more complex). Lucene has a succinct Elias-Fano encoder.

The reason to use succinct graphs is simply that you can analyze in core memory much larger graphs. Access is asymptotically the same of a redundant format, with constant factors that can be large or small depending on the implementation.

vigna · 2021-03-26T10:15:13Z

We're having a look at the wiki with Dimitrios. Where would you think would be sensible to put information about this (and WebGraph adapters)?

jsichi · 2021-04-07T07:51:12Z

The correct place would be in the user guide (edit docs/guide-templates/UserOverview.md):

https://jgrapht.org/guide/UserOverview#graph-adapters

A new page (with code examples) linked from here would be best.

vigna added 7 commits February 18, 2021 09:46

Moved Sux4J files to unimi-dsi

b010ef2

Moved Sux4J files to unimi-dsi

83aba60

Fixed dependencies

8c23534

Fixed copyright

29288dc

Package info

cf99e53

Merge remote-tracking branch 'origin/master' into sux4j

30ff570

IntIntPair-based implementation

3968be3

Imported modifications from sux4j-test

3186973

vigna added 3 commits February 20, 2021 13:42

Imported modifications from sux4j-test

52a282b

Fixed javadoc

b82ffd4

Tweaking

de93f41

vigna force-pushed the sux4j branch from eddd21d to de93f41 Compare February 20, 2021 16:26

vigna added 9 commits February 20, 2021 16:43

Faster iteration

7b0500b

Improved iteration

d67ae25

Improved iteration

fb8bb54

Fixed checkstyle violations

9ed84b8

Fixed checkstyle violations

13e1c65

Now we implement serializable

f99dba9

Improved docs and enumeration

4cc4eb8

Improved docs

f01db34

Improved docs

be52580

vigna added 3 commits February 22, 2021 09:41

Improved docs

d3263c0

Made indexing function taking and returning longs

7b08537

Fixed javadoc

407ba04

Made statement about size precise

e27efe4

vigna added 2 commits February 24, 2021 11:09

Merge remote-tracking branch 'origin/master' into sux4j

703d8d1

Added constructors with choice of support for incoming arcs and const…

b5edc80

…ructors acceping a supplier of streams of edges

Fixed Javadoc

edd7fb2

jsichi approved these changes Mar 4, 2021

View reviewed changes

Fixed Javadoc

17dad55

d-michail merged commit c523396 into jgrapht:master Mar 8, 2021

vigna mentioned this pull request May 10, 2021

Dsi docs #1086

Merged

syoon2 mentioned this pull request Apr 12, 2024

Add Missing License Headers #1215

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Succinct graph representations #1041

Succinct graph representations #1041

vigna commented Feb 18, 2021 •

edited

d-michail commented Feb 20, 2021

d-michail commented Feb 20, 2021

vigna commented Feb 20, 2021

d-michail commented Feb 20, 2021

vigna commented Feb 20, 2021

vigna commented Feb 22, 2021

vigna commented Feb 23, 2021

d-michail commented Feb 23, 2021

vigna commented Feb 23, 2021

vigna commented Feb 24, 2021

vigna commented Mar 3, 2021

d-michail commented Mar 3, 2021

jsichi left a comment

jsichi Mar 4, 2021

jsichi Mar 4, 2021

jsichi Mar 4, 2021

vigna commented Mar 4, 2021

d-michail commented Mar 8, 2021

vigna commented Mar 8, 2021

jkinable commented Mar 21, 2021

vigna commented Mar 21, 2021

d-michail commented Mar 21, 2021

jkinable commented Mar 22, 2021

vigna commented Mar 23, 2021

vigna commented Mar 26, 2021

jsichi commented Apr 7, 2021

Succinct graph representations #1041

Succinct graph representations #1041

Conversation

vigna commented Feb 18, 2021 • edited

d-michail commented Feb 20, 2021

d-michail commented Feb 20, 2021

vigna commented Feb 20, 2021

d-michail commented Feb 20, 2021

vigna commented Feb 20, 2021

vigna commented Feb 22, 2021

vigna commented Feb 23, 2021

d-michail commented Feb 23, 2021

vigna commented Feb 23, 2021

vigna commented Feb 24, 2021

vigna commented Mar 3, 2021

d-michail commented Mar 3, 2021

jsichi left a comment

Choose a reason for hiding this comment

jsichi Mar 4, 2021

Choose a reason for hiding this comment

jsichi Mar 4, 2021

Choose a reason for hiding this comment

jsichi Mar 4, 2021

Choose a reason for hiding this comment

vigna commented Mar 4, 2021

d-michail commented Mar 8, 2021

vigna commented Mar 8, 2021

jkinable commented Mar 21, 2021

vigna commented Mar 21, 2021

d-michail commented Mar 21, 2021

jkinable commented Mar 22, 2021

vigna commented Mar 23, 2021

vigna commented Mar 26, 2021

jsichi commented Apr 7, 2021

vigna commented Feb 18, 2021 •

edited