Centrality betweenness in Sage #18137

nathanncohen · 2015-04-07T19:39:44Z

I hate it that we do not appear in comparisons like the following, just because we are slower than the worst library :-P

http://graph-tool.skewed.de/performance

With this branch we can compute the betweenness centrality in Sage with a decent speed.

Nathann

P.S.: The version of the code that deals with rational instead of floats has been removed because it is much slower (60x in some cases), and because I did not see how to make the two coexist without copy/pasting most of the code.

CC: @dcoudert @sagetrac-borassi

Component: graph theory

Author: Nathann Cohen

Branch/Commit: 2db68fb

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/18137

The text was updated successfully, but these errors were encountered:

nathanncohen · 2015-04-07T19:40:31Z

Branch: u/ncohen/18137

nathanncohen · 2015-04-07T19:40:31Z

New commits:

`7b63297`	`trac #18137: Add new centrality module`
`0402abf`	`trac #18137: keep only the 'double' version, get rid of the rationals`

nathanncohen · 2015-04-07T19:40:31Z

Commit: 0402abf

dcoudert · 2015-04-07T20:08:10Z

comment:2

Hello,

I have only small remarks:

Wouldn't it be slightly faster to use arrays of bint rather than bitsets ? It would use more memory, but since you want to be fast, any saving is important.
You could add a doctest to compare the result of your implementation with networkx
You could pre-compute the ((n-1)*(n-2)), although its minor improvement.
You could add cdef double x
Variables k and d are not used.
You wrote from centrality import centrality_betweenness. Shouldn't it be from sage.graphs.centrality import centrality_betweenness ?

nathanncohen · 2015-04-07T20:22:58Z

comment:3

Hello,

Wouldn't it be slightly faster to use arrays of bint rather than bitsets ? It would use more memory, but since you want to be fast, any saving is important.

It would not be much faster, because most of what this array would contain is zeroes (bint is an int in memory). Plus the bottleneck is float computation in this case :-/

You could add a doctest to compare the result of your implementation with networkx

There is one, isn't there? In centrality.pyx. The one with a "tolerance" flag.

You could pre-compute the ((n-1)*(n-2)), although its minor improvement.

I don't think that it is worth it. Save a linear number of multiplications after all this work, really?.. :-P

You could add cdef double x

Done.

Variables k and d are not used.

Done.

You wrote from centrality import centrality_betweenness. Shouldn't it be from sage.graphs.centrality import centrality_betweenness ?

They are in the same folder, so it works. It is even more robust as a result, as we can move them wherever we want and the path does not change.

I also changed a Cython flag which checks for exceptions when doing float divisions.

Nathann

sagetrac-git · 2015-04-07T20:23:19Z

Changed commit from 0402abf to 47291d7

sagetrac-git · 2015-04-07T20:23:19Z

Branch pushed to git repo; I updated commit sha1. New commits:

`47291d7`	`trac #18137: Review`

fchapoton · 2015-04-08T06:21:49Z

comment:5

there is numerical noise, add tolerance, see patchbot report.

sagetrac-git · 2015-04-08T07:15:55Z

Changed commit from 47291d7 to 421dc01

sagetrac-git · 2015-04-08T07:15:55Z

Branch pushed to git repo; I updated commit sha1. New commits:

`421dc01`	`trac #18137: Numerical noise`

nathanncohen · 2015-04-08T07:16:42Z

comment:8

Another thing for which pathbot save us :-P

Thanks,

Nathann

videlec · 2015-04-08T11:55:21Z

comment:10

The advantage of using rationals is that it was exact! Here you are using floats but without any guarantee on the result. Aren't you? Do you have an estimate on the error depending on the number of vertices/edges? One solution solution would be to use ball arithmetic that also produce a bound on the error (see the recently added arb package). Or interval arithmetic (but that is slower).

Vincent

dcoudert · 2015-04-08T12:35:42Z

comment:11

Although Nathann would prefer not to, we could have 2 versions of the code, the fast one as default, and a slower exact one.
David.

nathanncohen · 2015-04-08T12:51:34Z

comment:12

Hello,

The advantage of using rationals is that it was exact!

And this is the very reason why I wrote both implementations.

I am not so sure that it is a very big problem, however, as the algorithm will not add noise to noise like it can happen for PDE computations.

The current version of centrality_betweenness, from NetworkX shipped with Sage, also computes on floats (in Python):

    sage: import networkx
    sage: networkx.algorithms.centrality.betweenness._accumulate_basic??

I wanted to check how Boost does it, but I was not able to locate the source code (God, how can anyone read those files???).

(15 minutes later)

Here it is! Line 338 of:

http://boost.cvs.sourceforge.net/viewvc/boost/boost/boost/graph/betweenness_centrality.hpp?annotate=1.2.6.1

So the answer is that "it depends of dependency_type", which is.. A template.

For igraph it is apparently a double too:
https://github.com/igraph/igraph/blob/master/src/centrality.c#L1685
https://github.com/igraph/igraph/blob/master/src/centrality.c#L1804

For graph-tools (last of the libraries compared on the link in the ticket's description) it is apparently a double too, though I can't make sure for I do not find the get_betweenness function

https://git.skewed.de/count0/graph-tool/blob/master/src/graph_tool/centrality/__init__.py#L326

Sooooooo please don't just limit your argumentation to "not exact=BAD". I care about this, and for this reason I implemented both (which definitely took more than a couple of minutes as you can imagine), but I do believe that for this kind of computations working on floats is not that bad, for I know when the divisions occur and, well, we do not mind much.

I would personally be very happy to have both in Sage, with an easy flag to switch from one implementation to the other. If you just checkout the first of my commits you will see that only one variable need to be changed so that double become rationals. My trouble is that using Cython's preprocessor instructions requires to run sage -b, and we do not want that.

I would also like to NOT have the same code copy/pasted twice, and to not pay for the 'if' inside of the loops.

I would be happy to have both if there is a free (in terms of computations) way to handle both at once, and a cheap (in term of lines of code) way to have both.

So far I did not find any way out, and I thought that the best was to have what everybody seemds interested in: computations on double (we can also turn them into 'long double' if necessary).

Nathann

P.S.: I uploaded a commit with both versions so that it will be available somewhere (and not on my computer only) if we ever need that implementation. I did that on purpose, to have it archived somewhere.

videlec · 2015-04-08T13:02:51Z

comment:13

Replying to @nathanncohen:

Sooooooo please don't just limit your argumentation to "not exact=BAD".

Do not oversimplify. My argumentation was "not exact => extra care needed". Floats are wonderful because they are very fast.

And this is the very reason why I wrote both implementations.

Would be interesting to investigate (experimentally) the error propagation.

I am not so sure that it is a very big problem, however, as the algorithm will not add noise to noise like it can happen for PDE computations.

Already summing (a lot of) floating point numbers create problems. Simple (not so dramatic) example

sage: sum(1.r/n for n in range(1,10000000))
16.69531126585727
sage: sum(1.r/n for n in range(9999999,0,-1))
16.695311265859964

If you mix that with division, it is of course even worse.

I care about this, and for this reason I implemented both (which definitely took more than a couple of minutes as you can imagine), but I do believe that for this kind of computations working on floats is not that bad, for I know when the divisions occur and, well, we do not mind much.

I also believe so, but it would be better if we were sure and the documentation mentioned it. Something like: if you do have a graph with m vertices and n edges than the worst case is an error of function(m,n).

Vincent

nathanncohen · 2015-04-08T13:20:49Z

comment:14

Hello !

I agree that float operations make errors, but I do not know how to evaluate it. I expect the relative error to stay very very small in those cases, and in the graphs that are of interest for the networks community.

Would you know a trick to have both implementations available in the code (without recompilation)? I do not think that we can have 'real templates' in Cython, can we?

Nathann

nathanncohen · 2015-04-08T15:44:28Z

comment:15

Okay. Here it is. It cost me the last four hours.

Nathann

sagetrac-git · 2015-04-08T15:44:56Z

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

`2f2fbd4`	`trac #18137: Add new centrality module`

sagetrac-git · 2015-04-08T15:44:56Z

Changed commit from 421dc01 to 2f2fbd4

videlec · 2015-04-08T15:47:40Z

comment:17

Replying to @nathanncohen:

Okay. Here it is. It cost me the last four hours.

Youhou! You initiated me to the world of Cython templating!

I am having a careful look right now.

sagetrac-borassi · 2015-04-09T14:50:02Z

comment:44

Sorry, this is the "mistake due to lack of experience". I thought "positive review" meant that I was happy with the code, but now I understand it is much more. I think it's better to leave this issue to more experienced people.

nathanncohen · 2015-04-09T14:53:53Z

comment:45

No proooooob!!! If you have some spare time you can read our manual a bit. Reviewing a ticket is not very complicated and the 'technical checks' do not take more than a couple of minutes once you get used to them. And of course you can ask us any question if the manual isn't clear ;-)

Nathann

dcoudert · 2015-04-09T17:48:05Z

comment:46

One issue:

sage: G = Graph()
sage: G.centrality_betweenness()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
...
ValueError: bitset capacity must be greater than 0

Did you know that your code is also working for multi-graphs?

sage: G = Graph(multiedges=True)
sage: G.add_edge(0,1)
sage: G.add_edge(0,1)
sage: G.add_edge(1,2)
sage: G.add_edge(2,3)
sage: G.add_edge(0,3)
sage: G.centrality_betweenness(exact=1)
{0: 2/9, 1: 2/9, 2: 1/9, 3: 1/9}

sagetrac-git · 2015-04-10T06:52:58Z

Changed commit from c84caef to c9f83e7

sagetrac-git · 2015-04-10T06:52:58Z

Branch pushed to git repo; I updated commit sha1. New commits:

`c9f83e7`	`trac #18137: Trivial cases... As always.`

nathanncohen · 2015-04-10T06:55:12Z

comment:48

One issue:

Right. Fixed.

Did you know that your code is also working for multi-graphs?

Yeah, that was a good news! At some point I wondered whether I should add a 'scream if not simple' somewhere, then figured out that it worked fine. It also extends the definition in the most natural way, i.e. by considering a path as a set of edges instead of a set of vertices.

And it also works for loops :-PPPP

Nathann

dcoudert · 2015-04-10T15:29:52Z

Reviewer: David Coudert

dcoudert · 2015-04-10T15:29:52Z

comment:49

Good.

vbraun · 2015-04-12T11:57:04Z

comment:50

I'm getting this on 32-bit. You should probably add an # abs tol

sage -t --long src/sage/graphs/graph.py
**********************************************************************
File "src/sage/graphs/graph.py", line 4662, in sage.graphs.graph.Graph.?
Failed example:
    (graphs.ChvatalGraph()).centrality_betweenness(
      normalized=False) # abs tol abs 1e10
Expected:
    {0: 3.833333333333333, 1: 3.833333333333333, 2: 3.333333333333333,
     3: 3.333333333333333, 4: 3.833333333333333, 5: 3.833333333333333,
     6: 3.333333333333333, 7: 3.333333333333333, 8: 3.333333333333333,
     9: 3.333333333333333, 10: 3.333333333333333,
     11: 3.333333333333333}
Got:
    {0: 3.833333333333333,
     1: 3.833333333333333,
     2: 3.333333333333333,
     3: 3.333333333333333,
     4: 3.833333333333333,
     5: 3.833333333333333,
     6: 3.333333333333333,
     7: 3.3333333333333335,
     8: 3.333333333333333,
     9: 3.333333333333333,
     10: 3.333333333333333,
     11: 3.333333333333333}
**********************************************************************
1 item had failures:
   1 of  66 in sage.graphs.graph.Graph.?
    [652 tests, 1 failure, 15.40 s]

sagetrac-git · 2015-04-12T12:59:16Z

Changed commit from c9f83e7 to dddf502

sagetrac-git · 2015-04-12T12:59:16Z

Branch pushed to git repo; I updated commit sha1. New commits:

`dddf502`	`trac #18137: broken doctest`

sagetrac-git · 2015-04-12T13:01:26Z

Changed commit from dddf502 to 2db68fb

sagetrac-git · 2015-04-12T13:01:26Z

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. This was a forced push. New commits:

`2db68fb`	`trac #18137: broken doctest`

vbraun · 2015-04-14T19:43:06Z

Changed branch from public/18137 to 2db68fb

nathanncohen mannequin added this to the sage-6.6 milestone Apr 7, 2015

nathanncohen mannequin added c: graph theory labels Apr 7, 2015

nathanncohen mannequin added the s: needs review label Apr 7, 2015

fchapoton added s: needs work and removed s: needs review labels Apr 8, 2015

nathanncohen mannequin added s: needs review and removed s: needs work labels Apr 8, 2015

This comment has been minimized.

Sign in to view

sagetrac-borassi mannequin added s: needs review and removed s: positive review labels Apr 9, 2015

dcoudert added s: needs work and removed s: needs review labels Apr 9, 2015

dcoudert added s: positive review and removed s: needs work labels Apr 10, 2015

vbraun added s: needs work and removed s: positive review labels Apr 12, 2015

nathanncohen mannequin added s: positive review and removed s: needs work labels Apr 12, 2015

sagetrac-git mannequin added s: needs review and removed s: positive review labels Apr 12, 2015

nathanncohen mannequin added s: positive review and removed s: needs review labels Apr 12, 2015

vbraun removed the s: positive review label Apr 14, 2015

vbraun closed this as completed in c33fb36 Apr 14, 2015

Centrality betweenness in Sage #18137

Centrality betweenness in Sage #18137

Comments

nathanncohen mannequin commented Apr 7, 2015

nathanncohen mannequin commented Apr 7, 2015

nathanncohen mannequin commented Apr 7, 2015

nathanncohen mannequin commented Apr 7, 2015

dcoudert commented Apr 7, 2015

nathanncohen mannequin commented Apr 7, 2015

sagetrac-git mannequin commented Apr 7, 2015

sagetrac-git mannequin commented Apr 7, 2015

fchapoton commented Apr 8, 2015

sagetrac-git mannequin commented Apr 8, 2015

sagetrac-git mannequin commented Apr 8, 2015

nathanncohen mannequin commented Apr 8, 2015

This comment has been minimized.

videlec commented Apr 8, 2015

dcoudert commented Apr 8, 2015

nathanncohen mannequin commented Apr 8, 2015

videlec commented Apr 8, 2015

nathanncohen mannequin commented Apr 8, 2015

nathanncohen mannequin commented Apr 8, 2015

sagetrac-git mannequin commented Apr 8, 2015

sagetrac-git mannequin commented Apr 8, 2015

videlec commented Apr 8, 2015

sagetrac-borassi mannequin commented Apr 9, 2015

nathanncohen mannequin commented Apr 9, 2015

dcoudert commented Apr 9, 2015

sagetrac-git mannequin commented Apr 10, 2015

sagetrac-git mannequin commented Apr 10, 2015

nathanncohen mannequin commented Apr 10, 2015

dcoudert commented Apr 10, 2015

dcoudert commented Apr 10, 2015

vbraun commented Apr 12, 2015

sagetrac-git mannequin commented Apr 12, 2015

sagetrac-git mannequin commented Apr 12, 2015

sagetrac-git mannequin commented Apr 12, 2015

sagetrac-git mannequin commented Apr 12, 2015

vbraun commented Apr 14, 2015