Weighted betweenness depends on edge ordering #950

Open
szhorvat opened this Issue Jun 22, 2016 · 4 comments

Projects

None yet

2 participants

@szhorvat
Contributor
szhorvat commented Jun 22, 2016 edited

I came across this thread today where someone shows that the betweenness returned by igraph depends on the ordering of edges.

http://stats.stackexchange.com/q/180550/4764

Notes:

  • this is not a multigraph, i.e. there's at most one edge connecting a directed pair
  • there are self loops, but removing them does't change the behaviour
  • the problem is there in the latest development version; I reproduced it using the Mathematica interface
  • the result isn't changed if we use bigints
  • the graph is weighted

I'll copy the code here for completeness:

#1st run, with unordered edges
edges <- read.csv("example.csv", col.names=c("src", "dest"), colClasses = "character")

social.graph <- graph.edgelist(as.matrix(edges), directed=T)
social.graph <- graph.adjacency(get.adjacency(social.graph), weighted=TRUE)

E(social.graph)$weight <- 1 / E(social.graph)$weight

set.seed(1)
between1 <- betweenness(social.graph)

#2nd run, with now ordered edges
edges <- edges[order(edges[,1], edges[,2]),]

social.graph <- graph.edgelist(as.matrix(edges), directed=T)
social.graph <- graph.adjacency(get.adjacency(social.graph), weighted=TRUE)

E(social.graph)$weight <- 1 / E(social.graph)$weight
set.seed(1)

between2 <- betweenness(social.graph)

print(merge(data.frame(between1=between1, names=names(between1)),
            data.frame(between2=between2, names=names(between2))))

And the data, example.csv:

SRC, DEST
1960,3918
1398,1356
1352,1350
1284,1320
1350,1960
3918,1960
1350,1352
1320,1284
1352,1358
1405,1439
1350,1358
1345,1350
1336,1387
1336,1284
1345,1405
1356,1960
1356,1960
1405,1345
1284,1336
1350,1284
1398,1960
1398,1405
1439,1284
1405,1398
1398,1350
1352,1341
1345,1341
1387,1345
1352,1356
1352,1345
1352,1398
1284,1439
1352,1356
1352,1320
1352,3918
1341,1320
1320,1341
1352,1387
1387,1336
1398,1960
1341,1960
1387,1341
1405,1352
1341,1356
1960,1341
3918,1341
1320,1960
1320,3918
1345,3918
1320,1960
1960,1352
1352,1960
1341,1345
1398,1284
1960,1320
1352,1405
1284,1284
3918,3918
1336,1960
1398,1356
1352,1960
1358,1345
3918,1320
1358,1352
1320,3918
1960,3918
1341,1320
1352,1336
1352,1284
1398,1387
1352,1405
1345,1336
1352,1960
3918,1398
1960,1320
1320,1341
1341,1398
1960,1352
1398,1284
1341,1350
1341,3918
1960,1341
1405,1439
1336,1405
1336,1356
1336,3918
1960,1320
1352,1398
1320,1398
1345,1352
1345,1358
1405,1356
1439,1284
1350,1341
1405,1960
1405,1352
1350,1345
1356,1405
1960,1398
1960,1352
1960,1356
1960,1356
1284,1398
1358,1398
1352,1320
1356,1341
1960,1356
1398,1960
3918,1284
1439,1405
1960,1398
1439,1405
1336,1439
1387,1398
1320,1336
1304,1320
1320,1345
1960,1352
1960,3918
1439,1336
1960,1345
1336,1352
1336,1352
1336,1320
1398,1341
1341,1387
1398,3918
1398,1320
1356,1352
1398,1352
1398,1358
1439,1398
1398,1405
1439,1352
1336,1345
1284,1398
1284,1350
1284,1352
1284,3918
1352,1284
1320,1352
1439,1356
3918,1960
1284,1439
1356,1398
1320,1352
1320,1352
3918,1320
1352,1387
1352,1439
1341,1960
1304,3918
1352,1320
1345,1341
3918,1352
1356,1345
1345,1356
1341,1350
1439,1352
1352,1350
1345,1320
1350,1387
1398,1387
1345,1341
1345,1350
1387,1350
1960,1350
1350,1398
1387,1387
1387,1352
1341,1320
3918,1345
1320,1341
1341,1352
1405,1398
1387,1336
1356,1336
1336,1284
1336,1387
1960,1336
1405,1352
1960,1320
1960,1405
1352,1960
1352,1439
1341,1345
1352,1439
1352,1336
1352,1341
1398,1341
1345,1387
1398,1439
1960,1352
1350,1341
1387,1398
1320,1960
1352,1320
@szhorvat
Contributor
szhorvat commented Jun 27, 2016 edited

Just wanted to note that the result NetworkX gives agrees with the second column in the output:

   names   between1   between2
1   1284  0.5833333  0.5833333
2   1304  0.0000000  0.0000000
3   1320 21.7500000 21.7500000
4   1336  1.7500000  1.7500000
5   1341 15.0833333 14.0833333
6   1345  2.0000000  2.0000000
7   1350  0.7500000  0.7500000
8   1352 74.2500000 75.2500000
9   1356  0.0000000  0.0000000
10  1358  0.0000000  0.0000000
11  1387  1.8333333  1.8333333
12  1398 16.0000000 16.0000000
13  1405  0.5833333  0.5833333
14  1439  1.0000000  1.0000000
15  1960 34.0000000 34.0000000
16  3918  0.0000000  0.0000000
@ntamas
Member
ntamas commented Jun 27, 2016

I was wondering whether this could be due to roundoff errors; in one of the cases, the path length calculations could be performed in an order where igraph correctly identifies two alternative paths between nodes A and B as being exactly equal in length, while in the other case the two path lengths differ by a very small amount (e.g., 0.0000000000001) due to floating-point roundoff errors, therefore igraph ignores one of them (since it is not the shortest path between A and B). The fact that NetworkX agrees with one solution does not necessarily mean that the solution is correct; it only means that NetworkX's implementation happens to perform floating-point operations in an identical or similar order that yields approximately the same roundoff errors.

Note that if you replace E(social.graph)$weight <- 1 / E(social.graph)$weight with E(social.graph)$weight <- round(10000000 / E(social.graph)$weight) to force all the weights to be integers, then the bug goes away.

If this is the case, then in theory we could work around it by allowing the user to pass an "epsilon" value to the betweenness function; path lengths that differ only by epsilon or less in absolute value could then be considered as identical in length. However, this is still an ugly workaround, and providing a default value for epsilon will only hide the issue of roundoff errors from the user but not really fix it (unless there is a theoretically sound way of determining the right epsilon, given a graph). I'd rather put a warning in the documentation and ask the user to quantize the weights to integers to ensure a consistent result.

@szhorvat
Contributor

When viewed in this light, I wouldn't call it a bug any more. I agree that a warning in the documentation is better than adding an epsilon parameter.

@szhorvat
Contributor

It really is a pathological case because these are exactly the kinds of fractions which tend to add up to integers: 1/3 + 1/3 + 1/3 = 1. And they may fail to add up to an exact one. In fact rounding makes it more likely that they won't, unless it's done carefully. The result from rounded weights can also be imprecise (although it is stable with respect to vertex reordering).

A more careful rounding may be needed. Before inverting the weights, they were integers. The largest one was 5. So we should use the least common multiple of 1,2,3,4,5, i.e. 60, in the rounding, like this:

round(60 / E(social.graph)$weight)

10000000 is not divisible by 3 so it may not avoid the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment