# Weighted betweenness depends on edge ordering #950

Open
opened this Issue Jun 22, 2016 · 4 comments

None yet

### 2 participants

Contributor
commented Jun 22, 2016 edited
 I came across this thread today where someone shows that the betweenness returned by igraph depends on the ordering of edges. http://stats.stackexchange.com/q/180550/4764 Notes: this is not a multigraph, i.e. there's at most one edge connecting a directed pair there are self loops, but removing them does't change the behaviour the problem is there in the latest development version; I reproduced it using the Mathematica interface the result isn't changed if we use bigints the graph is weighted I'll copy the code here for completeness: ``````#1st run, with unordered edges edges <- read.csv("example.csv", col.names=c("src", "dest"), colClasses = "character") social.graph <- graph.edgelist(as.matrix(edges), directed=T) social.graph <- graph.adjacency(get.adjacency(social.graph), weighted=TRUE) E(social.graph)\$weight <- 1 / E(social.graph)\$weight set.seed(1) between1 <- betweenness(social.graph) #2nd run, with now ordered edges edges <- edges[order(edges[,1], edges[,2]),] social.graph <- graph.edgelist(as.matrix(edges), directed=T) social.graph <- graph.adjacency(get.adjacency(social.graph), weighted=TRUE) E(social.graph)\$weight <- 1 / E(social.graph)\$weight set.seed(1) between2 <- betweenness(social.graph) print(merge(data.frame(between1=between1, names=names(between1)), data.frame(between2=between2, names=names(between2)))) `````` And the data, `example.csv`: ``````SRC, DEST 1960,3918 1398,1356 1352,1350 1284,1320 1350,1960 3918,1960 1350,1352 1320,1284 1352,1358 1405,1439 1350,1358 1345,1350 1336,1387 1336,1284 1345,1405 1356,1960 1356,1960 1405,1345 1284,1336 1350,1284 1398,1960 1398,1405 1439,1284 1405,1398 1398,1350 1352,1341 1345,1341 1387,1345 1352,1356 1352,1345 1352,1398 1284,1439 1352,1356 1352,1320 1352,3918 1341,1320 1320,1341 1352,1387 1387,1336 1398,1960 1341,1960 1387,1341 1405,1352 1341,1356 1960,1341 3918,1341 1320,1960 1320,3918 1345,3918 1320,1960 1960,1352 1352,1960 1341,1345 1398,1284 1960,1320 1352,1405 1284,1284 3918,3918 1336,1960 1398,1356 1352,1960 1358,1345 3918,1320 1358,1352 1320,3918 1960,3918 1341,1320 1352,1336 1352,1284 1398,1387 1352,1405 1345,1336 1352,1960 3918,1398 1960,1320 1320,1341 1341,1398 1960,1352 1398,1284 1341,1350 1341,3918 1960,1341 1405,1439 1336,1405 1336,1356 1336,3918 1960,1320 1352,1398 1320,1398 1345,1352 1345,1358 1405,1356 1439,1284 1350,1341 1405,1960 1405,1352 1350,1345 1356,1405 1960,1398 1960,1352 1960,1356 1960,1356 1284,1398 1358,1398 1352,1320 1356,1341 1960,1356 1398,1960 3918,1284 1439,1405 1960,1398 1439,1405 1336,1439 1387,1398 1320,1336 1304,1320 1320,1345 1960,1352 1960,3918 1439,1336 1960,1345 1336,1352 1336,1352 1336,1320 1398,1341 1341,1387 1398,3918 1398,1320 1356,1352 1398,1352 1398,1358 1439,1398 1398,1405 1439,1352 1336,1345 1284,1398 1284,1350 1284,1352 1284,3918 1352,1284 1320,1352 1439,1356 3918,1960 1284,1439 1356,1398 1320,1352 1320,1352 3918,1320 1352,1387 1352,1439 1341,1960 1304,3918 1352,1320 1345,1341 3918,1352 1356,1345 1345,1356 1341,1350 1439,1352 1352,1350 1345,1320 1350,1387 1398,1387 1345,1341 1345,1350 1387,1350 1960,1350 1350,1398 1387,1387 1387,1352 1341,1320 3918,1345 1320,1341 1341,1352 1405,1398 1387,1336 1356,1336 1336,1284 1336,1387 1960,1336 1405,1352 1960,1320 1960,1405 1352,1960 1352,1439 1341,1345 1352,1439 1352,1336 1352,1341 1398,1341 1345,1387 1398,1439 1960,1352 1350,1341 1387,1398 1320,1960 1352,1320 ``````
Contributor
commented Jun 27, 2016 edited
 Just wanted to note that the result NetworkX gives agrees with the second column in the output: `````` names between1 between2 1 1284 0.5833333 0.5833333 2 1304 0.0000000 0.0000000 3 1320 21.7500000 21.7500000 4 1336 1.7500000 1.7500000 5 1341 15.0833333 14.0833333 6 1345 2.0000000 2.0000000 7 1350 0.7500000 0.7500000 8 1352 74.2500000 75.2500000 9 1356 0.0000000 0.0000000 10 1358 0.0000000 0.0000000 11 1387 1.8333333 1.8333333 12 1398 16.0000000 16.0000000 13 1405 0.5833333 0.5833333 14 1439 1.0000000 1.0000000 15 1960 34.0000000 34.0000000 16 3918 0.0000000 0.0000000 ``````
Member
commented Jun 27, 2016
 I was wondering whether this could be due to roundoff errors; in one of the cases, the path length calculations could be performed in an order where igraph correctly identifies two alternative paths between nodes A and B as being exactly equal in length, while in the other case the two path lengths differ by a very small amount (e.g., 0.0000000000001) due to floating-point roundoff errors, therefore igraph ignores one of them (since it is not the shortest path between A and B). The fact that NetworkX agrees with one solution does not necessarily mean that the solution is correct; it only means that NetworkX's implementation happens to perform floating-point operations in an identical or similar order that yields approximately the same roundoff errors. Note that if you replace `E(social.graph)\$weight <- 1 / E(social.graph)\$weight` with `E(social.graph)\$weight <- round(10000000 / E(social.graph)\$weight)` to force all the weights to be integers, then the bug goes away. If this is the case, then in theory we could work around it by allowing the user to pass an "epsilon" value to the betweenness function; path lengths that differ only by epsilon or less in absolute value could then be considered as identical in length. However, this is still an ugly workaround, and providing a default value for `epsilon` will only hide the issue of roundoff errors from the user but not really fix it (unless there is a theoretically sound way of determining the right epsilon, given a graph). I'd rather put a warning in the documentation and ask the user to quantize the weights to integers to ensure a consistent result.
Contributor
 When viewed in this light, I wouldn't call it a bug any more. I agree that a warning in the documentation is better than adding an epsilon parameter.
Contributor
 It really is a pathological case because these are exactly the kinds of fractions which tend to add up to integers: 1/3 + 1/3 + 1/3 = 1. And they may fail to add up to an exact one. In fact rounding makes it more likely that they won't, unless it's done carefully. The result from rounded weights can also be imprecise (although it is stable with respect to vertex reordering). A more careful rounding may be needed. Before inverting the weights, they were integers. The largest one was 5. So we should use the least common multiple of 1,2,3,4,5, i.e. 60, in the rounding, like this: ``````round(60 / E(social.graph)\$weight) `````` 10000000 is not divisible by 3 so it may not avoid the problem.