PRPACK and ARPACK returns different results for undirected weighted graphs #972

Open
ntamas opened this Issue Nov 30, 2016 · 4 comments

Projects

None yet

2 participants

@ntamas
Member
ntamas commented Nov 30, 2016

This is an example in R from Daniel Piacek, but it seems to be related to the C core. Sorry for the missing images, use your imagination ;)

Hello to everyone,

First of all, I would like to thank the authors of these amazing package. I use it extensively in my > master thesis.
I am sorry for duplicate if there already exists same thread.

I have a huge network with couple of millions of edges and would like to calculate personalized PageRank. My network is attributed and I would like to use edge weights to calculate PageRank. I can either consider my network as directed or undirected, but in both cases my network is multigraph. I have encountered several 'strange' results when calculating weighted PageRank.

Here I will try to present it on simple networks without 'personalization'.

1a.) Example with directed symmetric network, with no multiple edges and no weights.

A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3)
g = graph.adjacency(A)

Then I compute PageRank with page_rank function and get the same result for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse as follows:

D = diag(1/apply(A,1,sum),nrow(A)); Beta = (1-0.85)*rep(1/nrow(A),nrow(A))
(pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A))

And as expected I get the same PageRank score. In addition, if I consider network as undirected I get the same scores, as expected.

1b.) The same example with edge weights:

set.seed(1)
E(g)$weight = round(runif(ecount(g)),1)

And again, using page_rank I get the same scores for both algorithms. Or I can get the same scores with weight matrix W as:

W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3)
D = diag(1/apply(W,1,sum),nrow(W)); Beta = (1-0.85)*rep(1/nrow(W),nrow(W))
(pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W))

However, if I consider this network being undirected, I get different scores, namely:

g_UN = as.undirected(g,'each') # do not collapse
(prUn1 = page_rank(g,algo='PRPACK', directed = F)$vector)
(prUn2 = page_rank(g,algo='ARPACK', directed = F)$vector)
(prUn3 = page_rank(g_UN, algo='PRPACK')$vector)
(prUn4 = page_rank(g_UN, algo='ARPACK')$vector)

Here scores given by 'ARPACK' are different to those given by 'PRPACK'. Moreover, with 'ARPACK' I get same scores for both g_UN(undirected network) and for directed=F. However 'PRPACK' gives different scores.
I can replicate results for 'ARPACK' with weight matrix, where I take sum/average of edge weights:

W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3)
D = diag(1/apply(W_UN,1,sum),nrow(W_UN))
(pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN))

2a.) Example with directed multigraph.

A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3)
g2 = graph.adjacency(A2)

Without considering edge weights both algorithms give the same scores, which can be also found as

 D = diag(1/apply(A2,1,sum),nrow(A2))
 (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2))

2b.) Same graph as undirected.
Here again both algorithms give the same score for both directed=F and as.undirected(g2,'each'), or:

A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3)
D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN))
(pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN))

gives the same scores.

2c.) Same graph with edge weights-directed.

set.seed(1)
E(g2)$weight = round(runif(ecount(g2)),1)

And again I get different scores

page_rank(g2,algo='PRPACK')$vector
page_rank(g2,algo='ARPACK')$vector

Where 'ARPACK' gives the same scores as with weight matrix W where edge weights for multiple
edges(considering direction) are summed

W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3)
D = diag(1/apply(W2,1,sum),nrow(W2))
(pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2))

2d.) Same graph with edge weights-undirected.

g2_UN = as.undirected(g2,'each') # do not collapse

Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK' differs for

(prUn1 = page_rank(g2,algo='PRPACK',directed = F)$vector)
(prUN2 = page_rank(g2_UN,algo='PRPACK')$vector)

while 'ARPACK' gives same scores for both. Or with weight matrix W I can replicate 'ARPACK' by summing edge weights:

W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T)
D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN))
(pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN)) 
@szhorvat
Contributor

Reproduced with Mathematica, confirming that it's in the C core.

But these are multigraphs (multiple undirected edges between the same vertices) and lots of things are shaky with multigraphs in igraph ...

@ntamas
Member
ntamas commented Nov 30, 2016

The strange thing is that I also tried this with simplify(as.undirected(g2), edge.attr.comb="sum") in R and the result seemed to be inconsistent as well - not sure why.

@szhorvat
Contributor

Isn't simplify(as.undirected(g2), edge.attr.comb="sum") the same as as.undirected(g2)? as.undirected already sums the weights. I cannot reproduce the problem with any simple graph, only with multigraphs.

@ntamas
Member
ntamas commented Nov 30, 2016

This is what I get now:

> A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3)
> g = graph.adjacency(A)
> set.seed(1)
> E(g)$weight = round(runif(ecount(g)),1)
> g_UN = simplify(as.undirected(g), edge.attr.comb="sum")
> is.simple(g_UN)
[1] TRUE
> page_rank(g,algo='PRPACK', directed = F)$vector
[1] 0.4870632 0.2470592 0.2658776
> page_rank(g,algo='ARPACK', directed = F)$vector
[1] 0.4864865 0.2191646 0.2943489
> page_rank(g_UN,algo='PRPACK', directed = F)$vector
[1] 0.4864865 0.2191646 0.2943489
> page_rank(g_UN,algo='ARPACK', directed = F)$vector
[1] 0.4864865 0.2191646 0.2943489

So, it seems like PRPACK seems to be doing something wrong if the original graph is simple and directed but it is instructed to treat it as undirected. I have an idea about what could be wrong; let me check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment