PRPACK and ARPACK returns different results for undirected weighted graphs #972

Open
opened this Issue Nov 30, 2016 · 4 comments

None yet

2 participants

Member
commented Nov 30, 2016
 This is an example in R from Daniel Piacek, but it seems to be related to the C core. Sorry for the missing images, use your imagination ;) Hello to everyone, First of all, I would like to thank the authors of these amazing package. I use it extensively in my > master thesis. I am sorry for duplicate if there already exists same thread. I have a huge network with couple of millions of edges and would like to calculate personalized PageRank. My network is attributed and I would like to use edge weights to calculate PageRank. I can either consider my network as directed or undirected, but in both cases my network is multigraph. I have encountered several 'strange' results when calculating weighted PageRank. Here I will try to present it on simple networks without 'personalization'. 1a.) Example with directed symmetric network, with no multiple edges and no weights. ``````A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3) g = graph.adjacency(A) `````` Then I compute PageRank with page_rank function and get the same result for both "PRPACK" and "ARPACK" algo. I can also compute via matrix inverse as follows: ``````D = diag(1/apply(A,1,sum),nrow(A)); Beta = (1-0.85)*rep(1/nrow(A),nrow(A)) (pr.in = Beta%*%solve(diag(1,nrow(A))-0.85*D%*%A)) `````` And as expected I get the same PageRank score. In addition, if I consider network as undirected I get the same scores, as expected. 1b.) The same example with edge weights: ``````set.seed(1) E(g)\$weight = round(runif(ecount(g)),1) `````` And again, using page_rank I get the same scores for both algorithms. Or I can get the same scores with weight matrix W as: ``````W = matrix(c(0,0.3,0.4,0.6,0,0,0.9,0,0),byrow=T,nrow=3) D = diag(1/apply(W,1,sum),nrow(W)); Beta = (1-0.85)*rep(1/nrow(W),nrow(W)) (pr.in = Beta%*%solve(diag(1,nrow(W))-0.85*D%*%W)) `````` However, if I consider this network being undirected, I get different scores, namely: ``````g_UN = as.undirected(g,'each') # do not collapse (prUn1 = page_rank(g,algo='PRPACK', directed = F)\$vector) (prUn2 = page_rank(g,algo='ARPACK', directed = F)\$vector) (prUn3 = page_rank(g_UN, algo='PRPACK')\$vector) (prUn4 = page_rank(g_UN, algo='ARPACK')\$vector) `````` Here scores given by 'ARPACK' are different to those given by 'PRPACK'. Moreover, with 'ARPACK' I get same scores for both g_UN(undirected network) and for directed=F. However 'PRPACK' gives different scores. I can replicate results for 'ARPACK' with weight matrix, where I take sum/average of edge weights: ``````W_UN = matrix(c(0,0.9,1.3,0.9,0,0,1.3,0,0),byrow=T,nrow=3) D = diag(1/apply(W_UN,1,sum),nrow(W_UN)) (pr.in = Beta%*%solve(diag(1,nrow(W_UN))-0.85*D%*%W_UN)) `````` 2a.) Example with directed multigraph. ``````A2 = matrix(c(0,2,1,1,0,0,1,1,0),byrow=T,nrow=3) g2 = graph.adjacency(A2) `````` Without considering edge weights both algorithms give the same scores, which can be also found as `````` D = diag(1/apply(A2,1,sum),nrow(A2)) (pr.in = Beta%*%solve(diag(1,nrow(A2))-0.85*D%*%A2)) `````` 2b.) Same graph as undirected. Here again both algorithms give the same score for both directed=F and as.undirected(g2,'each'), or: ``````A2_UN = matrix(c(0,3,2,3,0,1,2,1,0),byrow=T,nrow=3) D = diag(1/apply(A2_UN,1,sum),nrow(A2_UN)) (pr.in = Beta%*%solve(diag(1,nrow(A2_UN))-0.85*D%*%A2_UN)) `````` gives the same scores. 2c.) Same graph with edge weights-directed. ``````set.seed(1) E(g2)\$weight = round(runif(ecount(g2)),1) `````` And again I get different scores ``````page_rank(g2,algo='PRPACK')\$vector page_rank(g2,algo='ARPACK')\$vector `````` Where 'ARPACK' gives the same scores as with weight matrix W where edge weights for multiple edges(considering direction) are summed ``````W2 = matrix(c(0,0.7,0.6,0.9,0,0,0.2,0.9,0),byrow=T,nrow=3) D = diag(1/apply(W2,1,sum),nrow(W2)) (pr.in = Beta%*%solve(diag(1,nrow(W2))-0.85*D%*%W2)) `````` 2d.) Same graph with edge weights-undirected. ``````g2_UN = as.undirected(g2,'each') # do not collapse `````` Again, 'PRPACK' and 'ARPACK' give different scores. Moreover, 'PRPACK' differs for ``````(prUn1 = page_rank(g2,algo='PRPACK',directed = F)\$vector) (prUN2 = page_rank(g2_UN,algo='PRPACK')\$vector) `````` while 'ARPACK' gives same scores for both. Or with weight matrix W I can replicate 'ARPACK' by summing edge weights: ``````W2_UN = matrix(c(0,1.6,0.8,1.6,0,0.9,0.8,0.9,0),nrow=3,byrow = T) D = diag(1/apply(W2_UN,1,sum),nrow(W2_UN)) (pr.in = Beta%*%solve(diag(1,nrow(W2_UN))-0.85*D%*%W2_UN)) ``````
Contributor
 Reproduced with Mathematica, confirming that it's in the C core. But these are multigraphs (multiple undirected edges between the same vertices) and lots of things are shaky with multigraphs in igraph ...
Member
commented Nov 30, 2016
 The strange thing is that I also tried this with `simplify(as.undirected(g2), edge.attr.comb="sum")` in R and the result seemed to be inconsistent as well - not sure why.
Contributor
 Isn't `simplify(as.undirected(g2), edge.attr.comb="sum")` the same as `as.undirected(g2)`? `as.undirected` already sums the weights. I cannot reproduce the problem with any simple graph, only with multigraphs.
Member
commented Nov 30, 2016
 This is what I get now: ``````> A = matrix(c(0,1,1,1,0,0,1,0,0),byrow=T,nrow=3) > g = graph.adjacency(A) > set.seed(1) > E(g)\$weight = round(runif(ecount(g)),1) > g_UN = simplify(as.undirected(g), edge.attr.comb="sum") > is.simple(g_UN) [1] TRUE > page_rank(g,algo='PRPACK', directed = F)\$vector [1] 0.4870632 0.2470592 0.2658776 > page_rank(g,algo='ARPACK', directed = F)\$vector [1] 0.4864865 0.2191646 0.2943489 > page_rank(g_UN,algo='PRPACK', directed = F)\$vector [1] 0.4864865 0.2191646 0.2943489 > page_rank(g_UN,algo='ARPACK', directed = F)\$vector [1] 0.4864865 0.2191646 0.2943489 `````` So, it seems like PRPACK seems to be doing something wrong if the original graph is simple and directed but it is instructed to treat it as undirected. I have an idea about what could be wrong; let me check.