Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PageRank: ARPACK method sometimes fails and produces negative values #1641

Open
szhorvat opened this issue Jan 30, 2021 · 14 comments
Open

PageRank: ARPACK method sometimes fails and produces negative values #1641

szhorvat opened this issue Jan 30, 2021 · 14 comments
Labels
todo Triaged for implementation in some unspecified future version

Comments

@szhorvat
Copy link
Member

szhorvat commented Jan 30, 2021

Describe the bug
igraph_pagerank() with the ARPACK method sometimes fails, produces an eigenvalue that is not 1, and produces negative results or even non-finite values (not sure if NaN or Infinity).

To reproduce
This is easy to reproduce with fuzzing: create a directed random graph with at least 8 vertices, and run the function many times. Be sure to check if the eigenvalue differs from 1. Eventually you will find a case where it does. I see no correlation with the graph structure that causes the failure. Here are some examples:

One:
image

Two:
image

Three:
image

Four:
image

The failure is non-deterministic and is somehow related to the random (!!) way that igraph_pagerank initialized ARPACK's starting vector.

Example in Mathematica:

Do[
 Check[
  g = RandomGraph[BernoulliGraphDistribution[8, p] (* G(8,p) model *), DirectedEdges -> True];
  r = IGPageRank[g, Method -> "Arnoldi"];, Break[]],
 {p, 0.1, 1, 1./5000} (* loop and scan range of p from 0.1 to 1, but this is really reproducible with any p *)
]

Output:

IGraphM::warning: The PageRank eigenvalue should be 1, actual eigenvalue is -0.425. Possible convergence problem.

PageRank values:

{0.0367487,  7.48975*10^14, 0.191093, -0.628403, -0.058798, -7.48975*10^14, -0.110246, -0.407911}

Graph:

image

For completeness, this specific graph's edge list in C format:

{0, 2, 1, 3, 1, 7, 2, 4, 3, 5, 5, 1, 6, 0, 6, 1, 6, 3, 7, 5}

Python format:

[[0, 2], [1, 3], [1, 7], [2, 4], [3, 5], [5, 1], [6, 0], [6, 1], [6, 3], [7, 5]]
@szhorvat
Copy link
Member Author

szhorvat commented Jan 30, 2021

@ntamas I discovered this by finally implementing a warning in the Mathematica interface for when the eigenvalue is not 1, and fuzzing a bit to see if the tolerance I put in was good enough. If you don't already check this in the Python interface, it would be a good idea to do it ...

I chose to check this allowing for up to 5 binary digits of difference, using fabs(value - 1.0) > 32*DBL_EPSILON. 32 seems to be more than enough to avoid false positives. 8 is not enough.

@szhorvat szhorvat added the todo Triaged for implementation in some unspecified future version label Jan 30, 2021
@szhorvat
Copy link
Member Author

szhorvat commented Jan 30, 2021

The above test was with the vendored ARPACK. It also happens with an external ARPACK 3.8.0, but less frequently. However, the tolerance of 5 binary digits is not sufficient with the external ARPACK, and it also tends to run into convergence problems (ARPACK itself reports no convergence). I decided to switch to 7 digits (which is also Mathematica's default tolerance when comparing double values).

@szhorvat
Copy link
Member Author

szhorvat commented Feb 3, 2021

The result ARPACK returns here is not actually an eigenvector / eigenvalue pair at all. I am not sure why ARPACK won't detect the problem. I would have assumed that it attempts to verify it results.

@ntamas
Copy link
Member

ntamas commented Feb 4, 2021

Most of the examples that you have shown above have sinks or sources; the Arnoldi method might have a problem with these, although in theory it shouldn't of course. PRPACK works around this by decomposing the graph to strongly connected components, calculating the PageRank scores in each of them, and then using some clever math to merge the PageRank scores of all SCCs so the result is consistent. In general, I would say that PRPACK is probably much more reliable than ARPACK exactly because of this.

Example #2 is intriguing, but it is also not strongly connected; the leftmost vertex and the one that is connected to with a mutual edge form a strongly connected component so the PageRank "juice" mostly flows to these two nodes from the rest of the graph.

When testing the eigenvector, I was wondering whether it would make sense to use a more sophisticated comparison between floats that tests both absolute and relative errors instead of simply comparing the differences with a tolerance.

@szhorvat
Copy link
Member Author

szhorvat commented Feb 4, 2021

When testing the eigenvector, I was wondering whether it would make sense to use a more sophisticated comparison between floats that tests both absolute and relative errors instead of simply comparing the differences with a tolerance.

Yes, that is absolutely necessary, to handle both zeros (relative tol doesn't work) and non-zeros (absolute tol not good).

Luckily, for PageRank we only need to test the eigenVALUE, as we know it should be one. So this function is fine. In the Mathematica interface, I do not return the eigenvalue, but I added a check which show s warning if it differs from 1 in more than 5 binary digits (see the PageRank test for something similar). With this amount of tolerance, I have not yet run into any case when it shows a warning even though nothing is wrong (e.g. when the eigenvalue is barely off, such as 0.999999999999)

Most of the examples that you have shown above have sinks or sources; the Arnoldi method might have a problem with these,

ARPACK does not see the sinks. When we are in a sink, we can jump to any other vertex. Effectively, each sink is connected with an out-edge to all other vertices.

Of course, it is possible that it is precisely this (overwhelming amount of effective connections) that is throwing ARPACK off.

@szhorvat
Copy link
Member Author

szhorvat commented Feb 4, 2021

To clarify, I was suggesting verifying ARPACK's result in other cases than PageRank, such as with eigenvector centrality.

@szhorvat
Copy link
Member Author

szhorvat commented Apr 8, 2022

For some context, this problem mostly affects the old version of ARPACK that is bundled with igraph. Newer versions (such as ARPACK-NG 3.8.0) are much less susceptible to this problem, but still not completely immune to it. Reproducing the problem is not easy, so no, it was not reported upstream.

Now for IGraph/M specifically:

It does come with the old, bundled version of ARPACK, so this is a real problem. However, for PageRank specifically, we also have the PRPACK method which is better than the ARPACK-based one in almost every situation (i.e. all reasonable damping factors, except maybe those very close to 1.0). Thus I suggest just using the default, PRPACK-based method.

ARPACK is used in other functions though, such as eigenvector centrality or leading eigenvector based community detection. During practical work, I have not encountered this problem with those functions, but I know that it can theoretically appear. Did you run into it? A potential solution would be to link IGraph/M with a newer ARPACK, but doing that on all platforms is quite difficult ... it's not something I can promise.

Investigating the use of FEAST for the core igraph C library is something interesting, but also a lot of work, so I don't expect it would happen for the next release (the TODO list for that is already too long). But it's definitely something to look at. We can continue discussing that in the other issue, #551.

Using Mathematica's own ARPACK (or FEAST) is not something I'm keen to try, as they don't support this through any documented API. It would be a fragile hack. Still, I guess it wouldn't hurt to ask Wolfram whether they'd be willing to make this possible/easier, but I don't expect a positive response.

@szhorvat
Copy link
Member Author

szhorvat commented Apr 9, 2022

Reproducing the problem is not easy, so no, it was not reported upstream.

Why?

Lack of time. If you have the time to debug this, of course contributions are welcome.

  • We need to find an example that sometimes fails. It will only fails sometimes, as ARPACK uses a random starting condition.
  • From this, we need to distill a small program that uses ARPACK without igraph. Whether igraph even uses ARPACK correctly will come into question. There are multiple parameters one can adjust, what are the best choices?
  • Suppose we have an ARPACK-only (no igraph) example that fails for some rare random starting conditions. Is this behaviour expected? Does ARPACK guarantee convergence at all? This is not clear to me. Perhaps the answer is to just re-try from another random starting point—and this does indeed work in practice with ARPACK 3.8.0, but not with our vendored ARPACK.

Newer versions (such as ARPACK-NG 3.8.0) are much less susceptible to this problem

Can you bisect what is the commit that made the behaviour better? Maybe you can add a custom patch for the dependency.

Again, a lot of work that will likely require an understanding of ARPACK's workings. ARPACK is written in Fortran. We use a version translated to C with f2c. f2c is no longer maintained and only works on F77 code. Newer ARPACK uses some F90. This is why upgrading it is problematic.

hus I suggest just using the default, PRPACK-based method.

Will that fix this and related bugs?

PRPACK is the default for PageRank calculations in all high-level interfaces of igraph, including IGraph/M. You will only encounter this issue if you explicitly choose the ARPACK methods. As I recall, that only makes sense for damping factors which are very close to 1.0, as PRPACK does very well for everything else.

Similar issues may be present in other functions which use ARPACK. Since PRPACK is a PageRank calculation library, it can't be used for those other functions (such as eigenvector centrality).


You can see the common theme here now: lack of time and resources. This would require a very large resource investment for a relatively small gain. At this time, there are other issues which affect more uses cases in more severe ways, so they are of higher priority.


@ValZapod Did you encounter this bug in practice? If you have run into any problems during practical work, let us know, as that helps prioritize issues.

@szhorvat
Copy link
Member Author

szhorvat commented Apr 9, 2022

Ultimately, the best solution for IGraph/M is to link to a newer ARPACK, not the old one bundled with igraph. Why is this difficult?

  • ARPACK is written in Fortran, so I need a Fortran compiler on all plaforms (Linux / macOS / Windows). That will be a pain on Windows. It will require compiling with gfortran, which brings its own runtime dependencies.
  • ARPACK requires BLAS/LAPACK, so it brings yet another complex set of libraries. The most common implementation is OpenBLAS, but we saw several problems with it on Windows. I'm not sure how easy it is to link to Mathematica's internal BLAS/LAPACK. I don't think Wolfram would support this ... Even if they do, I believe Mathematica uses an ILP64 BLAS on 64-bit, and igraph is not yet prepared for that.

@szhorvat
Copy link
Member Author

It seems that the fix for #2090 made this failure rarer, but it still occurs sometimes. Interestingly, when it does occur, in most cases ARPACK finds an eigenvalue of -0.85 which is certainly related to the default damping factor of 0.85. Once again, this is tested with the vendored ARPACK.

@ntamas
Copy link
Member

ntamas commented Jun 13, 2022

What does it mean some? Can the .F90 files be excluded?

No, I don't think so; I've attempted a conversion a few months ago and it seemed to me like the new codebase uses certain language constructs that f2c does not understand. I gave up due to my limited understanding of ARPACK and Fortran in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
todo Triaged for implementation in some unspecified future version
Projects
None yet
Development

No branches or pull requests

3 participants
@ntamas @szhorvat and others