Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algebraic connectivity, Fiedler vector and spectral ordering #1134

Merged
merged 59 commits into from
May 12, 2014

Conversation

ysitu
Copy link
Contributor

@ysitu ysitu commented May 1, 2014

This addresses #1123.

This basically involves finding the second smallest eigenvalue and its eigenvector of the Laplacian matrix of a graph. Supposedly, this should be handled by SciPy, but I included an one extra algorithm so that for each of my eight test problems, at least one algorithm solves it within 10 seconds.

Algorithms

To be clear up front, NumPy alone or anything based on dense matrices will not do the job as it will become a memory bomb when used on large graphs. SciPy provides two sparse solvers eigsh and lobpcg. eigsh is the implicitly restarted Lanczos iteration from ARPACK. lobpcg stands for "locally optimal block preconditioned conjugate gradient". I know about the "PCG" part but not the "LOB" part.

The algorithm not from SciPy is TraceMin. I am using a version specifically designed for finding Fiedler vectors. TraceMin relies on a linear system solver. PCG is an obvious options. But personally I prefer sparse direct solvers as they are robust, and their comfort zone well covers the graph sizes within NetworkX's reach. SciPy bundles SuperLU as splu. There are better options, but they are shut out by licensing issues—UMFPACK, CHOLMOD and PARDISO all fall into this category.

Benchmarking

Long-n (n = 32768, m = 63472, weighted)
----------------------------------------------------------------------
eigsh               (did not converge) 2191.91
lobpcg              0.0302404279791517   38.62
TraceMin_pcg        0.0302404277402188   49.09
TraceMin_chol       0.0302404277401874    0.95   (Cholesky      0.058)
TraceMin_lu         0.0302404277402692    1.11   (LU            0.145)

Random4-n (n = 32768, m = 131056, weighted)
----------------------------------------------------------------------
eigsh                 3900.84878990967   29.43
lobpcg                3900.84878990965    4.03
TraceMin_pcg          3900.84878997654   26.30
TraceMin_chol         3900.84878997652   76.62   (Cholesky     21.657)
TraceMin_lu           3900.84878997653  819.56   (LU          615.625)

Square-n (n = 32761, m = 65160, weighted)
----------------------------------------------------------------------
eigsh                 3.88833036807499  680.97
lobpcg                3.88833036795032  253.81
TraceMin_pcg          3.88833036815728   49.98
TraceMin_chol         3.88833036815718    1.62   (Cholesky      0.097)
TraceMin_lu           3.88833036815734    1.92   (LU            0.249)

ak6 (n = 32774, m = 49159)
----------------------------------------------------------------------
eigsh               (did not converge) 1826.40
lobpcg            5.70035769998342e-08  102.51
TraceMin_pcg      5.72572519105717e-08   80.71
TraceMin_chol     5.72572519439957e-08    0.62   (Cholesky      0.014)
TraceMin_lu       5.72572486979079e-08    0.70   (LU            0.095)

gl6 (n = 32786, m = 93145)
----------------------------------------------------------------------
eigsh              0.00026223266873431   94.72
lobpcg            0.000262232668724498    4.54
TraceMin_pcg      0.000262232668735754    5.78
TraceMin_chol     0.000262232668735774    1.55   (Cholesky      0.283)
TraceMin_lu       0.000262232668735766    3.06   (LU            1.637)

gw6 (n = 32768, m = 93184)
----------------------------------------------------------------------
eigsh                0.152240934977433    1.16
lobpcg               0.152240934977427    2.60
TraceMin_pcg         0.152240934977428    2.56
TraceMin_chol        0.152240934977426   11.75   (Cholesky      7.699)
TraceMin_lu          0.152240934977426  345.57   (LU          331.141)

netgen-6 (n = 8000, m = 14999, weighted)
----------------------------------------------------------------------
eigsh                 74.0188255406067    4.29
lobpcg                74.0188255406006    0.73
TraceMin_pcg          74.0188255457315    1.28
TraceMin_chol         74.0188255457307    0.55   (Cholesky      0.126)
TraceMin_lu           74.0188255457307    1.93   (LU            1.299)

wlm6 (n = 8194, m = 179238)
----------------------------------------------------------------------
eigsh              0.00829854600893233   13.50
lobpcg             0.00829854600893219    2.75
TraceMin_pcg       0.00829854600893260    2.92
TraceMin_chol      0.00829854600893253    1.41   (Cholesky      0.089)
TraceMin_lu        0.00829854600893275    1.73   (LU            0.344)

The second column in the computed eigenvalue, and the third column is the time in seconds. The numbers in brackets are the Cholesky/LU factorization times. Cholesky factorization relies on the GPL scikits.sparse wrapper for CHOLMOD. For this reason, the related code is not in this PR. I suppose that scikits.sparse can be stripped down to use only the LGPL portion of CHOLMOD and become LGPL itself, but I am not familiar with Cython to do the work.

From the data, it is evident that eigsh is not nearly as robust as the rest, although it can be fast at times. lobpcg and TraceMin_chol/lu both have good and bad cases, while TraceMin_pcg sits somewhere between the two. It should be noted that for lobpcg and TraceMin_pcg, I am using only Jacobi preconditioning. (In separate tests, passing the Cholesky factorization as the preconditioner to lobpcg improves its bad cases by at least an order magnitude but produces incorrect results in one case.) lobpcg is very sensitive to the initial guess and can fail to converge if it is not good enough. I produce an estimate of the Fiedler vector using the reverse Cuthill–McKee ordering. The same estimate does not seem to help TraceMin.

The effectiveness of Cholesky/LU factorization highly depends on the structure of the matrix and the matrix reordering algorithm. For example, both gl6 and gw6 are three-dimensional grid graphs. gw6 has a larger bisection width and thus requires long time to factorize. The METIS library used by CHOLMOD is much more effective in discovering small bisections than the multiple minimum degree algorithm used by SuperLU. As a result, CHOLMOD is more than 40 times faster in factorizing gw6. (As a separate issue, it may be of interest to some users if NetworkX includes graph ordering and partitioning functionalities.)

ysitu added 30 commits April 26, 2014 11:47
@argriffing
Copy link

what else is left to do

Looks good. I think the tracemin implementation could be vectorized more, but that doesn't need to be in this PR. By the way, the application of Laplacian system solvers to graph algorithms is trendy, for example Lx=b, so maybe more of these efficient algorithms could be separately added later.

@ysitu
Copy link
Contributor Author

ysitu commented May 6, 2014

@chebee7i The remaining question is whether I should restore the Cholesky solver from scikits.sparse.

@argriffing
Copy link

The question is whether to paste scikits.sparse code into networkx so that if users have CHOLMOD separately installed then they can use it through this wrapper?

@ysitu
Copy link
Contributor Author

ysitu commented May 6, 2014

@argriffing Importing from scikits.sparse is good enough.

@chebee7i
Copy link
Member

chebee7i commented May 6, 2014

TLDR: I vote for including it.

My personal opinion is that open source software packages are too fussy in deciding whether to use other open source packages. If a mistake is made (wrt licensing), it's not as if it was done in bad faith. So if someone upstream makes an issue of it and has weight to throw around, then we change it. No big deal. We are not lawyers and the legal status of linking and derivative works, especially as it relates to Python's run-time compiled modules, is still undecided. Due to project relevance, we are also probably some of the least likely projects to face litigation .

IMO, simply typing: import X where X is some GPL or LGPL module does not mean that NetworkX is "linking" to X or is a derivative work of X. The meaning of X is determined at runtime. I can, on my computer, put anything I want in my PYTHONPATH with the name X that supports the same public API that NetworkX is expecting---and that thing I put in my PYTHONPATH need not be the GPL'd or LGPL'd package X. So it doesn't logically follow that "import X" means we are "using" GPL/LGPL'd code. This remains true even when we distribute wheel files for NetworkX (of course, this would get messier if the NetworkX wheel included, say, compiled Cython libraries).

Interesting discussion in this thread...here are two views similar to what i just described:

Grant Edwards: https://groups.google.com/forum/#!msg/comp.lang.python/zYZGRRqs_Mk/cMJkm9gfi-0J
Alex Martelli: https://groups.google.com/d/msg/comp.lang.python/zYZGRRqs_Mk/kxt6qtLjR_oJ

@ysitu
Copy link
Contributor Author

ysitu commented May 7, 2014

@argriffing I wonder what you mean by "vectorization" here. I suppose that it is not the same thing as what you get from icc/ifort.

BTW, for the orthonormalization step in TraceMIN I had wanted to use scipy.linalg.orth. But I think that it is a memory bomb for tall, skinny input. Can you help confirm?

@argriffing
Copy link

I wonder what you mean by "vectorization" here.

@ysitu I just meant avoiding the python loops

@argriffing
Copy link

I had wanted to use scipy.linalg.orth. But I think that it is a memory bomb

I haven't checked this in detail but given that its first line https://github.com/scipy/scipy/blob/master/scipy/linalg/decomp_svd.py#L201 calls svd without specifying extra options, and given that svd without extra options computes full MxM and NxN matrices https://github.com/scipy/scipy/blob/master/scipy/linalg/decomp_svd.py#L15 I would agree that appears to unnecessarily bomb the memory. Maybe add a scipy issue or PR or test? fixed

@argriffing
Copy link

I've added a PR to fix the orth problem in scipy/scipy#3626

@ysitu
Copy link
Contributor Author

ysitu commented May 10, 2014

I think that I have no further planned changes. Here are the latest benchmarking results (let me omit ARPACK and SuperLU to save testing time):

Long-n
32768 63472
lobpcg              0.0302404277260102   27.56
TraceMIN_pcg         0.030240428135769   22.76
TraceMIN_chol        0.030240427731946    0.97
Random4-n
32768 131056
lobpcg                3900.84878990957    4.25
TraceMIN_pcg          3900.84878997912   11.25
TraceMIN_chol         3900.84878991507   38.24
Square-n
32761 65160
lobpcg                3.88833036817791  129.01
TraceMIN_pcg          3.88833036816306   10.04
TraceMIN_chol         3.88833036813151    1.31
ak6
32774 49159
lobpcg            5.70036654894323e-08   71.02
TraceMIN_pcg      5.79241219647608e-08   42.79
TraceMIN_chol     5.70164761757119e-08    0.67
gl6
32786 93145
lobpcg            0.000262232668737997    4.02
TraceMIN_pcg      0.000262232668735729    2.96
TraceMIN_chol     0.000262232668735724    1.50
gw6
32768 93184
lobpcg               0.152240934977426    2.76
TraceMIN_pcg         0.152240934977427    1.39
TraceMIN_chol        0.152240934977427   11.69
netgen-6
8000 14999
lobpcg                74.0188255406044    0.71
TraceMIN_pcg          74.0188255429854    0.74
TraceMIN_chol         74.0188255417018    0.48
wlm6
8194 179238
lobpcg             0.00829854600893217    2.83
TraceMIN_pcg       0.00829854600893189    2.00
TraceMIN_chol      0.00829854600893205    1.51

@jtorrents jtorrents added this to the networkx-1.9 milestone May 11, 2014
@jtorrents
Copy link
Member

Great work @ysitu! This is very, very nice. I've tagged this as 1.9 because it seems that it's ready for merging, but I didn't look at it closely. Any more comments?

@argriffing
Copy link

Any more comments?

It looks well tested so I assume it works, and I can vouch that it provides functions that are useful for spectral graph theory. I hope it is merged soon!

@chebee7i
Copy link
Member

Merging...

chebee7i added a commit that referenced this pull request May 12, 2014
Algebraic connectivity, Fiedler vector and spectral ordering
@chebee7i chebee7i merged commit 3eb0369 into networkx:master May 12, 2014
@ysitu ysitu deleted the fiedler_nochol branch May 12, 2014 05:15
@ysitu ysitu added the 1.9 label May 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants