Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page Rank algorithm implementation for Graphs #27480

Closed
rajat1433 mannequin opened this issue Mar 13, 2019 · 99 comments
Closed

Page Rank algorithm implementation for Graphs #27480

rajat1433 mannequin opened this issue Mar 13, 2019 · 99 comments

Comments

@rajat1433
Copy link
Mannequin

rajat1433 mannequin commented Mar 13, 2019

Page Rank computes the ranking of the nodes of the graph based on the structure of the incoming links. It is a useful metrics in graphs and can be quite useful. (https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a)

Below is a link to a thesis on this algorithm
http://www.sagemath.org/files/thesis/augeri-thesis-2008.pdf

It would be good to have its implementation in the Sage's graph module.

CC: @dcoudert

Component: graph theory

Keywords: pagerank

Author: Rajat Mittal

Branch: 07a884b

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/27480

@rajat1433 rajat1433 mannequin added this to the sage-8.7 milestone Mar 13, 2019
@rajat1433 rajat1433 mannequin self-assigned this Mar 13, 2019
@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 13, 2019

comment:2

I have a brief idea about it(used it in my college project) and can implement it in Sage modules.

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 14, 2019

comment:3

I just found that networkx library already has Page Rank algorithm implemented in Python. Should I implement it in Sage freshly or should I make a function to use networkx implementation ?
Or maybe we can write a faster cython implemetation of it.
Need some suggetsions!

@dcoudert
Copy link
Contributor

comment:4

networkx has several methods to compute Page rank: a pure Python, one using numpy, another using scipy, etc.

The optional package igraph also has an implementation of pagerank (see
igraph documentation for pagerank
(to install igraph, do sage -i igraph and sage -i python_igraph).

The best thing to do is to create a method including a parameter algorithm, and that will call the different algorithms. So algorithm could be None``, 'networkx', 'numpy', 'scipy', 'igraph'`.

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 14, 2019

comment:5

When it is None, should an implementation in sage be used? And if so should I do an implementation
in python or cython?

@dcoudert
Copy link
Contributor

comment:6

When it's None, the method should choose the best available implementation. And the best implementation could depend on the size or density of the (di)graph. For instance, for very large graphs, methods based on matrix multiplication might not be appropriate (memory consumption), while such methods might be very efficient for small graphs. Some measurements are needed to decide.

Before deciding if a new implementation is needed, we must know if what we can easily get is fast enough or not.

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 15, 2019

Branch: u/gh-rajat1433/27480_page_rank

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 15, 2019

Dependencies: #27496

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 16, 2019

comment:9

I have installed igraph in my sage module. But however its pagerank algorithm is not able to work. It keeps on throwing a segmentation error. I use igraph_graph to convert sage graph into igraph. All other algorithms of igraph works fine. Following error message I get:

Cython backtrace
----------------

29	../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Traceback (most recent call last):
  File "<string>", line 56, in <module>
  File "/usr/lib/python3/dist-packages/Cython/Debugger/libcython.py", line 689, in invoke
    for arg in string_to_argv(args):
TypeError: argument 1 must be str, not bytes
Saved trace to /home/rajat/.sage/crash_logs/crash_yLpPiW.log
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.

Regarding networkx,numpy and scipy, I have tested them on their runtime. Numpy does it by matrix multiplication and solving for eigenvalues, networkx has pure python implementation and scipy does matrix multiplication iteratively.

Following are the runtimes I got:

sage: G=nx.gnp_random_graph(40,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 883 µs per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.7 ms per loop
sage: %timeit nx.pagerank(G)
1000 loops, best of 3: 1.7 ms per loop

sage: G=nx.gnp_random_graph(60,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 2.87 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.87 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 1.58 ms per loop

sage: G=nx.gnp_random_graph(70,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 3.66 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 2.22 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 1.99 ms per loop

sage: G=nx.gnp_random_graph(80,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 4.66 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.93 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.3 ms per loop

sage: G=nx.gnp_random_graph(100,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 7.07 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 2.29 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 4.63 ms per loop

sage: G=nx.gnp_random_graph(400,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
10 loops, best of 3: 175 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.9 ms per loop
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 53.2 ms per loop

sage: G=nx.gnp_random_graph(4000,0.01,directed=True)
sage: %timeit nx.pagerank(G)
1 loop, best of 3: 1.81 s per loop
sage: %timeit nx.pagerank_scipy(G)
1 loop, best of 3: 206 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1 loop, best of 3: 1min 1s per loop

As per my analysis upto around 60 vertices numpy is fast but scipy is fastest after that.

@dcoudert
Copy link
Contributor

comment:10

I don't know what's the problem with igraph. I have installed igraph and python_igraph on my OSX laptop and I can do

sage: G = graphs.RandomGNP(1000, .01)
sage: I = G.igraph_graph()
sage: %timeit I.pagerank()
100 loops, best of 3: 1.92 ms per loop

Try recompiling sage (make build or sage -b). If not working, you can ask for help on sage-devel. Explain what you did to install igraph, that other algorithms of igraph are working fine for you, and what's the error message you get.

@dcoudert
Copy link
Contributor

comment:11

Note that a random graph on 40 nodes with probability 0.01 has in average 8 edges... so your experiments are not very conclusive on small graphs. Nonetheless, scipy seems the most efficient version.

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 16, 2019

comment:12

More experiments with dense small graphs also suggest numpy to be best for small graphs

sage: G=nx.gnp_random_graph(40,0.5,directed=True)
sage:  **%timeit nx.pagerank_numpy(G)
The slowest run took 86.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 2.32 ms per loop**
sage: %timeit nx.pagerank_scipy(G)
The slowest run took 60.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.41 ms per loop
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 12.5 ms per loop


sage: G=nx.gnp_random_graph(60,0.5,directed=True)
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 27 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.47 ms per loop
sage: **%timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.13 ms per loop**

@embray
Copy link
Contributor

embray commented Mar 25, 2019

comment:13

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

@embray embray modified the milestones: sage-8.7, sage-8.8 Mar 25, 2019
@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 29, 2019

Changed dependencies from #27496 to #27496, #27502

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 31, 2019

comment:15

Page rank for Ipython with weighted edges doesn't seem to be working:
However there is a parameter called weights in Pagerank method of Igraph , I looked at its code and doc but it is not clear to me if it is using it.

https://github.com/igraph/python-igraph/blob/master/src/graphobject.c

https://igraph.org/python/doc/igraph.GraphBase-class.html#personalized_pagerank

sage: G = Graph(6)
sage: I = G.igraph_graph()
sage: I.add_edge(0,0,weight=40)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 0, {'weight': 40})
sage: I.add_edge(1,2,weight=50)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 1, {'weight': 50})
sage: I.add_edge(2,3,weight=60)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 2, {'weight': 60})
sage: I.add_edge(0,3,weight=70)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 3, {'weight': 70})
sage: I.add_edge(3,4,weight=80)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 4, {'weight': 80})
sage: I.add_edge(4,5,weight=20)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 5, {'weight': 20})
sage: I.pagerank(weights='weight')
[0.1380494948975056,
 0.11517272680482316,
 0.19683228912731204,
 0.237940473238224,
 0.196832289127312,
 0.11517272680482313]
sage: I.pagerank()
[0.1380494948975056,
 0.11517272680482316,
 0.19683228912731204,
 0.237940473238224,
 0.196832289127312,
 0.11517272680482313]

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 31, 2019

comment:16

Following experiments show igraph's results to be the best due to its c-language code.

sage: G = graphs.RandomGNP(40, 0.2)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.06 ms per loop
sage: %timeit nx.pagerank_scipy(n)
The slowest run took 37.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.31 ms per loop
sage:  %timeit nx.pagerank(n)
100 loops, best of 3: 11.3 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **23.3 µs** per loop


sage: G = graphs.RandomGNP(40, 0.8)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.33 ms per loop
sage: %timeit nx.pagerank_scipy(n)
100 loops, best of 3: 2.16 ms per loop
sage:  %timeit nx.pagerank(n)
10 loops, best of 3: 20 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **39.1 µs** per loop


sage: G = graphs.RandomGNP(4000, 0.5)
sage: n = G.networkx_graph()
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
1 loop, best of 3: **1.03 s** per loop
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1 loop, best of 3: 1min 9s per loop
sage: %timeit nx.pagerank_scipy(n)
1 loop, best of 3: 7.12 s per loop
sage:  %timeit nx.pagerank(n)
....: 
....: 
Killed

@dcoudert
Copy link
Contributor

comment:17

The code of the pagerank algorithm is here https://github.com/igraph/igraph/blob/master/src/centrality.c

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 31, 2019

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 31, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

b2c82ecFixes cycle basis for multiedges
1a68ce1Add doctests
142e50fMerge branch 'develop' of git://trac.sagemath.org/sage into develop
8836f0dcheckpoint1
a2f38d9merged
8433c2amerged
b57b730pagerank method
ecbb4a7removed xtra lines

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 31, 2019

Commit: ecbb4a7

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 31, 2019

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Mar 31, 2019

Changed commit from ecbb4a7 to none

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 31, 2019

Commit: 142e50f

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Mar 31, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

b2c82ecFixes cycle basis for multiedges
1a68ce1Add doctests
142e50fMerge branch 'develop' of git://trac.sagemath.org/sage into develop

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 9, 2019

Changed commit from 30d493d to 804fe60

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 9, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

804fe60improved note

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Apr 9, 2019

comment:65

Done the changes!

@dcoudert
Copy link
Contributor

dcoudert commented Apr 9, 2019

comment:66

LGTM.

@vbraun
Copy link
Member

vbraun commented Apr 11, 2019

Changed branch from u/gh-rajat1433/27480_page_rank_algo to 804fe60

@vbraun
Copy link
Member

vbraun commented Apr 13, 2019

Changed branch from 804fe60 to u/gh-rajat1433/27480_page_rank_algo

@vbraun
Copy link
Member

vbraun commented Apr 13, 2019

comment:68
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9536, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(alpha=0.50, algorithm="igraph")
Exception raised:
    Traceback (most recent call last):
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.graphs.generic_graph.GenericGraph.?[2]>", line 1, in <module>
        G.pagerank(alpha=RealNumber('0.50'), algorithm="igraph")
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 9668, in pagerank
        raise PackageNotFoundError("igraph")
    PackageNotFoundError: the package 'igraph' was not found. You can install it by running 'sage -i igraph' in a shell
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9581, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(algorithm="igraph")
Exception raised:
    Traceback (most recent call last):
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.graphs.generic_graph.GenericGraph.?[10]>", line 1, in <module>
        G.pagerank(algorithm="igraph")
      File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 9668, in pagerank
        raise PackageNotFoundError("igraph")
    PackageNotFoundError: the package 'igraph' was not found. You can install it by running 'sage -i igraph' in a shell
**********************************************************************
1 item had failures:
   2 of 890 in sage.graphs.generic_graph.GenericGraph.?
    [3393 tests, 2 failures, 36.96 s]
**********************************************************************

@vbraun vbraun reopened this Apr 13, 2019
@dcoudert
Copy link
Contributor

comment:69

@Rajat: add to each doctest with igraph # optional - python_igraph, like this

-            sage: G.pagerank(alpha=0.50, algorithm="igraph")
+            sage: G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
             {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}

and

-            sage: G.pagerank(algorithm="igraph")
+            sage: G.pagerank(algorithm="igraph")  # optional - python_igraph

@Volker: sorry for this.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 13, 2019

Changed commit from 804fe60 to f10629a

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 13, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

f10629aadded optional igraph parameter

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Apr 13, 2019

comment:71

Added the optional igraph thing.
Understood the importance of it as it may fail tests on systems on which igraph is not installed..

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented Apr 14, 2019

comment:73
$ ./sage -t src/sage/graphs/generic_graph.py
Running doctests with ID 2019-04-14-09-52-00-5ed75243.
Git branch: u/gh-rajat1433/27480_page_rank_algo
Using --optional=dochtml,igraph,memlimit,mpir,python2,python_igraph,sage
Doctesting 1 file.
sage -t --warn-long 24.3 src/sage/graphs/generic_graph.py
    [3402 tests, 27.42 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 28.3 seconds
    cpu time: 19.0 seconds
    cumulative wall time: 27.4 seconds

On my system all tests passes but patchbot has some failed examples don't know why...

@dcoudert
Copy link
Contributor

comment:74

This is numerical noise due to floating point arithmetic. See http://doc.sagemath.org/html/en/developer/coding_basics.html#special-markup-to-influence-doctests

A solution is to add # abs tol 1e-9 to all doctests.

-            sage: G.pagerank(algorithm="Numpy")
+            sage: G.pagerank(algorithm="Numpy") # abs tol 1e-9

...

-            sage: G.pagerank()
+            sage: G.pagerank() # abs tol 1e-9

also for igraph

-            sage: G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
+            sage: G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph # abs tol 1e-9

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 14, 2019

Branch pushed to git repo; I updated commit sha1. New commits:

07a884bfloating point flag included

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 14, 2019

Changed commit from f10629a to 07a884b

@dcoudert
Copy link
Contributor

comment:76

LGTM

@vbraun
Copy link
Member

vbraun commented Apr 15, 2019

Changed branch from u/gh-rajat1433/27480_page_rank_algo to 07a884b

@fchapoton
Copy link
Contributor

comment:78

There is an issue with igraph on the arando patchbot with 8.8.b4 and 8.8.b5:

**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9663, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
Expected:
    {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}
Got:
    {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
**********************************************************************
1 item had failures:
   1 of 860 in sage.graphs.generic_graph.GenericGraph.?
    [3497 tests, 1 failure, 50.96 s]
----------------------------------------------------------------------
sage -t --long src/sage/graphs/generic_graph.py  # 1 doctest failed

@fchapoton
Copy link
Contributor

Changed commit from 07a884b to none

@rajat1433
Copy link
Mannequin Author

rajat1433 mannequin commented May 11, 2019

comment:79

I think I forgot to add # abs tol 1e-9 to this test.

Thanks David for opening #27811 for fixing it

Replying to @fchapoton:

There is an issue with igraph on the arando patchbot with 8.8.b4 and 8.8.b5:

**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9663, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
    G.pagerank(alpha=0.50, algorithm="igraph")  # optional - python_igraph
Expected:
    {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}
Got:
    {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
**********************************************************************
1 item had failures:
   1 of 860 in sage.graphs.generic_graph.GenericGraph.?
    [3497 tests, 1 failure, 50.96 s]
----------------------------------------------------------------------
sage -t --long src/sage/graphs/generic_graph.py  # 1 doctest failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants