# graph_mate

## Running PageRank on wikipedia articles

***

This notebook replicates the benchmarks performed by memgraph and published on [Hackernews](https://news.ycombinator.com/item?id=33716570).

First, we want to prepare some logging, so that we can see the output of what's going on.

In [21]:
import logging

logging.basicConfig(format="%(message)s")
logging.getLogger().setLevel(logging.NOTSET)

Next, we import the graph_mate library.

In [22]:
import graph_mate as gm

Load the wikipedia-articles graph from disk. The graph is stored in the edge list format, where each line represents an edge, denoted by `<source_id> <target_id>`.

The file has been converted from the exported Wikipedia Articles graph in memgraph labs. It is available as a [Gist](https://gist.github.com/s1ck/97e23af14b2e117fa47c713addef7517). Just download the file and put it in the same folder as the notebook.

In [23]:
g = gm.DiGraph.load('wikipedia-articles.el', file_format = gm.FileFormat.EdgeList)

page_size = 4096, cpu_count = 16, chunk_size = 180224
Read 310227 edges in 0.02s (117.29 MB/s)
Creating directed graph
Computed degrees in 871.992µs
Computed prefix sum in 140.675µs
Computed target array in 2.495239ms
Finalized offset array in 20.068µs
Created outgoing csr in 5.68147ms.
Computed degrees in 1.349712ms
Computed prefix sum in 173.496µs
Computed target array in 1.870223ms
Finalized offset array in 41.257µs
Created incoming csr in 4.607016ms.
Created directed graph (node_count = 78181, edge_count = 310227)


Now we can run PageRank on the graph with the same configs that are used by memgraph.

In [24]:
pr_result  = g.page_rank(max_iterations = 100, tolerance = 1E-5)
print(pr_result)
print(f"Computation took {pr_result.micros / 1000} ms")

Finished iteration 0 with an error of 0.844404 in 597.305µs
Finished iteration 1 with an error of 0.017353 in 459.055µs
Finished iteration 2 with an error of 0.000000 in 441.342µs


PageRankResult { scores: "[... 78181 values]", ran_iterations: 3, error: 0.0, took: 3.347ms }
Computation took 3.347 ms


In [25]:
import numpy as np

node = np.argmax(pr_result.scores(), axis = 0)
rank = pr_result.scores()[node]
print(f"Node id={node} has the highest rank of {rank}")

Node id=222 has the highest rank of 5.088145553600043e-05
