In [1]:
import sys
sys.path.append(".."); from graphs.directedgraph import DirectedGraph
from pagerank import PageRank

# Examples from
# https://towardsdatascience.com/pagerank-3c568a7d2332

In [12]:
def print_ranks(ranks, start_idx=1, labels=None):
    for i in range(len(ranks[0])):
        page_num = i+start_idx if labels is None else labels[i]
        print(f"Page {page_num} has rank {ranks[0][i]}")

#### Example 1
<img src="examples/web1.png" alt="Drawing" style="width: 400px;"/>

In [3]:
# Example 1
W1 = DirectedGraph.importDGFromFile("examples/web1.txt")

ranks = PageRank(W1, max_iter=100, damping_factor=0.85)[0]
print_ranks(ranks)

Page 1 has rank 0.06063679544843559
Page 2 has rank 0.11210456182712417
Page 3 has rank 0.15622587122253376
Page 4 has rank 0.19368002942452084
Page 5 has rank 0.22537658605542885
Page 6 has rank 0.2519761560219569


By summing up ranks from pages that link to it, the pages later in the path accumulate the ranks from their predecessor, so the rank increases for pages further in the chain.

#### Example 2
<img src="examples/web2.png" alt="Drawing" style="width: 400px;"/>

In [4]:
# Example 3
W2 = DirectedGraph.importDGFromFile("examples/web2.txt")

ranks = PageRank(W2, max_iter=100, damping_factor=0.85)[0]
print_ranks(ranks)

Page 1 has rank 0.19975076413268358
Page 2 has rank 0.20011609488624932
Page 3 has rank 0.1997284469185977
Page 4 has rank 0.20030745364464617
Page 5 has rank 0.20009724041782445


The cycle will eventually force all ranks to converge to evenly distributed for all pages.

#### Example 3
<img src="examples/web3.png" alt="Drawing" style="width: 400px;"/>

In [5]:
# Example 3
W3 = DirectedGraph.importDGFromFile("examples/web3.txt")

ranks = PageRank(W3, max_iter=100, damping_factor=0.85)[0]
print_ranks(ranks)

Page 1 has rank 0.17531353698216132
Page 2 has rank 0.3248115225274708
Page 3 has rank 0.32431128449041224
Page 4 has rank 0.17556365599995635


If each link provides the same "rank weight", then the pages in the middle will accumulate more weight than the pages at the side.

#### Example 4
<img src="examples/web4.png" alt="Drawing" style="width: 400px;"/>

In [8]:
# Example 4
W4 = DirectedGraph.importDGFromFile("examples/web4.txt")

ranks = PageRank(W4, max_iter=100, damping_factor=0.85)[0]
print_ranks(ranks)

Page 1 has rank 0.280336452211701
Page 2 has rank 0.15867496075940418
Page 3 has rank 0.13892049488816371
Page 4 has rank 0.1083272913349879
Page 5 has rank 0.18398493804218383
Page 6 has rank 0.06064407879456792
Page 7 has rank 0.06911178396899141


Pages 1 and 5 and both have 4 links to them. But why page 1 has the highest PageRank? This is because 2 pages that have a link to page 5 have a really low rank, so they could not provide enough proportional rank to page 5.

Pages 6 and 7 have a low PageRank because they are at the edge of the graph and only have one page that links to them. Thereâ€™s just not enough rank for them.

#### Example 5
<img src="examples/web5.png" alt="Drawing" style="width: 400px;"/>

In [14]:
# Example 5
W4 = DirectedGraph.importDGFromFile("examples/web5.txt")
labels = [2076, 2564, 4785, 5016, 5793, 6338, 6395, 9484, 9994]

ranks = PageRank(W4, max_iter=100, damping_factor=0.85)[0]
print_ranks(ranks, labels=labels)

Page 2076 has rank 0.08333626972816052
Page 2564 has rank 0.09220610223700115
Page 4785 has rank 0.09220610223700115
Page 5016 has rank 0.09220610223700115
Page 5793 has rank 0.11836683713520596
Page 6338 has rank 0.09220610223700115
Page 6395 has rank 0.11836683713520596
Page 9484 has rank 0.19273880991821696
Page 9994 has rank 0.11836683713520596


Page 9484 has the highest PageRank because it obtains a lot of proportional rank from pages that links to it and does not have any outgoing for it to pass the rank. 

From this observation, we could guess that the pages with many pages linking to it and no outgoing links to other pages tend to have a higher PageRank.