# Graph Learning
## Lab 2: PageRank

In this lab, you will learn to compute, use and interpret various [PageRank](https://en.wikipedia.org/wiki/PageRank) scores.

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
<bold>Done in pair:</bold> 
<br>
Rafaela de Carvalho Machado Pinheiro
Bárbara Barsi Duarte Batista da Silva
</div>

## Import

In [None]:
from IPython.display import SVG

In [None]:
import numpy as np
from scipy import sparse
import matplotlib.pyplot as plt

In [None]:
from sknetwork.data import load_netset, linear_graph, miserables
from sknetwork.linalg import normalize
from sknetwork.ranking import PageRank
from sknetwork.visualization import visualize_graph

## Data

We will work on the following graphs (see the [NetSet](https://netset.telecom-paris.fr/) collection for details):
* Openflights (graph)
* WikiVitals (directed graph)
* Cinema (bipartite graph)

In [None]:
openflights = load_netset('openflights')
wikivitals = load_netset('wikivitals')
cinema = load_netset('cinema')

## 1. Graphs

The PageRank corresponds to the stationary distribution of a random walk with restart probability $1-\alpha$. Unless otherwise specified, we take the default value $\alpha = 0.85$ and the restart probability distribution is uniform over the set of nodes.

## Linear graph

Consider a linear graph:

In [None]:
n = 10

In [None]:
dataset = linear_graph(n, True)
adjacency = dataset.adjacency
position = dataset.position

In [None]:
image = visualize_graph(adjacency, position, names=np.arange(n))
SVG(image)

## To do

* What are the two best ranked nodes? Try with different values of $\alpha$ and interpret the results.
* What is the exact PageRank vector when $\alpha=1$ (no restarts)? Justify your answer.

In [None]:
alpha = 0.85

pagerank = PageRank(damping_factor=alpha, solver='lanczos')

In [None]:
scores = pagerank.fit_predict(adjacency)

In [None]:
def scores_test(adjacency, final_alpha = 0.9, alpha_step = 0.2):
    
    scores_array = []

    alphas = np.arange(0.0, final_alpha, alpha_step)

    for alpha in alphas:
        pagerank = PageRank(damping_factor=alpha, solver='lanczos')
        scores_array.append(pagerank.fit_predict(adjacency))
    
    return scores_array, alphas

In [None]:
def plot_scores(scores_array, alphas):
    scores = np.array(scores_array)
    num_groups = len(scores)
    num_nodes = scores.shape[1]

    bar_width = 0.15
    index = np.arange(num_nodes)

    fig, ax = plt.subplots(figsize=(12, 6))

    for i in range(num_groups):
        offset = i * bar_width
        ax.bar(index + offset, scores[i], bar_width, label=f'α = {alphas[i]:.2f}')

    ax.set_xlabel('Nodes')
    ax.set_xticks(index + bar_width * (num_groups - 1) / 2) 
    ax.set_xticklabels([f'{i}' for i in range(num_nodes)]) 
    ax.set_ylabel('Scores (PageRank)')
    ax.set_title('Node scores for different values of α')
    ax.legend()
    ax.grid(True, axis='y', linestyle='--', alpha=0.6)

    plt.tight_layout()
    plt.show()

In [None]:
image = visualize_graph(adjacency, position, names=np.arange(n), scores=scores)
SVG(image)

In [None]:
scores_array, alphas = scores_test(adjacency)
plot_scores(scores_array, alphas)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd">

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">

The two best ranked nodes are 1 and 8, since they are the only ones connected to nodes with no other connection except for these two nodes (0 is connected only to 1, and 9 is connected only to 8). The patern of the score increasing in case of the alpha value increase can be noticed at the graph above, that ilustrate the score for each node for values of alpha going from 0 to 0.8.

 With a low damping factor (α ≈ 0.2), isolated nodes (0 and 9) lose importance since the algorithm relies more on link structure. With a higher factor (α ≈ 0.8), scores balance out due to random teleportation, redistributing weight more evenly. 
  
</div>

In [None]:
# Since the PageRank only acept a damping factor value in [0, 1[ , we can use a value closer to 1, like 0.99,
# to simulate the behavior of a random walker that does not restart to a random page.

alpha = 0.9999
pagerank = PageRank(damping_factor=alpha, solver='lanczos')
scores = pagerank.fit_predict(adjacency)
print(scores)


<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word; ">
    When the damping factor <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>α</mi></math> is equal to 1, the equation 
	<math xmlns="http://www.w3.org/1998/Math/MathML">
	<mrow><msup><mi>π</mi><mrow><mo>(</mo><mi>α</mi><mo>)</mo></mrow></msup><mo>=</mo><mi>α</mi><msup><mi>π</mi><mrow><mo>(</mo><mi>α</mi><mo>)</mo></mrow></msup><mi>P</mi><mo>+</mo><mo>(</mo><mn>1</mn><mo>−</mo><mi>α</mi><mo>)</mo><mfrac><msup>	<mn>1</mn>	<mi>T</mi></msup><mi>n</mi></mfrac></mrow>
	</math>
	reduces to the standard left eigenvector equation
	<math xmlns="http://www.w3.org/1998/Math/MathML">
	<mrow><msup><mi>π</mi><mrow><mo>(</mo><mi>α</mi><mo>)</mo></mrow></msup><mo>=</mo><msup><mi>π</mi><mrow><mo>(</mo><mi>α</mi><mo>)</mo></mrow></msup><mi>P</mi>
	</math>. 
	</br></br>
	This means the PageRank vector 
	<math xmlns="http://www.w3.org/1998/Math/MathML">
	<mi>π</mi>
	</math>
	corresponds exactly to the stationary distribution of the transition matrix \(P\), and can be interpreted as the left eigenvector associated with eigenvalue 1. In this case, there are no random jumps or restarts in the Markov process—the user simply follows the existing links indefinitely, reflecting only the structure of the underlying graph.
	</br></br>
	As a result, the PageRank vector assigns more weight to central nodes and less to the border ones. Therefore, we can affirm that final vector
	</br></br>
	<math xmlns="http://www.w3.org/1998/Math/MathML">
	<mi>P</mi>
	<mo>=</mo>
	<mrow>
		<mtable>
		<mtr><mtd><mn>0.0555563</mn></mtd></mtr>
		<mtr><mtd><mn>0.1111117</mn></mtd></mtr>
		<mtr><mtd><mn>0.11111104</mn></mtd></mtr>
		<mtr><mtd><mn>0.11111059</mn></mtd></mtr>
		<mtr><mtd><mn>0.11111037</mn></mtd></mtr>
		<mtr><mtd><mn>0.11111037</mn></mtd></mtr>
		<mtr><mtd><mn>0.11111059</mn></mtd></mtr>
		<mtr><mtd><mn>0.11111104</mn></mtd></mtr>
		<mtr><mtd><mn>0.1111117</mn></mtd></mtr>
		<mtr><mtd><mn>0.0555563</mn></mtd></mtr>
		</mtable>
	</mrow>
	</math>
	</br></br>
	is indeed the left eigenvector of the transition matrix P, obtained by the exact PageRank distribution when 
	<math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>α</mi><mo>=</mo><mo>1</mo></math>.
  </div>

## Les Misérables


In [None]:
datasetLM = miserables(True)

In [None]:
adjacencyLM = datasetLM.adjacency
positionLM = datasetLM.position
namesLM = datasetLM.names

In [None]:
image = visualize_graph(adjacencyLM, positionLM, namesLM, scale=2)
SVG(image)

## To do

* Display the graph of Les Misérables with PageRank scores.
* List the 10 best ranked characters.
* Compare with:
    1. the 10 nodes of highest degrees
    2. the 10 nodes of highest weights
* Try different values of $\alpha$ and interpret the results.

In [None]:
pagerankLM = PageRank(damping_factor=0.85)

In [None]:
scoresLM = pagerank.fit_predict(adjacencyLM)

In [None]:
# scores in log scale appear more clearly
imageLM = visualize_graph(adjacencyLM, positionLM, namesLM, scores=np.log(scoresLM), scale=2)
SVG(imageLM)

In [None]:
from sknetwork.utils import get_degrees, get_weights

In [None]:
# Top 10 best ranked nodes
ranked_indices = np.argsort(scoresLM)[-10:][::-1]

for rank, i in enumerate(ranked_indices, 1):
    print(f"{rank}. {datasetLM.names[i]} (score: {scoresLM[i]:.4f})")

In [None]:
# Top 10 weightes
weights = get_weights(adjacencyLM)
weights_indices = np.argsort(weights)[-10:][::-1] 
for rank, i in enumerate(weights_indices, 1):
    print(f"{rank}. {datasetLM.names[i]} (weight: {weights[i]:.2f})")

In [None]:
# Top 10 degrees
degrees = get_degrees(adjacencyLM)

degrees_indices = np.argsort(degrees)[-10:][::-1]

for rank, i in enumerate(degrees_indices, 1):
    print(f"{rank}. {datasetLM.names[i]} (degree: {degrees[i]:.2f})")

### Compare lists



In [None]:
# Tests for different values of alpha
scores_array, alphas = scores_test(adjacencyLM)

for i in range(len(scores_array)):

    print(f"\nAlpha = {alphas[i]:.2f}")

    print("\n Top 5 best ranked nodes")

    # Top 10 best ranked nodes
    ranked_indices = np.argsort(scoresLM)[-5:][::-1]

    for rank, i in enumerate(ranked_indices, 1):
        print(f"{rank}. {datasetLM.names[i]} (score: {scoresLM[i]:.4f})")

    print("\n Top 5 weightes")

    # Top 10 weightes
    weights = get_weights(adjacencyLM)
    weights_indices = np.argsort(weights)[-5:][::-1] 
    for rank, i in enumerate(weights_indices, 1):
        print(f"{rank}. {datasetLM.names[i]} (weight: {weights[i]:.2f})")

    print("\n Top 5 degrees")
    
    # Top 10 degrees
    degrees = get_degrees(adjacencyLM)

    degrees_indices = np.argsort(degrees)[-5:][::-1]

    for rank, i in enumerate(degrees_indices, 1):
        print(f"{rank}. {datasetLM.names[i]} (degree: {degrees[i]:.2f})")
        
    print("="*30)

## Openflights


In [None]:
datasetOF = openflights

In [None]:
adjacencyOF = datasetOF.adjacency
positionOF = datasetOF.position
namesOF = datasetOF.names

In [None]:
# hide the edges for better visualization
image = visualize_graph(adjacencyOF, positionOF, width=800, height=400, display_node_weight=True, display_edges=False)
SVG(image)

## To do

* Display the same world map with PageRank scores (in log scale).
* List the 10 best ranked airports, and compare with the 10 airports of highest traffic.
* Display the world map with Personalized PageRank scores, starting from Tokyo international airport.
* List the corresponding 10 best ranked airports.

In [None]:
pagerankOF = PageRank()

In [None]:
scoresOF = pagerankOF.fit_predict(adjacencyOF)

In [None]:
image = visualize_graph(adjacencyOF, positionOF, scores=np.log(scoresOF), node_order=np.argsort(scoresOF), 
                  width=800, height=400, display_node_weight=True, display_edges=False)
SVG(image)

In [None]:
# Top 10 best ranked nodes
ranked_indices = np.argsort(scoresOF)[-10:][::-1]
for rank, i in enumerate(ranked_indices, 1):
    print(f"{rank}. {datasetOF.names[i]} (score: {scoresOF[i]:.4f})")

In [None]:
# Top 10 airports by traffic
weights = get_weights(adjacencyOF)
weights_indices = np.argsort(weights)[-10:][::-1]
for rank, i in enumerate(weights_indices, 1):
    print(f"{rank}. {datasetOF.names[i]} (weight: {weights[i]:.2f})")

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
Although there are 5 airports that appear in both lists, being well ranked and having high traffic do not mean the same. The best ranked airports are the 10 most important for the connectivity of the graph, as they received/send flights from/to other important airports. However, the high traffic (high weight) airports are high-volume hubs, regardless of how important their connections are.
</div>

In [None]:
# find Tokyo airport index
tokyo_index = [i for i, name in enumerate(namesOF) if 'tokyo' in name.lower()]
print(f"Index: {tokyo_index}, Name: {namesOF[tokyo_index]}")


In [None]:
for i in tokyo_index:
    tokyo_index = int(i)
perso_scores = pagerank.fit_predict(adjacencyOF, {tokyo_index: 1})

In [None]:
image = visualize_graph(adjacencyOF, positionOF, scores=np.log(perso_scores), node_order=np.argsort(perso_scores), 
                  width=800, height=400, display_node_weight=True, display_edges=False)
SVG(image)

In [None]:
# Top 10 best ranked nodes with personalized scores
ranked_indices = np.argsort(perso_scores)[-11:][::-1]
for rank, i in enumerate(ranked_indices, 1):
    print(f"{rank}. {datasetOF.names[i]} (score: {perso_scores[i]:.4f})")

## 2. Directed graphs

## Wikipedia Vitals

In [None]:
datasetWV = wikivitals

In [None]:
adjacencyWV = datasetWV.adjacency
namesWV = datasetWV.names

## To do

* List the 10 best ranked articles of Wikipedia Vitals.
* Compare with the 10 nodes of highest out-degrees and the 10 nodes of highest in-degrees. Interpret the results.
* Which article of Wikipedia Vitals is in the top-20 of PageRank but not in the top-20 of in-degrees?

In [None]:
# Top 10 best ranked nodes
pagerankWV = PageRank()
scoresWV = pagerankWV.fit_predict(adjacencyWV)

ranked_indices = np.argsort(scoresWV)[-10:][::-1]
for rank, i in enumerate(ranked_indices, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {scoresWV[i]:.4f})")

In [None]:
# The out-degree of a node i is defined as A * 1_i, while the in-degree is defined similarly A^T * 1_i. 
# To get those values, we can use the get_degrees function from sknetwork.utils, only specifying the transpose parameter to True for the in-degree.

out_degrees = get_degrees(adjacencyWV)
in_degrees = get_degrees(adjacencyWV, transpose=True)

out_degrees_indices = np.argsort(out_degrees)[-10:][::-1]
print("\nTop 10 nodes of highest out-degree")
for rank, i in enumerate(out_degrees_indices, 1):
    print(f"{rank}. {datasetWV.names[i]} (out-degree: {out_degrees[i]:.2f})")

print("\nTop 10 nodes of highest in-degree")
in_degrees_indices = np.argsort(in_degrees)[-10:][::-1]
for rank, i in enumerate(in_degrees_indices, 1):
    print(f"{rank}. {datasetWV.names[i]} (out-degree: {in_degrees[i]:.2f})")

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
Nodes with high out-degree values indicate that those pages contain more hyperlinks to other pages on Wikipedia, with their content being more general or not specific to a particular topic (which leads to more references to external pages and articles). In the other hand, nodes with high in-degree values indicate that those pages are more frequently linked to by other pages, being metioned in other articles due to their importance or relevance in diverse contexts.
</div>

In [None]:
# Top 10 best ranked articles
ranked_indices = np.argsort(scoresWV)[-20:][::-1]
scores_list = [datasetWV.names[i] for i in ranked_indices]

in_degrees_indices = np.argsort(in_degrees)[-20:][::-1]
in_degrees_list = [datasetWV.names[i] for i in in_degrees_indices]

print("Page present in top 20 hightest ranked articles but not in the top 20 highest in-degree articles")
print(set(scores_list) - set(in_degrees_list)) 

## To do

* List the 20 closest articles to **Picasso** in Wikipedia Vitals. Who is the best ranked painter other than Picasso?
* List the 20 closest articles to both **Cat** and **Dog** in Wikipedia Vitals.
* In both cases, do the same using the difference between the Personalized PageRank score and the PageRank score. Interpret the results.

In [None]:
# Find Picasso index
picasso_index = [i for i, name in enumerate(namesWV) if 'pablo picasso' in name.lower()]
print(f"Index: {picasso_index}, Name: {namesWV[picasso_index]}")

pp_perso_scores = pagerankWV.fit_predict(adjacencyWV, {picasso_index[0]: 1})
ranked_indices_pp_perso = np.argsort(pp_perso_scores)[-20:][::-1]

print("\nTop 20 closest articles to Pablo Picasso")
for rank, i in enumerate(ranked_indices_pp_perso, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {pp_perso_scores[i]:.4f})")


In [None]:
dog_index = [i for i, name in enumerate(namesWV) if 'dog' == name.lower()]
print(f"Index: {dog_index}, Name: {namesWV[dog_index]}")

dog_perso_scores = pagerankWV.fit_predict(adjacencyWV, {dog_index[0]: 1})
ranked_indices_dog = np.argsort(dog_perso_scores)[-20:][::-1]

print("\nTop 20 closest articles to dog\n")
for rank, i in enumerate(ranked_indices_dog, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {dog_perso_scores[i]:.4f})")


In [None]:
# Top 10 scores for cat and dog

dog_index = [i for i, name in enumerate(namesWV) if 'dog' == name.lower()]
cat_index = [i for i, name in enumerate(namesWV) if 'cat' == name.lower()]
print(f"Index: {dog_index}, Name: {namesWV[dog_index]}")
print(f"Index: {cat_index}, Name: {namesWV[cat_index]}")

cd_perso_scores = pagerankWV.fit_predict(adjacencyWV, {dog_index[0]: 1, cat_index[0]: 1})
ranked_indices_cd = np.argsort(cd_perso_scores)[-20:][::-1]

print("\n Top 20 closest articles to cat and dog\n")
for rank, i in enumerate(ranked_indices_cd, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {cd_perso_scores[i]:.4f})")

In [None]:
scores_dif_pp = pp_perso_scores - scoresWV
ranked_indices_pp_dif = np.argsort(scores_dif_pp)[-20:][::-1]

print("Top 20 closest articles to Pablo Picasso (using difference between personalized and standard page rank\n")

for rank, i in enumerate(ranked_indices_pp_dif, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {scores_dif_pp[i]:.4f})")

In [None]:
scores_dif_cd = cd_perso_scores - scoresWV
ranked_indices_cd_dif =  np.argsort(scores_dif_cd)[-20:][::-1]

print("Top 20 closest articles to both cat and dog (using difference between personalized and standard pagerank)\n")
for rank, i in enumerate(ranked_indices_cd_dif, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {cd_perso_scores[i]:.4f})")

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
The result obtained by using certain nodes as the only non-zero elements in the teleportation vector produces a more accurate result towards the articles related to these topics.  However, globally high-ranking nodes can still influence the Personalized PageRank results even when they are not included in the teleportation vector. Since they have a high number of connections, their scores are still high, even when we observe specifically the results biased on the teleportation vector.

To avoid this, we can remove this bias by subtracting the standard PageRank scores from the personalized PageRank scores, which will give us a clearer view of the articles that are more closely related to the topics we are interested in.
</div>

## To do

* List 5 representative articles of each category.

In [None]:
# there are 11 categories
labels = datasetWV.labels
names_labels = datasetWV.names_labels

In [None]:
print(names_labels)

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
To identify representative articles for each category, we compute a Personalized PageRank with the restart distribution (weights) biased toward nodes belonging to a given category.

The resulting PageRank scores reflect the influence of nodes under the structural context of the entire graph but with focus on the target category. After computing the scores, we filter the result to retain only the nodes that belong to the label l, ensuring that the top-ranked nodes are consistent with their category.
</div>

In [None]:
labels_dict = { l: [] for l in names_labels}

for i, label in enumerate(names_labels, 0):
    selected_pages = { p: 1 for p in np.where(labels == i)[0]}
    label_scores = pagerankWV.fit_predict(adjacencyWV, weights = selected_pages)

    ranked_indices_label = np.argsort(-label_scores)
    for p in ranked_indices_label:
        if p in selected_pages.keys():
            labels_dict[label].append(namesWV[p])
        if len(labels_dict[label]) > 5:
            break


print("\nTop 5 articles for each category:\n")
for label, pages in labels_dict.items():
    print(f"{label}: { ", ".join(pages)}")
    print()

## 3. Bipartite graphs

## Cinema

In [None]:
datasetC = cinema

In [None]:
biadjacency = datasetC.biadjacency
movies = datasetC.names_row
actors = datasetC.names_col

## To do

List the 5 closest actors and the 5 closest movies to **Catherine Deneuve**.

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
To do this, we can use a personalized PageRank again, to start from Catherine Deneuve and then find the 5 closest movies and actors.
</div>

In [None]:
# Index for Catherine Deneuve
index = [i for i, name in enumerate(actors) if 'catherine deneuve' in name.lower()]
catherine_index = int(index[0])
print(f"Index: {catherine_index}, Name: {actors[catherine_index]}")

In [None]:
scoresC = pagerank.fit_predict(biadjacency, weights_col = {catherine_index: 1})

In [None]:
scores_actors = pagerank.scores_col_
scores_movies = pagerank.scores_row_ # same as scoresC

print("5 closest movies:")
ranked_indices = np.argsort(scores_movies)[-5:][::-1]
for rank, i in enumerate(ranked_indices, 1):
    print(f"{rank}. {movies[i]} (score: {scores_movies[i]:.4f})")

print("\n5 closest actors:")
ranked_indices = np.argsort(scores_actors)[-6:][::-1]
for rank, i in enumerate(ranked_indices, 1):
    print(f"{rank}. {actors[i]} (score: {scores_actors[i]:.4f})")

## 4. Directed graphs as bipartite graphs

Directed graphs can be represented as bipartite graphs by duplicating each node, one as source of edges and the other as destination of edges. The biadjacency matrix of the bipartite graph is simply the adjacency matrix of the directed graph. 

The PageRank scores obtained with the bipartite graph differ from those obtained with the directed graph: they correspond to the **forward-backward** random walk in the directed graph, edges being alternately followed in forward and backward directions.

## Wikipedia Vitals

In [None]:
datasetWV_bi = wikivitals

In [None]:
adjacencyWV_bi = datasetWV.adjacency
namesWV_bi = datasetWV.names

## To do

Do the same experiments as above and compare both rankings:
* List the 10 best ranked articles of Wikipedia Vitals. 
* List the 20 closest articles to **Picasso** in Wikipedia Vitals. 
* List the 20 closest articles to both **Cat** and **Dog** in Wikipedia Vitals.

In [None]:
scoresWV_bi = pagerank.fit_predict(adjacencyWV_bi, force_bipartite=True)
ranked_indices_bi = np.argsort(scoresWV_bi)[-10:][::-1]

for rank, i in enumerate(ranked_indices_bi, 1):
    print(f"{rank}. {datasetWV.names[i]} (score: {scoresWV_bi[i]:.4f})")


In [None]:
# Find Picasso index

picasso_index = [i for i, name in enumerate(namesWV_bi) if 'pablo picasso' in name.lower()]
print(f"Index: {picasso_index}, Name: {namesWV_bi[picasso_index]}")

pp_scores_bi = pagerankWV.fit_predict(adjacencyWV_bi, {picasso_index[0]: 1})
ranked_indices_pp_bi = np.argsort(pp_scores_bi)[-20:][::-1]

print("\nTop 20 closest articles to Pablo Picasso")
for rank, i in enumerate(ranked_indices_pp_bi, 1):
    print(f"{rank}. {datasetWV_bi.names[i]} (score: {pp_scores_bi[i]:.4f})")


In [None]:
# Top 10 scores for cat and dog

dog_index = [i for i, name in enumerate(namesWV_bi) if 'dog' == name.lower()]
cat_index = [i for i, name in enumerate(namesWV_bi) if 'cat' == name.lower()]

print(f"Index: {dog_index}, Name: {namesWV_bi[dog_index]}")
print(f"Index: {cat_index}, Name: {namesWV_bi[cat_index]}")

cd_scores_bi = pagerankWV.fit_predict(adjacencyWV, {dog_index[0]: 1, cat_index[0]: 1})
ranked_indices_cd_bi = np.argsort(cd_scores_bi)[-20:][::-1]

print("\n Top 20 closest articles to cat and dog\n")
for rank, i in enumerate(ranked_indices_cd_bi, 1):
    print(f"{rank}. {datasetWV_bi.names[i]} (score: {cd_scores_bi[i]:.4f})")

<div style="border: 1px solid white; padding: 10px; display: inline-block; max-width: 98%; box-sizing: border-box; word-wrap: break-word;">
When comparing the results from standard or personalized PageRank to those obtained by enforcing a bipartition of the graph, we observe a fundamental shift in the nature of the
ranking. In a standard graph structure, PageRank leverages the full connectivity of the network, allowing influence to propagate freely across all nodes.
</br>
When we force a bipartition to the PageRank, each node’s influence must pass through the opposing set. As a result, the nodes that emerge as important under bipartition tend to be
 those that act as strong bridges or connectors between the two partitions, rather than those that are globally influential. This modification highlights nodes that are structurally critical to inter-group communication, and change mainly the result of the general rank of the Wikipedia Vitals dataset.
</div>