# SD212: Graph mining
## Lab 4: PageRank

In this lab, you will learn how to compute, use and interpret various PageRank metrics.

## Import

In [2]:
import networkx as nx

The documentation is available [here](https://networkx.readthedocs.io/en/stable/).

In [3]:
import numpy as np

In [4]:
from scipy import sparse as sp

In [5]:
import warnings
warnings.filterwarnings('ignore')

In [6]:
import matplotlib.pyplot as plt

**Note:** The following magic command provides interactive plots (e.g., for zooming in/out).<br>
Any new figure must by declared by the command ```plt.figure()```

In [7]:
# %matplotlib inline
%matplotlib notebook

## Data

You will need the following datasets (the same as in previous labs, no need to download them again):
* [Les Misérables](http://perso.telecom-paristech.fr/~bonald/graphs/miserables.graphml.gz)<br>  Graph connecting the characters of the [novel of Victor Hugo](https://fr.wikisource.org/wiki/Les_Misérables) when they appear in the same chapter. The graph is undirected and weighted. Weights correspond to the number of chapters in which characters appear together. 
* [Openflights](http://perso.telecom-paristech.fr/~bonald/graphs/openflights.graphml.gz)<br>
Graph of the main international flights. Nodes are airports. The graph is undirected (all flights are bidirectional). Weights correspond to the number of daily flights between airports. Extracted from [Openflights](http://openflights.org).
* [Wikipedia for schools](http://perso.telecom-paristech.fr/~bonald/graphs/wikipedia_schools.graphml.gz)<br> Graph of the hyperlinks between a subset of the pages of the English Wikipedia. The graph is directed and unweighted.
More information [here](https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_for_Schools)
* [Actors-Movies](http://perso.telecom-paristech.fr/~bonald/graphs/actors_movies.graphml.gz)<br>
Graph between actors and movies (a link exists if the actor stars in the movie). The graph is bipartite (see the field 'type' of each node). 
Extracted from [YaGo](http://yago-knowledge.org).



## 1. PageRank

The PageRank corresponds to the stationary distribution of a random walk with restart probability $1-\alpha$. Unless otherwise specified, we take the default value $\alpha = 0.85$ and the restart probability distribution uniform over the set of nodes.

## Toy graph

Consider the following graph:

In [8]:
toy_graph = nx.DiGraph()
edgelist = [(7,5),(5,3),(3,7),(4,5),(4,6),(7,4),(1,6),(9,8),(7,8),(6,1),(2,9),(8,2)]
toy_graph.add_edges_from(edgelist)

In [9]:
plt.figure()
nx.draw(toy_graph, with_labels = True)

<IPython.core.display.Javascript object>

## To do

* Guess the best ranked node of this graph.
* Check the PageRank of each node using the```pagerank``` function of ```networkx``` and interpret the results.
* Add edges 2->5 and 6->3 and observe the impact on the top-3 nodes.


In [10]:
nx.pagerank(toy_graph)

{1: 0.1578013863403067,
 2: 0.15340048063270742,
 3: 0.060108035941243115,
 4: 0.03586490754882651,
 5: 0.05110749325856301,
 6: 0.16603807389373965,
 7: 0.06775849722262377,
 8: 0.16086300246509883,
 9: 0.1470581226968908}

In [11]:
edgelist = [(2,5),(6,3)]
toy_graph.add_edges_from(edgelist)
plt.figure()
nx.draw(toy_graph, with_labels = True)

<IPython.core.display.Javascript object>

In [12]:
nx.pagerank(toy_graph)

{1: 0.055177090077234855,
 2: 0.11949146088380552,
 3: 0.1754230695911956,
 4: 0.06363615072841398,
 5: 0.1414647348577573,
 6: 0.09061278609150875,
 7: 0.16577642307830376,
 8: 0.1209687215133555,
 9: 0.06744956317842467}

## Path graph

Now consider a path graph:

In [13]:
n = 10
path_graph = nx.path_graph(n)
pos = [(i,0) for i in range(n)]

In [14]:
plt.figure(figsize=(5,1))
nx.draw(path_graph,pos,with_labels = True)

<IPython.core.display.Javascript object>

## To do

* What are the two best ranked nodes? Try with different values of $\alpha$ and interpret the results.
* What is the exact PageRank vector when $\alpha=1$?

In [32]:
nx.pagerank?

In [15]:
nx.pagerank(path_graph)

{0: 0.06438475238742992,
 1: 0.1161980205543576,
 2: 0.10934610989479174,
 3: 0.10579068620906135,
 4: 0.10428043095435946,
 5: 0.10428043095435947,
 6: 0.10579068620906133,
 7: 0.10934610989479172,
 8: 0.1161980205543576,
 9: 0.06438475238742991}

In [16]:
nx.pagerank(path_graph, 1, max_iter = 100000, tol = 1e-15)

{0: 0.05555555555555509,
 1: 0.1111111111111118,
 2: 0.11111111111111038,
 3: 0.11111111111111144,
 4: 0.11111111111111088,
 5: 0.11111111111111088,
 6: 0.11111111111111144,
 7: 0.11111111111111038,
 8: 0.1111111111111118,
 9: 0.05555555555555509}

## Les Misérables


In [17]:
miserables = nx.read_graphml("../miserables.graphml", node_type = int)

In [18]:
names = nx.get_node_attributes(miserables, 'name')
weights = nx.get_edge_attributes(miserables, 'weight')
width = [weights[e] / 2 for e in miserables.edges()]

In [19]:
plt.figure(figsize = (8,8))
nx.draw(miserables, labels = names, width = width, edge_color = 'grey')

<IPython.core.display.Javascript object>

## To do

* List the 10 best ranked nodes of Les Misérables.
* Compare with the 10 nodes of highest degrees and the 10 nodes of highest weights.
* Try different values of $\alpha$ and interpret the results.

In [20]:
name_func = lambda x: (miserables.nodes[x[0]]["name"], x[1])
list(map(name_func, sorted(list(nx.pagerank(miserables, 0.99).items()), key = lambda x: x[1], reverse = True)[:10]))

[('Valjean', 0.09783227844142484),
 ('Marius', 0.062411069926085576),
 ('Enjolras', 0.05312291480847467),
 ('Courfeyrac', 0.04885465832522683),
 ('Cosette', 0.0414131491796308),
 ('Combeferre', 0.039511939698859785),
 ('Bossuet', 0.038378862771582214),
 ('Thenardier', 0.03732178848532264),
 ('Gavroche', 0.03320438588218726),
 ('Fantine', 0.02871443229996302)]

In [21]:
list(map(name_func, sorted(list(miserables.degree), key = lambda x: x[1], reverse = True)[:10]))

[('Valjean', 36),
 ('Gavroche', 22),
 ('Marius', 19),
 ('Javert', 17),
 ('Thenardier', 16),
 ('Fantine', 15),
 ('Enjolras', 15),
 ('Courfeyrac', 13),
 ('Bossuet', 13),
 ('Bahorel', 12)]

In [22]:
def get_weights(adjacency):
    n = adjacency.shape[0]
    return adjacency.dot(np.ones(n, dtype = int))

weights = get_weights(nx.to_scipy_sparse_matrix(miserables))
list(map(name_func, sorted(enumerate(weights), key = lambda x: x[1], reverse = True)[:10]))

[('Valjean', 158),
 ('Marius', 104),
 ('Enjolras', 91),
 ('Courfeyrac', 84),
 ('Cosette', 68),
 ('Combeferre', 68),
 ('Bossuet', 66),
 ('Thenardier', 61),
 ('Gavroche', 56),
 ('Fantine', 47)]

## Openflights


In [23]:
openflights = nx.read_graphml("../openflights.graphml", node_type = int)

In [24]:
# Get positions
pos_x = nx.get_node_attributes(openflights,'pos_x')
pos_y = nx.get_node_attributes(openflights,'pos_y')
pos = {u: (pos_x[u], pos_y[u]) for u in openflights}

In [25]:
plt.figure(figsize=(8,4))
plt.axis('off')
nx.draw_networkx_nodes(openflights, pos, node_size = 10, node_color = 'b')
nx.draw_networkx_edges(openflights, pos, alpha = .1, edge_color = 'grey')
plt.show()

<IPython.core.display.Javascript object>

## To do

* List and display the 10 best ranked airports.
* Compare with the 10 airports of highest traffic.

In [26]:
name_func = lambda x: (openflights.nodes[x[0]]["name"], x[1])
list(map(name_func, sorted(list(nx.pagerank(openflights).items()), key = lambda x: x[1], reverse = True)[:10]))

[('Hartsfield Jackson Atlanta International Airport', 0.005145403593166541),
 ('Dallas Fort Worth International Airport', 0.00472740026056956),
 ('Amsterdam Airport Schiphol', 0.004457275429288883),
 ("Chicago O'Hare International Airport", 0.004342573162802293),
 ('Denver International Airport', 0.004295311108107729),
 ('Charles de Gaulle International Airport', 0.004158427799666545),
 ('Atatürk International Airport', 0.003977465015359743),
 ('OR Tambo International Airport', 0.0038079835293439797),
 ('Los Angeles International Airport', 0.0038031273482594784),
 ('Domodedovo International Airport', 0.00379643421251896)]

In [27]:
weights = get_weights(nx.to_scipy_sparse_matrix(openflights))
list(map(name_func, sorted(enumerate(weights), key = lambda x: x[1], reverse = True)[:10]))

[('Amsterdam Airport Schiphol', 492),
 ('Charles de Gaulle International Airport', 413),
 ('Dublin Airport', 408),
 ('Munich International Airport', 386),
 ('Hartsfield Jackson Atlanta International Airport', 382),
 ('London Gatwick Airport', 376),
 ('Frankfurt am Main International Airport', 369),
 ('Atatürk International Airport', 354),
 ('Dallas Fort Worth International Airport', 334),
 ('Beijing Capital International Airport', 332)]

## Wikipedia for Schools

In [28]:
wikipedia = nx.read_graphml("../wikipedia_schools.graphml", node_type = int)

## To do

* List the 10 best ranked pages of Wikipedia for Schools.
* Compare with the 10 nodes of highest out-degrees and the 10 nodes of highest in-degrees.
* Which article of Wikipedia for School is in the top-10 in terms of PageRank but not in the top-10 in terms of in-degree?

In [29]:
name_func = lambda x: (wikipedia.nodes[x[0]]["name"], x[1])
list(map(name_func, sorted(list(nx.pagerank(wikipedia).items()), key = lambda x: x[1], reverse = True)[:10]))

[('United States', 0.00956607151876177),
 ('France', 0.006422592507473661),
 ('Europe', 0.006339657306010129),
 ('United Kingdom', 0.00623504934004644),
 ('English language', 0.004864986027022299),
 ('Germany', 0.004824155845362229),
 ('World War II', 0.004724450220056358),
 ('England', 0.0044743032210142735),
 ('Latin', 0.004424050210791028),
 ('India', 0.004035681392814871)]

In [30]:
## OUT
weights = get_weights(nx.to_scipy_sparse_matrix(wikipedia, weight = None))
list(map(name_func, sorted(enumerate(weights), key = lambda x: x[1], reverse = True)[:10]))

[('United States', 294),
 ('Driving on the left or right', 255),
 ('List of countries', 244),
 ('List of circulating currencies', 236),
 ('List of sovereign states', 216),
 ('Africa', 212),
 ('List of countries by system of government', 207),
 ('Lebanon', 192),
 ('Interpol', 191),
 ('Armenia', 186)]

In [31]:
## IN
weights = get_weights(nx.to_scipy_sparse_matrix(wikipedia, weight = None).T)
list(map(name_func, sorted(enumerate(weights), key = lambda x: x[1], reverse = True)[:10]))

[('United States', 1551),
 ('United Kingdom', 972),
 ('France', 959),
 ('Europe', 933),
 ('World War II', 751),
 ('England', 751),
 ('Germany', 743),
 ('India', 611),
 ('English language', 598),
 ('London', 587)]

## 2. Personalized PageRank

The personalized PageRank provides local ranking with respect to some target node(s). 

## To do

Using the ```pagerank``` function of ```networkx```:
* List the 5 closest characters to **Cosette** in Les Miserables.
* List the 10 closest airports to **Charles de Gaulle** in Openflights.
* List the 20 closest pages to **Picasso** in Wikipedia for Schools. What is the best ranked painter other than Picasso?
* List the 20 closest pages to both **Cat** and **Dog** in Wikipedia for Schools.
* List the 10 closest actors to **Marion Cotillard** in the Actors-Movies graph.

In [33]:
# Get nodes from string (case sensitive)
def get_node_from_string(graph, word):
    nodes = {}
    for u in graph:
        if graph.node[u]['name'].find(word) >= 0:
            nodes[u] = graph.node[u]['name']
    return nodes

my_dict = get_node_from_string(miserables, 'Cosette')

In [34]:
# Sort a dictionary by values (decreasing order)
sorted(my_dict, key = my_dict.get, reverse = True)

[26]

In [55]:
def find_closest(graph, nodes, N):
    perso = {u:1/len(nodes) for u in nodes}
    pi = nx.pagerank(graph, personalization = perso)
    return sorted([(graph.nodes[u]['name'], pi[u]) for u in  sorted(pi, key = pi.get, reverse = True)[:N + len(nodes)]], key = lambda x: x[1], reverse = True)

find_closest(miserables, [26], 5)

[('Cosette', 0.2049227042244718),
 ('Valjean', 0.15001512556798233),
 ('Marius', 0.09928905925063694),
 ('Thenardier', 0.035848340561201696),
 ('Enjolras', 0.03076643191042199),
 ('Javert', 0.0293848167822739)]

In [56]:
node = get_node_from_string(openflights, 'Charles de Gaulle')
print(node)
find_closest(openflights, list(node), 10)

{622: 'Charles de Gaulle International Airport'}


[('Charles de Gaulle International Airport', 0.1613301886784415),
 ('Amsterdam Airport Schiphol', 0.007783471190170047),
 ('Brussels Airport', 0.00640892713798806),
 ('Munich International Airport', 0.006300989932369418),
 ('London Gatwick Airport', 0.006298967227618662),
 ('Atatürk International Airport', 0.0060775729203760885),
 ('Dublin Airport', 0.005952505787498728),
 ('Frankfurt am Main International Airport', 0.005947401879773409),
 ('Barcelona International Airport', 0.004819159668542008),
 ('Leonardo da Vinci–Fiumicino Airport', 0.004776377088008445),
 ('Addis Ababa Bole International Airport', 0.004642886306426803)]

In [57]:
node = get_node_from_string(wikipedia, 'Picasso')
print(node)
find_closest(wikipedia, list(node), 20)

{1185: 'Pablo Picasso'}


[('Pablo Picasso', 0.15161790147789264),
 ('France', 0.014235355148177944),
 ('Germany', 0.012025388201103274),
 ('World War II', 0.0116012290295659),
 ('World War I', 0.011101931514288977),
 ('Spain', 0.01079708820526127),
 ('Paris', 0.010012974366911738),
 ('Poland', 0.009764448843802521),
 ('Art', 0.009069805203132403),
 ('Sculpture', 0.008916710238216368),
 ('United States dollar', 0.00808436171501112),
 ('Neoclassicism', 0.007829089699984765),
 ('Chicago', 0.007823018426642611),
 ('Madrid', 0.007803441359064105),
 ('Barcelona', 0.007639044580162011),
 ('Fascism', 0.0075474343567553),
 ('United States', 0.007501553684342383),
 ('Cubism', 0.0074880528842814565),
 ('Paul McCartney', 0.007210945462616337),
 ('Henri Matisse', 0.0072046773393911415),
 ('Europe', 0.006006168666936557)]

In [58]:
nodes = [2515, 1408]
find_closest(wikipedia, nodes, 20)

[('Dog', 0.07808177342883052),
 ('Cat', 0.07716337609195725),
 ('Animal', 0.00883023335409618),
 ('Scientific classification', 0.00795930076137549),
 ('United States', 0.007272895042412954),
 ('Mammal', 0.007249707613790685),
 ('Chordate', 0.006318242075466015),
 ('English language', 0.005451954276215336),
 ('Germany', 0.0050001163908077),
 ('Insect', 0.004981162278638164),
 ('Europe', 0.004897965822117865),
 ('United Kingdom', 0.004731059236308883),
 ('Africa', 0.004469249649543645),
 ('Bird', 0.004455469901481943),
 ('France', 0.004449295503229965),
 ('Latin', 0.004199755039002363),
 ('Evolution', 0.004173748371728191),
 ('Russia', 0.00415004493415713),
 ('North America', 0.004079435515592613),
 ('Carolus Linnaeus', 0.003827391042669227),
 ('Retina', 0.0038038896596876916),
 ('Rabies', 0.003743953233495532)]

In [39]:
get_node_from_string(wikipedia, 'Cat')

{73: 'Cathedral of Magdeburg',
 297: 'Roman Catholic Church',
 386: 'Catherine II of Russia',
 1073: 'Cattle',
 2354: 'Canterbury Cathedral',
 2461: "St Paul's Cathedral",
 2515: 'Cat',
 2838: 'Durham Cathedral',
 2984: 'Black-footed Cat',
 3298: "Cat's Eye Nebula",
 3299: 'Cat Stevens',
 3300: 'Cataract',
 3301: 'Cataract surgery',
 3302: 'Catherine Cranston',
 3303: 'Catholic social teaching',
 3649: 'The Cat in the Hat',
 3834: 'Felix the Cat',
 4464: 'Sears Catalog Home',
 4520: 'The Catcher in the Rye',
 4521: 'The Catlins'}

## 3. Sparse matrices

You will now implement your own PageRank algorithm using sparse matrices.

The Personalized PageRank vector $\pi$ follows from $N$ iterations of the update:
$$
\pi \gets \alpha  P^T \pi + (1-\alpha)\mu,
$$
where $P$ is the transition matrix of the random walk and $\mu$ the personalization vector (summing to 1) used to restart the random walk.

Note that $\mu$ is also:
* the initial value for $\pi$,
* the restart distribution in the presence of sinks (i.e., the corresponding rows of $P$ are taken equal to $\mu$)

## To do

* Write a function returning the transition matrix $P$ (pay attention to the sinks!)
* Write a function returning the vector $\pi$ after $N$ iterations of the update.
* Write your own pagerank function, with the same parameters as that of  ```networkx``` (see below).
* Compare your function to that of ```networkx``` on Wikipedia for Schools (results, speed).

In [47]:
def get_transition_matrix(adjacency, restart = None):
    '''
    adjacency: scipy CSR matrix
        adjacency matrix of the graph
    restart: numpy array summing to 1
        If None, set to uniform distribution.
    '''
    n = adjacency.shape[0]
    if restart is None:
        restart = np.ones(n)/n
    else:
        restart = restart/np.sum(restart)
        
    sum_adj = adjacency.dot(np.ones(n))
    P = adjacency.astype(float).toarray()
    
    for i in range(n):
        if sum_adj[i] == 0:
            P[i] = restart
        else:
            P[i] = P[i]/sum_adj[i]
  
    return sp.csr_matrix(P)

In [48]:
def pagerank(graph, alpha = 0.85, personalization = None, N = 20):
    '''
    graph: networkx Graph (or DiGraph)
        Graph 
    alpha:
        damping factor    
    personalization: dictionary
        initial / restart distribution
    N: int
        number of iterations
    '''
    adjacency = nx.to_scipy_sparse_matrix(graph)
    
    nodes = list(graph.nodes)
    mu = np.zeros(len(nodes))
    
    for i in range(len(nodes)):
        mu[i] = personalization.get(nodes[i], 0)
    
    #mu = np.array([personalization[u] for u in nodes])
    
    P = get_transition_matrix(adjacency, mu)
    pi = mu
    
    for i in range(N):
        pi = alpha * P.T.dot(pi) + (1-alpha) * mu
    
    pagerank = {nodes[i]: pi[i] for i in range(len(nodes))}
    
    return pagerank

In [49]:
def find_closest(graph, nodes, N):
    perso = {u:1/len(nodes) for u in nodes}
    pi = pagerank(graph, personalization = perso)
    return sorted([(graph.nodes[u]['name'], pi[u]) for u in  sorted(pi, key = pi.get, reverse = True)[:N + len(nodes)]], key = lambda x: x[1], reverse = True)

find_closest(miserables, [26], 5)

[('Cosette', 0.204941703185777),
 ('Valjean', 0.15006089781790874),
 ('Marius', 0.09930671387983639),
 ('Thenardier', 0.035871368425922194),
 ('Enjolras', 0.030759667017942544),
 ('Javert', 0.02939454618581316)]

In [43]:
nodes = [2515, 1408]
find_closest(wikipedia, nodes, 20)

[('Dog', 0.07808265557860444),
 ('Cat', 0.07716450936634511),
 ('Animal', 0.008881931706710432),
 ('Scientific classification', 0.007999610167679723),
 ('Mammal', 0.007273691562597403),
 ('United States', 0.007255473932885001),
 ('Chordate', 0.0063406644472288765),
 ('English language', 0.005436949686351699),
 ('Insect', 0.00499845435433172),
 ('Germany', 0.004980415132058265),
 ('Europe', 0.004879724003496881),
 ('United Kingdom', 0.004707989337199911),
 ('Bird', 0.00447972301043151),
 ('Africa', 0.004465384404134162),
 ('France', 0.004418109915582344),
 ('Latin', 0.004209915601920218),
 ('Evolution', 0.004199567455406228),
 ('Russia', 0.0041403079412654495),
 ('North America', 0.0040817307447589225),
 ('Carolus Linnaeus', 0.0038455776083798463),
 ('Retina', 0.003806740436086695),
 ('Rabies', 0.003744874004762114)]

In [46]:
from time import time
perso = {26:1}

t = 0
for i in range(100):
    to = time()
    pi = nx.pagerank(miserables, personalization = perso)
    t += time() -to
print("Compute time for nx pagerank", t/100)

t = 0
for i in range(100):
    to = time()
    pi = pagerank(miserables, personalization = perso)
    t += time() -to
print("Compute time for our pagerank", t/100)


Compute time for nx pagerank 0.023122541904449463
Compute time for our pagerank 0.0019045257568359375


In [None]:
print ()

## 4. Forward-Backward PageRank

The forward-backward pagerank is based on a random walk following edges in forward and backward directions, alternately. The ranking may differ significantly in directed graphs. 

## To do
* Complete the functions below to get the (Personalized) Forward-Backward PageRank vector.
* What is the closest page to **Picasso** in Wikipedia for Schools for the Personalized Forward-Backward PageRank?
* List the 20 closest pages to both **Cat** and **Dog** in Wikipedia for Schools for the Personalized Forward-Backward PageRank.

In [50]:
def forward_backward_pagerank(graph, alpha = 0.85, personalization = None, N = 20):
    '''
    graph: networkx Graph (or DiGraph)
        Graph 
    alpha:
        damping factor    
    personalization: dictionary
        initial / restart distribution
    N: int
        number of iterations
    '''
    adjacency = nx.to_scipy_sparse_matrix(graph)
    
    nodes = list(graph.nodes)
    mu = np.ones(len(nodes))/len(nodes)
    
    if personalization:
        for i in range(len(nodes)):
            mu[i] = personalization.get(nodes[i], 0)
    
    #mu = np.array([personalization[u] for u in nodes])
    
    P_plus = get_transition_matrix(adjacency, mu)
    P_minus = get_transition_matrix(adjacency.T, mu)
    
    pi = mu
    
    for i in range(N):
        pi = P_plus.T.dot(pi)
        pi = alpha * P_minus.T.dot(pi) + (1-alpha) * mu
    
    pagerank = {nodes[i]: pi[i] for i in range(len(nodes))}
    
    return pagerank

In [59]:
def find_closest(graph, nodes, N):
    perso = {u:1/len(nodes) for u in nodes}
    pi = forward_backward_pagerank(graph, personalization = perso)
    return sorted([(graph.nodes[u]['name'], pi[u]) for u in  sorted(pi, key = pi.get, reverse = True)[:N + len(nodes)]], key = lambda x: x[1], reverse = True)

find_closest(wikipedia, [1185], 5)

[('Pablo Picasso', 0.1532385511876167),
 ('History of painting', 0.003131273482448992),
 ('Western painting', 0.002578382312125069),
 ('United States', 0.0019464829985527802),
 ('Bucharest', 0.0016269958457686047),
 ('Vincent van Gogh', 0.0015610255875454367)]

In [60]:
nodes = [2515, 1408]
find_closest(wikipedia, nodes, 20)

[('Dog', 0.07904454222158257),
 ('Cat', 0.0779137022463273),
 ('Africa', 0.0017203411417366097),
 ('United States', 0.0016654843417511077),
 ('Fauna of Australia', 0.0016566344714256274),
 ('Europe', 0.0016481901147421623),
 ('Driving on the left or right', 0.0014609882265471299),
 ('Slovenia', 0.0014372747974193657),
 ('List of countries', 0.001384390677396117),
 ('List of circulating currencies', 0.0013374440826929668),
 ('Human', 0.0012545232959576464),
 ('List of sovereign states', 0.0012331884545080686),
 ('List of countries by system of government', 0.0011861807335144252),
 ('Laika', 0.001153932944933762),
 ('Argentina', 0.0011473365022465448),
 ('Lebanon', 0.001114034497235171),
 ('Carnivore', 0.0011072895497398314),
 ('Interpol', 0.0010997129970119856),
 ('History of science', 0.0010916228012378248),
 ('Democratic Republic of the Congo', 0.0010829494853652577),
 ('History of Earth', 0.001070959430052048),
 ('Armenia', 0.0010594406399625062)]