In [31]:
import json
import networkx as nx
from operator import itemgetter

class WikiPage:
    def __init__(self, url, title, snippet):
        self.url = url
        self.title = title
        self.snippet = snippet

##### Parser #####
        
def parse_wiki_json(file_path, use_only_wiki_page_nodes=True):
    graph = nx.DiGraph()
    wike_pages_dict = dict()
    
    nodes = json.load(open(file_path))
    for node in nodes:
        url = node["url"]
        wike_pages_dict[url] = WikiPage(url, node["title"], node["info"])
        graph.add_node(url)

    for node in nodes:
        for url in node["out_urls"]:
            if (url in wike_pages_dict) or (not use_only_wiki_page_nodes):
                graph.add_edge(node["url"], url)
            
    return graph, wike_pages_dict

##### Ranking things #####

def print_wiki_page(url, rank, wiki_pages_dict):
    if url in wiki_pages_dict:
        wiki_page = wiki_pages_dict[url]
        print("%s[rank=%s]\n%s\n%s\n" % (wiki_page.title, rank, wiki_page.url, wiki_page.snippet))
    else:
        print("%s[rank=%s]\n%s\n%s\n" % ("...", rank, url, "..."))

def print_top_ranks(ranks, wiki_pages_dict):
    top_to_bottom_ranks = sorted(list(ranks.items()), key=itemgetter(1), reverse=True)

    for (url, rank) in top_to_bottom_ranks[:10]:
        print_wiki_page(url, rank, wiki_pages_dict)

def print_pagerank_results(graph, wiki_pages_dict, alpha, tag):
    print("PageRank results [%s]:\n" % tag)
    print_top_ranks(nx.pagerank(graph, alpha), wiki_pages_dict)
    
def analyze_wiki_graph_with_pagerank(graph, wiki_pages_dict):
    # Print default PageRank results
    print_pagerank_results(graph, wiki_pages_dict, 0.85, "default")

    # Print PageRank results for different alphas
    alphas = [0.95, 0.5, 0.3]    
    for alpha in alphas:
        tag = "alpha = %s" % alpha
        print_pagerank_results(graph, wiki_pages_dict, alpha, tag)
        
def analyze_wiki_graph_with_hits(graph, wiki_pages_dict):
    hubs, authorities = nx.hits(graph, max_iter=500)
    average = { url: (value + authorities[url]) / 2 for url, value in hubs.items() }
    print("HITS results [hubs]\n")
    print_top_ranks(hubs, wiki_pages_dict)
    print("HITS results [authorities]\n")
    print_top_ranks(authorities, wiki_pages_dict)
    print("HITS results [average]\n")
    print_top_ranks(average, wiki_pages_dict)

In [32]:
print("=== BUILDING GRAPH ONLY BETWEEN WIKI PAGE NODES ===\n")
graph, info_dict = parse_wiki_json("wiki_links.json", use_only_wiki_page_nodes=True)
analyze_wiki_graph_with_pagerank(graph, info_dict)

=== BUILDING GRAPH ONLY BETWEEN WIKI PAGE NODES ===

PageRank results [default]:

World War II[rank=0.025849148224995167]
https://en.wikipedia.org/wiki/World_War_II
World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although related conflicts began earlier. The vast majority of the world's countries—including all of the great powers—eve...

New York City[rank=0.01101301659274179]
https://en.wikipedia.org/wiki/New_York_City
The City of New York, often called New York City or simply New York, is the most populous city in the United States.[9] With an estimated 2017 population of 8,622,698[7] distributed over a land area of about 302.6 square miles (784 km2),[10][11] New York ...

Paris[rank=0.007194827382879238]
https://en.wikipedia.org/wiki/Paris
Paris (French pronunciation: ​[paʁi] ( listen)) is the capital and most populous city in France, with an administrative-limits area of 105 square kilometres (41 s

World War II[rank=0.008124281431376135]
https://en.wikipedia.org/wiki/World_War_II
World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although related conflicts began earlier. The vast majority of the world's countries—including all of the great powers—eve...

New York City[rank=0.003989651613354816]
https://en.wikipedia.org/wiki/New_York_City
The City of New York, often called New York City or simply New York, is the most populous city in the United States.[9] With an estimated 2017 population of 8,622,698[7] distributed over a land area of about 302.6 square miles (784 km2),[10][11] New York ...

Paris[rank=0.0020964907015691273]
https://en.wikipedia.org/wiki/Paris
Paris (French pronunciation: ​[paʁi] ( listen)) is the capital and most populous city in France, with an administrative-limits area of 105 square kilometres (41 square miles) and an official population of 2,206,488 (2015).[5] The city is a co

___Выводы___: При изменении $alpha$ топовые 3 результат не меняются (наверное связано с тем, что у них ранг явно больше выделяется на фоне оставшихся рангов). Остальные же 4-10 топ ранги немного меняются местами из-за изменения $alpha$.

In [28]:
print("=== BUILDING GRAPH USING ALL NODES AND ALL EDGES ===")
graph, info_dict = parse_wiki_json("wiki_links.json", use_only_wiki_page_nodes=False)
analyze_wiki_graph_with_pagerank(graph, info_dict)

=== BUILDING GRAPH USING ALL NODES AND ALL EDGES ===
PageRank results [default]:

...[rank=8.32295165367056e-05]
https://en.wikipedia.org/wiki/United_States
...

World War II[rank=4.870370331995472e-05]
https://en.wikipedia.org/wiki/World_War_II
World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although related conflicts began earlier. The vast majority of the world's countries—including all of the great powers—eve...

...[rank=4.143651987487163e-05]
https://en.wikipedia.org/wiki/France
...

...[rank=3.817706147720798e-05]
https://en.wikipedia.org/wiki/Mathematics
...

...[rank=3.574484291728748e-05]
https://en.wikipedia.org/wiki/United_Kingdom
...

New York City[rank=3.2299184144809654e-05]
https://en.wikipedia.org/wiki/New_York_City
The City of New York, often called New York City or simply New York, is the most populous city in the United States.[9] With an estimated 2017 population of 8,622,698[7] di

___Выводы___: Изменение $alpha$ ничего не меняют, ибо у всех страниц маленький ранг, нет явно выдеющихся рангов + кажется, что выводятся самые популяные страницы (т.е. страницы, на которые чаще всего ссылаются остальные или на которые проще всего попасть с любой другой страницы).

In [29]:
print("=== BUILDING GRAPH ONLY BETWEEN WIKI PAGE NODES ===\n")
graph, info_dict = parse_wiki_json("wiki_links.json", use_only_wiki_page_nodes=True)
analyze_wiki_graph_with_hits(graph, info_dict)

=== BUILDING GRAPH ONLY BETWEEN WIKI PAGE NODES ===

HITS results [hubs]

United States[rank=0.001155832205297175]
https://en.wikipedia.org/wiki/United_states
Coordinates: 40°N 100°W﻿ / ﻿40°N 100°W﻿ / 40; -100...

List of recurring The Simpsons characters[rank=0.001141421004976474]
https://en.wikipedia.org/wiki/List_of_recurring_The_Simpsons_characters
The Simpsons includes a large array of supporting characters: co-workers, teachers, family friends, extended relatives, townspeople, local celebrities, fictional characters within the show, and even animals. The writers originally intended many of these c...

List of recurring The Simpsons characters[rank=0.001141421004976474]
https://en.wikipedia.org/wiki/Jimbo_Jones
The Simpsons includes a large array of supporting characters: co-workers, teachers, family friends, extended relatives, townspeople, local celebrities, fictional characters within the show, and even animals. The writers originally intended many of these c...

Airline[rank=0

___Выводы___: 
1. HITS [hubs] выводит в топе страницы, у которых много ссылок на другие страницы;
2. HITS [average] и HITS [authorities] во многом напоминают результаты PageRank.

In [30]:
print("=== BUILDING GRAPH USING ALL NODES AND ALL EDGES ===")
graph, info_dict = parse_wiki_json("wiki_links.json", use_only_wiki_page_nodes=False)
analyze_wiki_graph_with_hits(graph, info_dict)

=== BUILDING GRAPH USING ALL NODES AND ALL EDGES ===
HITS results [hubs]

History of Western civilization[rank=0.004372299909070321]
https://en.wikipedia.org/wiki/History_of_Western_civilization
Western civilization traces its roots back to Europe and the Mediterranean. It is linked to the Roman Empire and with Medieval Western Christendom which emerged from the Middle Ages to experience such transformative episodes as the Renaissance, the Reform...

United States[rank=0.0030728305857598247]
https://en.wikipedia.org/wiki/United_states
Coordinates: 40°N 100°W﻿ / ﻿40°N 100°W﻿ / 40; -100...

New York City[rank=0.002209641967709587]
https://en.wikipedia.org/wiki/New_York_City
The City of New York, often called New York City or simply New York, is the most populous city in the United States.[9] With an estimated 2017 population of 8,622,698[7] distributed over a land area of about 302.6 square miles (784 km2),[10][11] New York ...

New York (state)[rank=0.0019379334377949697]
https://en.wik

___Выводы___: 
1) HITS [authorities] больше всего походит на результаты PageRank, HITS [hubs] и HITS[average] содержат пару страниц из топ-3 PageRank.