# Political Blogs and Network Centrality Project

Data used: blogs.gml | University of Michigan School of Information

This project focuses on building and evaluating a network of political blogs who link to one another (either mulutally or one-way directed). The project's goal is to evaluate the centrality each blog using the PageRank and HITS algorithm.  For the HITS hub and authority scores, their weights are defined recursively. A higher authority weight occurs if the page is pointed to by pages with high hub weights. A higher hub weight occurs if the page points to many pages with high authority weights.

In [1]:
import networkx as nx
import pandas as pd

In [2]:
G2 = nx.read_gml('blogs.gml')

#### Build Page Rank with Damping Factor of 85% (to help the network not get stuck in one particular part of the network)

In [3]:
def PR():
    pr = nx.pagerank(G2, alpha=0.85)
    return pr
p = PR()
p['realclearpolitics.com'] # PageRank score of realclearpolitics

0.004636694781649094

Top Five PageRanked Blogs

In [4]:
def top_five():
    pr = PR()
    nodes = nx.nodes(G2)
    ranking = sorted(pr.keys(), key=lambda key:pr[key], reverse=True)[:5]
    return ranking
top_five()

['dailykos.com',
 'atrios.blogspot.com',
 'instapundit.com',
 'blogsforbush.com',
 'talkingpointsmemo.com']

#### Applied HITS Algorithm to network to find the 5 nodes with highest hub scores

In [5]:
def hub_5_scores():
    scores = nx.hits(G2)
    search = scores[0]
    results = sorted(search.keys(),key=lambda key:search[key], reverse=True)[:5]
    return results
hub_5_scores()

['politicalstrategy.org',
 'madkane.com/notable.html',
 'liberaloasis.com',
 'stagefour.typepad.com/commonprejudice',
 'bodyandsoul.typepad.com']

Interpret these results as the top five sources for useful links in the network

#### Applies HITS Algorithm to network to find the 5 nodes with highest authority scores

In [6]:
def auth_5_scores():
    scores = nx.hits(G2)
    search = scores[1]
    results = sorted(search.keys(),key=lambda key:search[key], reverse=True)[:5]
    return results
auth_5_scores()

['dailykos.com',
 'talkingpointsmemo.com',
 'atrios.blogspot.com',
 'washingtonmonthly.com',
 'talkleft.com']

Interpret these results as the top five sites with the most directed traffic from high hub scored links (i.e., these links are the most referenced/useful links).