#Data

In [1]:
%%writefile textrank.txt
3: BC−HurricaineGilbert, 09−11 339
4: BC−Hurricaine Gilbert, 0348
5: Hurricaine Gilbert heads toward Dominican Coast
6: By Ruddy Gonzalez
7: Associated Press Writer
8: Santo Domingo, Dominican Republic (AP)
9: Hurricaine Gilbert Swept towrd the Dominican Republic Sunday, and the Civil Defense alerted its heavily populated south coast to prepare for high winds, heavy rains, and high seas.
10: The storm was approaching from the southeast with sustained winds of 75 mph gusting to 92 mph.
11: "There is no need for alarm," Civil Defense Director Eugenio Cabral said in a television alert shortly after midnight Saturday.
12: Cabral said residents of the province of Barahona should closely follow Gilbert’s movement.
13: An estimated 100,000 people live in the province, including 70,000 in the city of Barahona, about 125 miles west of Santo Domingo.
14: Tropical storm Gilbert formed in the eastern Carribean and strenghtened into a hurricaine Saturday night.
15: The National Hurricaine Center in Miami reported its position at 2 a.m. Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo.
16: The National Weather Service in San Juan, Puerto Rico, said Gilbert was moving westard at 15 mph with a "broad area of cloudiness and heavy weather" rotating around the center of the storm.
17: The weather service issued a flash flood watch for Puerto Rico and the Virgin Islands until at least 6 p.m. Sunday.
18: Strong winds associated with the Gilbert brought coastal flooding, strong southeast winds, and up to 12 feet to Puerto Rico’s south coast.
19: There were no reports on casualties.
20: San Juan, on the north coast, had heavy rains and gusts Saturday, but they subsided during the night.
21: On Saturday, Hurricane Florence was downgraded to a tropical storm, and its remnants pushed inland from the U.S. Gulf Coast.
22: Residents returned home, happy to find little damage from 90 mph winds and sheets of rain.
23: Florence, the sixth named storm of the 1988 Atlantic storm season, was the second hurricane.
24: The first, Debby, reached minimal hurricane strength briefly before hitting the Mexican coast last month.

Writing textrank.txt


#Set enviroment variable for PySpark

In [2]:
import os
import sys
spark_home = os.environ['SPARK_HOME'] = '/Users/liang/Downloads/spark-1.3.0-bin-hadoop2.4/'
spark_home = os.environ['SPARK_HOME'] = '/Users/jshanahan/Dropbox/Lectures-UC-Berkeley-ML-Class-2015/spark-1.5.0-bin-hadoop2.6/'
if not spark_home:
    raise ValueError('SPARK_HOME enviroment variable is not set')
sys.path.insert(0,os.path.join(spark_home,'python'))
sys.path.insert(0,os.path.join(spark_home,'python/lib/py4j-0.8.2.1-src.zip'))
execfile(os.path.join(spark_home,'python/pyspark/shell.py'))

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.3.0
      /_/

Using Python version 2.7.10 (default, May 28 2015 17:04:42)
SparkContext available as sc, HiveContext available as sqlCtx.


#Text Rank Algo

In [3]:
from math import log

def computeContribs(sentences, rank):
    sumOfWeights = sum([s[1] for s in sentences])
    for sentence in sentences:
        yield (sentence[0], rank * sentence[1] / sumOfWeights)
# each record is (node, List of All Neighbors)        
def parseSentences(line):
    fields = line.split(':')
    return(fields[0], fields[1].replace(',','').split())

def generateLink(sentence1,sentence2):
    w = len(set(sentence1[1]) & set(sentence2[1]))/(log(len(sentence1[1]))+log(len(sentence2[1])))
    return(sentence1[0],(sentence2[0],w))


text = sc.textFile("./textrank.txt").map(parseSentences)
links = text.cartesian(text).filter(lambda(s1,s2): s1[0]!=s2[0]) \
            .map(lambda(s1,s2):generateLink(s1,s2)) \
            .groupByKey() \
            .filter(lambda (Idx,links): sum([s[1] for s in links])>0) \
            .cache()
ranks = links.map(lambda (url, neighbors): (url, 1.0))
for iteration in xrange(5):
    #Emit scores to all neighbor nodes 
    contribs = links.join(ranks).flatMap(
        lambda (sentence, (sentences, rank)): computeContribs(sentences, rank))
    #sum up 
    ranks = contribs.reduceByKey(lambda x,y: x+y).mapValues(lambda rank: rank * 0.85 + 0.15)
print ranks.collect()


[(u'19', 0.23286450789357083), (u'18', 1.3101198610073745), (u'8', 0.3597458157428781), (u'3', 0.15), (u'6', 0.15), (u'9', 1.6621648426556026), (u'4', 0.3910419617637807), (u'7', 0.15), (u'11', 0.6701184724682888), (u'24', 0.655934562135146), (u'10', 1.378954687951134), (u'13', 1.0000606932676237), (u'12', 0.839961809498709), (u'20', 1.1108549804269938), (u'5', 0.5583873090105679), (u'15', 1.388140218659445), (u'14', 1.3506522859181915), (u'21', 1.3034037101082727), (u'22', 0.8728040835480523), (u'17', 1.1616119338531525), (u'23', 0.8686459446416938), (u'16', 1.8845323194495136)]


In [3]:
"""
From this paper: https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf

External dependencies: nltk, numpy, networkx

Based on https://gist.github.com/voidfiles/1646117
"""

import io
import nltk
import itertools
from operator import itemgetter
import networkx as nx
import os

#apply syntactic filters based on POS tags
def filter_for_tags(tagged, tags=['NN', 'JJ', 'NNP']):
    return [item for item in tagged if item[1] in tags]

def normalize(tagged):
    return [(item[0].replace('.', ''), item[1]) for item in tagged]

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in itertools.ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

def lDistance(firstString, secondString):
    "Function to find the Levenshtein distance between two words/sentences - gotten from http://rosettacode.org/wiki/Levenshtein_distance#Python"
    if len(firstString) > len(secondString):
        firstString, secondString = secondString, firstString
    distances = range(len(firstString) + 1)
    for index2, char2 in enumerate(secondString):
        newDistances = [index2 + 1]
        for index1, char1 in enumerate(firstString):
            if char1 == char2:
                newDistances.append(distances[index1])
            else:
                newDistances.append(1 + min((distances[index1], distances[index1+1], newDistances[-1])))
        distances = newDistances
    return distances[-1]

def buildGraph(nodes):
    "nodes - list of hashables that represents the nodes of the graph"
    gr = nx.Graph() #initialize an undirected graph
    gr.add_nodes_from(nodes)
    nodePairs = list(itertools.combinations(nodes, 2))

    #add edges to the graph (weighted by Levenshtein distance)
    for pair in nodePairs:
        firstString = pair[0]
        secondString = pair[1]
        levDistance = lDistance(firstString, secondString)
        gr.add_edge(firstString, secondString, weight=levDistance)

    return gr

def extractKeyphrases(text):
    # tokenize the text using nltk
    wordTokens = nltk.word_tokenize(text)

    # assign POS tags to the words in the text
    tagged = nltk.pos_tag(wordTokens)
    textlist = [x[0] for x in tagged]
    
    tagged = filter_for_tags(tagged)
    tagged = normalize(tagged)

    unique_word_set = unique_everseen([x[0] for x in tagged])
    word_set_list = list(unique_word_set)

    # this will be used to determine adjacent words in order to construct keyphrases with two words
    graph = buildGraph(word_set_list)
    # pageRank - initial value of 1.0, error tolerance of 0,0001, 
    calculated_page_rank = nx.pagerank(graph, weight='weight')
    # most important words in ascending order of importance
    keyphrases = sorted(calculated_page_rank, key=calculated_page_rank.get, reverse=True)
    # the number of keyphrases returned will be relative to the size of the text (a third of the number of vertices)
    aThird = len(word_set_list) / 3
    keyphrases = keyphrases[0:aThird+1]
    # take keyphrases with multiple words into consideration as done in the paper - if two words are adjacent 
    # in the text and are selected as keywords, join them together
    modifiedKeyphrases = set([])
    dealtWith = set([]) #keeps track of individual keywords that have been joined to form a keyphrase
    i = 0
    j = 1
    while j < len(textlist):
        firstWord = textlist[i]
        secondWord = textlist[j]
        if firstWord in keyphrases and secondWord in keyphrases:
            keyphrase = firstWord + ' ' + secondWord
            modifiedKeyphrases.add(keyphrase)
            dealtWith.add(firstWord)
            dealtWith.add(secondWord)
        else:
            if firstWord in keyphrases and firstWord not in dealtWith: 
                modifiedKeyphrases.add(firstWord)
            #if this is the last word in the text, and it is a keyword,
            #it definitely has no chance of being a keyphrase at this point    
            if j == len(textlist)-1 and secondWord in keyphrases and secondWord not in dealtWith:
                modifiedKeyphrases.add(secondWord)      
        i = i + 1
        j = j + 1
        
    return modifiedKeyphrases

def extractSentences(text):
    sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
    sentenceTokens = sent_detector.tokenize(text.strip())
    graph = buildGraph(sentenceTokens)

    calculated_page_rank = nx.pagerank(graph, weight='weight')

    #most important sentences in ascending order of importance
    sentences = sorted(calculated_page_rank, key=calculated_page_rank.get, reverse=True)

    #return a 100 word summary
    summary = ' '.join(sentences)
    summaryWords = summary.split()
    summaryWords = summaryWords[0:101]
    summary = ' '.join(summaryWords)

    return summary

def writeFiles(summary, keyphrases, fileName):
    "outputs the keyphrases and summaries to appropriate files"
    print "Generating output to " + 'TextRank-master/keywords/' + fileName
    keyphraseFile = io.open('TextRank-master/keywords/' + fileName, 'w')
    for keyphrase in keyphrases:
        keyphraseFile.write(keyphrase + '\n')
    keyphraseFile.close()

    print "Generating output to " + 'TextRank-master/summaries/' + fileName
    summaryFile = io.open('TextRank-master/summaries/' + fileName, 'w')
    summaryFile.write(summary)
    summaryFile.close()

    print "-"

def mainRunner():
    nltk.download()
    #retrieve each of the articles
    articles = os.listdir("TextRank-master/articles")
    for article in articles:
        print 'Reading articles/' + article
        articleFile = io.open('TextRank-master/articles/' + article, 'r')
        text = articleFile.read()
        keyphrases = extractKeyphrases(text)
        summary = extractSentences(text)
        writeFiles(summary, keyphrases, article)

# If you have not downloaded nltk packages please run the following line


In [8]:
!pwd  #/Users/jshanahan/Dropbox/nativex-internal/publications/WSDM-2016/Notebooks/TextRank

#mainRunner()

/Users/jshanahan/Dropbox/nativex-internal/publications/WSDM-2016/Notebooks/TextRank


In [None]:
#nltk.download()

In [None]:
#/Users/jshanahan/Dropbox/nativex-internal/publications/WSDM-2016/Notebooks/TextRank

mainRunner()

In [None]:
wordTokens = nltk.word_tokenize("assign POS tags to the words in the text")

    #assign POS tags to the words in the text
tagged = nltk.pos_tag(wordTokens)

In [4]:
    articles = os.listdir("TextRank-master/articles")
    for article in articles:
        print 'Reading articles/' + article
        articleFile = io.open('TextRank-master/articles/' + article, 'r')
        text = articleFile.read()
        keyphrases = extractKeyphrases(text)
        summary = extractSentences(text)
        writeFiles(summary, keyphrases, article)
        

Reading articles/1.txt
Generating output to TextRank-master/keywords/1.txt
Generating output to TextRank-master/summaries/1.txt
-
Reading articles/10.txt
Generating output to TextRank-master/keywords/10.txt
Generating output to TextRank-master/summaries/10.txt
-
Reading articles/2.txt
Generating output to TextRank-master/keywords/2.txt
Generating output to TextRank-master/summaries/2.txt
-
Reading articles/3.txt
Generating output to TextRank-master/keywords/3.txt
Generating output to TextRank-master/summaries/3.txt
-
Reading articles/4.txt
Generating output to TextRank-master/keywords/4.txt
Generating output to TextRank-master/summaries/4.txt
-
Reading articles/5.txt
Generating output to TextRank-master/keywords/5.txt
Generating output to TextRank-master/summaries/5.txt
-
Reading articles/6.txt
Generating output to TextRank-master/keywords/6.txt
Generating output to TextRank-master/summaries/6.txt
-
Reading articles/7.txt
Generating output to TextRank-master/keywords/7.txt
Generating o

In [13]:
text="""LINCOLNSHIRE, IL With next-generation video game systems such as the Xbox One and the Playstation 4 hitting 
stores later this month, the console wars got even hotter today as electronics manufacturer Zenith announced 
the release of its own console, the Gamespace Pro, which arrives in stores Nov. 19. “With its sleek silver-and-gray box, 
double-analog-stick controllers, ability to play CDs, and starting price of $374.99, the Gamespace Pro is our way of 
saying, ‘Move over, Sony and Microsoft, Zenith is now officially a player in the console game,’” said 
Zenith CEO Michael Ahn at a Gamespace Pro press event, showcasing the system’s launch titles MoonChaser: 
Radiation, Cris Collinsworth’s Pigskin 2013, and survival-horror thriller InZomnia. “With over nine launch 
titles, 3D graphics, and the ability to log on to the internet using our Z-Connect technology, Zenith is finally 
poised to make some big waves in the video game world.” According to Zenith representatives, over 650 units have 
already been preordered."""

In [17]:
articleFile = io.open('TextRank-master/articles/' + "1.txt", 'r')
text = articleFile.read()
wordTokens = nltk.word_tokenize(text) 

#assign POS tags to the words in the text
tagged = nltk.pos_tag(wordTokens)
textlist = [x[0] for x in tagged]


In [18]:
tagged


[(u'\ufeffLINCOLNSHIRE', 'NN'),
 (u',', ','),
 (u'IL', 'NNP'),
 (u'With', 'IN'),
 (u'next-generation', 'JJ'),
 (u'video', 'NN'),
 (u'game', 'NN'),
 (u'systems', 'NNS'),
 (u'such', 'JJ'),
 (u'as', 'IN'),
 (u'the', 'DT'),
 (u'Xbox', 'NNP'),
 (u'One', 'NNP'),
 (u'and', 'CC'),
 (u'the', 'DT'),
 (u'Playstation', 'NNP'),
 (u'4', 'CD'),
 (u'hitting', 'NN'),
 (u'stores', 'NNS'),
 (u'later', 'RBR'),
 (u'this', 'DT'),
 (u'month', 'NN'),
 (u',', ','),
 (u'the', 'DT'),
 (u'console', 'NN'),
 (u'wars', 'NNS'),
 (u'got', 'VBD'),
 (u'even', 'RB'),
 (u'hotter', 'JJR'),
 (u'today', 'NN'),
 (u'as', 'IN'),
 (u'electronics', 'NNS'),
 (u'manufacturer', 'NN'),
 (u'Zenith', 'NNP'),
 (u'announced', 'VBD'),
 (u'the', 'DT'),
 (u'release', 'NN'),
 (u'of', 'IN'),
 (u'its', 'PRP$'),
 (u'own', 'JJ'),
 (u'console', 'NN'),
 (u',', ','),
 (u'the', 'DT'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u',', ','),
 (u'which', 'WDT'),
 (u'arrives', 'VBZ'),
 (u'in', 'IN'),
 (u'stores', 'NNS'),
 (u'Nov.', 'NNP'),
 (u'19', 'CD'

In [19]:
textlist


[u'\ufeffLINCOLNSHIRE',
 u',',
 u'IL',
 u'With',
 u'next-generation',
 u'video',
 u'game',
 u'systems',
 u'such',
 u'as',
 u'the',
 u'Xbox',
 u'One',
 u'and',
 u'the',
 u'Playstation',
 u'4',
 u'hitting',
 u'stores',
 u'later',
 u'this',
 u'month',
 u',',
 u'the',
 u'console',
 u'wars',
 u'got',
 u'even',
 u'hotter',
 u'today',
 u'as',
 u'electronics',
 u'manufacturer',
 u'Zenith',
 u'announced',
 u'the',
 u'release',
 u'of',
 u'its',
 u'own',
 u'console',
 u',',
 u'the',
 u'Gamespace',
 u'Pro',
 u',',
 u'which',
 u'arrives',
 u'in',
 u'stores',
 u'Nov.',
 u'19',
 u'.',
 u'\u201cWith',
 u'its',
 u'sleek',
 u'silver-and-gray',
 u'box',
 u',',
 u'double-analog-stick',
 u'controllers',
 u',',
 u'ability',
 u'to',
 u'play',
 u'CDs',
 u',',
 u'and',
 u'starting',
 u'price',
 u'of',
 u'$',
 u'374.99',
 u',',
 u'the',
 u'Gamespace',
 u'Pro',
 u'is',
 u'our',
 u'way',
 u'of',
 u'saying',
 u',',
 u'\u2018Move',
 u'over',
 u',',
 u'Sony',
 u'and',
 u'Microsoft',
 u',',
 u'Zenith',
 u'is',
 u'now

In [20]:
tagged = filter_for_tags(tagged)
tagged 

[(u'\ufeffLINCOLNSHIRE', 'NN'),
 (u'IL', 'NNP'),
 (u'next-generation', 'JJ'),
 (u'video', 'NN'),
 (u'game', 'NN'),
 (u'such', 'JJ'),
 (u'Xbox', 'NNP'),
 (u'One', 'NNP'),
 (u'Playstation', 'NNP'),
 (u'hitting', 'NN'),
 (u'month', 'NN'),
 (u'console', 'NN'),
 (u'today', 'NN'),
 (u'manufacturer', 'NN'),
 (u'Zenith', 'NNP'),
 (u'release', 'NN'),
 (u'own', 'JJ'),
 (u'console', 'NN'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u'Nov.', 'NNP'),
 (u'sleek', 'JJ'),
 (u'silver-and-gray', 'NN'),
 (u'box', 'NN'),
 (u'double-analog-stick', 'JJ'),
 (u'ability', 'NN'),
 (u'price', 'NN'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u'way', 'NN'),
 (u'\u2018Move', 'NN'),
 (u'Sony', 'NNP'),
 (u'Microsoft', 'NNP'),
 (u'Zenith', 'NNP'),
 (u'player', 'NN'),
 (u'console', 'JJ'),
 (u'game', 'NN'),
 (u'\u2019\u201d', 'NNP'),
 (u'Zenith', 'NNP'),
 (u'CEO', 'NNP'),
 (u'Michael', 'NNP'),
 (u'Ahn', 'NNP'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u'press', 'NN'),
 (u'event', 'NN'),
 (u'system\u2019s', 'NN'),


In [21]:
tagged = normalize(tagged)
tagged

[(u'\ufeffLINCOLNSHIRE', 'NN'),
 (u'IL', 'NNP'),
 (u'next-generation', 'JJ'),
 (u'video', 'NN'),
 (u'game', 'NN'),
 (u'such', 'JJ'),
 (u'Xbox', 'NNP'),
 (u'One', 'NNP'),
 (u'Playstation', 'NNP'),
 (u'hitting', 'NN'),
 (u'month', 'NN'),
 (u'console', 'NN'),
 (u'today', 'NN'),
 (u'manufacturer', 'NN'),
 (u'Zenith', 'NNP'),
 (u'release', 'NN'),
 (u'own', 'JJ'),
 (u'console', 'NN'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u'Nov', 'NNP'),
 (u'sleek', 'JJ'),
 (u'silver-and-gray', 'NN'),
 (u'box', 'NN'),
 (u'double-analog-stick', 'JJ'),
 (u'ability', 'NN'),
 (u'price', 'NN'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u'way', 'NN'),
 (u'\u2018Move', 'NN'),
 (u'Sony', 'NNP'),
 (u'Microsoft', 'NNP'),
 (u'Zenith', 'NNP'),
 (u'player', 'NN'),
 (u'console', 'JJ'),
 (u'game', 'NN'),
 (u'\u2019\u201d', 'NNP'),
 (u'Zenith', 'NNP'),
 (u'CEO', 'NNP'),
 (u'Michael', 'NNP'),
 (u'Ahn', 'NNP'),
 (u'Gamespace', 'NNP'),
 (u'Pro', 'NNP'),
 (u'press', 'NN'),
 (u'event', 'NN'),
 (u'system\u2019s', 'NN'),
 

In [22]:
unique_word_set = unique_everseen([x[0] for x in tagged])
word_set_list = list(unique_word_set)

#this will be used to determine adjacent words in order to construct keyphrases with two words
graph = buildGraph(word_set_list)

#pageRank - initial value of 1.0, error tolerance of 0,0001, 
calculated_page_rank = nx.pagerank(graph, weight='weight')

#most important words in ascending order of importance
keyphrases = sorted(calculated_page_rank, key=calculated_page_rank.get, reverse=True)

In [26]:
word_set_list

[u'\ufeffLINCOLNSHIRE',
 u'IL',
 u'next-generation',
 u'video',
 u'game',
 u'such',
 u'Xbox',
 u'One',
 u'Playstation',
 u'hitting',
 u'month',
 u'console',
 u'today',
 u'manufacturer',
 u'Zenith',
 u'release',
 u'own',
 u'Gamespace',
 u'Pro',
 u'Nov',
 u'sleek',
 u'silver-and-gray',
 u'box',
 u'double-analog-stick',
 u'ability',
 u'price',
 u'way',
 u'\u2018Move',
 u'Sony',
 u'Microsoft',
 u'player',
 u'\u2019\u201d',
 u'CEO',
 u'Michael',
 u'Ahn',
 u'press',
 u'event',
 u'system\u2019s',
 u'launch',
 u'MoonChaser',
 u'Radiation',
 u'Cris',
 u'Collinsworth\u2019s',
 u'Pigskin',
 u'survival-horror',
 u'thriller',
 u'InZomnia',
 u'\u201cWith',
 u'internet',
 u'Z-Connect',
 u'technology',
 u'big',
 u'world\u201d']

In [35]:
keyphrases

[u'double-analog-stick',
 u'survival-horror',
 u'silver-and-gray',
 u'next-generation',
 u'\ufeffLINCOLNSHIRE',
 u'Collinsworth\u2019s',
 u'manufacturer',
 u'Playstation',
 u'technology',
 u'MoonChaser',
 u'Gamespace',
 u'Microsoft',
 u'Z-Connect',
 u'Radiation',
 u'system\u2019s',
 u'InZomnia',
 u'thriller',
 u'internet',
 u'Pigskin',
 u'ability',
 u'hitting',
 u'Michael',
 u'release',
 u'console',
 u'world\u201d',
 u'launch',
 u'player',
 u'Zenith',
 u'\u201cWith',
 u'IL',
 u'\u2019\u201d',
 u'CEO',
 u'press',
 u'\u2018Move',
 u'Xbox',
 u'sleek',
 u'video',
 u'today',
 u'event',
 u'price',
 u'such',
 u'Nov',
 u'game',
 u'way',
 u'box',
 u'month',
 u'Ahn',
 u'big',
 u'Pro',
 u'Cris',
 u'own',
 u'Sony',
 u'One']

In [30]:
for node in  graph.nodes_iter:
    print node

TypeError: 'instancemethod' object is not iterable

In [31]:
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
sentenceTokens = sent_detector.tokenize(text.strip())
graph = buildGraph(sentenceTokens)

calculated_page_rank = nx.pagerank(graph, weight='weight')


In [32]:
sentenceTokens

[u'\ufeffLINCOLNSHIRE, IL With next-generation video game systems such as the Xbox One and the Playstation 4 hitting stores later this month, the console wars got even hotter today as electronics manufacturer Zenith announced the release of its own console, the Gamespace Pro, which arrives in stores Nov. 19.',
 u'\u201cWith its sleek silver-and-gray box, double-analog-stick controllers, ability to play CDs, and starting price of $374.99, the Gamespace Pro is our way of saying, \u2018Move over, Sony and Microsoft, Zenith is now officially a player in the console game,\u2019\u201d said Zenith CEO Michael Ahn at a Gamespace Pro press event, showcasing the system\u2019s launch titles MoonChaser: Radiation, Cris Collinsworth\u2019s Pigskin 2013, and survival-horror thriller InZomnia.',
 u'\u201cWith over nine launch titles, 3D graphics, and the ability to log on to the internet using our Z-Connect technology, Zenith is finally poised to make some big waves in the video game world.\u201d Acc

In [33]:
text


u'\ufeffLINCOLNSHIRE, IL With next-generation video game systems such as the Xbox One and the Playstation 4 hitting stores later this month, the console wars got even hotter today as electronics manufacturer Zenith announced the release of its own console, the Gamespace Pro, which arrives in stores Nov. 19. \u201cWith its sleek silver-and-gray box, double-analog-stick controllers, ability to play CDs, and starting price of $374.99, the Gamespace Pro is our way of saying, \u2018Move over, Sony and Microsoft, Zenith is now officially a player in the console game,\u2019\u201d said Zenith CEO Michael Ahn at a Gamespace Pro press event, showcasing the system\u2019s launch titles MoonChaser: Radiation, Cris Collinsworth\u2019s Pigskin 2013, and survival-horror thriller InZomnia. \u201cWith over nine launch titles, 3D graphics, and the ability to log on to the internet using our Z-Connect technology, Zenith is finally poised to make some big waves in the video game world.\u201d According to Z

In [34]:
calculated_page_rank

{u'\u201cWith its sleek silver-and-gray box, double-analog-stick controllers, ability to play CDs, and starting price of $374.99, the Gamespace Pro is our way of saying, \u2018Move over, Sony and Microsoft, Zenith is now officially a player in the console game,\u2019\u201d said Zenith CEO Michael Ahn at a Gamespace Pro press event, showcasing the system\u2019s launch titles MoonChaser: Radiation, Cris Collinsworth\u2019s Pigskin 2013, and survival-horror thriller InZomnia.': 0.363650754848674,
 u'\u201cWith over nine launch titles, 3D graphics, and the ability to log on to the internet using our Z-Connect technology, Zenith is finally poised to make some big waves in the video game world.\u201d According to Zenith representatives, over 650 units have already been preordered.': 0.31264271392367987,
 u'\ufeffLINCOLNSHIRE, IL With next-generation video game systems such as the Xbox One and the Playstation 4 hitting stores later this month, the console wars got even hotter today as electro