## Defining a class to analyze a document 


 - Defining a new class called `Analyzer` as follows: 
 
    - It has two attributes:
        - `text`: store the document text 
        - `vocab`: a dictionary to store the count of each token  
  
    - It has an `__init__` function which 
       - takes `doc`, a document (i.e. a string) as an input, and save `doc` into `text` attribute
       - calls the `tokenize` function that you defined in Q1 with `doc` as the input, and save the returned dictionary to `vocab` attribute 

    - It has a function named `sentiment(postives, negatives)` which 
        - calls `get_sentiment` function you defined in Q2 to analyze the `text` attribute with the supplied `positives` and `negatives` word lists
        - returns an overall sentiment score

    - It has a function named `topN` which 
        - takes a number `N` as an input
        - finds the most frequent `N` words and their counts in the document
        - returns the top `N` words and their counts as a list of tuples
        

In [19]:
class Analyzer(object):

    
    # add your code here
    def __init__(self,text):
        self.text = text
        t = tokenize(self.text)
        self.vocab = t
        
    def sentiment(self,positives,negatives):
        score = get_sentiment(self.text, positives, negatives)[-1]
        return score
    
    def topN (self,N):
        
        i = self.vocab.items()
        
        #To sort the dictionary according to keys
        t = {(value,key) for key,value in i}                #Swapping key and value elements 
        t_sorted = sorted(t)                                #Sorting the dictionary in terms of keys
        final_t = [(value,key) for key,value in t_sorted]   #Swapping key and value elements
        
        T = final_t[::-1]                                   #Reversing the list (desc. order of values)
        
        T = T[:N]                                           #Return the first N values
      
        return T
    

In [20]:
# Code to test your class

positives = open("positive-words.txt").read().split()
negatives = open("negative-words.txt").read().split()


doc ='''"What investors don’t like is uncertainty," said Jason Draho, 
        head of asset allocation Americas at UBS Global Wealth Management, 
        in a phone interview, pointing to a selloff that’s left 
        few corners of financial markets unscathed in January.

        Even with a sharp rally late Friday, the interest rate-sensitive Nasdaq 
        Composite Index COMP, +3.13% remained in correction territory, defined 
        as a fall of at least 10% from its most recent record close. Worse, 
        the Russell 2000 index of small-capitalization stocks RUT, +1.93% is 
        in a bear market, down at least 20% from its Nov. 8 peak.
    '''
analyzer = Analyzer(doc)

print("Vocabulary: \n", analyzer.vocab, "\n")
print("Sentiment: ", analyzer.sentiment(positives, negatives), "\n")
print("Top 3 words: ",analyzer.topN(3))

Vocabulary: 
 {'what': 1, 'investors': 1, 'don’t': 1, 'like': 1, 'is': 2, 'uncertainty': 1, 'said': 1, 'jason': 1, 'draho': 1, 'head': 1, 'of': 4, 'asset': 1, 'allocation': 1, 'americas': 1, 'at': 3, 'ubs': 1, 'global': 1, 'wealth': 1, 'management': 1, 'in': 4, 'phone': 1, 'interview': 1, 'pointing': 1, 'to': 1, 'selloff': 1, 'that’s': 1, 'left': 1, 'few': 1, 'corners': 1, 'financial': 1, 'markets': 1, 'unscathed': 1, 'january': 1, 'even': 1, 'with': 1, 'sharp': 1, 'rally': 1, 'late': 1, 'friday': 1, 'the': 2, 'interest': 1, 'rate-sensitive': 1, 'nasdaq': 1, 'composite': 1, 'index': 2, 'comp': 1, '3.13': 1, 'remained': 1, 'correction': 1, 'territory': 1, 'defined': 1, 'as': 1, 'fall': 1, 'least': 2, '10': 1, 'from': 2, 'its': 2, 'most': 1, 'recent': 1, 'record': 1, 'close': 1, 'worse': 1, 'russell': 1, '2000': 1, 'small-capitalization': 1, 'stocks': 1, 'rut': 1, '1.93': 1, 'bear': 1, 'market': 1, 'down': 1, '20': 1, 'nov': 1, 'peak': 1} 

Sentiment:  -1 

Top 3 words:  [('of', 4), ('in