# Large Scale Sentiment Analysis 
## Summary and Demo


### Main Purpose/Contribution

* To automatically generate lexicon for Sentiment Analysis

    + Without the need for human curation
    + Increase richness of the dictionary means more depth in analysis
    
* Addressing the problem of weak coherence when expanding the lexicon using synonyms/antonyms

    + Relations between synonyms are usually represented in graph
    + The further the distance from the seeded word, the less relevant the synonyms
    

### Sentiment Lexicon Generation

* Path-based analysis
        
    + Explore all possible paths
    + Determine shortest path
    
* Generate Sentiment Dictionary from list of seeded word

    + Depth of graph (hop from seeded words) is pre-determine (5, in the publication).
    + Based on WordNet, fetch high ranking synonyms and antonyms
    + Significant Score
        
        - Determine factor: distance from the seeded word (based on WordNet).
        - The further the distance, the lesser the score, the lesser the relevancy 

    + Analyze the number of sentiment polarity flips in all paths, keep only those satisfy threshold

        - Assumption: The more flips, the less reliable the path 
        - After graph construction, re-enumerate all paths to determine the number of flips
        

In [1]:
import lexicon_builder
from WordGraph import Graph

seeded_word_list = [('good', 0, 1)]

connections = lexicon_builder.build_connections(0, seeded_word_list, [])

g = Graph(connections)

In [2]:
import pprint
pretty_print = pprint.PrettyPrinter()
pretty_print.pprint(g.graph)

                                 ('goodness', 4.0, 1)},
             ('goodness', 3.0, 1): {('bad', 1.0, -1),
                                    ('bad', 3.0, -1),
                                    ('badness', 1.0, -1),
                                    ('badness', 3.0, -1),
                                    ('evil', 4.0, -1),
                                    ('evilness', 4.0, -1),
                                    ('good', 1.0, 1),
                                    ('goodness', 1.0, 1)},
             ('goodness', 4.0, 1): {('bad', 3.0, -1),
                                    ('badness', 3.0, -1),
                                    ('evil', 1.0, -1),
                                    ('evil', 4.0, -1),
                                    ('evilness', 1.0, -1),
                                    ('evilness', 4.0, -1),
                                    ('good', 1.0, 1),
                                    ('goodness', 1.0, 1)},
             ('gravely', 1.0, -1): {('ba