# Event Handling and Textual Analysis

This week we will continue to work with the twitter data, developing a GUI, and perform some basic textual analysis.  In depth text analysis is well outside the scope of this class, but we can leverage a Python library to get a feel for what textual analysis might look like.  Really, the goal is to generate a few concrete marks for our points, demonstrate how easy it is to leverage an outside library, and have a good reason for a bit more involved GUI.

# Event Handling in Folium
Last week you handled GUI events when you supported opening a JSON file with tweets.  This week, we will add the ability to do that same thing for points on a [Folium](https://folium.readthedocs.org/en/latest/quickstart.html) map.  

I  am intentionally not telling you how to add a marker.  The above link is to the Folium quickstart information that contains the additional argument needed to get a popup.  In working through this assignment, I already had a Tweet class that had an attribute that stored the entire tweet.  I used the `text` key to create a new attribute, `text`, so that I could pass `tweet.text` to the code to generate a mark with a popup.  The result was this:

<img src="images/popup.png" /img>

Just like in PyQt, the user is executing an event and the Folium event handler (realized in javascript) is showing the popup.  I will note that adding popup functionality, with 500 markers, does slow the code down a bit.  Performance optimization is also something we will not worry about in this class.

# Natural Language Toolkit
The [Natural Lanfuage Toolkit](http://www.nltk.org)(NLTK) is a library for working with human generated, textual data.  In addition to language analytical functionality, NLTK also includes over 50 corpa of written examples.  These are essential data sets that support textual analysis.  As a brief aside, a corpus of classified language examples allows for the utilization of a large number of machine learning techniques.  The corpus can be split into a training set and a test set.  The former is used to teach the computer about characteristics within a dataset that result in a given classification.  The latter is used to test how well the computer has learned.  By way of example, imagine that we hade 50,000 tweets and each word within each tweet was classified as being either positive or negative.  We could teach the computer to identify those words within sentences and make decisions about the sentiment of the entire sentence.  This would be accurate to some percentage.  

Building and training a Machine Learning (ML) algorithm like that is well outside of the scope of this class.  Luckily, NLTK has already done this work for us.  They library has a corpus that will allow us to classify, as Negative, Positive, or Neutral, all of the text within our tweets.  Using this information we can generate a sentiment mark for every point and start to explore (Exploratory Spatial Data Analysis or ESDA) the distribution of tweeted sentiment across the Phoenix metro area.  One quick note: The sample size that we are working with is miniscule and we are only using a few simple spatial statistics - do not draw any conclusions from the results.

## Installation
Just like with other packages, we need to execute `pip install nltk`.  It is important to utilize pip; the conda nltk was not working well for me on OS X.  The installation should be relatively quick.  Once this is done, we need to download one of the language corpa.

NLTK ships with a download tool.  To utilize it, execute the code in the cell below.  Alternatively, open a new python session and copy/paste the code.

In [1]:
import nltk
nltk.download()

showing info http://www.nltk.org/nltk_data/


  if 'order' in inspect.getargspec(np.copy)[0]:


True

You should see the NLTK download window open.  In the top blue tabs, click on 'Corpa' and navigate to the `opinion_lexicon` entry.  Once highlighted, click download (lower left, above the URL text box).  This should take just a second or two to download and install the corpa.

<img src="images/opinion.png" /img>

## Test your installation
To test your installation, execute (or copy/paste) the code below:

In [2]:
from nltk.sentiment.util import demo_liu_hu_lexicon as classifier
classifier('I am sad')
classifier('I am happy')
classifier('I am neither happy nor sad')

Negative
Positive
Neutral


The above function utilizes the lexicon, determines the sentiment for the sentence, and prints the result to screen.  This is not really a complex machine learning algorithm, but rather a relatively straightforward [decision tree](http://scikit-learn.org/stable/modules/tree.html).  The code below is a modification of the logic to support returning the class.  This will be important for your assignment, as we need to be able to add an attribute to the Tweet that contains sentiment.  

In [4]:
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def classifier(sentence):

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        return 'Positive'
    elif pos_words < neg_words:
        return 'Negative'
    elif pos_words == neg_words:
        return 'Neutral'
    
# Example Usage:
a = classifier('I am sad')
b = classifier('I am happy')
c = classifier('I am neither happy nor sad')
print(a,b,c)

Negative Positive Neutral


# Embedding MatPlotLib in Qt GUIs
Last week you created a few MatPlotLib plots.  It would be awesome to get those plots into our GUI.  Below, I am providing you with a code example containing a new class.  This class is a MatPlotLib plot window with a single button that creates a random plot.  

Try running the code below to experiment with the window.  In completing the assignment, you will need to modify this code to launch the dialog when a particular menu item is selected and embed a G-Function in the MatPlotLib window.

In [5]:
import sys
from PyQt4 import QtGui

from matplotlib.backends.backend_qt4agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.backends.backend_qt4agg import NavigationToolbar2QTAgg as NavigationToolbar
import matplotlib.pyplot as plt

import random

class Window(QtGui.QDialog):
    def __init__(self, parent=None):
        super(Window, self).__init__(parent)

        # a figure instance to plot on
        self.figure = plt.figure()

        # this is the Canvas Widget that displays the `figure`
        # it takes the `figure` instance as a parameter to __init__
        self.canvas = FigureCanvas(self.figure)

        # this is the Navigation widget
        # it takes the Canvas widget and a parent
        self.toolbar = NavigationToolbar(self.canvas, self)

        # Just some button connected to `plot` method
        self.button = QtGui.QPushButton('Plot')
        self.button.clicked.connect(self.plot)

        # set the layout
        layout = QtGui.QVBoxLayout()
        layout.addWidget(self.toolbar)
        layout.addWidget(self.canvas)
        layout.addWidget(self.button)
        self.setLayout(layout)

    def plot(self):
        ''' plot some random stuff '''
        # random data
        data = [random.random() for i in range(10)]

        # create an axis
        ax = self.figure.add_subplot(111)

        # discards the old graph
        ax.hold(False)

        # plot data
        ax.plot(data, '*-')

        # refresh canvas
        self.canvas.draw()


app = QtGui.QApplication(sys.argv)

main = Window()
main.show()

sys.exit(app.exec_())

  mplDeprecation)


SystemExit: 0

To exit: use 'exit', 'quit', or Ctrl-D.


The system error above is fine to ignore for now.

# Week 14 Deliverables (E11) - Due 4/26/16
This is the final assignment!

For this week make sure that you have completed the following:
    
* Fork Assignment 11 to your own github repository.
    * You can access assignment 11 [HERE](https://github.com/Geospatial-Python/assignment_11)
* Clone the repository locally

## Deliverables
1. Extend the tweet class to support performing sentiment analysis using the above function.  Think about if the function should be a method in the class or patched in.  The sentiment should be included as a mark for each point.
1. Extend the GUI menubar to support visualizing `All Tweets`, `Positive Tweets`, `Negative Tweets`, and `Neutral Tweets`.  When a group is selected, redraw the map with only those tweets.  The tweets on the map will be the currently active set/subset of tweets.
1. Add popup capability for each tweet.  On click, display the tweet text.  Consider (but not required), altering the icon on the map.
1. Add toolbar button or menubar entries to compute the mean nearest neighbor distance and compute a G function with the currently active (visualized) set of tweets.
1. In the case of the G Function, once computed, open a new window with the MatPlotLib plot embedded.
1. Include three screen captures showing (1) all tweets, (2) a subset of tweets, (3) the dialog containing the G-Function plot.
1. Catch up on late assignments!

Note that when testing, I found 500 classified tweets to be quite large.  Feel free to work with a subset initially and only run the 500 tweets when generating screen captures.