# Lab 6b
## The Bonusing

You've made it. Congratulations.

At this point you know how to use flow control and iteration to programmatically interact with a streaming API, to process data, and to create interactive web maps out of said data. That's pretty impressive, you can and should feel good about that!

Even though some of the questions you answered were 'canned,' they were teaching you the core tenets of scripting. Of using programming to manipulate, process, and test spatial data and agorithms. Sure, Twitter is worthless, but there are other streaming APIs. And, yes, turtles are a bit silly; but, the idea of a random (or semi-random) walk through a road network? Same principles.

### So, now, let's have a bit of fun.

None of this lab is required. If you hand in the notebok for lab 6a (or equivalent scripts) with questions 1 and 2 answered, you're done. You will receive full credit.

But, what if you want more? What if your urge to program is insatiable? What if, for example, [Aunt May were the herald of Galactus](https://www.cbr.com/the-comic-book-fools-of-april-aunt-may-herald-of-galactus/)?

## Bonus 1: A 'heat' map, +2 pts.

A brief note on data binding: Although you are (most likely) using the Streaming API, the data in your maps is 'static.' What I mean is that once you create the map, you don't add data to it. There are ways to add data on the fly. In Esri environments, you do so with the [GeoEvent Server](http://www.esri.com/arcgis/products/geoevent-server). In other environments, you can use javascript frameworks such as [angular.js](https://angularjs.org/). Feel free to *experiment* with either. I might look at these guides [here](https://codehandbook.org/creating-a-web-app-using-angularjs-python-mongodb/) and [here](https://medium.com/@peregringaret/a-different-stack-angular-flask-mongodb-780b44e10afd) which use [flask](http://flask.pocoo.org/) and [mongodb](https://www.mongodb.com/) to create a web 'stack.'

Is all of that too much? No worries! That's why it's optional. Some of you are going to dive into python and swim within its majesty; others of you are going to learn the basics in order to create and deploy specific solutions or to test particular questions. **Either or both are fine!** The point of this class has never been to make you a programmer, but rather to teach you to *think computationally*, so that you can pursue programming and automation *as far as you need to*. You **can** go out and learn Flask on your own if you need to now, and that's awesome!

Ok, with that out of the way:

This is pretty straightforward. Heat Maps are (usually) meaningless, especially when they aren't [controlled for population](https://xkcd.com/1138/), but they are also pretty. Additionally, they'll teach you how to do some spatial interpolation programmatically, so that's cool.

Your task:

Create a slippy heatmap (one over which I can pan and zoom, so use Folium or some other library) created from *at least* 10,000 geolocated tweets (this means you'll have to harvest tweets for awhile, then create a heatmap).

In [41]:
import pickle
import pandas
import tweepy
import os
import numpy
import json
import folium
from folium.plugins import HeatMap

import pandas as pd, numpy as np, matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
from geopy.distance import great_circle
from shapely.geometry import MultiPoint

#Set up Tweepy
CK
CS
AK
AS
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AK, AS)
twitterApi = tweepy.API(auth)

#List of lat/long coords, needed for the heatmap
points = []
#Temporary list of tweets, the stream pushes data to this list
tweets = []
#List of all tweets that have coords (incl. user, text, cords)
tweetData = []
#A counter for incrementing the stream
val = 0

#Writes tweet info to a CSV - Only collect lat and long (I don't care about anything else)
def writeToList(user, text, lat, long) :
    points.append([lat, long])
    tweetData.append([user, text, lat, long])

#Parse the tweet data and check for coords. If coords, push it to the data lists
def parseTweets(tweetList) :
    for data in tweets :
        try :
            all_data = json.loads(data)
            user = all_data['user']['screen_name']
            text = all_data['text']
            #print(text)
            if all_data['coordinates'] :
                long = all_data['coordinates']['coordinates'][0]
                lat = all_data['coordinates']['coordinates'][1]
                #writeToArray(lat, long)
                writeToList(user, text, lat, long)
        except :
            pass

#A stream listener
#If parses coordinates or place coordinates, passes them to a CSV writer
class CustomStreamListener(tweepy.StreamListener):

    #Handle a refused connection, avoid exponential drop delays
    def on_error(self, status_code):
        if status_code == 420:
            #returning False in on_data disconnects the stream
            return False
    
    def on_data(self, data):
        global val
        if val >= 1000 :
            return False
        try :
            tweets.append(data)
            val += 1
        except :
            pass
        
#Starts a stream of tweets
def streamTweets() :
    global val 
    val = 0
    while not val :
        try:
            stream = tweepy.Stream(auth=twitterApi.auth, listener=CustomStreamListener())
            stream.filter(locations=[-180, -90, 180, 90])
        except Exception as e:
            print(e)
            print('Trying to continue')
            continue

#Save the tweetData to a pickle list so we don't need to get new data every time
#https://pythontips.com/2013/08/02/what-is-pickle-in-python/
def saveTweets(tweetList) :
    file_Name = "tweetData"
    fileObject = open(file_Name,'wb') 
    pickle.dump(tweetList,fileObject)   
    fileObject.close()

#get the centermost point from a list of points 
#Thanks to http://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/
def get_centermost_point(cluster):
    centroid = (MultiPoint(cluster).centroid.x, MultiPoint(cluster).centroid.y)
    centermost_point = min(cluster, key=lambda point: great_circle(point, centroid).m)
    return tuple(centermost_point)

#Simplify points. Take points, run some spatial analysis and simplify them 
#Takes points and clusters them if they are within X given distance of each other
#Thanks to http://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/
def simplify(points) :
    ptArray = numpy.array(points)
    kms_per_radian = 6371.0088
    epsilon = 1.5 / kms_per_radian
    db = DBSCAN(eps=epsilon, min_samples=1, algorithm='ball_tree', metric='haversine').fit(np.radians(ptArray))
    cluster_labels = db.labels_
    num_clusters = len(set(cluster_labels))
    clusters = pd.Series([ptArray[cluster_labels == n] for n in range(-1, num_clusters)])
    print('Simplified points to %s clusters' % num_clusters)
    cluster_points = []
    for row in clusters :
        if len(row) > 0 :
            cluster_points.append((row[0]).tolist())
    return cluster_points

#MAIN
if not (os.path.isfile('tweetData'))  :
    print("Collecting Tweets . . . ")
    while len(points) < 100000 :
        tweets = []
        streamTweets()
        parseTweets(tweets)
        print("%s total tweets collected" % len(points))

    print("Tweet collection complete, saving tweetData . . .")
    try :
        saveTweets(tweetData)
        print("Tweets saved")
    except :
        print("Error saving tweets, continuing . . . ")
else :
    print("Found tweet data! Continuing . . .")
    fileObject = open('tweetData','rb')
    tweetData = pickle.load(fileObject)
    print("There are %s Points in the found file. Continuing . . ." % len(tweetData))
    print("Parsing points . . .")
    points = []
    for row in tweetData :
        points.append([row[2],row[3]])


tweetPoints = simplify(points)

hmap = folium.Map(
    location=[25, 0], 
    tiles='CartoDB dark_matter',
    control_scale = True, 
    zoom_start=2
    )
hmap.add_child(HeatMap(tweetPoints, radius = 5, blur = 5))
hmap


Found tweet data! Continuing . . .
There are 100079 Points in the found file. Continuing . . .
Parsing points . . .
Simplified points to 29152 clusters


## Bonus 2: A map of feelings, +2 pts.

Some of you used [NLTK](http://www.nltk.org/) to analyze our old friend H.P. Lovecraft. If you didn't, that's fine. NLTK is, as the name suggests, a Natural Language processing ToolKit. It's not the only one, and you are free to find and use another one, but I recommend it for this task.

You know those tweets? Who cares *how many* there are, let's talk about our ***feelings***. What you need to do now is run a sentiment analysis on your tweets. Categorize them by positive or negative emotions **and then create a heat map of how people feel according to their tweets**. 

Your task:

Create an interactive heatmap where the colors correspond not to number of tweets, but overall emotion of tweets from that area. In other words, interpolate according to the results of your sentiment analysis. While you can choose your own color ramp, I might recommend something like red for positive and blue for negative.

This must be based on **at least** 50,000 tweets.


## Bonus 3: Let's bring it all together, +4 pts

Ok, are you ready. This is going to be fun, but I'm going to throw a bunch of curveballs at you.

You're still creating a slippy map, but this time:
1. It must be based on at least 100,000 tweets.
2. The **color** must be set by the sentiment of the tweet (red for positive, blue for negative)
3. The **intensity** (brightness, alpha value, etc. - how precisely you execute this is up to you; although I might look at [VBA maps](http://andywoodruff.com/blog/value-by-alpha-maps/)) must be set by *the relationship of tweets to population*.

Let me explain that third part. For this map, you can use explicit areal units if you want to (rather than creating them in a surface, which is what a heat map does). You can take census blocks if you like, or create heagons, etc. However, your units cannot be larger than **counties** (or, if you are creating squares/hexes, 1,000 square miles).

Within those units, I want you to create a 'tweets per population' figure, as in how many geolocated tweets are there per person within that space.

So now you have the average sentiment of tweets emanating from an area **and** how many tweets (per population). 

Use the sentiments to set the color and the normalized number of tweets to set the intensity/vibrancy/brightness of said tweet.

## THIS IS HARD!
Seriously, I struggled to find the solution to all this. If you can't get it, **don't fret**. There's a reason this is the final ultra bonus; think of it like the last boss of the internet. _I know you can do it._
