In [1]:
def HelloClass():
    return "Hello Class!"

print(HelloClass())

Hello Class!


# Lab 5 

I know it seems like just yesterday you were struggling with "what is a notebook? what is github? how do I `import arcgis`??" But, that's the nature of the quarter system and here we are, ready to start putting things all together.

If you remember the core goals of this class, they're to help you programmatically:
1. Get some data.
2. Format, process, analyze, etc. said data.
3. Visualize it in some way.

We've done so primarily through the ArcGIS API for Python **and** more recently, GeoPandas and Folium. In this lab, we're going to **acquire** data from Twitter (a robust, well developed API that spits out a lot of data), process it into a format that's easy to map (in this case, I recommend a csv), and then add it to a series of interactive maps. 

As this is your second to last lab, I'm going to leave open precisely how you do a lot of this; however, I'm also going to be working more directly with each group throughout the process. What I mean here is that I recommend you use Folium for your visualization, but you can use the ArcGIS API if you prefer.

**In fact, you can and should mix and match whatever _tools_ work best for your group.** After this class, you'll encounter many problems - use the core ideas and tools you've encountered in this class to cobble together clever solutions! Iteration! Flow control! Using libraries and reading documentation are the keys to solving a vast, vast variety of geospatial problems.


### Building our environment

As always, let's get out virtual environment set up before we start. This time, I'm going to be adding in installing something from a different channel so that we can install everything all at once. I do so with the `-c` part of the lines below.

`conda create -n lab5 python=3.6`

After that's done, activate your environment and then install the necessary packages. 

`conda install -c conda-forge geopandas jupyter folium fiona tweepy geopy`

I'm assuming you *aren't* going to use the ArcGIS API for Python. If you *are*, you can install it in your created environment with:

`conda install -c esri arcgis` *(See how you switch 'channels' to grab different packages?)*

You should recognize most of the packages above as we've used them before; I'm adding in tweepy (which is a library to interface Twitter's API) and geopy (which is a handy library for geocoding).

### Let's look at some tweets!

Now, *in the future*, if you want to do this you'll have to create your own twitter account and apply to become a developer. Their Standard APIs are still free to use and test, but they make you apply and try to sell you their more robust ones. You can [read about that all here](https://developer.twitter.com/en/apply-for-access).

But, for this lab, I'm going to let you use some of my own API keys. To access the Twitter API, you need a set of **keys and access tokens**. There are steps at the [developers page](https://developer.twitter.com) that walk you through this, but *for now* know that you'll need a: Consumer Key, a Consumer Secret Key, an Access Token, and an Access Token Secret. 

We talked a bit about this in our API lecture - these are basically like log-ins and passwords that let the API owner keep track of who is accessing their information, how much of it, and when. 

Below, I've supplied each group with a set of these tokens. Please note **storing keys like this in a github account is TERRIBLE PRACTICE**. You always want to remove private information before you upload, but these are basic level open information and it's just easiest for this class right now.

Let's check if everything is working. In the following cell, assign each key to a variable, and then pass those variables into a Tweepy API object.

In [1]:
import tweepy

ConsumerKey = 'yj50FC54j6ONZqxS5IzGnLdbF'
ConsumerSecret = '1CDAlx6bzMO9XDw31NOLFk11vTFWaG0z1bG2i33k6Co3n2XBzu'
AccessKey = '19347325-XIq3vbfAE8ZARoBEmXtmyHINnEVoFDu2nO90WgFF1'
AccessSecret = 'GjDd67wr650A1GrPk5uZYtyXUlSxqjvbLx86wfqYQgk34'

auth = tweepy.OAuthHandler(ConsumerKey, ConsumerSecret)
auth.set_access_token(AccessKey, AccessSecret)

api = tweepy.API(auth)


print(api.user_timeline(id='marxbot1', count=1)) #This simply pulls the last tweet from an account

[Status(_api=<tweepy.api.API object at 0x0000024FA15BC828>, _json={'created_at': 'Sat May 05 15:33:39 +0000 2018', 'id': 992789430262886400, 'id_str': '992789430262886400', 'text': '@alogicalfallacy A change in commodity A may therefore be imagined that all Catholics can be popes together.', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'alogicalfallacy', 'name': 'Jim Thatcher', 'id': 19347325, 'id_str': '19347325', 'indices': [0, 16]}], 'urls': []}, 'source': '<a href="http://twitter.com/marxbot1" rel="nofollow">Marxbot1</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': 19347325, 'in_reply_to_user_id_str': '19347325', 'in_reply_to_screen_name': 'alogicalfallacy', 'user': {'id': 830247048414842880, 'id_str': '830247048414842880', 'name': 'Marxbot1', 'screen_name': 'marxbot1', 'location': '', 'description': "I'm a markov chain based bot trained on Marx's works. I'll tweet on my own or respon

### If that ran successfully, you should have a giant mess of text.

That's the data that accompanies a single tweet. Interesting, huh? Check out the [reference docs](http://tweepy.readthedocs.io/en/v3.5.0/api.html#) for tweepy and spend some time experimenting if you want.

Here, I'll pull the same tweet as above, but this time I'm **only** going to print out the text property of the Status object and then check for some location information.



In [2]:
#Note: This calls the api object that we created in the previous cell.
#THAT MEANS THE PREVIOUS CELL HAS TO RUN BEFORE THIS ONE.

#Also note: First I specify which list object I want, then I pull a property from it.

print(api.user_timeline(id='marxbot1', count=1)[0].text)

#Now, let's see if there's some lat and long associated with the tweet
print(api.user_timeline(id='marxbot1', count=1)[0].geo)
print(api.user_timeline(id='marxbot1', count=1)[0].coordinates)

@alogicalfallacy A change in commodity A may therefore be imagined that all Catholics can be popes together.
None
None


### (Un)fortunately, most tweets don't actually have location information associated with them. 

There's been *a lot* written about this and the numbers vary from under 5% to 20% or so of tweets. Additionally, it's been argued that upwards of 60% of tweets *can* have some location inferred due to language use, topic, etc.

That's all interesting (and please do email me for citations if you so desire); however, it's also kind of besides the point here. We're interested in learning how to interact with APIs and process data, we can argue about the ephemerality of said data another day.

Let's query some topic of interest and see if we can find some spatial data.


In [3]:
#We're going to set up a couple of tricks here
#We're going to recreate our api with a few new settings.

import json
# I'm going to use python's built in json library to parse the text
#This will make it easier to call - you'll see it below


CK = 'yj50FC54j6ONZqxS5IzGnLdbF'
CS = '1CDAlx6bzMO9XDw31NOLFk11vTFWaG0z1bG2i33k6Co3n2XBzu'
AK = '19347325-XIq3vbfAE8ZARoBEmXtmyHINnEVoFDu2nO90WgFF1'
AS = 'GjDd67wr650A1GrPk5uZYtyXUlSxqjvbLx86wfqYQgk34'

auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AK, AS)

#By setting these values to true, our code will automatically wait as it hits its limits
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

#Now I'm going to set up a custom stream listener
# I'll inherit all of the properties of the tweepy StreamListener
# But, I'm going to play with one particular method...
class CustomStreamListener(tweepy.StreamListener):
    def on_data(self, data):
        #Here is where I use the json library to load the twitter data
        Data = json.loads(data)
        Author = Data['user']['screen_name']
        text = Data['text']
        print(Author)
        print(text)
        print(Data['place']['full_name'])
        print()
    
    
while True:
    try:
        stream = tweepy.Stream(auth=api.auth, listener=CustomStreamListener())
        #This next line puts a bounding box roughly around Seattle/Tacoma.
        #You start in the southwest and then go to the northeast
        #The format is longitude, then latitude... 
        stream.filter(locations=[-122.626, 47.113, -121.754,47.87])
    except Exception as e:
        print(e)
        print('Trying to continue')
        continue

#Note: As written, this will run indefinitely. Use the stop button stop it.
#How might you write in a loop to only get a certain number of tweets?

florencevincent
She did it, so can you!
Washington, USA

LLHitz
@iandenning85 I think ur cool curve has done this if it makes u feel any better https://t.co/4vJ2IRdR3r
Seattle, WA

kevinokeefe
Maybe @Retroist, @colinokeefe, @j_sulz and @EJWalters are right about people paying for niche focused, well done an… https://t.co/0yHp45szyr
Seattle, WA

lmxtledesma12
Kinda upset that I’m missing the rodeo this year :/
Bremerton, WA

JSKChavez_
“Silly muthafucka who raised you
A ni**a with a pussy how disgraceful
I have my hittas come and duck tape you
And y… https://t.co/bHFq8gD37d
University Place, WA

preinsko
#Maddow That BS moving trying to get USPS to raise rates on Amazon in my opinion was an impeachable offense! I hate… https://t.co/ROClJ5StoR
Sammamish, WA

epalicki
@AllNewDom @tresdcomics To be fair, I’m sure the REAL real reason is that the studio overspent on marketing the movie to academy voters.
Seattle, WA



KeyboardInterrupt: 

### If that all worked, you now have a listener that will pull tweets from a bounded area you define.

**Cool**. Well, I think so. But, even though we're now pulling tweets *from* a location, you aren't saving their spatial data... *quite yet*.

That's where the lab actually begins.

### Question 1: Where the tweets at?

Using the example code above **and** the hints below, start pulling the spatial information from the tweets in question. Create a 'file' that contains a tweet's author (account name), its text, and the location from which it came (in latitude and longitude). This 'file' can be in a number of formats (geojson, txt, csv, etc.). *I stronly recommend you use csv*.

Bear in mind, there are *a few* ways you can pull location information. You can find the [twitter api documentation here](https://developer.twitter.com/en/docs/tutorials/filtering-tweets-by-location).

Some tweets will come from a 'location' that is a named place. In order to handle those, you will need to geocode the information. The function below takes a string and returns latitude and longitude. Start there.

I'm going to import a few libraries. That's because I think they'll be highly useful for you. As is often the case with python, there are many ways to go about this - I am simply suggesting the way that I have found most easily comprehensible.

In [5]:
from geopy import geocoders
import tweepy
import csv
import json

#We're going to set up a couple of tricks here
#We're going to recreate our api with a few new settings.

import json
# I'm going to use python's built in json library to parse the text
#This will make it easier to call - you'll see it below


CK = 'yj50FC54j6ONZqxS5IzGnLdbF'
CS = '1CDAlx6bzMO9XDw31NOLFk11vTFWaG0z1bG2i33k6Co3n2XBzu'
AK = '19347325-XIq3vbfAE8ZARoBEmXtmyHINnEVoFDu2nO90WgFF1'
AS = 'GjDd67wr650A1GrPk5uZYtyXUlSxqjvbLx86wfqYQgk34'

auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AK, AS)

#By setting these values to true, our code will automatically wait as it hits its limits
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

#Now I'm going to set up a custom stream listener
# I'll inherit all of the properties of the tweepy StreamListener
# But, I'm going to play with one particular method...
class CustomStreamListener(tweepy.StreamListener):
    def on_data(self, data):
        #Here is where I use the json library to load the twitter data
        Data = json.loads(data)
        Author = Data['user']['screen_name']
        text = Data['text']
        print(Author)
        print(text)
        print(Data['place']['full_name'])
        print()
    
    
while True:
    try:
        stream = tweepy.Stream(auth=api.auth, listener=CustomStreamListener())
        #This next line puts a bounding box roughly around Seattle/Tacoma.
        #You start in the southwest and then go to the northeast
        #The format is longitude, then latitude... 
        stream.filter(locations=[-122.626, 47.113, -121.754,47.87])
    except Exception as e:
        print(e)
        print('Trying to continue')
        continue

#Note: As written, this will run indefinitely. Use the stop button stop it.
#How might you write in a loop to only get a certain number of tweets?

#This function is a rudimentary geocoder - YOU CAN IMPROVE IT
def geo(location):
    g = geocoders.Nominatim() #I use Nominatim, there are many others
    loc = g.geocode(location)
    return loc.latitude, loc.longitude

#This function is a rudimentary CSV creator - YOU CAN IMPROVE IT
def WriteCSV(user, text, lat, long):
    f = open('tweets.csv', 'a')
    write = csv.writer(f)
    write.writerow([user, text, lat, long])
    f.close()


cuppykait
My apartment complex sent out an email to tell everyone to be safe and just to stay inside if you can 😅 at least I… https://t.co/Yp1nMKmax8
Edmonds, WA

RealFoxD
@TEastNBA I am three weeks older than Tom Brady and still cannot fathom how a guy who's my age can even stand uprig… https://t.co/f8gpAv5KNd
Kent, WA



KeyboardInterrupt: 

### Question 2: Tweets on a map.

Now that you have a 'file' (or a script that will extract author, text, and location from tweets), let's make a map.

Using Folium, ArcGIS API for Python, GeoPandas, or Arcpy, create a map from your file. Make sure you accumulate enough tweets (let's say 100 or so) before you create the map.

Next week, we'll get into how to update the map on the fly and make it more interactive; for now, just make sure you can query some tweets, parse the data, put that data into a GIS of some form.

A brief note on data binding: Although you are (most likely) using the Streaming API, the data in your maps is 'static.' What I mean is that once you create the map, you don't add data to it. There are ways to add data on the fly. In Esri environments, you do so with the [GeoEvent Server](http://www.esri.com/arcgis/products/geoevent-server). In other environments, you can use javascript frameworks such as [angular.js](https://angularjs.org/). Feel free to *experiment* with either. I might look at these guides [here](https://codehandbook.org/creating-a-web-app-using-angularjs-python-mongodb/) and [here](https://medium.com/@peregringaret/a-different-stack-angular-flask-mongodb-780b44e10afd) which use [flask](http://flask.pocoo.org/) and [mongodb](https://www.mongodb.com/) to create a web 'stack.'

Is all of that too much? No worries! That's why it's optional. Some of you are going to dive into python and swim within its majesty; others of you are going to learn the basics in order to create and deploy specific solutions or to test particular questions. **Either or both are fine!** The point of this class has never been to make you a programmer, but rather to teach you to *think computationally*, so that you can pursue programming and automation *as far as you need to*. You **can** go out and learn Flask on your own if you need to now, and that's awesome!

Ok, with that out of the way, let's get onto your next question...

### Question 3 - Cluster Map


Write a script that prompts the user for a keyword. Then, it monitors that keyword until 100 geocodable tweets have been gathered. Finally, it creates and saves an html map (the name is whatever keyword was chosen - \[keyword].html where the tweets are clustered at small scales and, at large scales, are points that when clicked upon Give the username and text of the tweet.

This is *similar* to the solution to **Question 2**; however, it adds a number of additional tasks.

In [None]:
#I will give you one hint here; 
# this is not the only thing you need to import, but you'll want this:
from folium.plugins import MarkerCluster


### Question 4 - Heatmap

Acknowledging that heatmaps, especially non-normalized ones, are [bogus](https://xkcd.com/1138/), they can still be cool to make. So, take your script above (asks for a keyword, monitors for that keyword until 100 entries have been geocoded, makes a map) and this time instead of clustering points, create a heatmap and save it as \[keyword].html.


In [None]:
#I'll give you a couple of hints here:
from folium.plugins import HeatMap
import pandas

#Your tweets may not be a csv! Your file may not be called tweets!
#But, this will handle some (though not all) encoding errors
tweets = 'tweets.csv'
data = pandas.read_csv(tweets, names = ['users', 'tweet', 'latitude', 'longitude'], encoding='latin=1')

#Now, you need to loop through and add these rows to a heat map!



### An impressive, difficult, and fascinating bonus question, + 4 pts, partial credit possible

Some of you used [NLTK](http://www.nltk.org/) to analyze our old friend H.P. Lovecraft. If you didn't, that's fine. NLTK is, as the name suggests, a Natural Language processing ToolKit. It's not the only one, and you are free to find and use another one, but I recommend it for this task.

You know those tweets? Who cares *how many* there are, let's talk about our ***feelings***. What you need to do now is run a sentiment analysis on your tweets. Categorize them by positive or negative emotions **and then create a heat map of how people feel according to their tweets**. 

Your task:

Create an interactive heatmap where the colors correspond not to number of tweets, but overall emotion of tweets from that area. In other words, interpolate according to the results of your sentiment analysis. While you can choose your own color ramp, I might recommend something like red for positive and blue for negative.

This must be based on **at least** 1,000 tweets.
