# Mapping Geotagged Tweets -- Teacher Version
#### This exercise is part of the *Teaching Privacy* curriculum, which you can find at https://teachingprivacy.org.

**Teacher Notes: Full background for the exercise can be found in <a href="https://teachingprivacy.org/module-1-youre-leaving-footprints#mapping" target="_blank">Teaching Privacy Module 1: You're Leaving Footprints</a>. Note that, although the location is slightly obscure, these solutions are public and your students may find them.**


## Part 1: Install the Tweepy Library


To access the Twitter API, you will use the Tweepy library. To install it, try running the following in your terminal:

**pip install tweepy**

If this does not work, check the readme file at https://github.com/tweepy/tweepy for the most up-to-date installation instructions. 

*For more on getting and using **pip**, check out <a href="https://pip.pypa.io/en/stable/" target="_blank">the documentation</a> or <a href="https://www.w3schools.com/python/python_pip.asp" target="_blank">the w3schools tutorial</a>.*

Run the cell below to import the module.

In [2]:
import tweepy
from tweepy import TweepError
import json

## Part 2: Create Twitter App and Get Access Tokens

### Create a Twitter App

1. Go to https://apps.twitter.com and click 'Sign In'. If you don't have a Twitter account or don't want to use your current Twitter account, you will need to create one.
2. Click on 'Create New App'.
3. Give your app a Name, Description, and Website. For the website, you can use a placeholder (such as https://teachingprivacy.org).

### Obtain Twitter API Keys 

When using APIs that require tokens and keys for authentication, it is common practice to have your keys in a separate JSON file, to protect yourself and the app's users. Your key file should not be posted in public repositories, and you should *never* share your keys. 


Create a new text file named **twitter_keys.json** with the following format:

{ <br>
   "api_key":"", <br>
   "api_secret":  "", <br>
   "access_token": "", <br>
   "access_token_secret": "" <br>
}
<br>
1. Go into the app you created in the previous step and go to the 'Keys and Access Tokens' tab. 
2. Copy and paste the tokens and keys for the corresponding variables into your JSON file. <br>
    a. You will need to click 'Create my access token' when you first create your app. <br>
    b. Make sure you copy and paste the tokens *inside* the quotation marks.
3. Run the cell below to assign your keys to the **keys** variable.

In [3]:
keys_file = 'twitter_keys.json'
with open(keys_file) as file:
    keys = json.load(file)

Run the cell below to check whether you've correctly set up the keys.

In [4]:
try:
    auth = tweepy.OAuthHandler(keys["api_key"], keys["api_secret"])
    auth.set_access_token(keys["access_token"], keys["access_token_secret"])
    api = tweepy.API(auth)
    print("You have correctly set up your API keys. Your username is:", api.auth.get_username())
except TweepError as e:
    print("Tweepy found an error. Revisit your twitter_keys.json file and make sure you have the correct keys.")

You have correctly set up your API keys. Your username is: ImKarloss


## Part 3: Use the Twitter API with Tweepy to Gather Tweet Metadata

Now that you've been authenticated to use the Twitter API, it's time to get acquainted with it.

With help from <a href="http://tweepy.readthedocs.io/en/v3.5.0/" target="_blank">the Tweepy documentation</a>, find the 200 most-recent tweets for, say, Twitter user @stevewoz, in the cell below.

**Teacher Note: The outputs below used @jack (Jack Dorsey) in summer 2018. We instead suggest students look at Steve Wozniak first because he more frequently allows geotags from more interesting locations.**

*Hint: Look for a method to return the user timeline under 'API Reference'. http://docs.tweepy.org/en/v3.5.0/api.html#timeline-methods*

In [5]:
tweets = api.user_timeline(screen_name="jack", count=200)

In the cell below, identify the data type of the previous tweet.

In [6]:
type(tweets)

tweepy.models.ResultSet

The cell above should say we have a tweepy.models.ResultSet, which is a list of Status objects, or tweets. Confirm this in the cell below by indexing the first tweet and checking its type.

In [35]:
first_tweet = tweets[0]
type(first_tweet)

tweepy.models.Status

RESTful APIs typically send data in JSON format, the same format as our keys file. Using the '._json' attribute, convert the first tweet into a dictionary in the cell below. 

- *For some background, try this Stackoverflow post with more information about the json attribute: https://stackoverflow.com/questions/27900451/convert-tweepy-status-object-into-json * <br>
- *If you have not used dictionaries before, you can check out these videos: https://www.youtube.com/watch?v=daefaLgNkw0 or https://www.youtube.com/watch?v=XCcpzWs-CI4 *

In [36]:
first_tweet_dict = first_tweet._json
first_tweet_dict

{'contributors': None,
 'coordinates': None,
 'created_at': 'Thu Aug 16 00:02:50 +0000 2018',
 'entities': {'hashtags': [],
  'symbols': [],
  'urls': [],
  'user_mentions': [{'id': 7445912,
    'id_str': '7445912',
    'indices': [0, 7],
    'name': 'Pablo Defendini',
    'screen_name': 'pablod'}]},
 'favorite_count': 154,
 'favorited': False,
 'geo': None,
 'id': 1029881129266900992,
 'id_str': '1029881129266900992',
 'in_reply_to_screen_name': 'pablod',
 'in_reply_to_status_id': 1029880867894775810,
 'in_reply_to_status_id_str': '1029880867894775810',
 'in_reply_to_user_id': 7445912,
 'in_reply_to_user_id_str': '7445912',
 'is_quote_status': False,
 'lang': 'en',
 'place': None,
 'retweet_count': 16,
 'retweeted': False,
 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
 'text': '@pablod We take responsibility, and enforce accordingly',
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Tue Mar 21 20:50:14 +0

Looking at the cell above, you should see that we got a nested dictionary. That's the common JSON format, but this in itself is not a JSON file. 

Explore the result to identify where the tweet's location is listed and under which keys. Use the cell below to print the first tweet's location.

*Hint: Not all tweets have locations embedded. Find the first tweet's 'place' tag.*

In [37]:
first_tweet_location = first_tweet_dict['place']
print('This tweet was tweeted from:', first_tweet_location)

This tweet was tweeted from: None


## Part 4: Gather Tweet Locations

In the cell below, find the locations for all the tweets we obtained. 

*Hint: Not all tweets are geotagged, so figure out how to only append actual tweet locations to the list, and ignore those with no location.*

In [13]:
locations = []
tweets_with_location = []
for tweet in tweets:
    current_tweet = tweet._json['place']
    if current_tweet is not None:
        tweets_with_location.append(current_tweet)
        locations.append(current_tweet['full_name'])
locations

['Missouri, USA',
 'Illinois, USA',
 'Ohio, USA',
 'West Virginia, USA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'San Francisco, CA',
 'S

## Part 5: Install Plotting Libraries 

We will use the **geoplotlib** library to visualize tweet locations. Geoplotlib requires two other libraries, **NumPy** and **pyglet**, so you will need to install those too, using the following 3 *separate* commands in your terminal:

**pip install numpy <br>
pip install pyglet <br>
pip install geoplotlib <br>**

When you're done, run the cell below to import Tweepy and all other necessary Python modules.

In [14]:
import geoplotlib

## Part 6: Visualize Tweet Locations

Now that we've stored the locations of the user's tweets, we can create a visualization.

For each tweet with a location, Twitter stores four pairs of latitude and longitude coordinates, to create a bounding box. For each tweet, store the first pair from each bounding box in an array named 'coords'.


In [15]:
coords = []
for tweet in tweets_with_location:
    coords.append(tweet['bounding_box']['coordinates'][0][0])

Geoplotlib has a utils.DataAccessObject that takes in a dictionary or pandas dataframe to create a DataAccessObject. This is the data type the library uses to create its maps.

Create a dictionary with three keys: latitude, longitude, and the name of the city, in that order. For each key, the value should be a list with the corresponding values. You should already have the necessary values in previously assigned arrays.

Next, use the utils.DataAccessObject method to create the DataAccessObject and use the .dot method to create a dot density map.

*Hint: After creating the dot density map, you'll need to call geoplotlib.show() to open a window with the map.*

In [None]:
lat, lon, name = [], [], locations
lat = [coordinate[1] for coordinate in coords]
lon = [coordinate[0] for coordinate in coords]
loc = {'lat': lat, 'lon': lon, 'name': name}
geo_loc = geoplotlib.utils.DataAccessObject(loc)
geoplotlib.dot(geo_loc)
geoplotlib.show()

## Part 7: Optional Extension

This exercise will, of course, get different results depending on the Twitter user you look at. Try your own Twitter account or the account of your favorite celebrity! Some users will have no geotagged tweets, while others may only tweet from a single city.

## Part 8: Brainstorm and Reflect

*If you don't use Twitter, answer the questions below as though you did.*

- How do you decide whether to include the location for a Tweet?<br>
- Who do you picture as the audience when you're tweeting? Why would you want (or at least allow) those people to know your location? *Do* you actually want them to?<br>
- What could someone do with location data from, say, 100 of your tweets, that they couldn't do with just one?<br>
  - If someone never put their address on the Internet, how might you go about figuring out where they live?<br>
  - How could someone use your tweets in combination with other online data to figure something out?<br>
- What could someone do with location data from many people's tweets at once?

** Teacher Notes: Student answers should reflect understanding that: **

- Whoever you're posting the location data for, there are many other entities that can collect it. (Including businesses/advertisers, people you don't know, criminals...)<br>
- What you're putting out there is more than the sum of its parts. The location of one tweet just tells you where that person was when they tweeted it, but having many geolocations (with time stamps) allows you to make inferences about the person's habits and where they frequently go.<br>
  - For example, frequent late-night postings from the same location probably indicate where the person lives. Frequent postings from a location that matches the address of a McSandwichQueen might indicate a love for fast food.<br>
- Data from multiple users can show additional patterns.<br>
  - Examples might include knowing who lives together or hangs out together; in combination with demographic information, knowing what types of people tend to visit a particular business; figuring out where a newsworthy event is taking place.
