# Workshop on Spatial Analysis of Twitter

This workshop demonstrates the process of acquiring Twitter data using the search API and conduct simple spatial analyses on the data.

This workshop requires Anaconda3 (64-bit Python 3.7) installed in your computer.

## 1. Preparation

Install packages needed for this workshop.

**Note: GEOG-389 student please ignore the installations.**

In [None]:
# install packages for getting Twitter data and mapping
!pip install tweepy

In [None]:
!conda install basemap

In [None]:
!pip install folium

Import packages needed for this tutorial

In [None]:
# Run the following lines if there is an error loading basemap
#import os
#os.environ['PROJ_LIB'] = 'C:/ProgramData/Anaconda3/Library/share/'


import tweepy
import pandas as pd
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt



Go to this website for generating an App and get its keys and token: https://developer.twitter.com/en/apps

In [None]:
# paste your key and secret here.
consumer_key= '9v8Gwz0NPP7ZaTCE4VOU1lLWb'
consumer_secret= 'c6eNKnNTPECNO6pQBCem9lXT9VsvwHeuB3LYfkg7ZDaOU7Vpdc'
access_token= '138517960-DZ3L5JziiblVcQ3NUSKnQBIsZgllDVaktI0B5uYK'
access_token_secret= '4lKj5AgSN2VPtjEDrJI1ngMfHGorJgVEKJGNf6HpA5pAg'

In [None]:
# Set up for Twitter authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

In [None]:
# Set up tweepy API and set rate limit to be true
api = tweepy.API(auth, wait_on_rate_limit=True)

**Note**: if you don't want to disturb with your followers with a meanless tweet, don't run the following block of code.

---

## 2. Programmatic Manipulation of Twitter

Now, your working environment is ready for Twitter analysis.

Let's first try a few simple operations in Twitter in a programmatic way.

The full functionalities of Twitter API and Tweepy can be found in:

- [Twitter APIs](https://developer.twitter.com/en/docs.html)
- [Tweepy documentation](http://docs.tweepy.org/en/v3.5.0/)

In [None]:
# Post a tweet from Python
test_tweet = api.update_status("DRILL: I'm creating a robot to tweet")

Delete the tweet you just posted.

In [None]:
api.destroy_status(test_tweet.id_str)

### Get the first 100 retweets of a tweet

> **How do you know the ID of a tweet?**

In [None]:
retweets_workshop = api.retweets(1122963268996354049,10)

Get the first retweet of the tweet

In [None]:
retweets_workshop[0]._json['user']['name']

Print all retweet of the tweet.

Note: twitter API can only return the first 100 retweets.

In [None]:
[[tweet.user.screen_name, tweet.user.name, tweet.user.location] for tweet in retweets_workshop]

### Current trend in the world

Get the list of cities where trends are available

In [None]:
city_ls = api.trends_available()

In [None]:
df_city = pd.DataFrame(city_ls)

In [None]:
df_city.head()

Get the information of San Francisco

In [None]:
df_city[df_city['name']=='San Francisco']

Return the trends in San Francisco

In [None]:
# make San Francisco as an example
trend_sf = api.trends_place(2487956)

Print the trends in JSON format

In [None]:
# print trends in San Francisco
trend_sf

Organize the San Francisco trends in a table (dataframe)

In [None]:
trend_ls = [[trend['name'], trend['url'], trend['tweet_volume']] for trend in trend_sf[0]['trends']]

df_trend = pd.DataFrame(trend_ls,columns=['name','url','tweet_volume'])

In [None]:
# Sort the trends by tweet volumn in a descending order
df_trend.sort_values("tweet_volume", inplace = True, ascending = False)

# Print the top 10 trends ranked by tweet volumne
df_trend.head(10)

---

## 3. Acquiring Tweets using the Search API

> **Stoped working here. I suggest you to enlarge the bounding box to Contiguous U.S. to gather more tweets**
> **Please write some descriptions above the code so people know what they are doing**

### 3.1 Search using keywords

In [None]:
# Define the search term and the date_since date as variables
# only date (not specific time) will work in the parameter.
search_words = "#breakfast"
date_since = "2019-4-20"
# date_until = "2019-4-28"

In [None]:
# set up tweepy cursor and search 5 tweets according preset parameters
tweets = tweepy.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since).items(5)
[tweet.text for tweet in tweets]

In [None]:
for tweet in tweets:
    print(tweet.text)

In [None]:
new_search = search_words + " -filter:retweets"
new_search

In [None]:
tweets = tweepy.Cursor(api.search,
                       q=new_search,
                       lang="en",
                       since=date_since).items(5)

[tweet.text for tweet in tweets]

In [None]:
tweets = tweepy.Cursor(api.search,
                       q=new_search, # q means search query
                       lang="en",
                       since=date_since).items(5)

users_locs = [[tweet.user.screen_name, tweet.user.location] for tweet in tweets]
users_locs

In [None]:
tweet_text = pd.DataFrame(data=users_locs, columns=['user', 'location'])
tweet_text

### 3.2 Search using keywords and locations

Query for keyword "rain" in Oahu, Hawaii

In [None]:
new_search = "GIS"
#new_search = "rain -filter: retweet"

tweets = tweepy.Cursor(api.search,
                   q=new_search,
                   #bounding_box = [-124.848974, 24.396308, -66.885444, 49.384358], # contiguous U.S.
                   geocode = "21.473,-157.9868,50mi",
                   lang="en").items(400)

users_locs = [[tweet.user.screen_name, tweet.text, tweet.user.location,tweet.place] for tweet in tweets]

##### Make searched data into a dataframe

In [None]:
locationinfo = pd.DataFrame(data=users_locs, 
                    columns=['user', "text","location","place"])

In [None]:
locationinfo.head()

In [None]:
print(len(locationinfo[locationinfo['place'].notna()]),"/",len(locationinfo))

In [None]:
tweet_loc = locationinfo.loc[locationinfo['place'].notna()].copy()

In [None]:
tweet_loc['place_name'] = tweet_loc.place.apply(lambda s:s.name)

In [None]:
tweet_loc.place[1].bounding_box.coordinates[0]

In [None]:
tweet_loc['bounding_box'] = tweet_loc.place.apply(lambda s:s.bounding_box.coordinates[0])

In [None]:
tweet_loc.head()

In [None]:
tweet_loc['point']  = tweet_loc['bounding_box'].apply(lambda s: [(s[0][1]+s[2][1])/2,(s[0][0]+s[2][0])/2])

In [None]:
tweet_loc['lat']  = tweet_loc['bounding_box'].apply(lambda s: (s[0][1]+s[2][1])/2)

In [None]:
tweet_loc['lon']  = tweet_loc['bounding_box'].apply(lambda s: (s[0][0]+s[2][0])/2)

In [None]:
tweet_loc.head()

### Make interactive map using folium package

In [None]:
import folium

In [None]:
oahu = folium.Map(location = [21.473,-157.9868],zoom_start = 10)

In [None]:
for i, row in tweet_loc.iterrows():
    folium.Marker(row.point,popup = row.text).add_to(oahu)

In [None]:
display(oahu)

### Making traditional map using basemap package

In [None]:
f, ax1 = plt.subplots(1, figsize=(15, 10))

map = Basemap(llcrnrlon=-158.36,llcrnrlat=21.21,urcrnrlon=-157.59,urcrnrlat=21.8, epsg=4269, ax=ax1)
#https://www.bdmweather.com/2018/04/python-m-arcgisimage-basemap-options/

map.arcgisimage(service='ESRI_StreetMap_World_2D', xpixels = 2000, verbose= True)

#ct.plot(color='white', edgecolor='black', linewidth = .1,ax=ax1)
ax1.plot(tweet_loc['lon'],tweet_loc['lat'],'b*',markersize=5)


plt.show()