# Workshop on Spatial Analysis of Twitter

This workshop demonstrates the process of acquiring Twitter data using the search API and conduct simple spatial analyses on the data.

This workshop requires Anaconda3 (64-bit Python 3.7) installed in your computer.

You can access to this website from https://bit.ly/2GQ13hl

## 1. Preparation

Install packages needed for this workshop.

In [None]:
!pip install tweepy

In [None]:
!pip install folium

Import packages needed for this tutorial

In [None]:
# Run the following lines if there is an error loading basemap
#import os
#os.environ['PROJ_LIB'] = '~your anaconda 3 path/Anaconda3/Library/share/'

import tweepy
import pandas as pd
#from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

Go to this website for generating an App and get its keys and token: https://developer.twitter.com/en/apps

In [None]:
# paste your key and secret here.
consumer_key= '6dHQvQGYiRHcPKHmgSrctlM3p'
consumer_secret= 'Bxcq08bx7eaRwjnmhqGUx59rKHGzlfCT08UIWvL2SBNEcuTDYI'
access_token= '877986470375473152-1o9xDdJI3JeFwPeucAVw6bZYUHqLNnK'
access_token_secret= 'htJ3hJy95146g7JeJCh4cfYBIYfH2lvfM4O4PcKBP3ILy'

In [None]:
# Set up for Twitter authentication
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

In [None]:
# Set up tweepy API and set rate limit to be true
api = tweepy.API(auth, wait_on_rate_limit=True)

---

## 2. Programmatic Manipulation of Twitter

Now, your working environment is ready for Twitter analysis.

Let's first try a few simple operations in Twitter in a programmatic way.

The full functionalities of Twitter API and Tweepy can be found in:

- [Twitter APIs](https://developer.twitter.com/en/docs.html)
- [Tweepy documentation](http://docs.tweepy.org/en/v3.5.0/)

First, let's post a message in Twitter.

**Note**: if you don't want to disturb with your followers with a meanless tweet, don't run the following block of code.

In [None]:
# Post a tweet from Python
test_tweet = api.update_status("DRILL: I'm creating a robot to tweet")

Delete the tweet you just posted.

In [None]:
api.destroy_status(test_tweet.id_str)

### Get the first 100 retweets of a tweet

https://twitter.com/geog_uhm

In [None]:
retweets_workshop = api.retweets(1122963268996354049,10)

Get the first retweet of the tweet

In [None]:
retweets_workshop[0]._json['user']['name']

Print all retweets of the tweet.

Note: Twitter API can only return the first 100 retweets.

In [None]:
[[tweet.user.screen_name, tweet.user.name, tweet.user.location] for tweet in retweets_workshop]

### Current trends in the world

Get the list of cities where trends are available

In [None]:
city_ls = api.trends_available()

Convert the list (in JSON format) into a dataframe (like a table).

In [None]:
df_city = pd.DataFrame(city_ls)

Print the list of cities where trends are available

In [None]:
df_city

In [None]:
len(df_city)

Get the trends in San Francisco

In [None]:
df_city[df_city['name']=='San Francisco']

Return the trends in San Francisco

In [None]:
# make San Francisco as an example
trend_sf = api.trends_place(2487956)

Print the trends in JSON format

In [None]:
# print trends in San Francisco
trend_sf

In [None]:
# print first 5 trends
trend_sf[0]['trends'][0:5]

Organize the San Francisco trends in a table (dataframe)

In [None]:
trend_ls = [[trend['name'], trend['url'], trend['tweet_volume']] for trend in trend_sf[0]['trends']]

df_trend = pd.DataFrame(trend_ls,columns=['name','url','tweet_volume'])

In [None]:
# Sort the trends by tweet volumn in a descending order
df_trend.sort_values("tweet_volume", inplace = True, ascending = False)

# Print the top 10 trends ranked by tweet volumne
df_trend.head(10)

---

## 3. Acquiring Tweets using the Search API

### 3.1 Search using keywords

Get the trend on the top

In [None]:
df_trend.name[0]

use the trend as the keyword for searching.

In [None]:
# Define the search term and the date_since date as variables
# only date (not specific time) will work in the parameter.
search_words = df_trend.name[0]
date_since = "2019-4-27"
# date_until = "2019-4-28"

search _n_ tweets using the keyword (the top trend). The search will return all tweets containing the keyword worldwide.

In [None]:
# set up tweepy cursor and search 5 tweets according preset parameters
tweets = tweepy.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since).items(5)
[tweet.text for tweet in tweets]

As lots of retweets are repeating the original tweets, we can set up a filter to eliminate the retweets and keep only the original tweets.

In [None]:
new_search = search_words + " -filter:retweets"
new_search

Now, you can see only original tweets are retrieved.

In [None]:
tweets = tweepy.Cursor(api.search,
                       q=new_search,
                       lang="en",
                       since=date_since).items(5)

[tweet.text for tweet in tweets]

Display usernames and user locations

In [None]:
users_locs = [[tweet.user.screen_name, tweet.user.location] for tweet in tweets]
users_locs

Organize the retrieved tweets in a table.

In [None]:
tweet_text = pd.DataFrame(data=users_locs, columns=['user', 'location'])
tweet_text

### 3.2 Search using keywords and locations

Query for popular trend keyword in San Francisco (200 miles range)

## <font color='red'><strong>Note: please wait for the instruction before running the following code. All people running together may lead to an IP block</strong></font>

In [None]:
new_search = "Messi -filter:retweets"
#new_search = " -filter:retweets"

# use cursor to send your request with parameters
tweets = tweepy.Cursor(api.search,
                   q=new_search,
                   #bounding_box = [-124.848974, 24.396308, -66.885444, 49.384358], # contiguous U.S.
                   geocode = "37.7749,-122.4194,200mi",
                   lang="en").items(100)

# restore the results as a list
search_result = [[tweet.user.screen_name, tweet.text, tweet.user.location,tweet.place] for tweet in tweets]

Convert the searched tweets into a dataframe

In [None]:
df_result = pd.DataFrame(data=search_result, 
                    columns=['user', "text","location","place"])

Preview the first 5 tweets

In [None]:
df_result.head()

### 3.3 check how many tweets are geotagged

In [None]:
print(len(df_result[df_result['place'].notna()]),"/",len(df_result))

#### Copy tweets with geotags to a new dataframe called "geotags"

In [None]:
geotags = df_result.loc[df_result['place'].notna()].copy()

#### get their place and view where first 5 tweets are from

In [None]:
geotags['place_name'] = geotags.place.apply(lambda s:s.name)

In [None]:
geotags.head()

#### Check place information and parse them into dataframe

In [None]:
geotags.place[min(geotags.index)]

In [None]:
geotags.place[min(geotags.index)].bounding_box

###### Check the bounding box information

In [None]:
geotags.place[min(geotags.index)].bounding_box.coordinates[0]

#### Generate a column called bounding_box to restore bounding box information

In [None]:
geotags['bounding_box'] = geotags.place.apply(lambda s:s.bounding_box.coordinates[0])

In [None]:
geotags.head()

#### Parse the latitude and longitude hidden in the bounding box, finally check the dataframe

In [None]:
geotags['point']  = geotags['bounding_box'].apply(lambda s: [(s[0][1]+s[2][1])/2,(s[0][0]+s[2][0])/2])

In [None]:
geotags['lat']  = geotags['bounding_box'].apply(lambda s: (s[0][1]+s[2][1])/2)

In [None]:
geotags['lon']  = geotags['bounding_box'].apply(lambda s: (s[0][0]+s[2][0])/2)

In [None]:
geotags.head()

---

## 4. Spatial visualization using folium package

Import the folium package to create an interactive map.

In [None]:
import folium

Create a basemap.

In [None]:
#oahu = folium.Map(location = [21.473,-157.9868],zoom_start = 10)
maptweet = folium.Map()

Add the tweets into the basemap

In [None]:
for i, row in geotags.iterrows():
    folium.Marker(row.point,popup = row.text).add_to(maptweet)

Zoom closer into the tweets

In [None]:
maptweet.fit_bounds([[min(geotags.lat),min(geotags.lon)],[max(geotags.lat),max(geotags.lon)]])

In [None]:
display(maptweet)