## Spatial Data Science (GIS6307/GEO4930)


<br>
Instructor: Yi Qiang (qiangy@usf.edu)<br>
Teaching Assistant: Jinwen Xu (jinwenxu@usf.edu)

---

# Workshop on Spatial Analysis of Twitter

This workshop will help you to get started with the acquisition, processing, and analysis of Twitter data using data science techniques. Specifically, you will learn:

- Streaming real-time tweets using Twitter Developer APIs.
- Processing the raw tweets into an analyzable form.
- Basic mapping, spatial analysis and natural language processing for Twitter data.

### Prerequisite
- Install Anaconda in your computer.
- Activation of Twitter Developer Account and approved **Elevated Access** before the workshop.
- Basic programming skills are recommended, but not required.




## 1. Install Python Libraries

We will need two new libraries [tweepy](https://www.tweepy.org/) and [folium](https://plotly.com/python/) for this lab. Please do the following steps to install these two libarries.

1. Please open Anaconda Prompt, and use the command `conda activate geo` to activate the "geo" environment that you created in the previous lab. 

2. Install tweepy using the following command:

    `conda install -c conda-forge tweepy`
    
    If the above command doesn't work, please try the following one:
    
    `conda install -c conda-forge folium`
    
    Click 'y' and then 'Enter' when asked to proceed.
    
3. Install contextily using the following command:

    `conda install -c plotly plotly`
    
4. Run the following code to import the installed libraries and other needed libraries. If the code runs through, the libraries are installed successfully.

In [3]:
# Run the following lines if there is an error loading basemap
# import os
# os.environ['PROJ_LIB'] = '~your anaconda 3 path/Anaconda3/Library/share/'

import tweepy
import folium
import pandas as pd

#from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt

## 2. Set-Up the Connection to Twitter APIs

Go to Twitter Developer Portal (https://developer.twitter.com/en/apps). Click the App you created in account activation.

If you haven't created an App when you created the account, you can create one in the project.

Click `Sign up`.

![](../image/twitter/signup3.jpg)

Turn on `OAuth 1.0a` and keep `OAuth 2.0` off. 

![](../image/twitter/OAuth.jpg)

Select "Read and write and Direct message". You can use "http://127.0.0.1:8080" as Callback URL. Add any website as the website URL (e.g. your personal website or https://google.com).

![](../image/twitter/setting.jpg)

Copy the API keys you have saved when you activated your Developer account, and paste them to replace "......" below. If you can't find them, you can **regenerate** the keys and tokens in your Developer Portal. 

![](../image/twitter/keys2.jpg)

Generate your Access Token and Secret and paste them below.

In [13]:
# paste your API keys and tokens to replace ......
# API_key = 'aOLqeAcCKvMRhKLZBIKADW2rW'
# API_key_secret = 'b1lOWTIy3Vfb5C2OS4S9wXP3BYzugrPs50i4oIqWmn1tcdORaT'
# access_token = '1508096832806731780-wIo5pKdrys6o1qoX7LETloH1tRirkS'
# access_token_secret = 'U4g0XQiGxGXnwN2Y5T2ePqKdxhS91jcHy2Zc5jqBEJ9V8'

API_key = 'gmDPt4mHl24zKIWeabwu1ebDs'
API_key_secret = '2Wx9lIi37wAH8oCHyS6Lw5dzboif7B8xxjMP3lnaGsKCIbtoEC'
access_token = '1343615702809374721-nBEV6K4qlSvkUFWpxhjn2Q3Gf5WpaL'
access_token_secret = 'z2dTOARM5FpdIMqOP9IzRwM9bzCvFNh9UhgvOD8WkquTc'

Set up for Twitter authentication.

In [14]:
auth = tweepy.OAuthHandler(API_key, API_key_secret)
auth.set_access_token(access_token, access_token_secret)

Set up tweepy API and set rate limit to be true.

In [15]:
api = tweepy.API(auth, wait_on_rate_limit=True)

---

## 3. Simple Operations with Twitter APIs

Now, your working environment is set up for Twitter analysis. Let's first try a few simple operations to acquire Twitter data in a programmatic way.

The full functionalities of Twitter API and Tweepy can be found in:

- [Twitter APIs](https://developer.twitter.com/en/docs.html)
- [Tweepy documentation](http://docs.tweepy.org/en/v4.8.0/)

### 3.1 Posting/Deleting a Tweet

First, let's post a message in your Twitter account.

**Note**: if you don't want to disturb your followers with a meanless tweet, don't run the following block of code.

In [16]:
# Post a tweet from Python
test_tweet = api.update_status("DRILL: I'm creating a robot to tweet!")

Check your Twitter account, and you'll see the above message is posted.

![](../image/twitter/tweet.jpg)


Tweets are encoded in a JSON (JavaScript Object Notation) format. You can run the following code to check the content of the tweet you just posted.

In [17]:
test_tweet._json

{'created_at': 'Thu Mar 31 01:21:40 +0000 2022',
 'id': 1509340116334133249,
 'id_str': '1509340116334133249',
 'text': "DRILL: I'm creating a robot to tweet!",
 'truncated': False,
 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []},
 'source': '<a href="https://google.com" rel="nofollow">GIS6307</a>',
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_reply_to_user_id_str': None,
 'in_reply_to_screen_name': None,
 'user': {'id': 1343615702809374721,
  'id_str': '1343615702809374721',
  'name': 'alex',
  'screen_name': 'alex32348132',
  'location': '',
  'description': 'me',
  'url': None,
  'entities': {'description': {'urls': []}},
  'protected': False,
  'followers_count': 0,
  'friends_count': 9,
  'listed_count': 2,
  'created_at': 'Mon Dec 28 17:52:11 +0000 2020',
  'favourites_count': 1,
  'utc_offset': None,
  'time_zone': None,
  'geo_enabled': False,
  'verified': False,
  'statuses_count': 1,
  'l

You can run the following code to delete the tweet you just posted.

In [18]:
api.destroy_status(test_tweet.id_str)

Status(_api=<tweepy.api.API object at 0x00000245F4D7F6D0>, _json={'created_at': 'Thu Mar 31 01:21:40 +0000 2022', 'id': 1509340116334133249, 'id_str': '1509340116334133249', 'text': "DRILL: I'm creating a robot to tweet!", 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="https://google.com" rel="nofollow">GIS6307</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 1343615702809374721, 'id_str': '1343615702809374721', 'name': 'alex', 'screen_name': 'alex32348132', 'location': '', 'description': 'me', 'url': None, 'entities': {'description': {'urls': []}}, 'protected': False, 'followers_count': 0, 'friends_count': 9, 'listed_count': 2, 'created_at': 'Mon Dec 28 17:52:11 +0000 2020', 'favourites_count': 1, 'utc_offset': None, 'time_zone': None, 'geo_enabled': False, 'verified': False, 'statuse

### 3.2 Getting Trending Tweets

Get the list of cities where trends are available

In [19]:
city_ls = api.available_trends()

Convert the list (in JSON format) into a dataframe (i.e. a table). Print the number of cities where trends are available.

In [20]:
df_city = pd.DataFrame(city_ls)

print (str(len(df_city)) + " cities have trends.")

467 cities have trends.


Preview 10 cities where trends are available

In [21]:
df_city.head(10)

Unnamed: 0,name,placeType,url,parentid,country,woeid,countryCode
0,Worldwide,"{'code': 19, 'name': 'Supername'}",http://where.yahooapis.com/v1/place/1,0,,1,
1,Winnipeg,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/2972,23424775,Canada,2972,CA
2,Ottawa,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/3369,23424775,Canada,3369,CA
3,Quebec,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/3444,23424775,Canada,3444,CA
4,Montreal,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/3534,23424775,Canada,3534,CA
5,Toronto,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/4118,23424775,Canada,4118,CA
6,Edmonton,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/8676,23424775,Canada,8676,CA
7,Calgary,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/8775,23424775,Canada,8775,CA
8,Vancouver,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/9807,23424775,Canada,9807,CA
9,Birmingham,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/12723,23424975,United Kingdom,12723,GB


Get the record of Tampa. The `woeid` is a unique ID for each place.

In [22]:
df_city[df_city['name']=='Tampa']

Unnamed: 0,name,placeType,url,parentid,country,woeid,countryCode
394,Tampa,"{'code': 7, 'name': 'Town'}",http://where.yahooapis.com/v1/place/2503863,23424977,United States,2503863,US


Store the `woeid` of Tampa in tampa_id.

In [23]:
tampa_id = df_city[df_city['name']=='Tampa']['woeid']

Return the trends in Tampa.

Note: you need to convert the city_id from a pandas series object into an integer.

In [24]:
# make Tampa as an example
trends_tampa = api.get_place_trends(int(tampa_id))

Print the trends in JSON format

In [25]:
# print the top 20 trends in Tampa
trends_tampa[0:20]

[{'trends': [{'name': '#AsItWas',
    'url': 'http://twitter.com/search?q=%23AsItWas',
    'promoted_content': None,
    'query': '%23AsItWas',
    'tweet_volume': 26807},
   {'name': 'Jared',
    'url': 'http://twitter.com/search?q=Jared',
    'promoted_content': None,
    'query': 'Jared',
    'tweet_volume': 49608},
   {'name': '#BoycottDisney',
    'url': 'http://twitter.com/search?q=%23BoycottDisney',
    'promoted_content': None,
    'query': '%23BoycottDisney',
    'tweet_volume': 16662},
   {'name': 'Ivanka',
    'url': 'http://twitter.com/search?q=Ivanka',
    'promoted_content': None,
    'query': 'Ivanka',
    'tweet_volume': 65324},
   {'name': 'Booming',
    'url': 'http://twitter.com/search?q=Booming',
    'promoted_content': None,
    'query': 'Booming',
    'tweet_volume': 34743},
   {'name': '#AEWDynamite',
    'url': 'http://twitter.com/search?q=%23AEWDynamite',
    'promoted_content': None,
    'query': '%23AEWDynamite',
    'tweet_volume': 34950},
   {'name': 'Brady

Organize the Tampa trends in a dataframe.

In [26]:
trend_ls = [[trend['name'], trend['url'], trend['tweet_volume']] for trend in trends_tampa[0]['trends']]

df_trends = pd.DataFrame(trend_ls,columns=['name','url','tweet_volume'])

Sort the trends by tweet volumn in a descending order and print the top 10 trends with the most tweeting volumne.

In [27]:
# Sort the trends by tweet volumn in a descending order
df_trends.sort_values("tweet_volume", inplace = True, ascending = False)

# Print the top 10 trends ranked by tweet volumne
df_trends.head(10)

Unnamed: 0,name,url,tweet_volume
36,Hunter Biden,http://twitter.com/search?q=%22Hunter+Biden%22,365784.0
3,Ivanka,http://twitter.com/search?q=Ivanka,65324.0
35,Jonathan,http://twitter.com/search?q=Jonathan,60114.0
47,McCarthy,http://twitter.com/search?q=McCarthy,52965.0
1,Jared,http://twitter.com/search?q=Jared,49608.0
32,Tyga,http://twitter.com/search?q=Tyga,42018.0
37,The Get Down,http://twitter.com/search?q=%22The+Get+Down%22,41424.0
5,#AEWDynamite,http://twitter.com/search?q=%23AEWDynamite,34950.0
4,Booming,http://twitter.com/search?q=Booming,34743.0
29,Blac Chyna,http://twitter.com/search?q=%22Blac+Chyna%22,32959.0


The table shows the popular topics people are tweeting about in Tampa.

---

## 4. Acquiring Tweets using the Search API

### 4.1 Search Tweets using Keywords

In this step, you will use Python program to search tweets that contain a specific keyword. "Will Smith" was quite a hot topic when I created the tutorial. Next, I will search for Tweets that contain "Will Smith".

In [28]:
tweets = api.search_tweets("Will Smith",count=100)

print("Total retweet retrieved: "+ str(len(tweets)))

Total retweet retrieved: 100


Store the user name, user location, posting time, and tweet text in a Pandas dataframe.

In [29]:
tweets_pd = pd.DataFrame([[tweet.user.name, tweet.user.location,tweet.created_at, tweet.text] for tweet in tweets], 
                         columns = ['user_name','user_loc','creation_time','text'])

tweets_pd

Unnamed: 0,user_name,user_loc,creation_time,text
0,Black Trans Lives Matter,"the country, TX",2022-03-31 01:22:17+00:00,RT @mrLdavis: Me digging up dirt on all the ce...
1,KB,"Macon, GA",2022-03-31 01:22:17+00:00,RT @cornskiii: that slap from will smith was l...
2,PODEMOSARTS,paris,2022-03-31 01:22:17+00:00,RT @LaBasePublico: Hoy hemos hablado de las mú...
3,Jeff Rowe,,2022-03-31 01:22:17+00:00,@FoxNews Take Will Smith TOO the CLEANERS BRO...
4,Spider-Flakk,"Barcelona, Catalunya",2022-03-31 01:22:16+00:00,RT @lamba_aarav: Will Smith In and As The Batm...
...,...,...,...,...
95,rshei 🪷,the sippp,2022-03-31 01:21:57+00:00,RT @Courtstrology: Will Smith has a Libra stel...
96,wichopunkass,fresno,2022-03-31 01:21:57+00:00,RT @AndyKindler: Will Smith slapped Chris Rock...
97,D🌷,"Toronto, Ontario",2022-03-31 01:21:57+00:00,RT @thedigitaldash_: most of us watched the sl...
98,mahir,Deutschland,2022-03-31 01:21:57+00:00,"RT @NoahPasternak: ""Garfield"" creator Jim Davi..."


The `search_tweets` funciton can retrieve max 100 tweets at one time. If you want to get more tweets, you can use the `cursor`.

The following code can retrieve more tweets containing a keyword (e.g. "Will Smith"). We still set a limit of 100 to save your search quota for the following steps.

> Note: You can only retrieve a limited number of tweets per 15 minutes. If the retrieved tweets exceed the limit, the program will pause for some time. If you can't wait, you can doublepress `i` on your keyboard to interrupt the process.

In [None]:
# create an empty list
tweet_ls = []

# Number of tweets to be retrieved.
num = 100

# Use a cursor to get all tweets containing the keyword
for tweet in tweepy.Cursor(api.search_tweets, q="Will Smith", count = num, result_type="recent", include_entities=True,lang="en").items(num):
    tweet_ls.append([tweet.user.name, tweet.user.location,tweet.created_at, tweet.text])


# Store the retrieved tweets in a dataframe
tweets_pd_full = pd.DataFrame(tweet_ls, 
                         columns = ['user_name','user_loc','creation_time','text'])

# Print the dataframe
tweets_pd_full

Rate limit reached. Sleeping for: 654


### 4.2 Removing Retweets
The retrieved tweets include original tweets and retweets. The textual content of retweets are almost identical. You can set up a filter to eliminate the retweets and keep only the original tweets.

In [23]:
new_search = "Will Smith" + " -filter:retweets"
new_search

'Will Smith -filter:retweets'

Now, you can see only original tweets are retrieved.

Print to see the list.

In [None]:
# create an empty list
tweet_ls = []

# intiate a counter
n = 0
# Number of tweets to be retrieved.
num = 200

# Use a cursor to get all tweets containing the keyword
for tweet in tweepy.Cursor(api.search_tweets, q = new_search, count = num, result_type="recent", include_entities=True,lang="en").items(num):
    tweet_ls.append([tweet.user.name, tweet.user.location,tweet.created_at, tweet.text])
    n = n+1


# Store the retrieved tweets in a dataframe
tweets_pd_full = pd.DataFrame(tweet_ls, 
                         columns = ['user_name','user_loc','creation_time','text'])

# Print the dataframe
tweets_pd_full

Rate limit reached. Sleeping for: 753
Rate limit reached. Sleeping for: 759
Rate limit reached. Sleeping for: 752
Rate limit reached. Sleeping for: 747


### 4.3 Search Tweets using locations

Query for a popular trend keyword in Tampa (200 miles range)

First, let's check what are the top 10 trending topics in the selected city (Tampa).

In [133]:
df_trend.head(10)

Unnamed: 0,name,url,tweet_volume
30,Beyoncé,http://twitter.com/search?q=Beyonc%C3%A9,203721.0
31,Dune,http://twitter.com/search?q=Dune,150948.0
34,Encanto,http://twitter.com/search?q=Encanto,120818.0
42,Zendaya,http://twitter.com/search?q=Zendaya,117653.0
48,Duke,http://twitter.com/search?q=Duke,116909.0
45,World Cup,http://twitter.com/search?q=%22World+Cup%22,111669.0
36,Ariana,http://twitter.com/search?q=Ariana,95088.0
14,billie,http://twitter.com/search?q=billie,68328.0
17,Carolina,http://twitter.com/search?q=Carolina,64003.0
37,Bond,http://twitter.com/search?q=Bond,61730.0


The following code may take a few minutes to run to collect the tweets, depending on the number of tweets.

In [134]:
# new_search = "#new_search -filter:retweets"
#new_search = " -filter:retweets"

# use cursor to send your request with parameters
tweets = tweepy.Cursor(api.search_tweets,
                   q="beach",
                   geocode = "27.9506,-82.4572,20000mi",
                   lang="en").items(100)

# restore the results as a list
search_result = [[tweet.user.screen_name, tweet.text, tweet.user.location,tweet.place] for tweet in tweets]

Convert the searched tweets into a dataframe

In [135]:
search_result

[['vaenergy',
  '@SethQuick5 @mkobach great history, great foodie town, 1 hour to mountains, 1 1/2 hour to the beach, 2 hours to Was… https://t.co/R84RdcotZL',
  'Key Largo FL',
  None],
 ['gwardhome',
  'Bumbling Buffon Beach Cosplay Lawyer @DWUhlfelderLaw starts campaign page @DanielUhlfelder .\nTwice the trolling twi… https://t.co/Zpi1dytICK',
  'Plantation, FL',
  Place(_api=<tweepy.api.API object at 0x00000254C7BF6160>, id='7df9a00dcf914d5e', url='https://api.twitter.com/1.1/geo/id/7df9a00dcf914d5e.json', place_type='city', name='Plantation', full_name='Plantation, FL', country_code='US', country='United States', contained_within=[], bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x00000254C7BF6160>, type='Polygon', coordinates=[[[-80.330201, 26.088262], [-80.1968332, 26.088262], [-80.1968332, 26.160753], [-80.330201, 26.160753]]]), attributes={})],
 ['EdPiotrowski',
  'The ferris wheel at Broadway at the Beach still shines brightly in the colors of the Ukrainian flag to 

In [136]:
df_result = pd.DataFrame(data=search_result, 
                    columns=['user', "text","location","place"])

Preview the first 5 tweets

In [137]:
df_result.head()

Unnamed: 0,user,text,location,place
0,vaenergy,"@SethQuick5 @mkobach great history, great food...",Key Largo FL,
1,gwardhome,Bumbling Buffon Beach Cosplay Lawyer @DWUhlfel...,"Plantation, FL",Place(_api=<tweepy.api.API object at 0x0000025...
2,EdPiotrowski,The ferris wheel at Broadway at the Beach stil...,"Myrtle Beach, SC",
3,claudefla01,Spectacular beach front condo with panoramic o...,miami beach,
4,jakelkapri,"Ugh, quick beach run won’t hurt 😩","Orlando, fl",


Preview the first 5 tweets with geotags

In [138]:
df_result[df_result['place'].notna()].head()

Unnamed: 0,user,text,location,place
1,gwardhome,Bumbling Buffon Beach Cosplay Lawyer @DWUhlfel...,"Plantation, FL",Place(_api=<tweepy.api.API object at 0x0000025...
9,What2WearWhere,@megbraffdesigns What a fab opening! Don't m...,New York,Place(_api=<tweepy.api.API object at 0x0000025...
10,NWJS_jobs,"See our latest Vero Beach, FL job opening. htt...",,Place(_api=<tweepy.api.API object at 0x0000025...
12,donlexofficial,@majorleaguedjz 🅿️ushing 🅿️iano to the world 🌍...,"Miami, FL",Place(_api=<tweepy.api.API object at 0x0000025...
40,D3zMix,Last night in #pcb so of course we have to mak...,FL USA,Place(_api=<tweepy.api.API object at 0x0000025...


### 3.3 check how many tweets are geotagged

In [77]:
all_tweets = len(df_result[df_result['place'].notna()]) # all retrieved tweets
geo_tweets = len(df_result) # tweets that actually have geotags.

print("%s out of the %s retrieved tweets actually have geotags" % (all_tweets, geo_tweets))

21 out of the 100 retrieved tweets actually have geotags


#### Copy tweets with geotags to a new dataframe called "geotags"

In [78]:
geotags = df_result.loc[df_result['place'].notna()].copy()

#### get their place and view where first 5 tweets are from

In [79]:
geotags['place_name'] = geotags.place.apply(lambda s:s.name)

In [80]:
geotags.head()

Unnamed: 0,user,text,location,place,place_name
6,D3zMix,@alt_lyfe heavy out here in #pcb #coyoteugly ...,FL USA,Place(_api=<tweepy.api.API object at 0x0000025...,Panama City Beach
7,divinemoira,Swing through life as fearlessly as you did wh...,,Place(_api=<tweepy.api.API object at 0x0000025...,Sarasota
22,divinemoira,Dance with me… @ Lido Beach Resort https://t.c...,,Place(_api=<tweepy.api.API object at 0x0000025...,Sarasota
24,paulleary,"At Palm Beach Pride, 30 LGBTQ couples who marr...","Miami, FL",Place(_api=<tweepy.api.API object at 0x0000025...,Fort Lauderdale
27,MrsGinaC,Delicious blackened sea food platter and muscl...,"Oswego, IL",Place(_api=<tweepy.api.API object at 0x0000025...,Aunt Kate's


#### Check place information and parse them into dataframe

In [81]:
geotags.place[min(geotags.index)]

Place(_api=<tweepy.api.API object at 0x00000254C2A7A3D0>, id='9ebd5acfac2301ba', url='https://api.twitter.com/1.1/geo/id/9ebd5acfac2301ba.json', place_type='city', name='Panama City Beach', full_name='Panama City Beach, FL', country_code='US', country='United States', contained_within=[], bounding_box=BoundingBox(_api=<tweepy.api.API object at 0x00000254C2A7A3D0>, type='Polygon', coordinates=[[[-85.95802, 30.1650609], [-85.7860766, 30.1650609], [-85.7860766, 30.266595], [-85.95802, 30.266595]]]), attributes={})

#### Print the bounding box of the geotag

In [82]:
geotags.place[min(geotags.index)].bounding_box

BoundingBox(_api=<tweepy.api.API object at 0x00000254C2A7A3D0>, type='Polygon', coordinates=[[[-85.95802, 30.1650609], [-85.7860766, 30.1650609], [-85.7860766, 30.266595], [-85.95802, 30.266595]]])

###### Print the coordinates of the bounding box

In [83]:
geotags.place[min(geotags.index)].bounding_box.coordinates[0]

[[-85.95802, 30.1650609],
 [-85.7860766, 30.1650609],
 [-85.7860766, 30.266595],
 [-85.95802, 30.266595]]

#### Generate a column called bounding_box to restore bounding box information

In [84]:
geotags['bounding_box'] = geotags.place.apply(lambda s:s.bounding_box.coordinates[0])

In [85]:
geotags.head()

Unnamed: 0,user,text,location,place,place_name,bounding_box
6,D3zMix,@alt_lyfe heavy out here in #pcb #coyoteugly ...,FL USA,Place(_api=<tweepy.api.API object at 0x0000025...,Panama City Beach,"[[-85.95802, 30.1650609], [-85.7860766, 30.165..."
7,divinemoira,Swing through life as fearlessly as you did wh...,,Place(_api=<tweepy.api.API object at 0x0000025...,Sarasota,"[[-82.588866, 27.293114], [-82.477281, 27.2931..."
22,divinemoira,Dance with me… @ Lido Beach Resort https://t.c...,,Place(_api=<tweepy.api.API object at 0x0000025...,Sarasota,"[[-82.588866, 27.293114], [-82.477281, 27.2931..."
24,paulleary,"At Palm Beach Pride, 30 LGBTQ couples who marr...","Miami, FL",Place(_api=<tweepy.api.API object at 0x0000025...,Fort Lauderdale,"[[-80.20811, 26.080935], [-80.0902351, 26.0809..."
27,MrsGinaC,Delicious blackened sea food platter and muscl...,"Oswego, IL",Place(_api=<tweepy.api.API object at 0x0000025...,Aunt Kate's,"[[-81.31005883828144, 29.949614861868813], [-8..."


#### Parse the latitude and longitude hidden in the bounding box, finally check the dataframe

Store the centroids in a column 'point'

In [86]:
geotags['point']  = geotags['bounding_box'].apply(lambda s: [(s[0][1]+s[2][1])/2,(s[0][0]+s[2][0])/2])

Store the latitude of the centroids in the column 'lat'

In [87]:
geotags['lat']  = geotags['bounding_box'].apply(lambda s: (s[0][1]+s[2][1])/2)

Store the longitude of the centroids in the column 'lon'

In [88]:
geotags['lon']  = geotags['bounding_box'].apply(lambda s: (s[0][0]+s[2][0])/2)

Print to see the dataframe again.

You'll see the centroids, latitude, and longitude are added as columns in the dataframe.

Note: the point column is an redundancy of the lat and lon columns. We create all these columns just for demonstration of mapping in the next step.

In [89]:
geotags.head()

Unnamed: 0,user,text,location,place,place_name,bounding_box,point,lat,lon
6,D3zMix,@alt_lyfe heavy out here in #pcb #coyoteugly ...,FL USA,Place(_api=<tweepy.api.API object at 0x0000025...,Panama City Beach,"[[-85.95802, 30.1650609], [-85.7860766, 30.165...","[30.215827949999998, -85.8720483]",30.215828,-85.872048
7,divinemoira,Swing through life as fearlessly as you did wh...,,Place(_api=<tweepy.api.API object at 0x0000025...,Sarasota,"[[-82.588866, 27.293114], [-82.477281, 27.2931...","[27.3411215, -82.5330735]",27.341121,-82.533074
22,divinemoira,Dance with me… @ Lido Beach Resort https://t.c...,,Place(_api=<tweepy.api.API object at 0x0000025...,Sarasota,"[[-82.588866, 27.293114], [-82.477281, 27.2931...","[27.3411215, -82.5330735]",27.341121,-82.533074
24,paulleary,"At Palm Beach Pride, 30 LGBTQ couples who marr...","Miami, FL",Place(_api=<tweepy.api.API object at 0x0000025...,Fort Lauderdale,"[[-80.20811, 26.080935], [-80.0902351, 26.0809...","[26.150368, -80.14917255]",26.150368,-80.149173
27,MrsGinaC,Delicious blackened sea food platter and muscl...,"Oswego, IL",Place(_api=<tweepy.api.API object at 0x0000025...,Aunt Kate's,"[[-81.31005883828144, 29.949614861868813], [-8...","[29.949614861868813, -81.31005883828144]",29.949615,-81.310059


---

## 4. Spatial visualization using folium package

Import the folium package to create an interactive map.

In [90]:
import folium

Create a basemap.

In [91]:
#oahu = folium.Map(location = [21.473,-157.9868],zoom_start = 10)
maptweet = folium.Map()

Add the tweets into the basemap

In [92]:
for i, row in geotags.iterrows():
    folium.Marker(row.point,popup = row.text).add_to(maptweet)

Zoom closer into the tweets

In [93]:
maptweet.fit_bounds([[min(geotags.lat),min(geotags.lon)],[max(geotags.lat),max(geotags.lon)]])

In [94]:
display(maptweet)