# 13.15 Geocoding and Mapping
* Collect streaming tweets, then plot their locations on an interactive map
* **Twitter disables precise location info (latitude/longitude) by default** (users must opt in to allowing Twitter to track locations) 
* Large percentage include the user’s home location information
    * Sometimes invalid or fictitious 
* Map markers will show the sender's `location` and tweet text

### [**geopy** library](https://github.com/geopy/geopy)
* Setup in Section 13.6
* **Geocoding**&mdash;translate locations into **latitude** and **longitude**
* **geopy** supports dozens of **geocoding web services**, many with **free or lite tiers**
* We’ll use **OpenMapQuest geocoding service** 

### OpenMapQuest Geocoding API
* Sign-up instructions in Section 13.6
* Convert locations, such as **Boston, MA** into their **latitudes** and **longitudes**, such as **42.3602534** and **-71.0582912**, for plotting on maps


### [**folium library**](https://github.com/python-visualization/folium) and Leaflet.js JavaScript Mapping Library
* Setup in Section 13.6
* For maps — uses **Leaflet.js JavaScript mapping library** to display maps in a web page 
* Folium save as HTML files that you can view in your web browser

## 13.15.1 Getting and Mapping the Tweets
* We’ll use utility functions from our **`tweetutilities.py`** file and class **`LocationListener`** in **`locationlistener.py`**

### Collections Required By LocationListener
* a list (`tweets`) to store the data from the tweets we collect 
* a dictionary (`counts`) to track the total number of tweets we collect and the number that have location data

In [None]:
tweets = [] 

counts = {'total_tweets': 0, 'locations': 0}

### Creating the LocationListener 
* Collect 50 tweets about `'football'`
* `LocationListener` will use utility function `get_tweet_content` (located in `tweetutilities.py`; discussed in Section 13.15.2) to place in a dictionary the `username`, tweet `text` and user `location` from each tweet

In [None]:
import keys

import tweepy

from locationlistener import LocationListener

location_listener = LocationListener(
    keys.bearer_token, counts_dict=counts, tweets_list=tweets,
    topic='football', limit=50)

### Redirect sys.stderr to sys.stdout

In [None]:
import sys

sys.stderr = sys.stdout

### Delete Existing StreamRules

In [None]:
rules = location_listener.get_rules().data

rule_ids = [rule.id for rule in rules]

location_listener.delete_rules(rule_ids)    

### Create a StreamRule
* Rule to get tweets in English (`lang:en`) about football 

In [None]:
location_listener.add_rules(
    tweepy.StreamRule('football lang:en'))

### Configure and Start the Stream of Tweets
* start streaming the tweets
    * expansion `'author_id'` gets information about the user who sent the tweet, including the `username`
    * `user_fields` argument specifies that the user information should include the account’s `'location'` 
    * `tweet_fields` argument specifies additional information to include with each tweet—in this case, the tweet’s `language`


In [None]:
location_listener.filter(expansions=['author_id'], 
    user_fields=['location'], tweet_fields=['lang'])

### Displaying the Location Statistics
* check how many tweets we processed, how many had locations and the percentage that had locations

In [None]:
In [14]: counts['total_tweets']

In [None]:
In [15]: counts['locations']

In [None]:
In [16]: print(f'{counts["locations"] / counts["total_tweets"]:.1%}')

### Geocoding the Locations
* Use `get_geocodes` utility function (from `tweetutilities.py`; discussed in Section 13.15.2) to geocode the location of each tweet stored in the list of tweets

In [None]:
from tweetutilities import get_geocodes

bad_locations = get_geocodes(tweets)

* For each tweet with a valid location, the `get_geocodes` function adds the new keys `'latitude'` and `'longitude'` to that tweet’s dictionary in the `tweets` list — these will be used to plot map markers on our interactive map

### Displaying the Bad Location Statistics

In [None]:
bad_locations

In [None]:
print(f'{bad_locations / counts["locations"]:.1%}')

### Cleaning the Data
* Before we plot the tweet locations on a map, let’s use a pandas `DataFrame` to clean the data
* When you create a * DataFrame* from the `tweets` list, it will contain the value `NaN` for the `'latitude'` and `'longitude'` of any tweet that does not have a valid location
* `NaN` cannot be plotted on a map, so remove any rows containing `NaN` by calling the `DataFrame`’s `dropna` method

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(tweets)

In [None]:
df = df.dropna()

### Creating a Map with Folium
Create a folium Map on which we’ll plot the tweet locations

In [None]:
import folium

In [None]:
usmap = folium.Map(location=[39.8283, -98.5795], 
    tiles='Stamen Terrain', zoom_start=5, detect_retina=True)

* `location` keyword argument specifies a sequence containing latitude and longitude coordinates for the **map’s center point** 
    * The values in this snippet are the **geographic center of the continental United States**
    * In many places worldwide, the term `'football'` describes the sport we call soccer in the U.S., so some of the tweets we plot may be outside the U.S
    * You can zoom using the **+** and **–** buttons at the map’s top-left, or you can dragging the map with the mouse (that is, pan) to see anywhere in the world
*  `zoom_start` keyword argument specifies the map’s initial zoom level, lower values show more of the world
* `detect_retina` keyword argument enables folium to detect high-resolution screens to use higher-resolution maps from `OpenStreetMap.org`

### Creating Popup Markers for the Tweet Locations
* Create `folium` `Popup` objects containing each tweet’s text and add them to the `Map`
* `DataFrame` method `itertuples` creates a named tuple from each row containing properties corresponding to each `DataFrame` column

In [None]:
for t in df.itertuples():
    text = ': '.join([t.username, t.text])
    popup = folium.Popup(text, parse_html=True)
    marker = folium.Marker((t.latitude, t.longitude), 
                           popup=popup)
    marker.add_to(usmap)

* Creates a string (`text`) containing the user’s `username` and tweet `text` 
* Creates a `folium` `Popup` to display the `text`
* Creates a `folium` `Marker`
    * tuple to specify the `Marker`’s latitude and longitude
    * `popup` keyword argument associates the tweet’s `Popup` object with the new `Marker`
* Calls the `Marker`’s `add_to` method to specify the `Map` that will display the `Marker`

### Saving the Map
* Call the `Map`’s `save` method to store the map in an HTML file, which you can then double-click to open in your web browser

usmap.save('tweet_map.html')

In [None]:
usmap # displays the map in the notebook

## 13.15.2 Utility Functions in `tweetutilities.py` 
### `get_tweet_content` Utility Function 
* Receives a **`StreamResponse` object (`response`)** and creates a **dictionary** containing the **tweet’s `username`, `text` and `location`**

```python
def get_tweet_content(response):
    """Return dictionary with data from tweet."""
    fields = {}
    fields['username'] = response.includes['users'][0].username
    fields['text'] = response.data.text
    fields['location'] = response.includes['users'][0].location

    return fields
```

### `get_geocodes` Utility Function 
* Receives a list of dictionaries containing tweets and **geocodes their locations**
* If geocoding is successful for a tweet, adds the **latitude** and **longitude** to the tweet’s **dictionary in `tweet_list`**
* Requires class **`OpenMapQuest`** from the **geopy module**

```python
from geopy import OpenMapQuest
```

```python
def get_geocodes(tweet_list):
    """Get the latitude and longitude for each tweet's location.
    Returns the number of tweets with invalid location data."""
    print('Getting coordinates for tweet locations...')
    geo = OpenMapQuest(api_key=keys.mapquest_key)  # geocoder
    bad_locations = 0  

    for tweet in tweet_list:
        processed = False
        delay = .1  # used if OpenMapQuest times out to delay next call
        while not processed:
            try:  # get coordinates for tweet['location']
                geo_location = geo.geocode(tweet['location'])
                processed = True
            except:  # timed out, so wait before trying again
                print('OpenMapQuest service timed out. Waiting.')
                time.sleep(delay)
                delay += .1

        if geo_location:  
            tweet['latitude'] = geo_location.latitude
            tweet['longitude'] = geo_location.longitude
        else:  
            bad_locations += 1  # tweet['location'] was invalid
    
    print('Done geocoding')
    return bad_locations

```

### `get_geocodes` Utility Function (cont.)
* Creates the **`OpenMapQuest` object** we’ll use to geocode locations
* Initializes **`bad_locations`** which we use to keep track of the number of invalid locations in the tweet objects we collected
* Attempts to **geocode the current tweet’s location**
* Prints a message that it’s done geocoding and returns the `bad_locations` value

## 13.15.3 Class `LocationListener`
```python
# locationlistener.py
"""Receives tweets matching a search string and stores a list of
dictionaries containing each tweet's username/text/location."""
import tweepy
from tweetutilities import get_tweet_content

class LocationListener(tweepy.StreamingClient):
    """Handles incoming Tweet stream to get location data."""
```

```python
    def __init__(self, bearer_token, counts_dict, 
                 tweets_list, topic, limit=10):
        """Configure the LocationListener."""
        self.tweets_list = tweets_list
        self.counts_dict = counts_dict
        self.topic = topic
        self.TWEET_LIMIT = limit
        super().__init__(bearer_token, wait_on_rate_limit=True)
```

```python
    def on_response(self, response):
        """Called when Twitter pushes a new tweet to you."""

        # get each tweet's username, text and location
        tweet_data = get_tweet_content(response)  

        # ignore retweets and tweets that do not contain the topic
        if (tweet_data['text'].startswith('RT') or
            self.topic.lower() not in tweet_data['text'].lower()):
            return

        self.counts_dict['total_tweets'] += 1 # it's an original tweet

        # ignore tweets with no location 
        if not tweet_data.get('location'):  
            return

        self.counts_dict['locations'] += 1 # user account has location
        self.tweets_list.append(tweet_data) # store the tweet
        print(f"{tweet_data['username']}: {tweet_data['text']}\n")
        
        # if TWEET_LIMIT is reached, terminate streaming
        if self.counts_dict['locations'] == self.TWEET_LIMIT:
            self.disconnect()
```

## 13.15.3 Class `LocationListener` (cont.)
* `__init__` receives 
    * the `bearer_token` 
    * the number of tweets to process (`limit`)
    * `counts` dictionary that we use to keep track of the total number of tweets processed
    * `tweet_list` in which we store the dictionaries returned by the `get_tweet_content` utility function
    * a string representing the topic so we can confirm that its text is contained in the tweet text

## 13.15.3 Class `LocationListener` (cont.)
* In method `on_response`
   * Line 23 calls `get_tweet_content` to get each tweet’s screen name, text and location.
    * Lines 26–28 ignore the tweet if it is a retweet or if the text does not include the topic we’re searching for
    * Line 30 adds 1 to the value of the `'total_tweets'` key in the `counts` dictionary to track the number of original tweets
    * Lines 33–34 ignore tweets that have no location data
    * Line 36 adds 1 to the value of the `counts` dictionary’s `'locations'` key to indicate that we found a tweet with a location
    * Line 37 appends the `tweet_data` dictionary to the `tweets_list`
    * Line 38 displays the tweet’s screen name and tweet text so you can see that the app is making progress
    * Lines 41–42 check whether the `TWEET_LIMIT` has been reached, and if so, disconnect from the stream.

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  