# 13.15 Geocoding and Mapping
* Collect streaming tweets, then plot their locations on an interactive map
* **Twitter disables precise location info (latitude/longitude) by default** (users must opt in to allowing Twitter to track locations) 
* Large percentage include the user’s home location information
    * Sometimes invalid or fictitious 
* Map markers will show `location` from each tweet’s `User` object

### [**geopy** library](https://github.com/geopy/geopy)
* **Geocoding**&mdash;translate locations into **latitude** and **longitude**
* **geopy** supports dozens of **geocoding web services**, many with **free or lite tiers**
* We’ll use **OpenMapQuest geocoding service** 

### OpenMapQuest Geocoding API
* Convert locations, such as **Boston, MA** into their **latitudes** and **longitudes**, such as **42.3602534** and **-71.0582912**, for plotting on maps
* Currently allows **15,000 transactions per month** on their free tier
* [Sign up](https://developer.mapquest.com/)
* Once logged in, go to 
> https://developer.mapquest.com/user/me/apps 
    * Click **Create a New Key**
    * Fill in the `App Name` field with a name of your choosing
    * Leave the `Callback URL` empty
    * Click `Create App` to create an API key 
* Click your app’s name in the web page to see your **consumer key**
* In `keys.py`, replace **YourKeyHere** in `mapquest_key` line

### [**folium library**](https://github.com/python-visualization/folium) and Leaflet.js JavaScript Mapping Library
* For the maps
* Uses **Leaflet.js JavaScript mapping library** to display maps in a web page 
* Folium save as HTML files that you can view in your web browser
* Install folium
>```python
pip install folium
```

### Maps from OpenStreetMap.org
By default, **Leaflet.js** uses **open source maps** from **`OpenStreetMap.org`**
* To use these maps, **they require the following copyright notice**:
> `Map data © OpenStreetMap contributors`
and they state:
> _You must make it clear that the data is available under the Open Database License. This can be achieved by providing a “License” or “Terms” link which links to `www.openstreetmap.org/copyright` or `www.opendatacommons.org/licenses/odbl`._

## 13.15.1 Getting and Mapping the Tweets
* We’ll use utility functions from our **`tweetutilities.py`** file and class **`LocationListener`** in **`locationlistener.py`**


### Get the API Object
* In this case, we do this via the `get_API` utility function in `tweetutilities.py`

In [None]:
from tweetutilities import get_API

In [None]:
api = get_API()

### Collections Required By `LocationListener`
* Requires two collections
    * A **list (`tweets`)** to store the tweets we collect 
    * A **dictionary (`counts`)** to track the total number of tweets we collect and the number that have location data

In [None]:
tweets = [] 

In [None]:
counts = {'total_tweets': 0, 'locations': 0}

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Creating the LocationListener 

In [None]:
from locationlistener import LocationListener

In [None]:
location_listener = LocationListener(api, counts_dict=counts, 
    tweets_list=tweets, topic='football', limit=50)

* **`LocationListener`** uses our **utility function `get_tweet_content`** to extract the screen name, tweet text and location from each tweet, place that data in a dictionary

<hr style="height:2px; border:none; color:#AAA; background-color:#AAA;">

### Configure and Start the `Stream` of Tweets

In [None]:
import tweepy

In [None]:
stream = tweepy.Stream(auth=api.auth, listener=location_listener)

In [None]:
stream.filter(track=['football'], languages=['en'], is_async=False)

### Displaying the Location Statistics 

In [None]:
counts['total_tweets']

In [None]:
counts['locations']

In [None]:
print(f'{counts["locations"] / counts["total_tweets"]:.1%}')

### Geocoding the Locations 
* Uses our `get_geocodes` utility function 
* **OpenMapQuest geocoding service** times out when it cannot handle your request immediately
* If so, **`get_geocodes`** notifies you, waits, then retries the request

In [None]:
from tweetutilities import get_geocodes

In [None]:
bad_locations = get_geocodes(tweets)

### Displaying the Bad Location Statistics

In [None]:
bad_locations

In [None]:
print(f'{bad_locations / counts["locations"]:.1%}')

### Cleaning the Data 
* Use a pandas `DataFrame` to clean the data
* `DataFrame` will contain **`NaN`** for the **`latitude`** and **`longitude`** of any tweet that did not have a valid location
* Remove any such via `DataFrame`’s **`dropna` method** 

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(tweets)

In [None]:
df = df.dropna()

### Creating a Map with Folium

In [None]:
import folium

In [None]:
usmap = folium.Map(location=[39.8283, -98.5795],  # center of U.S.
                   tiles='Stamen Terrain',
                   zoom_start=4, detect_retina=True)

* **`location`** &mdash; sequence containing latitude and longitude of map center point
    * [Geographic center of the continental United States](http://bit.ly/CenterOfTheUS) 
* **`zoom_start`** &mdash; map’s initial zoom level
* **`detect_retina`** &mdash; enables folium to use higher-resolution maps

### Creating Folium `Popup` Objects for the Tweet Locations
* **`itertuples** creates tuples from each row of the **`DataFrame`**
* Each **tuple** contains a **property** for each **`DataFrame` column**

In [None]:
for t in df.itertuples():
    text = ': '.join([t.screen_name, t.text])
    popup = folium.Popup(text, parse_html=True)
    marker = folium.Marker((t.latitude, t.longitude), 
                           popup=popup)
    marker.add_to(usmap)

### Saving the Map with Map’s **`save`** Method 

In [None]:
usmap.save('tweet_map.html')

### Displaying the Map in Jupyter 
* Evaluating the `usmap` object in Jupyter displays the interactive map

In [None]:
usmap

## 13.15.2 Utility Functions in `tweetutilities.py` 
### `get_tweet_content` Utility Function 
* Receives a **`Status` object (tweet)** and creates a **dictionary** containing the **tweet’s `screen_name`, `text` and `location`**
* For the tweet’s text, we try to use the `full_text` property of an `extended_tweet` 

```python
def get_tweet_content(tweet, location=False):
    """Return dictionary with data from tweet (a Status object)."""
    fields = {}
    fields['screen_name'] = tweet.user.screen_name

    # get the tweet's text
    try:  
        fields['text'] = tweet.extended_tweet.full_text
    except: 
        fields['text'] = tweet.text

    if location:
        fields['location'] = tweet.user.location

    return fields

```

### `get_geocodes` Utility Function 
* Receives a list of dictionaries containing tweets and **geocodes their locations**
* If geocoding is successful for a tweet, adds the **latitude** and **longitude** to the tweet’s **dictionary in `tweet_list`**
* Requires class **`OpenMapQuest`** from the **geopy module**

```python
from geopy import OpenMapQuest
```

```python
def get_geocodes(tweet_list):
    """Get the latitude and longitude for each tweet's location.
    Returns the number of tweets with invalid location data."""
    print('Getting coordinates for tweet locations...')
    geo = OpenMapQuest(api_key=keys.mapquest_key)  # geocoder
    bad_locations = 0  

    for tweet in tweet_list:
        processed = False
        delay = .1  # used if OpenMapQuest times out to delay next call
        while not processed:
            try:  # get coordinates for tweet['location']
                geo_location = geo.geocode(tweet['location'])
                processed = True
            except:  # timed out, so wait before trying again
                print('OpenMapQuest service timed out. Waiting.')
                time.sleep(delay)
                delay += .1

        if geo_location:  
            tweet['latitude'] = geo_location.latitude
            tweet['longitude'] = geo_location.longitude
        else:  
            bad_locations += 1  # tweet['location'] was invalid
    
    print('Done geocoding')
    return bad_locations

```

### `get_geocodes` Utility Function (cont.)
* Creates the **`OpenMapQuest` object** we’ll use to geocode locations
* Initializes **`bad_locations`** which we use to keep track of the number of invalid locations in the tweet objects we collected
* Attempts to **geocode the current tweet’s location**
* Prints a message that it’s done geocoding and returns the `bad_locations` value

## 13.15.3 Class `LocationListener`
```python
# locationlistener.py
"""Receives tweets matching a search string and stores a list of
dictionaries containing each tweet's screen_name/text/location."""
import tweepy
from tweetutilities import get_tweet_content

class LocationListener(tweepy.StreamListener):
    """Handles incoming Tweet stream to get location data."""
```

```python
    def __init__(self, api, counts_dict, tweets_list, topic, limit=10):
        """Configure the LocationListener."""
        self.tweets_list = tweets_list
        self.counts_dict = counts_dict
        self.topic = topic
        self.TWEET_LIMIT = limit
        super().__init__(api)  # call superclass's init
```

```python
    def on_status(self, status):
        """Called when Twitter pushes a new tweet to you."""
        # get each tweet's screen_name, text and location
        tweet_data = get_tweet_content(status, location=True)  

        # ignore retweets and tweets that do not contain the topic
        if (tweet_data['text'].startswith('RT') or
            self.topic.lower() not in tweet_data['text'].lower()):
            return

        self.counts_dict['total_tweets'] += 1  # original tweet

        # ignore tweets with no location 
        if not status.user.location:  
            return

        self.counts_dict['locations'] += 1  # tweet with location
        self.tweets_list.append(tweet_data)  # store the tweet
        print(f'{status.user.screen_name}: {tweet_data["text"]}\n')
        
        # if TWEET_LIMIT is reached, return False to terminate streaming
        return self.counts_dict['locations'] != self.TWEET_LIMIT

```

## 13.15.3 Class `LocationListener` (cont.)
* Method `on_status`:
    * Calls `get_tweet_content` to get the screen name, text and location of each tweet.
    * Ignores the tweet if it is a retweet or if the text does not include the topic we’re searching for—we’ll use only original tweets containing the search string.
    * Adds 1 to the value of the `'total_tweets'` key in the `counts` dictionary to track the number of original tweets we process.
    * Ignores tweets that have no location data.
    * Adds 1 to the value of the `'locations'` key in the `counts` dictionary to indicate that we found a tweet with a location.
    * Appends to the `tweets_list` the `tweet_data` dictionary that `get_tweet_content` returned.
    * Displays the tweet’s screen name and tweet text 
    * Checks whether the `TWEET_LIMIT` has been reached and, if so, returns `False` to terminate the stream.

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  