# Geolocated tweet listener

This notebook defines and deploys a listener for geolocated tweets - those that have either a latitude/longitude coordinate, or a Twitter "Place", e.g., "Denver, CO" and associated bounding box information. Geolocated tweets are captured at random, and then sent to an S3 bucket for storage. 

In [None]:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import time
import pandas as pd
import os
import json
import boto3

This `S3Listener` class is the workhorse, and inherits from the `tweepy.streaming.StreamListener` class. This was adapted from: https://github.com/Ccantey/GeoSearch-Tweepy

In [None]:
class S3Listener(StreamListener):
    """A listener that siphons tweets to an S3 bucket."""
    def __init__(self, tweet_dir, bucketname, sleep_sec=5):
        """Initialize a listener

        Args:
            tweet_dir (str): local directory to store tweets
            bucketname (str): s3 bucket name to push tweets
            sleep_sec (int): number of seconds to wait between pushes
        """
        super().__init__()
        self.s3 = boto3.resource('s3')
        self.bucketname = bucketname
        self.tweet_dir = tweet_dir
        self.sleep_sec = sleep_sec
        if not os.path.exists(tweet_dir):
            os.makedirs(tweet_dir)

    def on_status(self, status):
        """Instructions for managing an incoming tweet.

        If a tweet has a lat/lon or "place", push to S3.
        """
        time.sleep(self.sleep_sec)
        has_geo = status.geo is not None
        has_place = status.place is not None
        if has_geo or has_place:
            self._tweet_to_s3(status)

    def on_exception(self, exception):
        """Print exceptions when the arise."""
        print(exception)
        return

    def _tweet_to_s3(self, status):
        """Send a tweet to S3."""
        id_str = status.id_str + ".json"
        destfile = os.path.join(self.tweet_dir, id_str)
        with open(destfile, 'w') as outfile:
            json.dump(status._json, outfile)
        self.s3.meta.client.upload_file(destfile,
                                        self.bucketname,
                                        destfile)
        os.remove(destfile)

The next cell deals with authentication required to push to the S3 bucket. The user provides their credentials in a `creds.csv` file.

In [None]:
creds = pd.read_csv('creds.csv')
consumer_key = creds.consumer_key.values[0]
consumer_secret = creds.consumer_secret.values[0]
access_token = creds.access_token.values[0]
access_token_secret = creds.access_token_secret.values[0]

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

Below, a stream object is defined and the `filter` method is called, which runs until failure and stores tweets in the S3 bucket. 

In [None]:
listener = S3Listener('data', 'earthlab-geolocated-tweets')
stream = Stream(auth, listener)

In [None]:
stream.filter(locations=[-125,25,-65,48])