# References

## [Tweepy Documentation v3.6.0](http://docs.tweepy.org/en/v3.6.0/index.html)

- [Streaming With Tweepy](http://docs.tweepy.org/en/v3.6.0/streaming_how_to.html)

The Twitter streaming API is used to download twitter messages in real time. It is useful for obtaining a high volume of tweets, or for creating a live feed using a site stream or user stream. See the [Twitter Streaming API Documentation](https://developer.twitter.com/en/docs/tweets/filter-realtime/overview).

The streaming api is quite different from the REST api because the REST api is used to pull data from twitter but the streaming api pushes messages to a persistent session. This allows the streaming api to download more data in real time than could be done using the REST API.

In Tweepy, an instance of **tweepy.Stream** establishes a streaming session and routes messages to **StreamListener** instance. The **on_data** method of a stream listener receives all messages and calls functions according to the message type. The default **StreamListener** can classify most common twitter messages and routes them to appropriately named methods, but these methods are only stubs.

Therefore using the streaming api has three steps.

1. Create a class inheriting from StreamListener
2. Using that class create a Stream object
3. Connect to the Twitter API using the Stream.

## [Twitter Developer platform Docs](https://developer.twitter.com/en/docs)

Twitter’s developer platform offers several tools and APIs. Twitter’s basic REST and Streaming APIs enable free access to numerous endpoints. 

The data provided by Twitter APIs are made up of data objects and their attributes rendered in JavaScript Object Notation (JSON). **To learn more about Tweet metadata, see this** [introduction to Tweet JSON objects](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/intro-to-tweet-json).

The following documentation provide 'data dictionaries' to help you understand the many attributes that make up Twitter Tweets, Users and other objects.

Tweet object
Tweets are the basic atomic building block of all things Twitter. [Click here](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object) **to learn more about the Tweet object and its data fields**.

User object
The user object contains public Twitter account metadata and describes the author of the Tweet. Click here to learn more.

Twitter entities 
The entities object encompasses common Tweet elements such hashtags, urls, mentions, and even polls. [Click here](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/entities-object) to learn more. 

Twitter extended entities
The extended entities is the go-to object for working with native Twitter media. Native media includes the types of media you can 'attach' while composing a Tweet. This includes up to four photos, or a single video or single animated GIF. Click here to learn more.

Geospatial objects
When posting Tweets, users have the option to geotag their Tweet with an exact location or a Twitter Place. User accounts can also have geospatial metadata associated with them. [Click here](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/geo-objects) to learn more.

## GitHub

### Tweepy

[tweepy/tweepy/streaming.py](https://github.com/tweepy/tweepy/blob/78d2883a922fa5232e8cdfab0c272c24b8ce37c4/tweepy/streaming.py)  

[tweepy/examples/streaming.py](https://github.com/tweepy/tweepy/blob/78d2883a922fa5232e8cdfab0c272c24b8ce37c4/examples/streaming.py)

## Stack Overflow

[What is the difference between on_data and on_status in the tweepy library?](https://stackoverflow.com/questions/31054656/what-is-the-difference-between-on-data-and-on-status-in-the-tweepy-library)
`on_data()` handles: replies to statuses, deletes, events, direct messages, friends, limits, disconnects and warnings
whereas, `on_status()` just handles statuses. 

If you're only concerned with tweets, use `on_status()`. This will give you what you needed without the added information and doing so will not hinder your limit. If you want detailed information use `on_data()`. That's rarely the case unless you're doing heavy analysis.

In [None]:
import tweepy
import csv

In [None]:
# assign consumer and access variables imported from config.py
import config

consumer_key = config.twitter_anidata_consumer_key
consumer_secret = config.twitter_anidata_consumer_secret
access_token = config.twitter_anidata_access_token
access_token_secret = config.twitter_anidata_access_token_secret

# create an OAuthHandler instance
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth) 

In [None]:
class TwitterAuthenticator():
    '''
    Class for authenticating Twitter app. 
    
    The consumer and access variables must have been previously assigned elsewhere.
    '''
    
    def authenticate_twitter_app(self):
        auth = OAuthHandler(consumer_key, consumer_secret)        # Create the authentication object
        auth.set_access_token(access_token, access_token_secret)  # Set access token and secret
        return auth
    

    
class TwitterListenerToFile(tweepy.StreamListener):
    '''
    Class for listening to a Twitter stream and routing output to a file, 
    inheriting from Tweepy's StreamListener and overriding on_status.
    A listener handles tweets are the received from the stream. 
    The on_data method of Tweepy’s StreamListener conveniently passes data 
    from statuses to the on_status method.
    
    This class listens to the stream and routs the data to a file.
    '''
    # constructor
    
    def __init__(self, fetched_tweets_filename): 
        self.fetched_tweets_filename = fetched_tweets_filename
    
    #override tweepy.StreamListener to add logic to on_status
    
    def on_status(self, status):    
        try:
            print(status)
            with open(self.fetched_tweets_filename, 'a') as tf:
                tf.write(status)
            return True             # return True to make sure everything went well
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True  
    
    def on_error(self, status):
        if status == 420:
            # Returning False on_data method in case rate limit occurs.
            return False
        print(status)
    
    
class TwitterStreamer():
    
    """
    Class for streaming live tweets.
    
    A number of twitter streams are available through Tweepy. Most cases will use filter, the user_stream, or the sitestream. 
    """
    
    def __init__(self):
        self.twitter_authenticator = TwitterAuthenticator()
    
    def stream_tweets(self, fetched_tweets_filename, search_terms):
        #This handles twitter authentication and the connection to the twitter streaming API
        
        listener = TwitterListener(fetched_tweets_filename) # Create and object of class TwitterListener
        auth = self.twitter_authenticator.authenticate_twitter_app()
        stream = Stream(auth, listener)
        
        # filter the tweets using filter method provided by stream class
        stream.filter(track=search_terms)
        
        


In [None]:
if __name__ == "__main__":
    
    # Define search terms
    search_terms = ['atlanta', 'social good']
    
    # Define output file name
    fetched_tweets_filename = "tweets.json"
    
    # Define TwitterStreamer object
    twitter_streamer = TwitterStreamer()
    
    twitter_streamer.stream_tweets(fetched_tweets_filename, search_terms)
    

# Capture the Stream

Note: It appears that `if not hasattr(status, 'retweeted_status'):` is a more accurate way of filtering out retweets than `if status.retweeted == False:`. It would be preferrable to determine this explictly, rather than emperically. Empirically, using the latter seemed to include some suspected retweets in the results. Specifically there was a popular retweeet involving Tom Brady's super bowl history which kept creeping in with the latter, but not the former. 

Also note, with regard to the former, that merely using `if not status.retweeted_status` generates an error when the `retweeted_status` attibute does not exist. 

In [None]:
### Create a StreamListener ###

# override tweepy.StreamListener to add logic to on_status

class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        # https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
        #if status.retweeted == False:
            #print("retweeted = ", status.retweeted, ": ", status.user.screen_name, ": ", status.text)
        
        #https://stackoverflow.com/questions/610883/how-to-know-if-an-object-has-an-attribute-in-python
        if not hasattr(status, 'retweeted_status'):
            print("retweeted = ", status.retweeted, ": ", status.user.screen_name, ": ", status.text)
            
        
    def on_error(self, status_code):
        if status_code == 420:
            #returning False in on_error disconnects the stream
            return False

        # returning non-False reconnects the stream, with backoff.
        
# Create a Stream object of class MyStreamListener

myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener, lang='en')

# Start a Stream

myStream.filter(track=['atlanta'])

# Store the Data