# Getting Tweets From Twitter Streaming API

In this notebook we will use the library Tweepy to access Twitter API. Tweepy is a python library that facilitate accessing the API. To get started you have to apply for a developer's account to get keys and access token. For more details consult this page: https://towardsdatascience.com/tweepy-for-beginners-24baf21f2c25

In [81]:
import tweepy

In [82]:
twitter_keys = {
    'consumer_key':        '',
    'consumer_secret':     '',
    'access_token_key':    '',
    'access_token_secret': ''
}

Let's try a simple experiment, we will import the tweets from my(your) timeline...

In [83]:
#Setup access to API
auth = tweepy.OAuthHandler(twitter_keys['consumer_key'], twitter_keys['consumer_secret'])
auth.set_access_token(twitter_keys['access_token_key'], twitter_keys['access_token_secret'])

api = tweepy.API(auth)

#Make call on home timeline, print each tweets text
public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)

Drought and frosts in Brazil may be starting a major price correction that shifts the coffee market upwards for yea… https://t.co/1gymoEsTUx
RT @trudymorgancole: #hfxfullycommitted was an amazing show by an incredible performer -- if you have a chance to see it before it closes,…
RT @DowntownHalifax: We've teamed up again with @PrismaticArts on another giveaway! 2 lucky winners will receive 2 tickets to any performan…
Pinterest has hired Martin Galvin, GroupM’s U.K. commercial strategy director, to lead its commercial partnerships… https://t.co/vgSgIAObU0
Holiday savings for customers start now! Check out how we're kicking off Black Friday-worthy deals, in addition to… https://t.co/oZyL1no79h
RT @TwitCoast: Breton Lalama began preparing for his latest role—a starring turn in @NeptuneTheatre's season-opening play Fully Committed,…
RT @Forbes: Here's everything we know so far about the Apple AirPods 3: https://t.co/odYSJ6BWUt https://t.co/74k15ZsChb
هذا العالم سيفنى بسبب جرعة زائدة من ال

Now let's see how a single tweet looks like..

In [None]:
public_tweets[0]

Oh..that's hard to examine..

For better readability try this..

In [85]:
import json 

status = public_tweets[1]

#convert to string
json_str = json.dumps(status._json)

#deserialise string into python object
parsed = json.loads(json_str)

print(json.dumps(parsed, indent=4, sort_keys=True))

{
    "contributors": null,
    "coordinates": null,
    "created_at": "Mon Oct 04 14:15:42 +0000 2021",
    "entities": {
        "hashtags": [
            {
                "indices": [
                    21,
                    39
                ],
                "text": "hfxfullycommitted"
            }
        ],
        "symbols": [],
        "urls": [],
        "user_mentions": [
            {
                "id": 45576341,
                "id_str": "45576341",
                "indices": [
                    3,
                    19
                ],
                "name": "Trudy Morgan-Cole",
                "screen_name": "trudymorgancole"
            }
        ]
    },
    "favorite_count": 0,
    "favorited": false,
    "geo": null,
    "id": 1445029867150577667,
    "id_str": "1445029867150577667",
    "in_reply_to_screen_name": null,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_s

This also help you understand the structure of a tweet, so you can decide which fields you need in your application.

## Getting More Data

We've seen a simple example, now we want to get real data and store them in a file for later processing. Define a file and open it for writing 'w', or append 'a'.

In [86]:
fName = 'sampledata.txt' # We'll store the tweets in a text file.
f = open(fName, 'w')

### Next, we have to import few more libraries.

**StreamListener**: the on_data method of Tweepy’s StreamListener passes data from statuses to the on_status method. The class StdOutListener inheriting from StreamListener and overriding on_status. <br>
**OAuthHandler**: as the name implies, this object handles authentication. Into this we pass our consumer key and secret. <br>
**Stream**: establishes a streaming session and routes messages to StreamListener instance.  <br>
**Time**: provides various time-related functions. <br>
**jsonpickle**: is a Python library for serialization and deserialization of complex Python objects to and from JSON. JSON is a format that encodes objects in a string. Serialization aims to convert an object into that string, and deserialization is its inverse operation (convert string back to object).

In [87]:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import time
import jsonpickle

Here, we create a class with methods to handle the streaming API, and errors

In [88]:
class StdOutListener(StreamListener):

    """ A listener handles tweets that are the received from the stream.
    """
    def __init__(self):
        super().__init__()
        self.max_tweets = 500
        self.tweet_count = 0
    
    def on_status(self, status):
        f.write(jsonpickle.encode(status._json, unpicklable=False) + '\n')
        print (status.user.id, '-', status.text) 
        self.tweet_count+=1
        if(self.tweet_count==self.max_tweets):
            print("completed")
            return False
        
    def on_error(self, status_code):
      if status_code == 420:
        #returning False in on_data disconnects the stream
        return False

    def on_timeout(self):
        print (sys.stderr, 'Timeout...')
        # sys: System-specific parameters and functions, stderr: error returned
        return True # Don't kill the stream

You can optionally add keywords to filter the stream, otherwise you will get random tweets.¶

In [89]:
filterkw = ["موسم الرياض", "كورونا"]

In [None]:
l = StdOutListener()
auth = OAuthHandler(twitter_keys['consumer_key'], twitter_keys['consumer_secret'])
auth.set_access_token(twitter_keys['access_token_key'], twitter_keys['access_token_secret'])
stream = Stream(auth, l)
stream.filter(track = filterkw)