In [53]:
import pandas as pd

## Part 1

You have to write a python script which can fetch all the tweets(as many as allowed by Twitter API) done by midas@IIITD twitter handle and dump the responses into JSONlines file.

Importing the necessary libraries i.e. tweepy (for retrieving tweets) and jsonlines (for storing the tweets information).

In [54]:
import tweepy
import jsonlines

In [55]:
consumer_key = ''  
consumer_secret = ''
access_token = ''
access_token_secret = ''

Next, we import the authentication handler.

In [56]:
from tweepy import OAuthHandler

Here, we define a function that retrieves tweets from any user by providing their screen name. We authenticate by creating an OAuthHandler instance. Next, we take an empty list tweets_data. To fetch all the tweets, we use pagination using cursor objects. We provide api.user_timeline to fetch tweets from the timeline of the user, tweet mode is taken as extended to get the full text of the tweets. The details of the fetched tweet is appended to tweets_data. The function returns the list.

In [57]:
def get_all_tweets(screen_name):
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret) #App level authorization
    auth.set_access_token(access_token, access_token_secret) #User level authorization
    api = tweepy.API(auth) #Move the auth variable in the API
    tweets_data = [] #Empty lisy
    for status in tweepy.Cursor(api.user_timeline, screen_name=screen_name, tweet_mode='extended').items(): 
        #Fetches all tweets from user's timeline
        tweets_data.append(status) #Adds the fetched tweet to the empty list
    return tweets_data #Returns list which contains the details of all tweets

In [58]:
tweets_data = get_all_tweets("midasIIITD")
#Calling the function and storing the result in tweets_data list

Now, we write all the contents of tweets_data into a jsonline file.

In [60]:
#Writes all the tweet statuses to jsonline file
with jsonlines.open('tweets_data.jsonl', 'w') as writer:
    for tweet in tweets_data:
        writer.write(tweet._json)

## Part 2

The other part of your script should be able to parse these JSONline files to display the
following for every tweet in a tabular format.

● The text of the tweet.

● Date and time of the tweet.

● The number of favorites/likes.

● The number of retweets.

● Number of Images present in Tweet. If no image returns None.

We initialise an emmpty list tweet_details and read the contents of the jsonline file to the empty list.

In [61]:
#Opens the jsonline file to retrieve the stored tweet statuses
tweet_details = []
with jsonlines.open('tweets_data.jsonl') as reader:
    for obj in reader:
        tweet_details.append(obj)

We initialise an empty dictionary required_tweet_details to store the required features of the tweet. We loop for every tweet we have collected and store the collected features in a temporary dictionary, temp_dict. We use try and except to find if the tweet has images or not. Then we add the temp_dict to required_tweet_details. 

In [104]:
required_tweet_details = {}
count = 1
for tweet in tweet_details:
    temp_dict = {}
    temp_dict['date_and_time'] = tweet['created_at'] 
    temp_dict['text'] = tweet['full_text']
    temp_dict['no_of_favorites'] = tweet['favorite_count']
    temp_dict['no_of_retweets'] = tweet['retweet_count']
    try:
        if tweet['extended_entities']:
            temp_dict['images_in_tweet'] = len(tweet['extended_entities']['media'])
    except:
        temp_dict['images_in_tweet'] = None
    required_tweet_details[count] = temp_dict
    count += 1

In [115]:
required_tweet_details[1] #Example of the stored tweet details

{'date_and_time': 'Sun Apr 07 06:55:19 +0000 2019',
 'text': 'Other queries: "none of the Tweeter Apis give the correct count of favorites tested for most of them, all give the wrong count. same is true for retweet. this mostly happens if the no. of likes, retweet is very large. So, what shld be done?"\nAns: Just use the count given by API.',
 'no_of_favorites': 3,
 'no_of_retweets': 2,
 'images_in_tweet': None}

To present the collected data in a tabular format, we create a dataframe from required_tweet_details using pandas. We then transpose the created dataframe.

In [116]:
dfObj = pd.DataFrame(required_tweet_details)
dfObj = dfObj.transpose()

This is the data in the required format.

In [117]:
dfObj.head()

Unnamed: 0,date_and_time,images_in_tweet,no_of_favorites,no_of_retweets,text
1,Sun Apr 07 06:55:19 +0000 2019,,3,2,"Other queries: ""none of the Tweeter Apis give ..."
2,Sun Apr 07 06:53:38 +0000 2019,,3,1,"Other queries: ""do we have to make two differe..."
3,Sun Apr 07 05:32:27 +0000 2019,,4,1,"Other queries: ""If using Twitter api, it does ..."
4,Sun Apr 07 05:29:40 +0000 2019,,6,1,Response to some queries asked by students on ...
5,Sat Apr 06 17:11:29 +0000 2019,,0,2,RT @kdnuggets: Top 8 #Free Must-Read #Books on...
