# Problem Statement

You have to write a python script which can fetch all the tweets(as many as allowed by Twitter
API) done by midas@IIITD twitter handle and dump the responses into JSONlines file.<br>
The other part of your script should be able to parse these JSONline files to display the
following for every tweet in a tabular format.

<ul style="list-style-type:disc;">
  <li>The text of the tweet.</li>
  <li>Date and time of the tweet.</li>
  <li>The number of favorites/likes.</li>
  <li>The number of retweets.</li>
  <li>Number of Images present in Tweet. If no image returns None.</li>
</ul>

# Solution

In [41]:
import tweepy           #Tweepy supports accessing Twitter via OAuth.It provides access to the well documented Twitter API.
import csv              #csv module to store the tweets in tabular format(csv)
import sys              #System-specific parameters and functions
import jsonlines        #Python library to simplify working with jsonlines and ndjson data.
import pandas as pd     #Pandas library to read csv

In [42]:
#Authorization Credential

In [43]:
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

In [44]:
def get_all_tweets(screen_name):
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)
    alltweets = []
    new_tweets = api.user_timeline(screen_name = screen_name,count=200,tweet_mode='extended')
    alltweets.extend(new_tweets)
    oldest = alltweets[-1].id - 1
    while len(new_tweets) > 0:
        new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest)
        alltweets.extend(new_tweets)
        oldest = alltweets[-1].id - 1
        print ("...%s tweets downloaded so far" % (len(alltweets)))
    #dump tweets in jsonline file
    fp = "tweets.jsonl"
    with jsonlines.open('tweets.jsonl', mode='w') as writer:
        for x in alltweets:  
             writer.write(x._json)

In [45]:
#function to convert json file to csv file
def table1(reader):
    outtweets = []
    for t in reader:
                #not all tweets will have media url, so lets skip them               
               if "extended_entities" in t.keys():
                   pr = len(t['extended_entities']['media'])
               else:
                    pr = "None"
               if "full_text" not in t.keys():
                     continue
               outtweets.append([t['created_at'],t['full_text'].encode('utf'),t['favorite_count'],t['retweet_count'],pr])
            
            
    with open('%s_tweets.csv' % "midas", 'w') as f:
        writer = csv.writer(f)
        writer.writerow(["Post Date and Time","Text","Favorite_count","Retweet_count","Images_count"])
        writer.writerows(outtweets)
    

In [46]:
if __name__ == '__main__':
    #pass in the username of the account you want to download
    get_all_tweets("midasiiitd")
    #tweets stored in csv file named midasiiitd_tweets.csv
    

...343 tweets downloaded so far
...343 tweets downloaded so far


In [47]:
reader = jsonlines.open('tweets.jsonl', mode='r')

table1(reader)
reader.close()

In [49]:
df = pd.read_csv("midas_tweets.csv",header=0)
df.head(150)


Unnamed: 0,Post Date and Time,Text,Favorite_count,Retweet_count,Images_count
0,Wed Apr 10 04:51:26 +0000 2019,RT @IIITDelhi: Applications open for MTech (CB...,0,1,
1,Tue Apr 09 16:45:07 +0000 2019,RT @IIITDelhi: We are delighted to share that ...,0,13,
2,Tue Apr 09 05:04:27 +0000 2019,RT @Harvard: Professor Jelani Nelson founded A...,0,35,
3,Tue Apr 09 05:04:11 +0000 2019,RT @emnlp2019: For anyone interested in submit...,0,16,
4,Mon Apr 08 19:38:09 +0000 2019,RT @multimediaeval: Announcing the 2019 MediaE...,0,15,
5,Mon Apr 08 07:08:12 +0000 2019,"Many Congratulations to @midasIIITD student, S...",18,2,1
6,Mon Apr 08 03:27:42 +0000 2019,@midasIIITD thanks all students who have appea...,5,0,1
7,Sun Apr 07 14:17:29 +0000 2019,"@himanchalchandr Meanwhile, complete CV/NLP ta...",0,0,
8,Sun Apr 07 14:17:09 +0000 2019,@sayangdipto123 Submit as per the guideline ag...,0,0,
9,Sun Apr 07 11:43:24 +0000 2019,We request all students whose interview are sc...,1,1,
