## **Installing Libraries**

Tweepy: Python library for accessing the Twitter API.

Pandas : For CSV Handling.

In [1]:
!pip install tweepy==3.3.0
!pip install pandas

Collecting tweepy==3.3.0
  Downloading https://files.pythonhosted.org/packages/cd/8b/dfeeb1057d37af810161f8206aa5a6e6e9351dd87ea6f84e4f87d7e3c8a3/tweepy-3.3.0-py2.py3-none-any.whl
Installing collected packages: tweepy
  Found existing installation: tweepy 3.6.0
    Uninstalling tweepy-3.6.0:
      Successfully uninstalled tweepy-3.6.0
Successfully installed tweepy-3.3.0


## **Importing Libraries**

API: For calling Twiiter API.

OAuthHandler: For accessing  twitter (Log-in).

Cursor: For pagination loop.

In [0]:
from tweepy import API
from tweepy import OAuthHandler 
from tweepy import Cursor
import pandas as pd
from IPython.display import HTML
import json
import os

## **Twitter Client:**
For setting up login to twitter.

In [0]:
def twitter_client():
  """
    auth:Authentication is handled by the tweepy.AuthHandler class.
    client:A wrapper for the API provided by Twitter.
  """
  auth=OAuthHandler(consumer_key,consumer_secret)
  auth.set_access_token(access_token,access_secret)
  client=API(auth,wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
  print("Account Verified!")
  return client

## **Get Tweets:**
Gets All tweets from timeline of a user as pages.Then we iterate over each status in every page and dump them as JSON files in JSONL file.

In [0]:
def get_tweets(username="midasIIITD",fName="./tweets_midas.jsonl"):
  """
    username: Account whose Timeline is used for data extraction.
    fName: Name of file where tweets will be stored (With Location).
    client: A wrapper for the API provided by Twitter.
    tweet_mode=extended:To obtain Full Text
    Cursor:To iterate over paginted results from API.
    page:Refers to a single page from Cursor
  """
  client=twitter_client()
  count=0
  with open(fName,"w") as f:
      for page in Cursor(client.user_timeline,screen_name=username,tweet_mode='extended').pages():
        count+=1
        for status in page:
          f.write(json.dumps(status._json)+"\n")
        completed=(count/len(list( Cursor(client.user_timeline,screen_name=username).pages())))*100
        print(8*'*',' Fetching Tweets, Completed : ','{0:.2f}'.format(completed),' ',8*'*')
  print(8*'*',' Fetching Tweets Completed ',8*'*')

## **Get Text:**
From a json file,it obtains the string under full_text .
Return Text of a tweet.

In [0]:
def get_text(tweet):
  """full_text:A String of text in tweet.
  """
  full_text=tweet.get('full_text',{})
  return full_text

## **Get Retweets:**
From a json file,it obtains the value under retweet_count . Return number of retweets of a tweet.

In [0]:
def get_retweets(tweet):
  """
    rt_count: No of retweets of a tweet.
  """
  rt_count=tweet.get('retweet_count',{})
  return rt_count

## **Get Likes:**
From a json file,it obtains the value under favorite_count . Return number of favorites of a tweet.

In [0]:
def get_likes(tweet):
  """
   nb_fav: No of favorites of a tweet.
  """
  nb_fav=tweet.get('favorite_count',{})
  return nb_fav

## **Get Date Time**
From a json file,it obtains the value under created_at . Return number of date and time of a tweet.

In [0]:
def get_date_time(tweet):
  """
    date_time:Date and time in dd month time +WXYZ year format.
    date:Date in dd month year format.
    time: Time in hh:mm:ss format.
  """
  date_time=tweet.get('created_at',{})
  date=date_time[4:-20]+date_time[-5:]
  time=date_time[-19:-11]
  return date,time

## **Get Nb Images**
From a json file,it obtains the value under media under entities.If type of media is photo then count is incremented.

Return number of favorites of a tweet.

In [0]:
def get_nb_images(tweet):
  """
    count:no. of photos in the tweet.
    entities:List of various lists (Entities of tweets like media,url etc)  
    media:A List containing details of media (including type/neccessary for checking if media is photo)
  """
  count=0
  entities=tweet.get('entities',{})
  if 'media' in entities:
    media=entities['media']
    for media_type in media:
      if media_type['type']=="photo":
        count+=1
    return count
  return None

## **DF Format:**
Returns an empty pandas dataframe containing columns Text, Date and Time, No. of Favorites, No. of Retweets, No. of Images in tabular.

In [0]:
def df_format():
  """
    column_names:List of column names for dataframe
    df_form:Empty pandas dataframe with column_name
  """
  column_names=['Text','Date and Time','No. of Favorites','No. of Retweets','No. of Images']
  df_form=pd.DataFrame(columns = column_names)
  return df_form,column_names

## **Fill Dataset:**
Returns pandas dataframe containing columns Text, Date and Time, No. of Favorites, No. of Retweets, No. of Images in tabular filled/appended with corresponding details of the columns.

In [0]:
def fill_dataset(df,column_name,text,rt_count,nb_likes,date,time,nb_images):
  """
    df:Dataframe to be filled/appended with the new row.
    column_name:List of columns of Dataframe.
    rt_count:Retweet count
    nb_like:no of favorites.
    date,time:Date and Time
    nb_images:No. of images
  """
  df=df.append({column_name[0]:text,column_name[1]:date+" "+time,column_name[2]:nb_likes,column_name[3]:rt_count,column_name[4]:nb_images},ignore_index=True)
  return df
  

## **Clean Hashtags**
Return text without hashtags.

In [0]:
def clean_hashtags(text):
  """
    Returns text striped of words that start with '#' 
  """
  return " ".join(filter(lambda x:x[0]!='#', text.split()))

## **Clean URLS**
Return text without any urls.

In [0]:
def clean_url(text):
  """
    new_text:Text without word that start with http
    returns text wothout words/strings that start with http or www
  """
  new_text= " ".join(filter(lambda x:x[0:4]!='http', text.split()))
  return  " ".join(filter(lambda x:x[0:3]!='www', new_text.split()))

## **Clean Text**
Return text without any line breaks.If required hashtags and/or urls can also be removed.

In [0]:
def clean_text(text,remove_at=False,remove_hash=False,remove_url=False):
  """
    text:text to be cleaned.
    remove_at:True if user refernces are to be replaced with only userid(without @).By default false.
    remove_hash:True if hashtags are to be removed.By default false.
    remove_url:True if urls are to be removed.By default false.
    new_text:List of words cleaned of selected deliminaters.
  """
  new_text=[]
  delim='\n'
  if remove_at:
    delim+=' @'
  if remove_hash:
    text=clean_hashtags(text)
  if remove_url:
    text=clean_url(text)
  for line in text:
    if not line.strip(delim):
      new_text.append(" ")
    else:
      new_text.append(line)
    
  return "".join(new_text)


## **Create Dataset:**
Creates Submision CSV that contains tweets and details of tweets in a tabular format containg columns Text, Date and Time, No. of Favorites, No. of Retweets, No. of Images and tweets arranged from new to old.

In [0]:
def create_dataset(fName="./tweets_midas.jsonl",remove_at=False,remove_hash=False,remove_url=False):
  """
    fName:File Containg all the tweets.
    remove_at:True if user refernces are to be replaced with only userid(without @).By default false.
    remove_hash:True if hashtags are to be removed.By default false.
    remove_url:True if urls are to be removed.By default false.
    column_name:List of columns of Dataframe.
    rt_count:Retweet count
    nb_like:no of favorites.
    date,time:Date and Time
    nb_images:No. of images.
    Line:A single JSON file containing a single tweet.
  """
  dataset_csv,column_names=df_format()
  with open(fName,'r') as f:
    for line in f:
      tweet=json.loads(line)
      text=get_text(tweet)
      text=clean_text(text,remove_at,remove_hash,remove_url)
      rt_count=get_retweets(tweet)
      nb_likes=get_likes(tweet)
      date,time=get_date_time(tweet)
      nb_images=get_nb_images(tweet)
      dataset_csv=fill_dataset(dataset_csv,column_names,text,rt_count,nb_likes,date,time,nb_images)
  dataset_csv.to_csv("./submision.csv",index=False) 
  f.close()
  

## **Display Tweets:**
Prints All tweets in a tabular manner with following details of tweets filled in columns  Text, Date and Time, No. of Favorites, No. of Retweets, No. of Images.

In [0]:
def display_tweets(fName='submision.csv'):
  """
    fName:File Containg all the tweets in CSV format.
  """

  if os.path.exists("./submision.csv"):
    pd.options.display.max_rows
    pd.set_option('display.max_colwidth', -1)
    disp=pd.read_csv('submision.csv')
    display(HTML(disp.to_html(classes='table table-striped')))
  else:
    print("File Error:CSV File Doesn't exist. Please Run get_tweets and then create_dataset and then try again.")

## **Fetch And Display Tweets:**
Calls respective functions for fetching and printing tweets.

Note: Hashtags,URLs,and user reference can be removed by changing boolean value of display tweets.

In [17]:
if __name__=='__main__':
  """
    remove_at:True if user refernces are to be replaced with only userid(without @).By default false.
    remove_hash:True if hashtags are to be removed.By default false.
    remove_url:True if urls are to be removed.By default false.
    username: Account whose Timeline is used for data extraction.
    fName: Name of file where tweets will be stored (With Location).
  """
  print('Enter Consumer Key')
  consumer_key=input()
  print('Enter Consumer Secret')
  consumer_secret=input()
  print('Enter Access Token')
  access_token=input()
  print('Enter Access Secret')
  access_secret=input()
  get_tweets(username="midasIIITD",fName="./tweets_midas.jsonl")
  create_dataset(fName="./tweets_midas.jsonl",remove_at=False,remove_hash=False,remove_url=False)
  display_tweets(fName="./submision.csv")
  print(8*'*',"Process Finished",8*'*')

Enter Consumer Key
Enter Consumer Secret
Enter Access Token
Enter Access Secret
Account Verified!
********  Fetching Tweets, Completed :  6.25   ********
********  Fetching Tweets, Completed :  12.50   ********
********  Fetching Tweets, Completed :  18.75   ********
********  Fetching Tweets, Completed :  25.00   ********
********  Fetching Tweets, Completed :  31.25   ********
********  Fetching Tweets, Completed :  37.50   ********
********  Fetching Tweets, Completed :  43.75   ********
********  Fetching Tweets, Completed :  50.00   ********
********  Fetching Tweets, Completed :  56.25   ********
********  Fetching Tweets, Completed :  62.50   ********
********  Fetching Tweets, Completed :  68.75   ********
********  Fetching Tweets, Completed :  75.00   ********
********  Fetching Tweets, Completed :  81.25   ********
********  Fetching Tweets, Completed :  87.50   ********
********  Fetching Tweets, Completed :  93.75   ********
********  Fetching Tweets, Completed :  100.00  

Unnamed: 0,Text,Date and Time,No. of Favorites,No. of Retweets,No. of Images
0,Congratulations @midasIIITD students Simra Shahid @Simcyy and Nilay Shrivastava @NilayShri on getting selected for a research internship at Adobe in this summer. #MIDAS #Achievment #Research #Summer #Internship https://t.co/WdF663EB5y,Mar 25 2019 13:01:57,7,1,1.0
1,"The last date for submitting a solution for the @midasIIITD internship task is 26th March midnight. We will not accept solutions submitted after the deadline. Thus, if you have not submitted your solution yet then kindly do so before the deadline. #Summer #Research #Internship",Mar 24 2019 18:44:01,8,3,
2,RT @IIITDelhi: @IIITDelhi invites application from Foreign Nationals/PIOs/NRIs/OCIs for admission to its https://t.co/u6kIuhIrGM and https:…,Mar 24 2019 18:26:02,0,4,
3,"One more week is left to submit the workshop proposal to @IEEEBigMM19. Looking forward to your submission. For any query, contact workshop co-chair Dr. Chiranjoy Chattopadhyay @cchatto or TPC co-chair @RatnRajiv",Mar 24 2019 11:34:27,4,0,
4,RT @IEEEBigMM19: We are honored to have Dr. Chang Wen Chen (Chinese University Hong Kong @CUHKofficial ) and Dr. Roger Zimmermann (National…,Mar 24 2019 06:23:37,0,5,
5,"RT @IEEEBigMM19: Distinguished researchers Dr. Rajiv Ratn Shah @RatnRajiv (@IIITDelhi , @midasIIITD ), Dr. Jianquan Liu (NEC Japan) and Dr.…",Mar 24 2019 06:23:14,0,3,
6,@IEEEBigMM19 is also available on Facebook now. LIKE its Facebook page https://t.co/B3Q0zmmzXb to get the regular updates. Check more details at https://t.co/w9ZymoPisk #IEEE #BigMM19 #Big #Multimedia #Singapore,Mar 20 2019 08:19:24,1,1,
7,"RT @IEEEBigMM19: BigMM 2019 : IEEE BigMM 2019 – Call for Workshop Proposals https://t.co/I4vqf8FE6K … When: Sep 11, 2019 - Sep 13, 201…",Mar 20 2019 02:40:07,0,5,
8,"BigMM 2019 : IEEE BigMM 2019 – Call for Workshop Proposals https://t.co/oUq2G0UgKN When: Sep 11, 2019 - Sep 13, 2019 Where: Singapore Submission Deadline: Apr 1, 2019 Notification Due: Apr 10, 2019 #IEEE #BigMM #Workshop #Proposal #Singapore #Multimedia",Mar 18 2019 02:27:47,6,3,
9,"Congratulations @midasIIITD team, Rohan, Pradyumn, Ramit, @debanjanbhucs, @MADAIguy and @RatnRajiv for getting their paper titled, ""SNAP-BATNET: Cascading Author Profiling and Social Network Graphs for Suicide Ideation Detection on Social Media"", accepted at @NAACLHLT SRW.",Mar 17 2019 14:22:04,15,4,


******** Process Finished ********


## **Without Hashtags**

In [19]:
if __name__=='__main__':
  """
    remove_at:True if user refernces are to be replaced with only userid(without @).By default false.
    remove_hash:True if hashtags are to be removed.By default false.
    remove_url:True if urls are to be removed.By default false.
    username: Account whose Timeline is used for data extraction.
    fName: Name of file where tweets will be stored (With Location).
  """
  print('Enter Consumer Key')
  consumer_key="x5TtLoE7hYMzSqTyfOFqlm9uE"
  print('Enter Consumer Secret')
  consumer_secret="YQSLaEgk2kxbqe7qIq12gUfEQ0coozSN1R1m1HjFN9ZeVEomKK"
  print('Enter Access Token')
  access_token="828800269500694528-n6VCXpa6P24Gxqsa8NJhg0KullvXufa"
  print('Enter Access Secret')
  access_secret="qSlPgdjZC7hyHuQXPNKy1F1qtrBJc56pYs7ZTN6IyArtU"
  get_tweets(username="midasIIITD",fName="./tweets_midas.jsonl")
  create_dataset(fName="./tweets_midas.jsonl",remove_at=False,remove_hash=True,remove_url=False)
  display_tweets(fName="./submision.csv")
  print(8*'*',"Process Finished",8*'*')

Enter Consumer Key
Enter Consumer Secret
Enter Access Token
Enter Access Secret
Account Verified!
********  Fetching Tweets, Completed :  6.25   ********
********  Fetching Tweets, Completed :  12.50   ********
********  Fetching Tweets, Completed :  18.75   ********
********  Fetching Tweets, Completed :  25.00   ********
********  Fetching Tweets, Completed :  31.25   ********
********  Fetching Tweets, Completed :  37.50   ********
********  Fetching Tweets, Completed :  43.75   ********
********  Fetching Tweets, Completed :  50.00   ********
********  Fetching Tweets, Completed :  56.25   ********
********  Fetching Tweets, Completed :  62.50   ********
********  Fetching Tweets, Completed :  68.75   ********
********  Fetching Tweets, Completed :  75.00   ********
********  Fetching Tweets, Completed :  81.25   ********
********  Fetching Tweets, Completed :  87.50   ********
********  Fetching Tweets, Completed :  93.75   ********
********  Fetching Tweets, Completed :  100.00  

Unnamed: 0,Text,Date and Time,No. of Favorites,No. of Retweets,No. of Images
0,Congratulations @midasIIITD students Simra Shahid @Simcyy and Nilay Shrivastava @NilayShri on getting selected for a research internship at Adobe in this summer. https://t.co/WdF663EB5y,Mar 25 2019 13:01:57,7,1,1.0
1,"The last date for submitting a solution for the @midasIIITD internship task is 26th March midnight. We will not accept solutions submitted after the deadline. Thus, if you have not submitted your solution yet then kindly do so before the deadline.",Mar 24 2019 18:44:01,8,3,
2,RT @IIITDelhi: @IIITDelhi invites application from Foreign Nationals/PIOs/NRIs/OCIs for admission to its https://t.co/u6kIuhIrGM and https:…,Mar 24 2019 18:26:02,0,4,
3,"One more week is left to submit the workshop proposal to @IEEEBigMM19. Looking forward to your submission. For any query, contact workshop co-chair Dr. Chiranjoy Chattopadhyay @cchatto or TPC co-chair @RatnRajiv",Mar 24 2019 11:34:27,4,0,
4,RT @IEEEBigMM19: We are honored to have Dr. Chang Wen Chen (Chinese University Hong Kong @CUHKofficial ) and Dr. Roger Zimmermann (National…,Mar 24 2019 06:23:37,0,5,
5,"RT @IEEEBigMM19: Distinguished researchers Dr. Rajiv Ratn Shah @RatnRajiv (@IIITDelhi , @midasIIITD ), Dr. Jianquan Liu (NEC Japan) and Dr.…",Mar 24 2019 06:23:14,0,3,
6,@IEEEBigMM19 is also available on Facebook now. LIKE its Facebook page https://t.co/B3Q0zmmzXb to get the regular updates. Check more details at https://t.co/w9ZymoPisk,Mar 20 2019 08:19:24,1,1,
7,"RT @IEEEBigMM19: BigMM 2019 : IEEE BigMM 2019 – Call for Workshop Proposals https://t.co/I4vqf8FE6K … When: Sep 11, 2019 - Sep 13, 201…",Mar 20 2019 02:40:07,0,5,
8,"BigMM 2019 : IEEE BigMM 2019 – Call for Workshop Proposals https://t.co/oUq2G0UgKN When: Sep 11, 2019 - Sep 13, 2019 Where: Singapore Submission Deadline: Apr 1, 2019 Notification Due: Apr 10, 2019",Mar 18 2019 02:27:47,6,3,
9,"Congratulations @midasIIITD team, Rohan, Pradyumn, Ramit, @debanjanbhucs, @MADAIguy and @RatnRajiv for getting their paper titled, ""SNAP-BATNET: Cascading Author Profiling and Social Network Graphs for Suicide Ideation Detection on Social Media"", accepted at @NAACLHLT SRW.",Mar 17 2019 14:22:04,15,4,


******** Process Finished ********
