# Scrapping Data

Scrapping data is where you are extracting data from an online platform. There are several aways to do data scrapping and they all boil up to 2 type:

                           1. Using APIs to get access to the data from the database
                           2. Viewing the source code of the platform (Read website)

Let me explain each a bit

###                                                                           1. Using APIs

* Here the online platform you are planning to access happens to have a gateway to access the data for developers who wish to access their data for learning purposes or creation of new solutins. 

* The companies give this leeway to developers so that they can increase user traffic or get new innovative solutions they can take to their users. 

###                                                             2. Extracting platform sorce code

* Here you are just taking the html code of a website and parsing it to get the data you want.

* This is a little tough if you have more than 1 websites to scrap data from. Escpecially when the developers ar different and have different syntax of coding.



### Scrapping Tweeter

* Here we will be using APIs to access twitter data. 
* Twitter provides several APIs to access different kinds of data. To use this APIs you need to acquire some access tokens and keys for authentication purposes hence why you need an approved twitter developer account.
* To work with the twitter APIs there several libraries you use.I have used 2 of them which are:
                1. Tweepy
                2. GetOldTweets3
* The reason I worked with both is because each has a particular limitation that make the work hard to gather the kind of data you are looking for.
* For Tweepy the limitation are:
                1. You can only get data from the last 30 days
                2. You can't get more than 300 tweets per query hence you need to query for 300 and wait for 15 minutes   before you query again.
* For GetOldTweets3 the limitations are:
                1. Though you are able to get a lot of tweets and old ones, there are specific attributes you can't get   from the tweet objects you return.

***Import Libraries to use***

In [1]:
# Import Tweeter APIs
import tweepy as tp
import GetOldTweets3 as got

# Import libraries for data reading
import pandas as pd

#For reading secured access code and tokens file
import yaml

***Read access codes and Tokens to authenticate the twitter API***

In [2]:
#Twitter API access token and consumer key with their authentication code read from a yaml file.
# Keep the secret keys private and not public
with open(r"secret.yml") as file:
    secret_list = yaml.load(file, Loader=yaml.FullLoader)
    
#Access the Twitter API
auth = tp.OAuthHandler(secret_list["consumer_key"], secret_list["consumer_secret"])
auth.set_access_token(secret_list["access_token"], secret_list["access_secret"])
api = tp.API(auth, wait_on_rate_limit=True)

***Set up tweet query with GetOldTweets3***

In [4]:
tweet_query = "@AIRTEL_KE"
count = 200000

In [5]:
#Set the criteria for searching the tweets
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(tweet_query)\
                                            .setSince("2020-01-01")

#Query for the tweets
tweets = got.manager.TweetManager.getTweets(tweetCriteria)


In [None]:
# Create a list holding lists with tweet details we want
tweets_lst = [[tw.id, tw.date, tw.text, tw.username, tw.retweets, tw.favorites, tw.geo, tw.mentions, tw.hashtags] for tw in tweets]

In [None]:
# Confirm that we received the number of tweets requested
len(tweets_lst)

136766

In [None]:
# Create a dataframe of the tweets we queried
tweets_df = pd.DataFrame(tweets_lst, columns=["ID", "Date", "Post", "Username","Retweets", "Favorites", "Geo", "Mentions", "Hashtags"])
tweets_df.sample(10)

Unnamed: 0,ID,Date,Post,Username,Retweets,Favorites,Geo,Mentions,Hashtags
95494,1246075645395963904,2020-04-03 14:02:39+00:00,#BeSmartBeSafe ^Caro,AIRTEL_KE,0,0,,,#BeSmartBeSafe
40560,1277336948709998592,2020-06-28 20:23:55+00:00,Na tusisahau @AIRTEL_KE banaa tho hao wataona ...,KamauTheSecond,0,2,,@AIRTEL_KE,
62755,1264964584009617410,2020-05-25 17:00:33+00:00,,SirJeremyKE,0,0,,,
16469,1288361388860211200,2020-07-29 06:31:06+00:00,Okay.Please share via dm the disconnected numb...,AIRTEL_KE,0,0,,,
95173,1246391239362174980,2020-04-04 10:56:42+00:00,@AIRTEL_KE Bought your mifi last saturday at 4...,nancy_mwongeli,0,0,,@AIRTEL_KE,
12591,1290212818944311296,2020-08-03 09:08:01+00:00,Checking. ^Jamo,AIRTEL_KE,0,0,,,
133214,1214957281747652609,2020-01-08 17:09:23+00:00,"Hello Thomas, Amazing bundles and unliminet bu...",AIRTEL_KE,0,0,,,
43789,1294543754511020033,2020-08-15 07:57:37+00:00,"JLCPCB Prototype For $2/5pcs, 24 Hours Quick T...",JLCPCB,43,375,,,
103530,1239965622563307521,2020-03-17 17:23:36+00:00,Always here to assist.^Caro,AIRTEL_KE,0,0,,,
89048,1250885624216903683,2020-04-16 20:35:47+00:00,"The number is in excess of 1 digit,please dial...",AIRTEL_KE,0,0,,,


In [None]:
# Filter the tweets that mention @AIRTEL_KE since those are the tweets with questions and queries.
airtel_mention_df = tweets_df[tweets_df["Mentions"].str.contains("@AIRTEL_KE") | tweets_df["Mentions"].str.contains("@airtel_ke")]
print(airtel_mention_df.shape)

# To avoid having to repeat the querying process again, we save the results we got
airtel_mention_df.to_csv(path_or_buf="AirtelMentions1.csv")
airtel_mention_df.sample(20)

(33677, 9)


Unnamed: 0,ID,Date,Post,Username,Retweets,Favorites,Geo,Mentions,Hashtags
104885,1239202406522634242,2020-03-15 14:50:51+00:00,Re: @Safaricom @JTLKenya @AIRTEL_KE who price ...,Muriu,0,1,,@safaricom @JTLKenya @AIRTEL_KE,
124277,1222932631286841346,2020-01-30 17:20:34+00:00,Try @AIRTEL_KE. You'll never regret. #SwitchTo...,Njokiwainaina3,0,0,,@AIRTEL_KE,#SwitchToAirtel
104879,1239206781559222272,2020-03-15 15:08:14+00:00,Hello @AIRTEL_KE mbona network inashinda ikika...,Niqy_Steamerman,0,0,,@AIRTEL_KE,
18921,1287091657456914432,2020-07-25 18:25:38+00:00,@AIRTEL_KE i was at your main office today in ...,denisyulempole,0,0,,@AIRTEL_KE,
91293,1249349397864947712,2020-04-12 14:51:22+00:00,@AIRTEL_KE please your network is very poor,Shadrackmwanza,0,0,,@AIRTEL_KE,
95416,1246131144858521601,2020-04-03 17:43:11+00:00,"@AIRTEL_KE your network is very poor, sasa tut...",fredie_wambua,0,1,,@AIRTEL_KE,#ukweliusemwe
39897,1277824520921833475,2020-06-30 04:41:21+00:00,@AIRTEL_KE what's wrong with your network sinc...,edochomo,0,0,,@AIRTEL_KE,
120199,1226177428512542726,2020-02-08 16:14:14+00:00,@AIRTEL_KE I am seriously having issues with m...,georgekamotho,0,0,,@AIRTEL_KE,#ShittyService
130518,1217054115953553409,2020-01-14 12:01:27+00:00,@AIRTEL_KE the way you guys are eating my bund...,NixonLumbugu,0,0,,@AIRTEL_KE,
82312,1254460245742686208,2020-04-26 17:20:03+00:00,@AIRTEL_KE Hey please refresh my line havin ne...,Digneez,0,0,,@AIRTEL_KE,


In [None]:
# Get the list we already created from the earlier query.
airtel_mention_df = pd.read_csv("AirtelMentions1.csv")
airtel_mention_df.drop(columns=['Unnamed: 0'], inplace=True)
airtel_mention_df.sample(20)

Unnamed: 0,ID,Date,Post,Username,Retweets,Favorites,Geo,Mentions,Hashtags
22314,1244577940446367744,2020-03-30 10:51:18+00:00,@AIRTEL_KE Hello work on your network strength...,Manu_Onyango,0,0,,@AIRTEL_KE,
6617,1282456879164198914,2020-07-12 23:28:41+00:00,@AIRTEL_KE Okay your bundles are depleted so f...,Chep15_,0,0,,@AIRTEL_KE,
14230,1265217098990735361,2020-05-26 09:43:57+00:00,"Dear @AIRTEL_KE, you claim to be competing wit...",Chipmunk254,0,2,,@AIRTEL_KE @SafaricomPLC,
19015,1252908401752973312,2020-04-22 10:33:35+00:00,Pigia @AIRTEL_KE customer care,Stevemulwa9,0,0,,@AIRTEL_KE,
9624,1277351813570793483,2020-06-28 21:22:59+00:00,Been a while haven't used @AIRTEL_KE for inter...,otikenne,0,0,,@AIRTEL_KE,
7911,1280828066260975631,2020-07-08 11:36:22+00:00,@AIRTEL_KE kwani izo night data yenu zinafanya...,SimonChae2,0,0,,@AIRTEL_KE,
3446,1289167680201789442,2020-07-31 11:55:01+00:00,@AIRTEL_KE hey Airtel. Network seems to be unu...,RitaOgada,0,0,,@AIRTEL_KE,
13857,1266043770337927169,2020-05-28 16:28:51+00:00,"Hey @AIRTEL_KE , I attribute this package with...",chumba_boaz,0,0,,@AIRTEL_KE,
11778,1271145042221113347,2020-06-11 18:19:29+00:00,@AIRTEL_KE @AIRTEL_KE check on my 2g data conn...,Antony23832972,0,0,,@AIRTEL_KE @AIRTEL_KE,
2717,1290711301203910656,2020-08-04 18:08:49+00:00,What's the customer service number for @AIRTEL...,UncleJayDwayne,0,0,,@AIRTEL_KE,


In [None]:
#This searches for replies for tweet by taking the name of the user and the tweet ID and looks for all the tweets after that tweet ID with with the username

# airtel_replies=[] # This holds all our posts with their replies in form of dictionaries per each reply

# We loop through our dataframe of tweets getting the value of ID for each row which is the tweet ID as well as get the current number of the loop
# for x, Id in enumerate(airtel_mention_df["ID"]):
#     tweet_id = Id
#     name = airtel_mention_df.Username.iloc[x] # We get the username of the current tweet
#     replies = [] # List of tweets that have the "in_reply_to_status_id_str" attribute equal to the value of our current tweet ID
#     print(x)

      # we retrieve all tweets meantioning our username and that were posted after our tweet was posted
#     for tweet in tp.Cursor(api.search,q='to:'+name, since_id = tweet_id, timeout=999999).items():

        # Iterate through the tweets gotten to check thos that reply to our tweet ID
#         if hasattr(tweet, 'in_reply_to_status_id_str'):
#             if (tweet.in_reply_to_status_id_str==tweet_id):
#                 replies.append(tweet)

        # Loop through our list of tweet replies to create a dictionary that has both the tweet and its replies
#       for tweet in replies:
#           row = {'ID':tweet_id, 'Date': airtel_mention_df.Date.iloc[x], 'Username':name, 
#                   'Post': airtel_mention_df.Post.iloc[x],  'Replier': tweet.user.screen_name, 
#                   'Mentions': airtel_mention_df.Mentions.iloc[x],  'Hashtags': airtel_mention_df.Hashtags.iloc[x],  
#                   'Reply_date':tweet.created_at, 'Reply': tweet.text.replace('\n', ' '), 
#                   'Reply_mentions':' '.join(x['screen_name'] for x in tweet.entities['user_mentions']), 
#                   'Reply_Hashtags':' '.join(x['text'] for x in tweet.entities['hashtags'])}
#           airtel_replies.append(row)


### Alternatively

The above code takes a lot of time especially if you have a large list of tweets you wish to get replies for. Hence decided to split the code into independent functions that can be called when required. 

The Idea is, we already know that if I get tweets after a tweet posted yesterday I would get all tweets posted today too. And since replies are tweets as well, it means that if I get all the tweets since yesterday I would surely get some replies for a tweet posted today. 

Hence the functions would get a dataframe sorted with the oldest tweet coming first and we get all tweets from Airtel since that tweet was posted and add them to a list of refence that would be added in case we meet a tweet that doesn't have a reply in the list, prompting us to query for all the tweets after it was posted. 

The process follows the path of checking if a tweet has any replies in our reference tweet list and if it doesn't we query for tweets after it was posted and check if we have gotten the replies then.

In [None]:
"""This function finds the tweets by AIRTEL_KE since the tweet 
    of the customer asking a question tweeted the question
    All those tweets are then added to a list of tweets 
    avoiding creation of duplicates"""

def retriver(name, tweet_id,tweetsData):
    # Get tweets by Airtel_ke since the current tweet_id
    try:
        tweet_data = tp.Cursor(api.user_timeline,id='AIRTEL_KE', since_id = tweet_id, timeout=999999).items()
    except:
        print('failed to get data')
        tweet_data = []
    
    # Check if the tweets gotten are already in our reference tweet list "tweetsData"
    for tweet in tweet_data:
        if tweet not in tweetsData:
            tweetsData.append(tweet)
    
    # Return our updated reference tweet list
    return tweetsData

In [None]:
# Testing our function to make sure it returns what we expect
# Data_tweets=[]
# ts = retriver('ntvkenya', '1294919890839773184', Data_tweets)
# ts[0].id

In [None]:
def get_replies(Data_tweets,df, tweet_id):
    airtel_replies=[] # Hold the dictionaries holding our current tweet and its replies
    replies = [] # Hold the tweets with the attribute 'in_reply_to_status_id_str' equal to the ID of our current tweet
    
    # Check if we have tweets with our tweet id as the 'in_reply_to_status_id_str' and add the to our replies list
    for tweet in Data_tweets:
#         print('In list')
        if hasattr(tweet, 'in_reply_to_status_id_str'):
            if (tweet.in_reply_to_status_id_str==tweet_id):
                replies.append(tweet)
                print('good to go ID')
                
    # Check if our replies list is empty and if not go through each reply matching it with current tweet to form a dictionary
    if len(replies) > 0:
        for tweet in replies:
            print('good to go')
            # Each row shall have the current tweet and a reply. 
            # Hence if a tweet has 2 replies we will have 2 rows created with the same tweet but different replies
            row = {'ID':tweet_id, 'Date': df.Date.iloc[x], 'Username':name, 
                    'Post': df.Post.iloc[x],  'Replier': tweet.user.screen_name, 
                    'Mentions': df.Mentions.iloc[x],  'Hashtags': df.Hashtags.iloc[x],  
                    'Reply_date':tweet.created_at, 'Reply': tweet.text.replace('\n', ' '), 
                    'Reply_mentions':' '.join(x['screen_name'] for x in tweet.entities['user_mentions']), 
                    'Reply_Hashtags':' '.join(x['text'] for x in tweet.entities['hashtags'])}
            airtel_replies.append(row)
    
    # Return the list of our replies each in a dictionary after matching with our current tweet
    return airtel_replies

In [None]:
# This will hold all the tweets and their replies in dictionary format for each row
repliesData = [] 

# We sort our dataframe of tweets using the ID column in ascending order to get the oldest tweets first
sort_df = airtel_mention_df.sort_values(by = 'ID')

# Here we shall hold all our tweets that we suspect might be replies to our tweets
Data_tweets=[]

In [None]:
# Loop though the sorted dataframe to get replies for each tweet starting with the oldest
for x, Id in enumerate(sort_df["ID"]):
    tweet_id = Id
    name = sort_df.Username.iloc[x]
    print(x)
    present = False
    print(len(Data_tweets))
    
    # To be able to know if there is any reply to our tweet in our reference tweet list
    # We go through the list checking and changing our present variable to true in case we find it
    for tw in Data_tweets:
        if tw.in_reply_to_status_id_str == tweet_id:
            present = True
    print(present)
    
    # This loop checks for the replies of the current tweet and calls the relevant function to get the replies.
    if present == True:
        print('good')
        # Call the get_replies to format our data by combinig our tweet and its replies
        try:
            repliesData.extend(get_replies(Data_tweets, sort_df, tweet_id))
        except:
            print('failed')
    else:
        # Since we don't have replies to current tweet, we call retriever to look for all tweets after our current tweet
        # After the tweets are returned, We call get_replies function to see if we got replies for our tweet
        try:
            Data_tweets= retriver(name, tweet_id, Data_tweets)
            print("Run retriver")
            repliesData.extend(get_replies(Data_tweets, sort_df, tweet_id))
        except:
            print('failed')
#     Save each data scrapped to prevent loss in case of the code crashing        
    airtelData_df = pd.DataFrame(repliesData)

    airtelData_df.to_csv(path_or_buf="AirtelData.csv")

0
0
False
Run retriver
1
3242
False
Run retriver
2
3242
False
Run retriver
3
3242
False
Run retriver
4
3242
False
Run retriver
5
3242
False
Run retriver
6
3242
False
Run retriver
7
3242
False
Run retriver
8
3242
False
Run retriver
9
3242
False
Run retriver
10
3242
False
Run retriver
11
3242
False
Run retriver
12
3242
False
Run retriver
13
3242
False
Run retriver
14
3242
False
Run retriver
15
3242
False
Run retriver
16
3242
False
Run retriver
17
3242
False
Run retriver
18
3242
False
Run retriver
19
3242
False
Run retriver
20
3242
False
Run retriver
21
3242
False
Run retriver
22
3242
False
Run retriver
23
3242
False
Run retriver
24
3242
False


In [None]:
# Creating a dataframe from our list of dictionaries
airtelData_df = pd.DataFrame(repliesData)

#Store our final results in a csv file
airtelData_df.to_csv(path_or_buf="AirtelData.csv")

In [None]:
airtelData_df.sample(20)

In [None]:
# Test how the attributes are formed in a tweet object returned by our query
# test = api.get_status('1243081255102615552')
# test.entities

In [23]:
# Getting a tweet from barack obama from 2015
tC = got.manager.TweetCriteria().setUsername("barackobama").setSince("2015-09-10")\
                                            .setMaxTweets(1)
twts = got.manager.TweetManager.getTweets(tC)

# Iterating through the object returned by GetOldTweets3 to see what is returned in the object
for twe in twts:
    for tw in twe:
        print(tw)

TypeError: 'Tweet' object is not iterable

In [29]:
# Looking at the first and last tweet IDs to see the difference
print(sort_df.ID.head(1))
print(sort_df.ID.tail(1))

199993    1169605566479687680
Name: ID, dtype: object
3    1295448699070554119
Name: ID, dtype: object


In [25]:
# A view of how a tweet object looks and its attributes
api.get_status('1169605566479687680')

Status(_api=<tweepy.api.API object at 0x0000022441941B00>, _json={'created_at': 'Thu Sep 05 13:37:51 +0000 2019', 'id': 1169605566479687680, 'id_str': '1169605566479687680', 'text': 'Please clarify this because I have visited your shop in Narok and they are saying the lines are not working… https://t.co/PNmw0GfW7G', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/PNmw0GfW7G', 'expanded_url': 'https://twitter.com/i/web/status/1169605566479687680', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [109, 132]}]}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 3026331367, 'id_str': '3026331367', 'name': 'Ronoh K Clinton 🇰🇪', 'screen_name': 'RonohClinton', 'location': 'Kenya', 'description': 'Di

In [30]:
# 1295448699070554119 - 1169605566479687680 # Looking at the number of tweets between our oldest tweet and the latest

125843132590866439

In [None]:
# This is the code I need you to run on a strong and faster machine without any internet fluctuation
# This code goes through all tweets posted since the oldest tweet in our list to the last tweet in our tweet
status_id_lst = airtel_mention_df["ID"].tolist() # Get all tweet IDs in the column ID and make them a list
dataAirtel = []

# Loop through tweeter and get each tweet from the oldest tweet to our current tweet
for x in range(1169605566479687680, 1295448699070554119):
    
    tweet = api.get_status(str(x)) # Get tweet with the current x value as tweet ID 
    
    # Check if the attribute 'in_reply_to_status_id_str' value is among our list of tweet IDs
    if tweet.in_reply_to_status_id_str in status_id_lst:
        
        # Get the row from our dataframe with the current ID if ID is in our list of tweet IDs
        df = airtel_mention_df.loc[airtel_mention_df['ID'] == tweet.in_reply_to_status_id_str]
        
        # Turn the row we have to a list and use the values to create another row with the replies too in the form of a dictionary
        for rw in df.values.tolist():
            row = {'ID':rw[0], 'Date': rw[1], 'Username':rw[3], 
                    'Post': rw[2], 'Mentions': rw[7],  'Hashtags': rw[8],
                    'Replier': tweet.user.screen_name,  
                    'Reply_date':tweet.created_at, 'Reply': tweet.text.replace('\n', ' '), 
                    'Reply_mentions':' '.join(x['screen_name'] for x in tweet.entities['user_mentions']), 
                    'Reply_Hashtags':' '.join(x['text'] for x in tweet.entities['hashtags'])}
            dataAirtel.append(row)
    


In [4]:
for x in [1298578564808310789, 1298674508895784961, 1298679927026450433]:
    status = api.get_status(str(x))
    print("User : ", x, "\n")
    print("=====================================================================================================================")
    print(status)

User :  1298578564808310789 

Status(_api=<tweepy.api.API object at 0x0000022707885A58>, _json={'created_at': 'Wed Aug 26 11:10:30 +0000 2020', 'id': 1298578564808310789, 'id_str': '1298578564808310789', 'text': 'Turn your home into your office and enjoy lightning fast 4G internet speeds with your 4G Pocket WiFi. Get yours tod… https://t.co/PIg4bk0co5', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/PIg4bk0co5', 'expanded_url': 'https://twitter.com/i/web/status/1298578564808310789', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 191765987, 'id_str': '191765987', 'name': 'Airtel Kenya', 'screen_name': 'AIRTEL_KE', 'location': 'Kenya', 'des

User :  1298674508895784961 

Status(_api=<tweepy.api.API object at 0x0000022707885A58>, _json={'created_at': 'Wed Aug 26 17:31:45 +0000 2020', 'id': 1298674508895784961, 'id_str': '1298674508895784961', 'text': "@AIRTEL_KE I wouldn't recommend Airtel Smartbox. Spent whole day today on call with your Customer Care, and no one… https://t.co/8QA4yNlIwG", 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'AIRTEL_KE', 'name': 'Airtel Kenya', 'id': 191765987, 'id_str': '191765987', 'indices': [0, 10]}], 'urls': [{'url': 'https://t.co/8QA4yNlIwG', 'expanded_url': 'https://twitter.com/i/web/status/1298674508895784961', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [116, 139]}]}, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'in_reply_to_status_id': 1298578564808310789, 'in_reply_to_status_id_str': '1298578564808310789', 'in_reply_to_user_id': 191765987, 'in_reply_to_user_id_str': '191

User :  1298679927026450433 

Status(_api=<tweepy.api.API object at 0x0000022707885A58>, _json={'created_at': 'Wed Aug 26 17:53:17 +0000 2020', 'id': 1298679927026450433, 'id_str': '1298679927026450433', 'text': '@denisi2 Hi denilson, kindly note that the Airtel Smartbox comes with a 4G sim card specifically meant for the rout… https://t.co/Tvv83t40bF', 'truncated': True, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'denisi2', 'name': 'denilson', 'id': 253230615, 'id_str': '253230615', 'indices': [0, 8]}], 'urls': [{'url': 'https://t.co/Tvv83t40bF', 'expanded_url': 'https://twitter.com/i/web/status/1298679927026450433', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}, 'source': '<a href="https://mobile.twitter.com" rel="nofollow">Twitter Web App</a>', 'in_reply_to_status_id': 1298674508895784961, 'in_reply_to_status_id_str': '1298674508895784961', 'in_reply_to_user_id': 253230615, 'in_reply_to_user_id_str': '253230615', 'in_reply_

In [15]:
twReplies = []
statusUpdate = tp.Cursor(api.search,q='to:'+"Qheem1", since_id = "1291639643067252742" , max_id = "1291687668670177280", timeout=999999).items()
for tw in statusUpdate:
    print(tw)
    if hasattr(tw, 'in_reply_to_status_id_str'):        
        if (tw.in_reply_to_status_id_str=="1291639643067252742"):
            twReplies.append(tw)

for twt in twReplies:
    row = {'ID':"1291639643067252742",'Username':"Qheem1",'Replier': tw.user.screen_name,
           'Reply_date':tw.created_at, 'Reply': tw.text.replace('\n', ' '),
           'Reply_mentions':' '.join(x['screen_name'] for x in tw.entities['user_mentions']),
           'Reply_Hashtags':' '.join(x['text'] for x in tw.entities['hashtags'])}
    print(row)

In [9]:
print(statusUpdate)
for tw in statusUpdate:
    print(tw)
    if hasattr(tw, 'in_reply_to_status_id_str'):        
        if (tw.in_reply_to_status_id_str=="1298674508895784961"):
            row = {'ID':"1298674508895784961",'Username':"denisi2",'Replier': tw.user.screen_name, 
                   'Reply_date':tw.created_at, 'Reply': tw.text.replace('\n', ' '),
                   'Reply_mentions':' '.join(x['screen_name'] for x in tw.entities['user_mentions']),
                   'Reply_Hashtags':' '.join(x['text'] for x in tw.entities['hashtags'])}
            print(row)

<tweepy.cursor.ItemIterator object at 0x00000227079D1748>


In [None]:
1291641918288519169
1298381125803540480 'Tue Aug 25 22:05:57 +0000 2020'
1298369088251736064 'Tue Aug 25 21:18:07 +0000 2020'
1298361922698477569 'Tue Aug 25 20:49:39 +0000 2020'