# Getting The Tweets

The following code connects to a twitter account and pulls down all tweets sent using #vinb on the evening of Wednesday 24th of June 2015. The line "from keys import keys" in the first block refers to a file called keys.py that holds a dictionary called keys. This is used for the OAuth and has not been included in the repository. As the name suggests, the consumer secret and access token secret should be kept secret. Check the <a href="http://docs.tweepy.org/en/v3.2.0/">documentation</a> for <a href="https://github.com/tweepy/tweepy">Tweepy</a> for details on authenticating and searching Twitter using your own account.

In [None]:
import tweepy
import sys
from keys import keys

CONSUMER_KEY = keys['consumer_key']
CONSUMER_SECRET = keys['consumer_secret']
ACCESS_TOKEN = keys['access_token']
ACCESS_TOKEN_SECRET = keys['access_token_secret']

#OAuth process, using the keys and tokens above
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

#Creation of the actual interface, using authentication
api = tweepy.API(auth)

In [189]:
import datetime
import time

f = open("vinb2.txt", "w") #create new file to write to
f.write("User\tText\tDate\tRetweet Count\tReply To\n") #write a header to the file

#the first time I ran this code I typed:
#starttime = datetime.datetime(21, 06, 24, 20, 40, 00)
#which tried to read every tweet posted since 21AD...


#Note that tweets created_at time, though stored in GMT, seems to be off by an hour. using 8.40 instead of 9.40 below
#Tweets about show end at 23.07, show ended at 00.00 (reference to statement by Brendan Griffen at 23.07, statement made at c 23.40 in closing statement )
starttime = datetime.datetime(2015, 06, 24, 20, 40, 00) #create a start and endtime, capture tweets between this time
endtime = datetime.datetime(2015, 06, 25, 00, 00, 00) 


#create a cursor that will pull down every tweet between the 24 and 25 of June with #vinb. This will exeed the Twitter API limits
vinb = tweepy.Cursor(api.search, q='vinb', since = "2015-06-24", until="2015-06-25").items()

proceed = True

while proceed == True:
    try:
        tweet = vinb.next() #get the next tweet from the #vinb stream
        if tweet.created_at < starttime: #If the tweet is from outside the program times, end
            proceed = False
            break
        if not tweet.text.startswith('RT'): #if it is not a retweet
            #replace newline and tabs in the text with spaces for ease of reading with pandas
            text = str(unicode(tweet.text).encode("utf-8")).replace("\n", " ")
            text = text.replace("\t", " ")
            
            #concat the username, time, text, times retweet and if it is a reply. Seperate by tabs for reading by pandas
            out_text = str(unicode(tweet.user.screen_name).encode("utf-8")) + "\t" + text + "\t" + str(tweet.created_at) + "\t" + str(tweet.retweet_count) + "\t" + str(tweet.in_reply_to_screen_name) + "\n"
            
            #write to f
            f.write(out_text)
    
    #a TweepError can be caused by hitting the rate limit. If so, put program to sleep for 15 minutes to allow limit to refresh
    #and then continue
    
    except tweepy.TweepError:
        time.sleep(60 * 15)
        continue
    #Exit if we somehow manage to read in every file for the day!
    except StopIteration:
        break
    

#Close f        
f.close()



In [193]:
import pandas as pd

test = pd.read_csv("vinb2.txt", sep="\t", header=0, encoding= 'utf-8')

len(test)

902

In [194]:
test.tail()

Unnamed: 0,User,Text,Date,Retweet Count,Reply To
897,popcornhack,@DeirdreWalsh1 Me too. Switched to TV3 for Vin...,2015-06-24 20:55:51,0,DeirdreWalsh1
898,MoranPaul52,"Festival in Killarney cancelled due to ""unfore...",2015-06-24 20:54:44,4,
899,kevosullivan07,@niallboylan4fm that's like #vinb pulling ppl ...,2015-06-24 20:54:14,0,kevosullivan07
900,Cunionsandphey,#MammaMia is right! It's a pile of shite! Brin...,2015-06-24 20:43:23,0,
901,GleneagleHotel,Turn over to @TV3Ireland and watch @vincentbro...,2015-06-24 20:40:41,1,
