In [1]:
import GetOldTweets3 as got
# Remember, if you do not have GOT3 installe, you first need to do it using !pip command

The cell below contains simplified version of the script you can use to verify if the output is correct.
In this cell you have to remember to manually change the following:
- setMaxTweets(9999) --> limitation on number of tweets to be scrapped. If you get an error, use smaller number.
- setSince('2015-04-16') --> use the yyyy-mm-dd format
- setUntil('2015-04-17') --> unless the city is small and with few tweets, trying to scrape more than one day will result in an error (421: too many requests)
- setNear('52.52, 13.40') --> use decimal notation; to double-check you can switch latitude with longitute to verify the output
- setWithin("50km") --> you can increase or decrease the radius, with setNear coordinates as a centre of the circle

In [3]:
# It creates, and later overwrites 'output.csv' file; Remember to force encoding to utf-8 to correctly represent diacritical marks
with open('output.csv', 'w', encoding="utf-8") as f:
    
    #This is the basic GetOldTweets3 script with the variables in ()
    tweetCriteria = got.manager.TweetCriteria().setMaxTweets(9999).setSince('2014-04-20').setUntil('2014-04-20').setNear('52.52, 13.40').setWithin("50km");
    tweets = got.manager.TweetManager.getTweets(tweetCriteria);
    
    #Verification if the tweet.text is long enough (10 characters by default - you may change this)
    for tweet in tweets:
        if len(tweet.text) > 10:
            #Format of the data saved in the .csv file
            f.write(f"{tweet.id},{tweet.date},{tweet.username},{tweet.text.replace(',', '')}'\n'")

The script in the next cells allows you to see the number of tweets scrapped to the file and the tweets without opening the file.

In [6]:
# We ignore errors (non-unicode signs, like some emojis)
with open('output.csv', errors='ignore', encoding="utf-8") as f:
    row_count = sum(1 for row in f)
    print(row_count)

2262


In [7]:
with open('output.csv', errors='ignore', encoding="utf-8") as f:
    print(f.read())

456582457651777537,2014-04-16 23:58:35+00:00,marieandree615,@JonTenney Can i get a retweet from VP Andrew on this fine #scandalFinaleEve ? From: Ur biggest fan in Germany! :-) &lt;3 #ScandalDeutschland'
'456581447831154689,2014-04-16 23:54:34+00:00,halasaXO,prague: a sneak peek before full album spam of our fab trip! what happens in prague... follows you to berlin ;) xo'
'456580162939658240,2014-04-16 23:49:28+00:00,djdancortex,"@dinalohan: @lindsaylohan @2BrokeGirls_CBS loved all 3 of you amazing lil actresses !!!xo mommy " Lindsay marries xo'
'456578633453150208,2014-04-16 23:43:23+00:00,thoratomas,When 2 become 1 @Monster Ronson's Ichiban Karaoke http://instagram.com/p/m3pyYHmyPz/'
'456576884063494144,2014-04-16 23:36:26+00:00,Collide341,Por si les quedo dudas, en el primer mundo el guardarropas del boliche es gratis'
'456574968474505218,2014-04-16 23:28:49+00:00,49Sakine,Çıkar konuşunca, vicdan susar. Cemil Meriç"'
'456574148060254208,2014-04-16 23:25:34+00:00,inselberliner,They di

And now the GOT3 function I created:
- by passing the values in the function () you can run the script for a given date and location,
- it will create separate .csv file,
- if you get an error, change the 'setMaxTweets(9999)' value in the function script
- you can also change the default radius of 50km, if the city is bigger or smaller
- the other variable that may need to be changed is the 'len(tweet.text)'

In [2]:
def GOTcity(start_date, stop_date, location, file_name):
    """Dates in 'yyyy-mm-dd' format; Location in decimals '52.52, 13.40', filename in 'Berlin20200410.csv' format"""
    with open(file_name, 'w', encoding="utf-8") as f:
        tweetCriteria = got.manager.TweetCriteria().setMaxTweets(9999).setSince(start_date).setUntil(stop_date).setNear(location).setWithin("25km");
        tweets = got.manager.TweetManager.getTweets(tweetCriteria);
        for tweet in tweets:
            if len(tweet.text) > 10:
                f.write(f"{tweet.id},{tweet.date},{tweet.username},{tweet.text.replace(',', '')}'\n'")

In [8]:
# Lesbos example; note that when I will run it on my computer, I will change the radius to 5 km, because otherwise I would scrap the tweets from the Turkish coast
GOTcity('2015-10-20', '2015-10-21', '37.07, 37.38', 'Gaziantep20151020.csv')