### Data Gathering
#### Getting tweets for the 2020 Democratic Primary about male vs. female candidates
To get tweets from the month between the Iowa primary and when Elizabeth Warren dropped out of the race I used the GetOldTweets module. I got tweets by each week of the month and they are organized by candidates. I chose four candidates, the two male and two female candidates that were most prominent during the time period.

#### Import and install everything necessary and get all functions for the data gathering

In [2]:
!pip install --user GetOldTweets3



In [1]:
import json
import os
import datetime
import GetOldTweets3 as got

In [2]:
def download_query_tweets(query, date_since, date_until, max=1000):
    print(f"Downloading tweets for query: '{query}' from {date_since} to {date_until}")
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(query)\
                                               .setSince(date_since)\
                                               .setUntil(date_until)\
                                               .setMaxTweets(max)

    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    list_of_tweets = [tweet.__dict__ for tweet in tweets]
    return list_of_tweets

In [3]:
def jsonconverter(o):
    if isinstance(o, datetime.datetime):
        return o.__str__()

#### Gather data about candidates
Get the data using queries which are the twitter username of the candidate and the name that the candidate is know by, which is a part of the candidates name. Put the data in a folder that is the last name of the candidate and the week that the tweets are from. Tweets are again being gathered using GetOldTweets.

In [6]:
DATA_DIR = 'data/WarrenWk1'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-03"
until = "2020-02-09"
queries = ["@ewarren", 'warren', 'Warren']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@ewarren' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'warren' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'Warren' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...



In [6]:
DATA_DIR = 'data/KlobucharWk1'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-03"
until = "2020-02-09"
queries = ['@amyklobuchar','klobuchar', 'Klobuchar']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@amyklobuchar' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'klobuchar' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'Klobuchar' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...



In [7]:
DATA_DIR = 'data/SandersWk1'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-03"
until = "2020-02-09"
queries = ['@BernieSanders', 'bernie', 'Bernie']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@BernieSanders' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'bernie' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'Bernie' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...



In [8]:
DATA_DIR = 'data/BidenWk1'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-03"
until = "2020-02-09"
queries = ['@JoeBiden', 'Biden', 'biden']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@JoeBiden' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'Biden' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...

Downloading tweets for query: 'biden' from 2020-02-03 to 2020-02-09
Downloaded 1000 tweets...



In [7]:
DATA_DIR = 'data/WarrenWk2'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-10"
until = "2020-02-16"
queries = ["@ewarren", 'warren', 'Warren']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@ewarren' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'warren' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'Warren' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...



In [8]:
DATA_DIR = 'data/KlobucharWk2'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-10"
until = "2020-02-16"
queries = ['@amyklobuchar','klobuchar', 'Klobuchar']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@amyklobuchar' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'klobuchar' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'Klobuchar' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...



In [9]:
DATA_DIR = 'data/SandersWk2'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-10"
until = "2020-02-16"
queries = ['@BernieSanders', 'bernie', 'Bernie']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@BernieSanders' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'bernie' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'Bernie' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...



In [5]:
DATA_DIR = 'data/BidenWk2'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-10"
until = "2020-02-16"
queries = ['@JoeBiden', 'Biden', 'biden']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@JoeBiden' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'Biden' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...

Downloading tweets for query: 'biden' from 2020-02-10 to 2020-02-16
Downloaded 1000 tweets...



In [6]:
DATA_DIR = 'data/WarrenWk3'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-17"
until = "2020-02-23"
queries = ["@ewarren", 'warren', 'Warren']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@ewarren' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'warren' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'Warren' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...



In [7]:
DATA_DIR = 'data/KlobucharWk3'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-17"
until = "2020-02-23"
queries = ['@amyklobuchar','klobuchar', 'Klobuchar']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@amyklobuchar' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'klobuchar' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'Klobuchar' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...



In [8]:
DATA_DIR = 'data/SandersWk3'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-17"
until = "2020-02-23"
queries = ['@BernieSanders', 'bernie', 'Bernie']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@BernieSanders' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'bernie' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'Bernie' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...



In [10]:
DATA_DIR = 'data/BidenWk3'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-17"
until = "2020-02-23"
queries = ['@JoeBiden', 'Biden', 'biden']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@JoeBiden' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'Biden' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...

Downloading tweets for query: 'biden' from 2020-02-17 to 2020-02-23
Downloaded 1000 tweets...



In [11]:
DATA_DIR = 'data/WarrenWk4'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-24"
until = "2020-03-01"
queries = ["@ewarren", 'warren', 'Warren']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@ewarren' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'warren' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'Warren' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...



In [12]:
DATA_DIR = 'data/KlobucharWk4'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-24"
until = "2020-03-01"
queries = ['@amyklobuchar','klobuchar', 'Klobuchar']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@amyklobuchar' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'klobuchar' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'Klobuchar' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...



In [13]:
DATA_DIR = 'data/SandersWk4'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-24"
until = "2020-03-01"
queries = ['@BernieSanders', 'bernie', 'Bernie']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@BernieSanders' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'bernie' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'Bernie' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...



In [14]:
DATA_DIR = 'data/BidenWk4'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-02-24"
until = "2020-03-01"
queries = ['@JoeBiden', 'Biden', 'biden']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@JoeBiden' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'Biden' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...

Downloading tweets for query: 'biden' from 2020-02-24 to 2020-03-01
Downloaded 1000 tweets...



In [15]:
DATA_DIR = 'data/WarrenWk5'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-03-02"
until = "2020-03-05"
queries = ["@ewarren", 'warren', 'Warren']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@ewarren' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'warren' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'Warren' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...



In [4]:
DATA_DIR = 'data/KlobucharWk5'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-03-02"
until = "2020-03-05"
queries = ['@amyklobuchar','klobuchar', 'Klobuchar']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@amyklobuchar' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'klobuchar' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'Klobuchar' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...



In [5]:
DATA_DIR = 'data/SandersWk5'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-03-02"
until = "2020-03-05"
queries = ['@BernieSanders', 'bernie', 'Bernie']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@BernieSanders' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'bernie' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'Bernie' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...



In [6]:
DATA_DIR = 'data/BidenWk5'

if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

since = "2020-03-02"
until = "2020-03-05"
queries = ['@JoeBiden', 'Biden', 'biden']

for query in queries:
    tweet_list = download_query_tweets(query, since, until)
    
    outfilename = "{}/{}_{}_to_{}.json".format(DATA_DIR, query.replace(' ','_'), since, until)
    
    print('Downloaded {} tweets...\n'.format(len(tweet_list)))
    with open(outfilename,'w') as out:
        for tweet in tweet_list:
            out.write(json.dumps(tweet, default=jsonconverter) + '\n')

Downloading tweets for query: '@JoeBiden' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'Biden' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

Downloading tweets for query: 'biden' from 2020-03-02 to 2020-03-05
Downloaded 1000 tweets...

