## Query Logic
- no retweets
- at least 50 characters
- contains at least one birth control keyword
- in the US???
- English language

`query = (keyword OR keyword OR keyword OR "key phrase") lang:en -is:retweet -is:reply`

Return tweet ID, tweet content, parent ID or whether it is a reply, date, matching keyword

`tweet_fields = created_at, id, text, conversation_id, public_metrics`

I could run this and then search for max 10 tweets with the same conversation IDs.

## References
- https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query
- https://developer.twitter.com/en/docs/tutorials/building-high-quality-filters
- https://developer.twitter.com/en/docs/tutorials/getting-historical-tweets-using-the-full-archive-search-endpoint
- https://developer.twitter.com/en/portal/dashboard

## Instructions
- Get your academic devleoper license and associated keys and token
- Set up a new app on the Twitter API website (where you can monitor your rates and how much you've downloaded)
- Set your environment variables in your terminal
- Run this script
- If the script hits your rate limit (300 historical requests / 15 minutes) just wait and then paste in the last pagination token and restart

In [14]:
category_keywords_dict = {'IUD': ['intrauterine device',
                                  'iud',
                                  'intrauterine system',
                                  'uterine jewelry',
                                  'copper-t',
                                  'cu-iud',
                                  'paragard',
                                  'paraguard',
                                  'mirena',
                                  'kyleena',
                                  'liletta',
                                  'skyla'],
                          'Imlant': ['nexplanon', 
                                     'implanon', 
                                     'arm implant',
                                     'bc implant',
                                     'b.c. implant',
                                     'birth control implant',
                                     'contraception implant',
                                     'contraceptive implant',
                                     'birth control rod',
                                     'contraception rod',
                                     'contraceptive rod'],
                          'Pill': ['oc pill',
                                   'o.c. pill',
                                   'bc pill',
                                   'b.c. pill',
                                   'chc pill',
                                   'coc pill',
                                   'birth control pill',
                                   'contraceptive pill',
                                   'contraception pill',
                                   'oral contraceptive',
                                   'oral contraception',
                                   'oral birth control',
                                   'combined hormonal contraceptive',
                                   'combined bcp',
                                   'combined ocp',
                                   'combined b.c.p.',
                                   'progestin only pill',
                                   'progestin bcp',
                                   'progestin ocp']}

category_keywords_dict_small = {'IUD': ['iud',
                                        'paragard',
                                        'paraguard',
                                        'kyleena'],
                                'Implant': ['nexplanon', 
                                            'implanon', 
                                            'bc implant',
                                            'birth control implant',
                                            'contraception implant',
                                            'contraceptive implant'],
                                'Pill': ['bc pill',
                                         'birth control pill',
                                         'contraceptive pill',
                                         'contraception pill',
                                         'oral contraceptive',
                                         'oral contraception',
                                         'oral birth control']}

In [15]:
api_key = ''
api_secret_key = ''
bearer_token = ''

In [16]:
output_directory_path = '/Volumes/Passport-1/data/birth-control/twitter/new-from-api'

In [37]:
import requests
import os
import json
import time


def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers


def connect_to_endpoint(url, headers, params):
    response = requests.request("GET", search_url, headers=headers, params=params)
    # print(response.status_code)
    if response.status_code != 200:
        print('OOPS')
        raise Exception(response.status_code, response.text)
    return response.json()


# To set your environment variables, run the following lines IN THE TERMINAL:
# export 'BEARER_TOKEN'=''
# export 'CONSUMER_SECRET'=''
# export 'CONSUMER_KEY'=''

# bearer_token = os.environ.get("BEARER_TOKEN")


# SMALL TEST QUERY
# Optional params: start_time,end_time,since_id,until_id,max_results,next_token,
# expansions,tweet.fields,media.fields,poll.fields,place.fields,user.fields
# query_params = {'query': '(from:twitterdev -is:retweet) OR #twitterdev','tweet.fields': 'author_id'}
# query_params = {'query': 'from:maria_antoniak lang:en -is:retweet -is:reply',
#                 'tweet.fields': 'created_at,conversation_id,public_metrics',
#                 'max_results': 10}

# SET THE QUERY PARAMETERS
bearer_token = 'AAAAAAAAAAAAAAAAAAAAADTpOwEAAAAA8h4mNylZRGnzB6dvIje5VE09hdk%3DACxJ0vdar0TNL5JmckY99bWrRVclGi0J1TunZZtw5U1loMdP12'
search_url = "https://api.twitter.com/2/tweets/search/all"
target_bc = 'Pill'
target_keywords = ['"' + w + '"' for w in category_keywords_dict_small[target_bc]]
query = '(' + ' OR '.join(target_keywords) + ')' + ' lang:en -is:retweet is:reply'
max_results = 500
start_time = '2007-01-01T17:00:00Z'
end_time = '2021-01-01T17:00:00Z'
fields = 'created_at,conversation_id,public_metrics'
i = 0

# RUN THE FIRST QUERY
headers = create_headers(bearer_token)
parameters = {'query': query,
              'tweet.fields': fields,
              'max_results': max_results,
              'start_time': start_time,
              'end_time': end_time}
json_response = connect_to_endpoint(search_url, headers, parameters)
with open(output_directory_path + '/replies/replies.' + target_bc + '.' + str(i) + '.json', 'w') as outfile:
    json.dump(json_response, outfile)
pagination_token = json_response['meta']['next_token']
i += 1

# UNCOMMENT THIS IF RESTARTING AFTER INTERRUPTION (INCLUDE THE MOST RECENT PARAMETERS)
# pagination_token = ''
# i = 

# REPEAT THE QUERY UNTIL YOU REACH THE END OF THE RESULTS
while pagination_token:

    print(i, pagination_token)

    # SAVE OUR HISTORY SO FAR
    with open(output_directory_path + '/replies/replies.' + target_bc + '.pagination_tokens.txt', 'a') as outfile:
        outfile.write(pagination_token + '\n')

    # GET THE QUERY RESULTS AND SAVE THEM
    parameters = {'query': query,
                  'tweet.fields': fields,
                  'max_results': max_results,
                  'next_token': pagination_token,
                  'start_time': start_time,
                  'end_time': end_time}
    json_response = connect_to_endpoint(search_url, headers, parameters)
    with open(output_directory_path + '/replies/replies.' + target_bc + '.' + str(i) + '.json', 'w') as outfile:
        json.dump(json_response, outfile)

    # PREPARE FOR THE NEXT QUERY
    pagination_token = None
    if 'next_token' in json_response['meta']:
        pagination_token = json_response['meta']['next_token']
        i += 1
    time.sleep(2)

    # pagination_history = [line.strip() for line in open(output_directory_path + '/posts.' + target_bc + '.' + str(i) + '.pagination_tokens.json')]
    # if pagination_token not in pagination_history:
    # else:
    #     print('Already ran this query! Trying next pagination token: ' + pagination_history[-1])
    #     pagination_token = pagination_history[-1]

1 b26v89c19zqg8o3foseuuuq2gyz2bonpo74xbtw5qzou5
2 b26v89c19zqg8o3foserwxx1xivdxhkuivzornt5o0hod
3 b26v89c19zqg8o3fosbsz8t37jugxhtiq4y7j4cahy83h
4 b26v89c19zqg8o3fos8uw5mfxe6nio7cm2hh87al9yp31
5 b26v89c19zqg8o3fos8ssbgarncfc8h930lfju877a9a5
6 b26v89c19zqg8o3fos5vj6o9zd3h7hb87s1rddc57sxvh
7 b26v89c19zqg8o3fos5sle0ifhxw3kxmc7e3papj9r4hp
8 b26v89c19zqg8o3fo7md2z69s24f4p9k1regt1pl7q1z1
9 b26v89c19zqg8o3fo7jezldhx0zg9wxfnzdfcxhegw399
10 b26v89c19zqg8o3fo7jcgw8r1ite720q965v3o73xs8zh
11 b26v89c19zqg8o3fo7gesny077m8xbrwoglm8zgh1jkvx
12 b26v89c19zqg8o3fo7dffxj283p05ph2qyow473o8tawt
13 b26v89c19zqg8o3fo7agi6ao58i79gp5ld1vnr34zrou5
14 b26v89c19zqg8o3fo77h5ogpts1v5qifudwrgkkqkuhkt
15 b26v89c19zqg8o3fo74gya0m625yh1ddhe6v2awwpixdp
16 b26v89c19zqg8o3fo71hlpych9uj5lwqdy0qk38umtda5
17 b26v89c19zqg8o3fo71dthw8neylf3ahfctgxi3zro0ot
18 b26v89c19zqg8o3fo6yfpr8phxljji1qogr69aw8ty7wd
19 b26v89c19zqg8o3fo6vfiy4h5h60pdv3i6736y9l4314t
20 b26v89c19zqg8o3fnmc00apix0votnjjtfykc9dpif62l
21 b26v89c19zqg8o3fnm908ix6tq

In [49]:
len(json_response['data'])

10

In [10]:
print(json.dumps(json_response, indent=4, sort_keys=True))

{
    "data": [
        {
            "conversation_id": "1387547993755332608",
            "created_at": "2021-04-28T23:23:15.000Z",
            "id": "1387547993755332608",
            "public_metrics": {
                "like_count": 0,
                "quote_count": 0,
                "reply_count": 0,
                "retweet_count": 0
            },
            "text": "mom bought me grapefruit &amp; forgot i can\u2019t eat it because of my BC implant, this is the peak of depression \ud83c\udf4a"
        },
        {
            "conversation_id": "1387546664978853890",
            "created_at": "2021-04-28T23:17:58.000Z",
            "id": "1387546664978853890",
            "public_metrics": {
                "like_count": 0,
                "quote_count": 0,
                "reply_count": 0,
                "retweet_count": 0
            },
            "text": "Man Nexplanon really fucked me up \ud83d\ude44"
        },
        {
            "conversation_id": "13875464291958865

In [13]:
json_response['meta']['next_token']

'b26v89c19zqg8o3fosttqhcx7b9254xtio25qgktofvct'