# CACTUS Week 1
## Import essential libraries

In [43]:
# Code thanks to https://python.plainenglish.io/using-python-and-twitter-api-2-to-get-user-details-437e442b4be9


# For sending GET requests from the API
import requests
# For saving access tokens and for file management when creating and adding to the dataset
import os
# For dealing with json responses we receive from the API
import json
# For displaying the data after
import pandas as pd
# For saving the response data in CSV format
import csv
# For parsing the dates received from twitter in readable formats
import datetime
import dateutil.parser
import unicodedata
#To add wait time between requests
import time

1. To be able to send your first request to the Twitter API, you need to have a developer account.
2. Next, create a project and connect an App through the developer portal.
3. Go to the developer portal dashboard
4. Sign in with your developer account
5. Create a new project, give it a name, a use-case based on the goal you want to achieve, and a description.
6. If everything is successful, you should be able to see a page containing your keys and tokens, we will use one of these to access the API. Look out for the BEARER TOKEN. See https://miro.medium.com/max/2400/1*Y20zm9Vf1k5uRMRTMkHRkQ.png

7. The next step is to create an auth() function that will have the “Bearer Token” from the app we just created.
8. Since this Bearer Token is sensitive information, you should not be sharing it with anyone at all. If you are working with a team you don’t want anyone to have access to it.
9. So, we will save the token in an “environment variable”.
10. Finally, we will create our auth() function, which retrieves the token from the environment.

In [44]:
os.environ['TOKEN'] = ''
def auth():
    return os.getenv('TOKEN')

## Create Headers
Next, we will define a function that will take our bearer token, pass it for authorization and return headers we will use to access the API.

In [45]:
def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers

def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """
    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2UserLookupPython"
    return r

# Create URL
Now that we can access the API, we will build the request for the endpoint we are going to use and the parameters we want to pass.

In [46]:
def create_url(keyword):
    
    search_url = "https://api.twitter.com/2/users/" #Change to the endpoint you want to collect data from
    # https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent
    #change params based on the endpoint you are using
    query_params = {'query': keyword,
                    'tweet.fields': 'id,text,author_id,in_reply_to_user_id,geo,conversation_id,created_at,lang,public_metrics,referenced_tweets,reply_settings,source',
                    'user.fields': 'id,name,username,created_at,description,public_metrics,verified',
                    'place.fields': 'full_name,id,country,country_code,geo,name,place_type',
                    'next_token': {}}
    return (search_url, query_params)


def create_url(user_names_list, user_fields ):
    # Specify the usernames that you want to lookup below
    # You can enter up to 100 comma-separated values.
    user_names = ','.join(user_names_list) if len(user_names_list)>1 else user_names_list[0]
    
    usernames = f"usernames={user_names}"
    url = "https://api.twitter.com/2/users/by?{}&{}".format(usernames, user_fields)
    print(url)
    return url

def create_url_id(id):
    url = "https://api.twitter.com/2/users/{}/tweets".format(id)
    print(url)
    return url

The defined function above contains two pieces:

## search_url:

Which is the link of the "endpoint" we want to access. Endpoint just means.. what we want to do with it. E.g.: if we want all the posts by a user, the endpoint is "user lookup"

Twitter’s API has a lot of different endpoints. You can look them up here: https://miro.medium.com/max/700/1*1oJExGGK151WfQJ6LIikww.png

Right now, this code is written for the full-archive search endpoint.

## query_params:

The parameters that the endpoint offers and we can use to customize the request we want to send. E.g.: if we want all the posts by a user, the endpoint is "user lookup", and the query parameter is the screen name of the user.

1. Some parameters control the returned response
```usernames={user_names}```

2. Some fields are optional, e.g., you can filter what subset of the full data you want. Only the user data, only the tweet data, or only the place data.

```"user.fields=description,created_at,public_metrics"```

3. One field lets you "turn the page" when there are hundreds or thousands of results, because the response bunches results into 500 at a time. The "next_token" parameter lets you access the next page of results.



# Connect to Endpoint
Now that we have the URL, headers, and parameters we want, we will create a function that will put all of this together and connect to the endpoint.
The function below will send the “GET” request and if everything is correct (response code 200), it will return the response in “JSON” format.
Note: next_token is set to “None” by default since we only care about it if it exists.

In [47]:
def connect_to_endpoint(url, headers, params, next_token = None):
    params['next_token'] = next_token   #params object received from create_url function
    response = requests.request("GET", url, headers = headers, params = params)
    print("Endpoint Response Code: " + str(response.status_code))
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

def connect_to_endpoint(url):
    response = requests.request("GET", url, auth=bearer_oauth,)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()


# Putting it all Together
Now that we have all the functions we need, let's test putting them all together to create our first request!

In the next cell, we will set up our inputs:
bearer_token and headers from the API.



In [48]:
#Inputs for the request
bearer_token = auth()
users_list = ['pankajtiwari2','NASA']

user_fields  = "user.fields=id,description,created_at,public_metrics,verified,url"




Now we will create the URL and get the response from the API.

The response returned from the Twitter API is returned in JavaScript Object Notation “JSON” format.

To be able to deal with it and break down the response we get, we will the encoder and decoder that exists for python which we have imported earlier. You can find more information about the library here: https://docs.python.org/3/library/json.html

If the returned response from the below code is 200, then the request was successful.

In [49]:
url = create_url(users_list,user_fields)
json_response = connect_to_endpoint(url)

https://api.twitter.com/2/users/by?usernames=pankajtiwari2,NASA&user.fields=id,description,created_at,public_metrics,verified,url
200


Lets print the response in a readable format using this JSON library functions

In [50]:
print(json.dumps(json_response, indent=4, sort_keys=True))

{
    "data": [
        {
            "created_at": "2010-07-18T22:49:46.000Z",
            "description": "living in past. want to change the world and create an utopia",
            "id": "168281471",
            "name": "Pankaj Kumar",
            "public_metrics": {
                "followers_count": 63,
                "following_count": 86,
                "listed_count": 0,
                "tweet_count": 155
            },
            "url": "",
            "username": "pankajtiwari2",
            "verified": false
        },
        {
            "created_at": "2007-12-19T20:20:32.000Z",
            "description": "There's space for everybody. \u2728",
            "id": "11348282",
            "name": "NASA",
            "public_metrics": {
                "followers_count": 50135139,
                "following_count": 178,
                "listed_count": 96987,
                "tweet_count": 67298
            },
            "url": "https://t.co/9NkQJKAnuU",
            "userna

# Exploring the JSON response

Now let's break down the returned JSON response.
the response is basically read as a Python dictionary and the keys either contain data or contain more dictionaries. The top two keys are:

## Data
A list of dictionaries, each dictionary represents the data for a tweet. Example on how to retrieve the time from the first tweet was created:

In [51]:
json_response['data'][0]['created_at']

'2010-07-18T22:49:46.000Z'

# Write to CSV file

In [52]:
df = pd.DataFrame(json_response['data'])
df.to_csv('handles_to_ids.csv')

# Get timelines for the ids in df

In [53]:
tweet_dataset = pd.DataFrame()
for id in df['id']:
    url = create_url_id(id)
    json_response = connect_to_endpoint(url)
    print(json_response)
    df = pd.DataFrame(json_response['data'])
    tweet_dataset = pd.concat([tweet_dataset, df], ignore_index=True)

tweet_dataset.to_csv('alltweets.csv')

https://api.twitter.com/2/users/168281471/tweets
200
{'data': [{'id': '1459900571746406401', 'text': 'RT @RjSriram15: @AmitShah @narendramodi and many @BJP4India leaders had this question in past on why they’re not able to consolidate Hindu…'}, {'id': '1457192363726557192', 'text': 'I just published Bitcoins Vs Dollars https://t.co/7kRS36afiI #bitcoin #Dollar #cryptocurrency'}, {'id': '1454682383416901633', 'text': 'I just published Extracting History of #American Revolutionary War Using #NLP https://t.co/RKXujyqf7v \n\n#python #Data'}, {'id': '1448478046097342466', 'text': 'RT @Nabu: Too much knowledge for the entire humanity? How we are going to handle and its Impacts. by @pankajtiwari2 (my photo) https://t.co…'}, {'id': '1446640472621867008', 'text': 'RT @DrMichaelHENG1: Too much knowledge for the entire humanity? How we are going to handle and its Impacts. by @pankajtiwari2 https://t.co/…'}, {'id': '1446640363834200066', 'text': "RT @neo4j: In this week's #twin4j, @pankajtiwari2 ut