# Hate Speech on Twitter
## Deliverable 01
## Amir ElTabakh
## 3/3/2022

This document primarily focuses on accessing the Twitter API, the privileges we have to work with, extended privileges for when we are ready to move forward with our analysis, how we can query Twitter searches, and identifying the features we have to work with.

### Agenda
- Identifying the privileges we have to work with and extended privileges
- Accessing the twitter API with Python.
- Identifying the features/variables available through the Twitter API, and defining them.
- Practice filtering for specific hashtags, such as "#ChineseVirus"

## The Twitter API
Refer to the [Getting Started](https://developer.twitter.com/en/docs/platform-overview) page on the Twitter Developer Platform site so sign up for the twitter API. This shit be smooth sailing. There are three products we can access through the Twitter API. We will be using the most basic product for now called Essential.

#### Essential
Free and immediate access to the Twitter API. No application is required.

- 1 environment per project (irrelevant)
- 500K tweets per month/project
- Cost: free

#### Elevated
Higher levels of access to the Twitter API for free with an approved application.

- 3 environments per project (irrelevant)
- 2M tweets per month/project
- Cost: free

#### Academic Research
For academics who have a research project that requires, or would benefit from, studying Twitter’s conversational data. Access is free. An application is required.

- 1 environments per project (irrelevant)
- 10M tweets per month/project
- Cost: free
- For non-commercial use only

At the time of this writing I use only Essential privileges, however I have applied for Elevated privileges. My application will require 48 hours to process.

### Search Tweets

We will primarily be utilizing the [Search Tweets](https://developer.twitter.com/en/docs/twitter-api/tweets/search/introduction) features. 

Searching for Tweets is an important feature used to surface Twitter conversations about a specific topic or event. While this functionality is present in Twitter, these endpoints provide greater flexibility and power when filtering for and ingesting Tweets so you can find relevant data for your research more easily; build out near-real-time ‘listening’ applications; or generally explore, analyze, and/or act upon Tweets related to a topic of interest. 

Twitter offers two endpoints that allow you to search for Tweets: Recent search and full-archive search. Both of these REST endpoints share a common design and features, including their use of a single search query to filter for Tweets around a specific topic. These search queries are created with a set of operators that match on Tweet and user attributes, such as message keywords, hashtags, and URLs. Operators can be combined into queries with boolean logic and parentheses to help refine the queries matching behavior. 

Once you’ve set up your query and start receiving Tweets, these endpoints support navigating the results both by time and Tweet ID ranges. This is designed to support two common use cases: 

- **Get historical**: Requests are for a period of interest, with no focus on the real-time nature of the data. A single request is made, and all matching data is delivered using pagination as needed. This is the default mode for Search Tweets.
- **Polling or listening**: Requests are made in a "any new Tweets since my last request?" mode. Requests are made on a continual basis, and typically there is a use case focused on near real-time 'listening' for Tweets of interest.

Many operators and query limits are exclusive to Academic Research access, meaning that you must use keys and tokens from an App within a Project with Academic Research access to utilize the additional functionality. You can learn more about this in the endpoint sections below. Both the recent search and the full-archive search endpoints returned Tweets contribute to the monthly Tweet cap.

Now let's go over the two endpoints provided.

#### Recent search
The recent search endpoint allows you to programmatically access filtered public Tweets posted over the last week, and is available to all developers who have a developer account and are using keys and tokens from an App within a Project.

You can authenticate your requests with OAuth 1.0a User Context, OAuth 2.0 App-Only, or OAuth 2.0 Authorization Code with PKCE. However, if you would like to receive private metrics, or a breakdown of organic and promoted metrics within your Tweet results, you will have to use OAuth 1.0a User Context or OAuth 2.0 Authorization Code with PKCE, and pass user Access Tokens that are associated with the user that published the given content. 

This endpoint can deliver up to 100 Tweets per request in reverse-chronological order, and pagination tokens are provided for paging through large sets of matching Tweets. 

When using a Project with Essential or Elevated access, you can use the basic set of operators and can make queries up to 512 characters long. When using a Project with Academic Research access, you have access to additional operators and can make queries up to 1024 characters long. 

#### Full-archive search
*Academic Research access only*

The v2 full-archive search endpoint is only available to Projects with Academic Research access. The endpoint allows you to programmatically access public Tweets from the complete archive dating back to the first Tweet in March 2006, based on your search query.

You can authenticate your requests to this endpoint using OAuth 2.0 App-Only, and the App Access Token must come from an App that is within a Project that has Academic Research access. Since you cannot make a request on behalf of other users (OAuth 1.0a User Context or OAuth 2.0 Authorization Code with PKCE) with this endpoint, you will not be able to pull private metrics. 

This endpoint can deliver up to 500 Tweets per request in reverse-chronological order, and pagination tokens are provided for paging through large sets of matching Tweets. 

Since this endpoint is only available to those that have been approved for Academic Research access, you have access to the full set of search operators and can make queries up to 1024 characters long.

## Accessing the Twitter API with Python

Let's get our hands dirty with some code. I have applied to a Twitter Developer Account and I have my API key and API key secret stored in a seperate document, I will import it below. But first, lets introduce [Tweepy](https://www.tweepy.org/). Tweepy is an easy-to-use Python library for accessing the Twitter API. Let's pip install it so it is accessible in our environment. Note that you only have to pip install tweepy (or any library) once, but you will have to import it in every instance you need it.

In [1]:
# pip install tweepy (use either or)
#!pip install tweep
!python -m pip install git+https://github.com/tweepy/tweepy@master

Collecting git+https://github.com/tweepy/tweepy@master
  Cloning https://github.com/tweepy/tweepy (to revision master) to c:\users\amira\appdata\local\temp\pip-req-build-9uo9gm4y


  ERROR: Error [WinError 2] The system cannot find the file specified while executing command git version
ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?


In [2]:
# importing dependencies
import tweepy as tw
import pandas as pd

# Twitter Keys, tokens, and secrets are saved in seperate config file on my local device
from config import consumer_key, consumer_secret, api_key, api_secret, access_token, access_secret, bearer_token

Now that we have access to and have imported our API key and API key secret, we can initialize the tweepy OAuthHandler with the API key and the API secret and use it to get an instance of tweepy API class using which we’ll be making requests to the Twitter API.

In [3]:
# authenticate
auth = tw.OAuthHandler(api_key, api_secret)
api = tw.API(auth, wait_on_rate_limit=True)

A search query is simply a string telling the Twitter API what kind of tweets you want to search for. Imagine using the search bar on Twitter itself without the API. For example, if you want to search for tweets with "#chinesevirus", you’d simply type #chinesevirus in the Twitter search bar and it’ll show you those tweets.

Under the hood, if we’re using a search query with Twitter API, it actually returns the results from what you’d get had you searched for it directly on Twitter. The difference here is we can query for thousands, or millions of tweets, access other metadata about the tweets, and get to work analyzing and generating products with the data.

In [4]:
search_query = "#covid19 -filter:retweets"

Here we set up our search_query to fetch tweets with `#chinesevirus` but also filter out the retweets. You can customize your query based on your requirements. For more, refer to [this guide](https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/guides/standard-operators).

We can query exact phrases, we could add an 'or' clause, a 'minus' clause to query strings not containing a string, we can query hashtags, tweets sent from specific twitter accounts, a tweet authored tin reply to an account, tweets mentioning accounts, we can filter for or against tweets marked as potentially sensitive, tweets containing images and/or videos, tweets containing URLs (you can filter for strings inside the URL as well), you can filter for tweets sent before a date or since a date, you can filter for tweets containing positive attitudes or negative attitudes, and you can filter for tweets containing questions.

^ A lot I know, but these may come in handy when figuring out how we want to target our tweets.

Now that we can access the API, we will build the request for the endpoint we are going to use and the parameters we want to pass.

In [5]:
# get tweets from the API
tweets = tw.Cursor(api.search_tweets,
              q=search_query,
              lang="en",
              since="2022-02-25").items(20)

# store the API responses in a list
tweets_copy = []
for tweet in tweets:
    tweets_copy.append(tweet)
    
print("Total Tweets fetched:", len(tweets_copy))

Unexpected parameter: since


Forbidden: 403 Forbidden
453 - You currently have Essential access which includes access to Twitter API v2 endpoints only. If you need access to this endpoint, you’ll need to apply for Elevated access via the Developer Portal. You can learn more here: https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api#v2-access-leve

## A second approach

In [6]:
# For sending GET requests from the API
import requests
# For saving access tokens and for file management when creating and adding to the dataset
import os
# For dealing with json responses we receive from the API
import json
# For displaying the data after
import pandas as pd
# For saving the response data in CSV format
import csv
# For parsing the dates received from twitter in readable formats
import datetime
import dateutil.parser
import unicodedata
#To add wait time between requests
import time

from config import consumer_key, consumer_secret, api_key, api_secret, access_token, access_secret, bearer_token

In [7]:
os.environ['TWITTER_BEARER_TOKEN'] = bearer_token

In [8]:
def auth():
    return os.getenv('TOKEN')

def create_headers(bearer_token):
    headers = {"Authorization": "{}".format(bearer_token)}
    return headers

In [9]:
def create_url(keyword, start_date, end_date, max_results = 10):
    
    search_url = "https://api.twitter.com/2/tweets/search/all" #Change to the endpoint you want to collect data from

    #change params based on the endpoint you are using
    query_params = {'query': keyword,
                    'start_time': start_date,
                    'end_time': end_date,
                    'max_results': max_results,
                    #'expansions': 'author_id,in_reply_to_user_id,geo.place_id',
                    #'tweet.fields': 'id,text,author_id,in_reply_to_user_id,geo,conversation_id,created_at,lang,public_metrics,referenced_tweets,reply_settings,source',
                    #'user.fields': 'id,name,username,created_at,description,public_metrics,verified',
                    #'place.fields': 'full_name,id,country,country_code,geo,name,place_type',
                    'next_token': {}}
    return (search_url, query_params)

In [10]:
def connect_to_endpoint(url, headers, params, next_token = None):
    params['next_token'] = next_token   #params object received from create_url function
    response = requests.request("GET", url, headers = headers, params = params)
    print("Endpoint Response Code: " + str(response.status_code))
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

In [11]:
bearer_token = os.environ['TWITTER_BEARER_TOKEN']
headers = create_headers(bearer_token)
keyword = "#covid19 -filter:retweets"
start_time = "2021-03-01T00:00:00.000Z"
end_time = "2021-03-31T00:00:00.000Z"
max_results = 15

In [12]:
url = create_url(keyword, start_time, end_time, max_results)
json_response = connect_to_endpoint(url[0], headers, params = url[1])

Endpoint Response Code: 401


Exception: (401, '{\n  "title": "Unauthorized",\n  "type": "about:blank",\n  "status": 401,\n  "detail": "Unauthorized"\n}')

## A third approach

In [13]:
from requests_oauthlib import OAuth1Session
import os
import json
import requests

from config import consumer_key, consumer_secret, api_key, api_secret, access_token, access_secret

# Be sure to add replace the text of the with the text you wish to Tweet. You can also add parameters to post polls, quote Tweets, Tweet with reply settings, and Tweet to Super Followers in addition to other features.
payload = {"text": "Hello world!"}

# Get request token
request_token_url = "https://api.twitter.com/oauth/request_token?oauth_callback=oob&x_auth_access_type=write"
oauth = OAuth1Session(consumer_key, client_secret=consumer_secret)

try:
    fetch_response = oauth.fetch_request_token(request_token_url)
except ValueError:
    print(
        "There may have been an issue with the consumer_key or consumer_secret you entered."
    )

resource_owner_key = api_key
resource_owner_secret = api_secret
print("Got OAuth token: %s" % resource_owner_key)

There may have been an issue with the consumer_key or consumer_secret you entered.
Got OAuth token: LJdkEEuWMI7aqVTFV4e8e9R7q


## A fourth approach

In [14]:
twitter = OAuth1Session(consumer_key,
                            client_secret=consumer_secret,
                            resource_owner_key=api_key,
                            resource_owner_secret=api_secret)

url = 'https://api.twitter.com/1/account/settings.json'
twitter.get(url)

<Response [401]>

## A fifth approach
Lets try tweepy again.

In [15]:
import tweepy
from config import consumer_key, consumer_secret, api_key, api_secret, access_token, access_secret

# Authentication
consumerKey = consumer_key
consumerSecret = consumer_secret
accessToken = access_token
accessTokenSecret = access_secret

auth = tweepy.OAuthHandler(consumerKey, consumerSecret)
auth.set_access_token(accessToken, accessTokenSecret)
api = tweepy.API(auth)

#Sentiment Analysis
def percentage(part,whole):
    return 100 * float(part)/float(whole) 

keyword = "#covid19 -filter:retweets"
noOfTweet = 10


tweets = tweepy.Cursor(api.search_tweets(q=keyword)).items(noOfTweet)

Unauthorized: 401 Unauthorized
32 - Could not authenticate you.

## Questions Moving Forward

1. Temporal Trends: what time frame are we looking at? The research publication gathered tweets one week before, and one week after Trump made his first tweet referring to Covid-19 as the "Chinese Virus".
2. To build on point 1, if we want to access tweets archived prior to one week ago, we will need to apply for the Academic Research product. The product is free, and will grant us greater flexibility with our work. Of course, I plan to use this privilege reasonably, and make only small requests until we've further developed our project.
3. What notable sentiments are we targetting in the tweets? It is one thing to gather all tweets with the hashtag "#chinesevirus" or "#covid19". The article provided considers a tweet anti-Asian if one of the four conditions were met,
- 1. Was opposed to or hostile toward the region, the people, or culture of Asia;
- 2. Demonstrated a general fear, mistrust, and hatred of Asian ethnic groups;
- 3. Supported restrictions on Asian immigration
- 4. Used derogatory language or condoned punishments toward Asian countries or their people

Continuing on point 3, what sentiments are we targeting? From there we can establish methods to extract those sentiments from the tweets and filter for/against them.

4. Sentiment analysis is a natural language processing technique in Python. We can grade the sentiment of a string of text on a scale of [-1, 1], -1 indicating a negative sentiment, and 1 indicating a positive sentiment, while 0 indicated neutrality. Let's keep this in mind moving forward.