# Case Study: Analyzing Twitter Data on AWS DynamoDB

## Author: Dr. Esma Yildirim

Twitter data is a very good example of unstructured datasets. Twitter V2 API let us to pull data about users, tweets, geolocations, etc.. in JSON format. To get access to the data you need to first get a developer account. 

Go to developer.twitter.com and sign up for a new account. As part of the account creation process you will be provided a couple access keys and a bearer_token key. Save them in a file called `keys.py` using the following format: 

```
bearer_token = 'AA...'
consumer_key='rB...'
consumer_secret='Wh...'
```

## V2 API

V2 API is the only api available when you have `essential` access to twitter data. It is a set of web services which answers to HTTP requests. The data is sent over a HTTP response message. The following example gets information about user accounts given a list of usernames separated via commas. In this example we want data about the Twitter accounts TwitterDec, TwitterAPI, nasa and my own account EyeLight24509. The `user_fields` is a comma separated list of fields I want the data to include such as `description`, `created_at`, `location`, `entities`.

- created_at: date time the account was created
- location: geo location if the account turned on location 
- description: a description about the account if the user added it at the time of account creation
- entities: is a combined dataset that includes hashtags, urls, @ words in the text of the tweet. 

There are other fields that you can pull. A complete list of the API is given in your account under `Products`:

https://developer.twitter.com/en/portal/products


In [1]:
%pip install requests
import requests
import os
import json
import keys

# To set your enviornment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
#bearer_token = os.environ.get("BEARER_TOKEN")
bearer_token = keys.bearer_token

def create_url():
    # Specify the usernames that you want to lookup below
    # You can enter up to 100 comma-separated values.
    usernames = "usernames=TwitterDev,TwitterAPI,nasa,EyeLight24509"
    user_fields = "user.fields=description,created_at,location,entities"
    # User fields are adjustable, options include:
    # created_at, description, entities, id, location, name,
    # pinned_tweet_id, profile_image_url, protected,
    # public_metrics, url, username, verified, and withheld
    url = "https://api.twitter.com/2/users/by?{}&{}".format(usernames, user_fields)
    return url

#This function adds header info to the http request to enable authorization the the bearer_token
def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2UserLookupPython"
    return r

#There different types of HTTP requests. GET is one to pull data from Twitter server
#It returns data in json format.
def connect_to_endpoint(url):
    response = requests.request("GET", url, auth=bearer_oauth,)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()

# the main function creates the HTTP request URL and calls connect function to pull data
# it then prints it.
def main():
    url = create_url()
    json_response = connect_to_endpoint(url)
    print(json.dumps(json_response, indent=4, sort_keys=True))


if __name__ == "__main__":
    main()

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.
200
{
    "data": [
        {
            "created_at": "2013-12-14T04:35:55.000Z",
            "description": "The voice of the #TwitterDev team and your official source for updates, news, and events, related to the #TwitterAPI.",
            "entities": {
                "description": {
                    "hashtags": [
                        {
                            "end": 28,
                            "start": 17,
                            "tag": "TwitterDev"
                        },
                        {
                            "end": 116,
                            "start": 105,
                            "tag": "TwitterAPI"
                        }
                    ]
            

## Exercise 1: Creating a Followers and Friends Table in AWS DynamoDB

In this exercise, we will first get the list of followers and friends for a specific user account and then we will create a DynamoDB table to load this data to the cloud. Then we will perform queries using python or DynamoDB console. First let's analyze the twitter response to a `followers` and `following` request.

The followers and friends requests return a list of items with `created_at`, `id`, `username` and `name` fields. 

Let's create two tables in DynamoDB console: `friends` and `followers`
- The only required fields when you create a table is the primary key: partition key and sort key 
- `friends` table partition and sort key: `user_id`, `friend_user_id`
- `followers` table partition and sort key: `user_id`, `follower_user_id`

The rest of the fields are added in python code: 


In [13]:
import requests
import os
import json
import keys
import boto3

# To set your environment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
bearer_token = keys.bearer_token


def create_friend_url(user_id):
    return "https://api.twitter.com/2/users/{}/following".format(user_id)

def create_follower_url(user_id):
    url = "https://api.twitter.com/2/users/{}/followers".format(user_id)
    url += "?max_results=10" # only 10 followers
    return url;


def get_params():
    return {"user.fields": "created_at"}


def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2FollowersLookupPython"
    return r


def connect_to_endpoint(url, params):
    response = requests.request("GET", url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()


def friend_main():
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('friends')
    user_id = 1565198864 # my id
    url = create_friend_url(user_id)
    params = get_params()
    json_response = connect_to_endpoint(url, params)
    dumps = json.dumps(json_response, indent=4, sort_keys=True)
    print(dumps)
    #type(json_response['data'])
    sresponse = json_response['data']
    for i in range(0, len(sresponse )):
        item = {'user_id': user_id, # partition key
                'friend_user_id': int(sresponse[i]['id']), # sort key
                'created_at':sresponse[i]['created_at'],
                'friend_name':sresponse[i]['name'],
                'friend_username':sresponse[i]['username']
        }
        table.put_item(Item=item)
        print(item);
        
def follower_main():
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('followers')
    user_id = 11348282 # nasa
    url = create_follower_url(user_id)
    params = get_params()
    json_response = connect_to_endpoint(url, params)
    dumps = json.dumps(json_response, indent=4, sort_keys=True)
    print(dumps)
    #type(json_response['data'])
    sresponse = json_response['data']
    for i in range(0, len(sresponse )):
        item = {'user_id': user_id, # partition key
                'follower_user_id': int(sresponse[i]['id']), # sort key
                'created_at':sresponse[i]['created_at'],
                'follower_name':sresponse[i]['name'],
                'follower_username':sresponse[i]['username']
        }
        table.put_item(Item=item)
        print(item);
#friend_main()
follower_main()


Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Using cached pandas-1.3.5.tar.gz (4.7 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting numpy>=1.20.0; platform_machine == "arm64" and python_version < "3.10"
  Using cached numpy-1.22.0.zip (11.3 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting pytz>=2017.3
  Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Building wheels for collected packages: pandas, numpy
  Building wheel for pandas (PEP 517) ... [?25ldone
[?25h  Created wheel for pandas: filename=pandas-1.3.5-cp38-cp38-macosx_10_14_arm64.whl size=16918036 sha256=f3e3e494085fd350c924b1878f00dec8f0370bfa882d9e5a10e0d856da4be03e
  Stored in directory: /Users/eyildirim/Library/C

Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


ModuleNotFoundError: No module named 'pandas'

## Exercise 2: Upload the followers and friends of US senators


In this exercise we will construct a list of user ids from a csv file and pull followers and friends information of US senators from Twitter and upload to our previous tables.

Using pandas, twitter ids of senators are extracted from the `senators.csv` files and the previous code is updated to upload their followers and friends to DynamoDB. (senators.csv is downloaded from https://github.com/pdeitel/IntroToPython)

Since there is a maximum 15 requests per 15 minutes limit, every 15 requests, we sleep for 15 minutes. 


In [7]:
%pip install pandas
import requests
import os
import json
import keys
import pandas as pd
import boto3
import time

bearer_token = keys.bearer_token

def create_friend_url(user_id):
    url = "https://api.twitter.com/2/users/{}/following".format(user_id)
    url += "?max_results=100" # only 10 followers
    return url
def create_follower_url(user_id):
    url = "https://api.twitter.com/2/users/{}/followers".format(user_id)
    url += "?max_results=100" # only 10 followers
    return url;


def get_params():
    return {"user.fields": "created_at"}


def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2FollowersLookupPython"
    return r


def connect_to_endpoint(url, params):
    response = requests.request("GET", url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()


def friend_main():
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('friends')
    senators_df = pd.read_csv('senators.csv')
    print(senators_df)
    rcount = 1;
    for senatorID in senators_df.TwitterID:
        print("REQUEST COUNT:", rcount)
        if rcount % 15 == 0:
            time.sleep(15 * 60 +1); # 15 minutes
        rcount += 1
        url = create_friend_url(senatorID)
        params = get_params()
        json_response = connect_to_endpoint(url, params)
        dumps = json.dumps(json_response, indent=4, sort_keys=True)
        print(dumps)
    #type(json_response['data'])
        sresponse = json_response['data']
        for i in range(0, len(sresponse )):
            item = {'user_id': senatorID, # partition key
                    'friend_user_id': int(sresponse[i]['id']), # sort key
                    'created_at':sresponse[i]['created_at'],
                    'friend_name':sresponse[i]['name'],
                    'friend_username':sresponse[i]['username']
            }
            table.put_item(Item=item)
            print(item);
def follower_main():
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('followers')
    senators_df = pd.read_csv('senators.csv')
    print(senators_df)
    rcount = 1;
    for senatorID in senators_df.TwitterID:
        print("REQUEST COUNT:", rcount)
        if rcount % 15 == 0:
            time.sleep(15 * 60 +1); # 15 minutes
        rcount += 1
        url = create_follower_url(senatorID)
        params = get_params()
        json_response = connect_to_endpoint(url, params)
        dumps = json.dumps(json_response, indent=4, sort_keys=True)
        print(dumps)
        #type(json_response['data'])
        sresponse = json_response['data']
        for i in range(0, len(sresponse )):
            item = {'user_id': senatorID, # partition key
                    'follower_user_id': int(sresponse[i]['id']), # sort key
                    'created_at':sresponse[i]['created_at'],
                    'follower_name':sresponse[i]['name'],
                    'follower_username':sresponse[i]['username']
            }
            table.put_item(Item=item)
            print(item);
friend_main()
#follower_main()

Defaulting to user installation because normal site-packages is not writeable
Collecting pandas
  Downloading pandas-1.4.2.tar.gz (4.9 MB)
[K     |████████████████████████████████| 4.9 MB 4.1 MB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
[?25hCollecting pytz>=2020.1
  Downloading pytz-2022.1-py2.py3-none-any.whl (503 kB)
[K     |████████████████████████████████| 503 kB 3.6 MB/s eta 0:00:01
[?25hCollecting numpy>=1.20.0; platform_machine == "arm64" and python_version < "3.10"
  Downloading numpy-1.22.4.zip (11.5 MB)
[K     |████████████████████████████████| 11.5 MB 103 kB/s eta 0:00:01
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h    Preparing wheel metadata ... [?25ldone
Building wheels for collected packages: pandas, numpy
  Building wheel for pandas (PEP 517) ... [?25ldone


Note: you may need to restart the kernel to use updated packages.


ModuleNotFoundError: No module named 'pandas'

## Exercise 3: Querying the followers and friends tables

### 3.1. Let's display all the followers and friends of Senator X Y. 

To do that we need to provide a KeyConditionExpression and based on what we would like to display a ProjectionExpression. Let's say we only want to display their names

### 3.2 List people who are both a follower and a friend. 

Once we get the list of followers and friends a nested for loop that checks if their ids are equal can list the people who are friends and followers

### 3.3 Pull up all the details about these people as we did in the first example from Twitter. 

Once we have the user ids of the people, we can use the first example program to find out information about them. 


In [17]:
from pprint import pprint
import boto3
from boto3.dynamodb.conditions import Key

def create_user_url(userid_string):
    # Specify the usernames that you want to lookup below
    # You can enter up to 100 comma-separated values.
    userids = "ids=" + userid_string
    user_fields = "user.fields=description,created_at,location,entities"
    # User fields are adjustable, options include:
    # created_at, description, entities, id, location, name,
    # pinned_tweet_id, profile_image_url, protected,
    # public_metrics, url, username, verified, and withheld
    url = "https://api.twitter.com/2/users?{}&{}".format(userids, user_fields)
    return url

def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2UserLookupPython"
    return r

#There different types of HTTP requests. GET is one to pull data from Twitter server
#It returns data in json format.
def connect_to_endpoint(url):
    response = requests.request("GET", url, auth=bearer_oauth,)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()

def query_and_project_followers(user_id, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('followers')
    #print(f"Get follower name")

    # Expression attribute names can only reference items in the projection expression.
    response = table.query(
        ProjectionExpression="#fn, #fi",
        ExpressionAttributeNames={"#fn": "follower_name", "#fi":"follower_user_id"},
        KeyConditionExpression=
            Key('user_id').eq(user_id)
    )
    return response['Items']


def query_and_project_friends(user_id, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('friends')
    #print(f"Get follower name")

    # Expression attribute names can only reference items in the projection expression.
    response = table.query(
        ProjectionExpression="#fn,#fi",
        ExpressionAttributeNames={"#fn": "friend_name", "#fi":"friend_user_id"},
        KeyConditionExpression=
            Key('user_id').eq(user_id)
    )
    return response['Items']

if __name__ == '__main__':
   # query_year = 1992
   # query_range = ('A', 'L')
    senator_user_id = 21111098
    senator_name = "Richard Shelby"
   # print(f"Followers of Senator {senator_name}:")
    followers = query_and_project_followers(senator_user_id)
   # for follower in followers:
    #    print(f"\n{follower['follower_name']}")
    
   # print(f"Friends of Senator {senator_name}:")
    friends = query_and_project_friends(senator_user_id)
 #   for friend in friends:
  #      print(f"\n{friend['friend_name']}")
    print("BOTH FRIENDS AND FOLLOWERS") 
    userid_string = "";
    for friend in friends:
        for follower in followers:
            if friend['friend_user_id'] == follower['follower_user_id']:
                print(f"\n{friend['friend_name']}")
                userid_string += str(follower['follower_user_id'])+ ','
    if userid_string[len(userid_string)-1] == ',':
        userid_string = userid_string[0:len(userid_string)-1]
    print(userid_string)


    url = create_user_url(userid_string)
    json_response = connect_to_endpoint(url)
    print(json.dumps(json_response, indent=4, sort_keys=True))



BOTH FRIENDS AND FOLLOWERS

Aidan Quigley
2729686346
200
{
    "data": [
        {
            "created_at": "2014-08-13T15:22:58.000Z",
            "description": "Reporter covering how Congress spends money @cqnow/@rollcall Yankees/Jets fan past: @insidedefense @vtdigger aidanquigley@cqrollcall.com",
            "entities": {
                "description": {
                    "mentions": [
                        {
                            "end": 50,
                            "start": 44,
                            "username": "cqnow"
                        },
                        {
                            "end": 60,
                            "start": 51,
                            "username": "rollcall"
                        },
                        {
                            "end": 98,
                            "start": 84,
                            "username": "insidedefense"
                        },
                        {
                       

## Exercise 4: Getting user tweet and user mention data

In this exercise, we will be looking into tweet texts and what people are tweeting about the senators. We will create tweet and user mention tables on DynamoDB.

### Step 4.1. Getting senators' tweets and store them on DynamoDB

- First create a table in DynamoDB called `tweets` where partition key is `user_id` and sort key is `created_at` using DynamoDB console. user_id can be a number where created_at can be a string. 

- Parse the senators.csv file just like we did in Exercise 3 and pull 100 tweets per request and store them in tweets table. 

- Pull `created_at` and `text` fields.



In [3]:
import requests
import os
import json
import keys
import pandas as pd
import boto3
import time

# To set your environment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
bearer_token = keys.bearer_token


def create_url(user_id):
    # Replace with user ID below
    #user_id = 2244994945
    url = "https://api.twitter.com/2/users/{}/tweets".format(user_id)
    url += '?max_results=100'
    return url


def get_params():
    # Tweet fields are adjustable.
    # Options include:
    # attachments, author_id, context_annotations,
    # conversation_id, created_at, entities, geo, id,
    # in_reply_to_user_id, lang, non_public_metrics, organic_metrics,
    # possibly_sensitive, promoted_metrics, public_metrics, referenced_tweets,
    # source, text, and withheld
    return {"tweet.fields": "created_at,text"}


def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2UserTweetsPython"
    return r


def connect_to_endpoint(url, params):
    response = requests.request("GET", url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()


def main():
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('tweets')
    senators_df = pd.read_csv('senators.csv')
    #print(senators_df)
    rcount = 1;
    for senatorID in senators_df.TwitterID:
        print("REQUEST COUNT:", rcount)
        if rcount % 15 == 0:
            time.sleep(15 * 60 +1); # 15 minutes
        rcount += 1
        url = create_url(senatorID)
        params = get_params()
        json_response = connect_to_endpoint(url, params)
        dumps = json.dumps(json_response, indent=4, sort_keys=True)
        print(dumps)
        #type(json_response['data'])
        sresponse = json_response['data']
        for i in range(0, len(sresponse )):
            item = {'user_id': senatorID, # partition key
                    'created_at': str(sresponse[i]['created_at']), # sort key
                    'text':sresponse[i]['text'],
            }
            table.put_item(Item=item)
            print(item);

if __name__ == "__main__":
    main()



NameError: name 'pd' is not defined

### Step 4.2 Getting senator mention tweets and store them on DynamoDB

- First create a table in DynamoDB called `mentions` where partition key is `user_id` and sort key is `created_at` using DynamoDB console. user_id can be a number where created_at can be a string. 

- Parse the senators.csv file just like we did in Exercise 3 and pull 100 tweets per request and store them in tweets table. 

- Pull `created_at`,`text` and `author_id` fields.

In [None]:
import requests
import os
import json
import keys
import pandas as pd
import boto3
import time

# To set your environment variables in your terminal run the following line:
# export 'BEARER_TOKEN'='<your_bearer_token>'
bearer_token = keys.bearer_token


def create_url(user_id):
    # Replace with user ID below
    #user_id = 2244994945
    url = "https://api.twitter.com/2/users/{}/mentions".format(user_id)
    url += '?max_results=100'
    return url


def get_params():
    # Tweet fields are adjustable.
    # Options include:
    # attachments, author_id, context_annotations,
    # conversation_id, created_at, entities, geo, id,
    # in_reply_to_user_id, lang, non_public_metrics, organic_metrics,
    # possibly_sensitive, promoted_metrics, public_metrics, referenced_tweets,
    # source, text, and withheld
    return {"tweet.fields": "created_at,text,author_id"}


def bearer_oauth(r):
    """
    Method required by bearer token authentication.
    """

    r.headers["Authorization"] = f"Bearer {bearer_token}"
    r.headers["User-Agent"] = "v2UserMentionsPython"
    return r


def connect_to_endpoint(url, params):
    response = requests.request("GET", url, auth=bearer_oauth, params=params)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Request returned an error: {} {}".format(
                response.status_code, response.text
            )
        )
    return response.json()


def main():
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('mentions')
    senators_df = pd.read_csv('senators.csv')
    #print(senators_df)
    rcount = 1;
    for senatorID in senators_df.TwitterID:
        print("REQUEST COUNT:", rcount)
        if rcount % 15 == 0:
            time.sleep(15 * 60 +1); # 15 minutes
        rcount += 1
        url = create_url(senatorID)
        params = get_params()
        json_response = connect_to_endpoint(url, params)
        dumps = json.dumps(json_response, indent=4, sort_keys=True)
        print(dumps)
        #type(json_response['data'])
        sresponse = json_response['data']
        for i in range(0, len(sresponse )):
            item = {'user_id': senatorID, # partition key
                    'created_at': str(sresponse[i]['created_at']), # sort key
                    'text':sresponse[i]['text'],
                    'author_id':sresponse[i]['author_id']
            }
            table.put_item(Item=item)
            print(item);

if __name__ == "__main__":
    main()




## Exercise 5: Emotion Analysis of Senator user tweets and mention tweets

In this exercise, we will analyze the tweets and user mentions text data and measure `polarity` and `subjectivity`. Python's `textblob` module has objects that can measure polarity and subjectivity in a text. Polarity is a number between -1 and 1. If the measure gives a positive number then the sentiment of the text is mostly positive. Otherwise, it is negative. A value of 0 shows neutral sentiment. Subjectivity is a number between 0 and 1. 0 refers to being objective while 1 refers to being subjective.

But first, we need to clean data in multiple steps. A tweet consists of special characters, html links all of which needs to be removed before we can analyze the text.

After the results are in, we will create a bar graph to compare senator's tweets and user mentions. 

But first, let's read the tweets from DynamoDB using user_id of the senators. 



In [2]:
#%pip install re
#%pip install html
#import statements
import requests
import os
import json
import keys
import boto3 # aws library
import time
import regex as re #regex library
import html
import pandas as pd
from boto3.dynamodb.conditions import Key
from textblob import TextBlob
import matplotlib.pyplot as plt



def query_tweets(user_id, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('tweets')

    # Expression attribute names can only reference items in the projection expression.
    response = table.query(
        ProjectionExpression="#uid,#tx",
        ExpressionAttributeNames={"#uid": "user_id", "#tx":"text"},
        KeyConditionExpression=
            Key('user_id').eq(user_id)
    )
    return response['Items']

def clean_tweet(tweet):
    x = tweet.replace("\n"," ") #cleaning newline “\n” from the tweets
    tweet = html.unescape(x)
    tweet= re.sub(r"(@[A-Za-z0–9_]+)|[^\w\s]|#|http\S+", "", tweet) # clean links and hashtag
    return tweet

def visualize(emotion_dict):
   
    pol = []
    sub = []
    for key in emotion_dict.keys():
        pol.append(emotion_dict[key][0])
        sub.append(emotion_dict[key][1])
    data = {'Polarity': pol,
        'Subjectivity': sub
       }
    df = pd.DataFrame(data,columns=['Polarity','Subjectivity'], index = emotion_dict.keys())

    df.plot.barh()

    plt.title('Senator Tweets')
    plt.ylabel('Emotions')
    plt.xlabel('Senators')
    plt.show()

def main():
    # read senators.csv to get the user_ids
    senators_df = pd.read_csv('senators.csv')
    emotion_dict = {}
    senator_names = senators_df.Name
    senator_ids = senators_df.TwitterID
    type(senator_ids)
    for i in range(len(senator_ids)):
        senatorID = int(senator_ids[i])
        senatorName = str(senator_names[i])
        senator_tweets = ""
        response = query_tweets(senatorID)
        #clean each tweet for this senator
        for i in range(0, len(response)):
            tweet = response[i]["text"]
            tweet = clean_tweet(tweet)
            senator_tweets += tweet
            #print(senatorID, ":", senator_tweets)
        #calculate polarity and subjectivity
        blob = TextBlob(senator_tweets)
        print("SenatorID:",senatorID, ":","Polarity:", blob.sentiment.polarity, "Subjectivity:", blob.sentiment.subjectivity)
        if(float(blob.sentiment.polarity) > 0.0 or float(blob.sentiment.polarity) < 0.0):
            emotion_dict[senatorName] = (blob.sentiment.polarity, blob.sentiment.subjectivity)
    print(emotion_dict)
    visualize(emotion_dict)

         
if __name__ == "__main__":
    main()


Defaulting to user installation because normal site-packages is not writeable
[31mERROR: Could not find a version that satisfies the requirement re (from versions: none)[0m
[31mERROR: No matching distribution found for re[0m
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Collecting html
  Downloading html-1.16.tar.gz (7.6 kB)
[31m    ERROR: Command errored out with exit status 1:
     command: /Library/Developer/CommandLineTools/usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/2y/xn6nvzw10n357g4m_8dd8wf80000gn/T/pip-install-0treymb7/html/setup.py'"'"'; __file__='"'"'/private/var/folders/2y/xn6nvzw10n357g4m_8dd8wf80000gn/T/pip-install-0treymb7/html/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file_

NameError: name 'pd' is not defined

Here is a sample bar graph that compares senator tweet emotions:

![Senator_Tweets](senator_tweets.png)



Now let's find out who is the most popular and objective opinion about them. We will do the same analysis for `mentions` table.

In [None]:
#%pip install re
#%pip install html
#import statements
import requests
import os
import json
import keys
import boto3 # aws library
import time
import regex as re #regex library
import html
import pandas as pd
from boto3.dynamodb.conditions import Key
from textblob import TextBlob
import matplotlib.pyplot as plt

def query_mentions(user_id, dynamodb=None):
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('mentions')

    # Expression attribute names can only reference items in the projection expression.
    response = table.query(
        ProjectionExpression="#uid,#tx",
        ExpressionAttributeNames={"#uid": "user_id", "#tx":"text"},
        KeyConditionExpression=
            Key('user_id').eq(user_id)
    )
    return response['Items']

def clean_tweet(tweet):
    x = tweet.replace("\n"," ") #cleaning newline “\n” from the tweets
    tweet = html.unescape(x)
    tweet= re.sub(r"(@[A-Za-z0–9_]+)|[^\w\s]|#|http\S+", "", tweet) # clean links and hashtag
    return tweet

def visualize(emotion_dict):
   
    pol = []
    sub = []
    for key in emotion_dict.keys():
        pol.append(emotion_dict[key][0])
        sub.append(emotion_dict[key][1])
    data = {'Polarity': pol,
        'Subjectivity': sub
       }
    df = pd.DataFrame(data,columns=['Polarity','Subjectivity'], index = emotion_dict.keys())

    df.plot.barh()

    plt.title('Senator Mentions')
    plt.ylabel('Emotions')
    plt.xlabel('Senators')
    plt.show()

def main():
    # read senators.csv to get the user_ids
    senators_df = pd.read_csv('senators.csv')
    mention_emotion_dict = {}
    senator_names = senators_df.Name
    senator_ids = senators_df.TwitterID
    type(senator_ids)
    for i in range(len(senator_ids)):
        senatorID = int(senator_ids[i])
        senatorName = str(senator_names[i])
        senator_tweets = ""
        response = query_mentions(senatorID)
        #clean each tweet for this senator
        for i in range(0, len(response)):
            tweet = response[i]["text"]
            tweet = clean_tweet(tweet)
            senator_tweets += tweet
            #print(senatorID, ":", senator_tweets)
        #calculate polarity and subjectivity
        blob = TextBlob(senator_tweets)
        #print("SenatorID:",senatorID, ":","Polarity:", blob.sentiment.polarity, "Subjectivity:", blob.sentiment.subjectivity)
        if(blob.sentiment.polarity > 0.0 or blob.sentiment.polarity < 0.0): # to be deleted
            mention_emotion_dict[senatorName] = (blob.sentiment.polarity, blob.sentiment.subjectivity)
    print(mention_emotion_dict)
    visualize(mention_emotion_dict)

         
if __name__ == "__main__":
    main()



Here is a sample bar graph that compares senator tweet emotions:

![Senator_Mentions](senator_mentions.png)


