# Twitter Text Report
#### Joshua Malone

#### The Cleveland Cavaliers recently started their season on Wednesday, October 19th, and as well as had their home opener on Sunday October 23rd. With big expectations this season for the team, my goal is to observe what people are saying about the cavs after their 2-1 start to the season, but more specifically look at where people are tweeting about them from. With so many ways to watch games nowadays, people can watch from anywhere, and not just in their local market, so my goal is to find out if the majority of the people tweeting about the Cavs are from the greater Cleveland area, or from fans all across the country. Recently the Cavs have played the Washington Wizards, Chicago Bulls, and upcoming the Orlando Magic, so besides Cleveland, i expect those 3 cities to be the highest in tweets about the Cavs.

In [45]:
import pandas as pd
import json
import requests
import urllib

In [46]:
bearer_token = pd.read_csv("Twitter_token_9-22.txt", header = 0)

In [47]:
bearer_token['Bearer_Token'].iloc[0]

'AAAAAAAAAAAAAAAAAAAAAEgIhAEAAAAAoMBdF0ktPf8mblPwPaRJ8U23ZPE%3DTXIZQHLnHZuySkMjGjzlybC08OAQGb4B2Yt7IHpbMKTbI9Kda4'

In [48]:
header = {'Authorization':'Bearer {}'.format(bearer_token['Bearer_Token'].iloc[0])}

In [49]:
endpoint = 'https://api.twitter.com/2/tweets/search/recent'

#### Below, I am creating a query that will search for cavs related content. I have names of players, team nicknames and names of the teams the Cavs have recently played. Additionally, after I used the query the first time, there was a lot of adidas related content because of what is going on with they brand and certain celeberties right now, so I excluded anything adidas related since it was overwhelming.

In [50]:
query_param = urllib.parse.quote('cleveland (cavs OR cavaliers OR preseason OR nba) (cavs OR evanmobley OR daruisgarland OR isaacokoro OR jarrettallen OR nbacavs OR donovanmitchell OR cavswizards OR cavsmagic OR cavsbulls) lang:en -adidas -@adidas -#adidas -is:retweet')

#### In the following 4 cells, I am using parameters to get the information about the tweets I want. This will include location that the user lists (if provided), the amount of retweets, likes, etc, when the tweet was created, and the username of the user that tweeted.

In [51]:
expansions = 'author_id'

In [52]:
user_fields = 'username,location'

In [53]:
tweet_fields = 'public_metrics,created_at'

In [54]:
query_url = endpoint + '?query={}&tweet.fields={}&expansions={}&user.fields={}&max_results=100'.format(query_param, tweet_fields, expansions, user_fields)

#### Generating the URL to search

In [55]:
query_url

'https://api.twitter.com/2/tweets/search/recent?query=cleveland%20%28cavs%20OR%20cavaliers%20OR%20preseason%20OR%20nba%29%20%28cavs%20OR%20evanmobley%20OR%20daruisgarland%20OR%20isaacokoro%20OR%20jarrettallen%20OR%20nbacavs%20OR%20donovanmitchell%20OR%20cavswizards%20OR%20cavsmagic%20OR%20cavsbulls%29%20lang%3Aen%20-adidas%20-%40adidas%20-%23adidas%20-is%3Aretweet&tweet.fields=public_metrics,created_at&expansions=author_id&user.fields=username,location&max_results=100'

In [56]:
response = requests.get(query_url, headers = header)

In [57]:
response_dict = json.loads(response.text)

In [63]:
response_df = pd.DataFrame(response_dict['data'])

In [70]:
response_df_locs = pd.DataFrame(response_dict['includes']['users'])

In [64]:
response_dict.keys()

dict_keys(['data', 'includes', 'meta'])

#### Below is my first DataFrame, this includes the username, id, name and location. Location is what Im the most concerned about here, as my goal is to see where people are located that are tweeting about the cavs.

In [168]:
response_df_locs

Unnamed: 0,username,id,location,name
0,faidley_david,1543976349798391808,Laurel Highlands PA,DF727
1,KosichJohn,551434032,Cleveland,John Kosich
2,fox8news,16243550,Cleveland,fox8news
3,ChrisFedor,123777284,"Cleveland, OH",Chris Fedor
4,AmNotEvan,714293764676796417,Cleveland | he/him,Evan Dammarell
...,...,...,...,...
79,SeniorsWinSBs,1073146200352833536,O-H-I-O,Mr. Football
80,IamGFranklin,148770371,Cleveland Ohio,G. A. Franklin
81,postupvideos,1479800838667358215,"Fort Lauderdale, FL",PostUpVideos.com
82,itBMe216,3363167751,Unknown crack alley,add my name


#### Below is my 2nd DataFrame, which includes a lot more than the first, here, we are most focused on creation time, and the text of what the tweet actually says.

In [72]:
response_df.head()

Unnamed: 0,author_id,public_metrics,edit_history_tweet_ids,created_at,id,text
0,1543976349798391808,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",[1585031635623632896],2022-10-25T22:12:45.000Z,1585031635623632896,I would hope Cleveland fans would spend their ...
1,551434032,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",[1585029013856296972],2022-10-25T22:02:19.000Z,1585029013856296972,"6 years ago tonight, a magical night in Clevel..."
2,16243550,"{'retweet_count': 0, 'reply_count': 1, 'like_c...",[1585028976086765568],2022-10-25T22:02:10.000Z,1585028976086765568,Basketball returned to Cleveland over the week...
3,123777284,"{'retweet_count': 4, 'reply_count': 2, 'like_c...",[1585024191371177984],2022-10-25T21:43:10.000Z,1585024191371177984,"Going back to last year, #Cavs coach J.B. Bick..."
4,714293764676796417,"{'retweet_count': 0, 'reply_count': 1, 'like_c...",[1585014227533602819],2022-10-25T21:03:34.000Z,1585014227533602819,Mike Brown is in Cleveland and this time it is...


In [73]:
response_dict['meta']['next_token']

'b26v89c19zqg8o3fpzel4no9h0ohyqus0k2vqznwmf0xp'

In [74]:
next_query_url = query_url + "&next_token={}".format(response_dict['meta']['next_token'])

In [75]:
next_response = requests.get(next_query_url, headers = header)

In [76]:
next_response_dict = json.loads(response.text)

In [77]:
next_response_dict['meta']

{'newest_id': '1585031635623632896',
 'oldest_id': '1584599036417671168',
 'result_count': 100,
 'next_token': 'b26v89c19zqg8o3fpzel4no9h0ohyqus0k2vqznwmf0xp'}

#### Below is a function created for gathering the tweets

In [78]:
def twt_recent_search (query, num_pages, header):
    response_list = []
    next_token = ''
    for i in range(0, num_pages):
        if i > 0:
            this_query = query + "&next_token={}".format(next_token)
        else:
            this_query = query
        
        this_response = requests.get(this_query, headers = header)
        print(this_response.status_code)
        this_response_dict = json.loads(this_response.text)
        response_list.append(this_response_dict)
        next_token = this_response_dict['meta']['next_token']
        
    return response_list

In [124]:
my_responses = twt_recent_search(query_url, 5, header)

200
200
200
200
200


In [125]:
results_1 = pd.DataFrame.from_records(my_responses)

In [126]:
data_list = list(results_1['data'])

In [127]:
data_lists_of_dfs = [pd.DataFrame(x) for x in data_list]

In [128]:
data_df = pd.concat(data_lists_of_dfs)

#### Below is a DataFrame similar to one of the previous ones, but this is generated from the function created above

In [129]:
data_df

Unnamed: 0,edit_history_tweet_ids,created_at,public_metrics,author_id,text,id
0,[1585056484735741954],2022-10-25T23:51:29.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",1543652260055601156,@ESPNCleveland WE ARE TIRED OF CLEVELAND SPORT...,1585056484735741954
1,[1585054496132333569],2022-10-25T23:43:35.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",4888287942,@Abubakerrr1 @Bgmegamanzero1 I was a Laker fan...,1585054496132333569
2,[1585054468986765312],2022-10-25T23:43:28.000Z,"{'retweet_count': 0, 'reply_count': 1, 'like_c...",1543652260055601156,"@AnthonyAlford92 ""UNINTELLIGENT CLEVELAND SPO...",1585054468986765312
3,[1585049472308252673],2022-10-25T23:23:37.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",1509354741553319942,Heed the reed bech \n/tU😀OE3WA056YF\nHEREESSS2...,1585049472308252673
4,[1585049116710576128],2022-10-25T23:22:12.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",163927613,"Three games into the season, the fact that Don...",1585049116710576128
...,...,...,...,...,...,...
95,[1584203402589806592],2022-10-23T15:21:38.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",57031415,Nba Props /Spread \n\nCleveland Cavs -3.5\nPho...,1584203402589806592
96,[1584203031524282368],2022-10-23T15:20:10.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",121491617,"If MIN, MEM or NO moved to the East, how would...",1584203031524282368
97,[1584200689030692865],2022-10-23T15:10:51.000Z,"{'retweet_count': 0, 'reply_count': 0, 'like_c...",1418612989272465412,"Last Sunday, the football team from Cleveland ...",1584200689030692865
98,[1584198863862525953],2022-10-23T15:03:36.000Z,"{'retweet_count': 82, 'reply_count': 15, 'like...",19263978,"Cleveland, we're home.\n\n🕖 7:00PM ET\n📺 @Ball...",1584198863862525953


In [136]:
my_responses2 = twt_recent_search(query_url, 5, header)

200
200
200
200
200


In [137]:
results2 = pd.DataFrame.from_records(my_responses2)

In [138]:
data_list2 = list(results2['includes'])

In [139]:
data_lists_of_dfs2 = [pd.DataFrame(x) for x in data_list2]

In [140]:
data_df2 = pd.concat(data_lists_of_dfs2)

#### Again, below is a DataFrame similar to one of the previous ones, but this is generated from the function created above

In [166]:
data_df2

Unnamed: 0,users
0,"{'name': 'Curtis Brown', 'id': '15436522600556..."
1,"{'name': 'Dre Day', 'id': '4888287942', 'usern..."
2,"{'name': 'AUSTIN_CARR.EXE', 'id': '15093547415..."
3,"{'name': 'Josh Poloha', 'id': '163927613', 'us..."
4,"{'name': 'Jay Filaye', 'id': '1233981215474143..."
...,...
89,"{'location': 'Paterson, New Jersey', 'id': '57..."
90,"{'location': 'Valenzuela', 'id': '121491617', ..."
91,"{'location': 'Cleveland, OH', 'id': '141861298..."
92,"{'location': 'Cleveland, OH', 'id': '19263978'..."


In [142]:
data_df_tweets = pd.DataFrame(data_df['text'])

#### Below is the text of the actual tweet (I couldnt understand how to get it to display the entire output, I dont understand why it does this)

In [171]:
data_df_tweets[:10]

Unnamed: 0,text
0,@ESPNCleveland WE ARE TIRED OF CLEVELAND SPORT...
1,@Abubakerrr1 @Bgmegamanzero1 I was a Laker fan...
2,"@AnthonyAlford92 ""UNINTELLIGENT CLEVELAND SPO..."
3,Heed the reed bech \n/tU😀OE3WA056YF\nHEREESSS2...
4,"Three games into the season, the fact that Don..."
5,@clickthatfollow @reesh0001 @TherealMiiC @funn...
6,Here is a link to purchase tickets for the Boy...
7,@DevaronPerry I’m from California and can’t wa...
8,I would hope Cleveland fans would spend their ...
9,"6 years ago tonight, a magical night in Clevel..."


#### Again, I couldnt understand how to pull out the location since it is not listed for every user, so it will not have its own singular DataFrame, however I will still use the DataFrame from earlier that includes the locations to come up with my results

In [160]:
data_df_locs_test = [['location'] for x in data_df2['users']]

#### Based off the dataframes I came up with, most users tweeting about the cavs are located in the area. To my surprise, the poeple outside of the city of Cleveland that are talking about the cavs, arent located in the cities of the teams that the cavs have recently played, it moreso seems like just general fans of the NBA discussing what they think about teams.