# Overwatch Halloween Conversation Exploration
The following report analyses the Overwatch community response to the new Halloween event in-game. Blizzard, the company that owns Overwatch, typically runs an event for most large holidays. This Halloween is special because it is the first event for the holiday within the next game in the series, Overwatch 2.  

The game has faced a lot of scrutiny because of many factors like company culture issues and scandals, and the lack of variation between the first game and the sequel.  

This report is going to look at the response from the community toward the event. Also, it is going to analyze if there is possibly any work that Blizzard could do to help benefit the game using the platform.


## Importing Libraries, Generating Query, Getting Response

In [174]:
import requests
import json5
import urllib as url
import pandas as pd
import numpy as np

In [175]:
bearer_token = pd.read_csv('tokens.txt', header = 0)['Bearer_Token'].iloc[0]
header = {'Authorization':'Bearer {}'.format(bearer_token)}
endpoint = 'https://api.twitter.com/2/tweets/search/recent'

The query is trying to gather the sentiment of the community surrounding the event. By using the keyword Overwatch and 2, this ensures that the query filters to our specific event. Then, including the keywords Halloween and event further allows the query to grab the tweets that are needed for this report. The keyword 2 is omitted to make sure that tweets are gathered even if the text does not contain the full title of the game.

In [176]:
query = url.parse.quote('(overwatch halloween OR overwatch 2 halloween OR overwatch 2 event OR overwatch 2 meta OR overwatch halloween OR overwatch event) lang:en -is:retweet')
expansions = url.parse.quote('attachments.media_keys,author_id')
media_fields = url.parse.quote('alt_text,url,public_metrics')
user_fields = url.parse.quote('username,verified')
tweet_fields = url.parse.quote('attachments,author_id,created_at')
search_recent_url = endpoint + '?query={}&expansions={}&user.fields={}&media.fields={}&tweet.fields={}&max_results=100'.format(query,expansions,user_fields,media_fields,tweet_fields)

Pagination is required to gather more than one page of tweets for the query. This is done below to grab 20 pages of 100 tweets each, and then store the results in a dictionary.

In [177]:
def pagination(query, headerVals, pages):
    response_list = []
    next_token = ''
    for i in range(pages):
        if i > 0:
            this_query = query + '&next_token={}'.format(next_token)
        else:
            this_query = query
            
        response = requests.get(query, headers = header)
        response_dict = json5.loads(response.text)
        response_list.append(response_dict)
        next_token = response_dict['meta']['next_token']
    return response_list

page_dict = pagination(search_recent_url, header, 20)

## Data Organization, Exploration, and Analysis

The below is the code to generate the data frame that stores the tweets. There are 20 entries because of the 20 iterations of pagination. Each row in the data frame holds a list of dictionaries that hold 100 tweets each. The total tweet count is 2000.

In [178]:
results = pd.DataFrame.from_records(page_dict)

In [231]:
usernames = []
verified = []

for entry in results['includes']:
    for name in entry['users']:
        usernames.append(name['username'])
        verified.append(name['verified'])
 
userInfo = pd.concat([pd.Series(usernames), pd.Series(verified)], axis=1)
userInfo = userInfo.rename(columns={0: 'Username', 1: 'Verified'})
userInfo.groupby(['Verified']).count()

Unnamed: 0_level_0,Username
Verified,Unnamed: 1_level_1
False,1980


The above result is interesting. This result indicates that the conversation surrounding the Overwatch Halloween event is completely had by non-verified users. If I were a social media manager for Blizzard, I am not sure if this would be good or bad for the game. 
The event is circulating throughout large groups of non-verified people, and this means that the game has a large pool of people that are interested without being financially motivated like an influencer or company account. This can be good because the game is circulating by word of mouth without much financial input.
On the other side, a social media manager may want to see verified users speak about the game on the platform. Typically, verified users have a wider reach, and Blizzard would be interested in some influencer-like individuals talking about their game.
Overall, I think this exploration would be beneficial for Blizzard to see on a longer time scale. That scale could include from the time any teasers for the event came out, until the end of the event. Then, the company could understand if there is a need for more verified accounts to weigh in on their events to hopefully reach a bigger audience.

The next code snipped will gather all of the text fields for each of the tweets. The length of the resulting list should match the number of tweets requested from the API. This data is gathered from the includes field. 

In [232]:
tweet_text = []
tweet_length = []
for entry in results['data']:
    for tweet in entry:
        string = tweet['text'].encode('utf-16', 'surrogatepass').decode('utf-16')
        tweet_text.append(string)
        tweet_length.append(len(string))
len(tweet_text)

2000

In [197]:
tweetInfo = pd.concat([pd.Series(tweet_text), pd.Series(tweet_length)], axis=1)
tweetInfo = tweetInfo.rename(columns={0: 'Tweet', 1: 'Length'})
tweetLength = tweetInfo.sort_values(by=['Length'], ascending=False)
tweetLength.head()

Unnamed: 0,Tweet,Length
1702,"Community Overwatch 2 Night!! UFEA-ers, it's t...",303
601,"Community Overwatch 2 Night!! UFEA-ers, it's t...",303
201,"Community Overwatch 2 Night!! UFEA-ers, it's t...",303
501,"Community Overwatch 2 Night!! UFEA-ers, it's t...",303
1502,"Community Overwatch 2 Night!! UFEA-ers, it's t...",303


The above data frame includes all tweets sorted by their length. The longest tweet is 303 characters, including the links at the end, as well as new line characters. There are many repeating values which will be spoke about further. Overall, longer tweets could have a negative impact on the conversation because of the need for brevity in Social Media.

In [212]:
tweetLength['Tweet'][1702]

'Community Overwatch 2 Night!! UFEA-ers, it\'s time for some FUN community night play! We are going to be running a "Open Bracket" Overwatch 2 mini-tournament tonight at 6:00pm PT / 9:00pm ET. \n\nEvent Link: https://t.co/Zj6Ll4lc4l \n\nView on our Twitch Page: https://t.co/7WBmaP3r2P https://t.co/kL4YtqT4y9'

In [203]:
tweetLength['Tweet'][601] == tweetLength['Tweet'][1702] == tweetLength['Tweet'][201] == tweetLength['Tweet'][501]

True

In [209]:
len(tweetLength.loc[tweetLength['Length'] == 303])

20

The query at the beginning of the report excludes re-tweets from the returning object. It is interesting that about 20 of the tweets in this conversation are the longest tweets in the set, and also that they are all the same text values. This is a data issue in this report that was not expected.  
I do not know if repeating values are typically good or bad in a conversational dataset. There is one view where a repeating value represents a sentiment that is *stronger* than another because it is being repeated, another is that the repeated sentence is just spam in the conversation. In this situation, I think the repeating values may be negative spam.  
The text related to the repeated values seems to be an ad, and if that is the case, it does not provide a deeper insight into the conversation happening on twitter surrounding the Overwatch 2 event.
This may be an issue with the dataset if there are more instances of repeating tweets.

In [229]:
print(len(tweetInfo.Tweet.unique()))
print(len(userInfo.Username.unique()))

101
100


This snippet of code above confirms the issue with the tweet dataset. There are only *101* unique tweets out of a dataset of *2000* tweets with pagination. An equally important finding is that there are only 100 unique users tweeting about the event according to the dataset that was gathered.
There is something interesting going on with the conversation surrounding the Overwatch 2 Halloween event. The next steps would be to explore why there is such a low number of unique tweets, as well as why there are so little unique users talking about the event.
The last exploration that should be done regarding this problem is to look at when the tweets were created.

In [272]:
from dateutil import parser
from datetime import datetime

results['data'][0][0]['created_at']
created_vals = []

for entry in results['data']:
    for value in entry:
        created_vals.append(value['created_at'])
print(created_vals[1])
print(created_vals[-1])

2022-10-25T22:45:24.000Z
2022-10-25T22:07:00.000Z


Finally, from the above code we can see that the tweets are only collected within about a 45 minute time frame. This is important because of our previous findings. It is an issue that the dataset is not unique, but if we are only capturing a 45 minute conversation, this is slightly less worrisome. This does not mean the issue is non-existent, and it should definitely be investigated by Blizzard to see if they can help steer conversations toward the benefit of the game.

## Conclusion

The above tweet exploration performed on the Overwatch 2 Halloween Event derived valuable insights and experience for myself, but would be equally as insightful for a Blizzard social media employee. The dataset queried tweets across a 45 minute timeframe, but these tweets were not unique from one another and were mostly from non-verified users. A social media manager would see this and possibly attempt to drive more verified users to the conversation.