# Twitter Webscraping and SWGOH

## Background

This mini-project of mine involves using Twitter's API to scrape tweets about an EA mobile game, Star Wars Galaxy of Heroes (SWGOH, for short). The game involves battling against other players using a team of Star Wars characters the player must obtain through farming or in-app purchases. The game is free to play with a variety of microtransactions offered.


As a player myself, there was often a good amount of animosity towards Capital Games and Electronic Arts about the state of the game, the quality of new additions to the app, and pricing for in-game transactions. Given this, I was curious if I could quantify this sentiment using tweets mentioning the game. I hope to expand this sentiment analysis to the EA forums at a later date, and (provided I can find the data), I'd like to perform market analysis on EA's revenue from the game and attempt to tie it back to general player sentiment.

## Technical Stuff

I first signed up for a twitter developer account, which allowed me the ability to scrape a limited amount of tweets using the API. These tweets were returned as a JSON, which is unpacked into a list.

The limitations of my API usage were that it only allowed me to retrieve about 50 tweets from up to a week prior to running the script. This really was not enough data to perform a proper ssentiment analysis on, especially since many of the tweet were automated reminders about in-game events. To work around this, I have been running the script once a week for some time now, and have saved each weeks's tweets to a CSV file. Once this file acquires a good amount of data, I plan to run more computationally intensive sentiment analysis.

In the meantime, after scraping the recent tweets, I'm using VADER from Natural Language Toolkit (NLTK) to perform some surface-level sentiment analysis on the tweets. I'll be performing tokenization, lemmization, stemming, and stopword removal on these corpi



In [1]:
# For sending GET requests from the API
import requests
# For saving access tokens and for file management when creating and adding to the dataset
import os
# For dealing with json responses we receive from the API
import json
# For displaying the data after
import pandas as pd
# For saving the response data in CSV format
import csv
# For parsing the dates received from twitter in readable formats
import datetime
import dateutil.parser
import unicodedata
#To add wait time between requests
import time

In [5]:
os.environ['TOKEN'] = "AAAAAAAAAAAAAAAAAAAAAAs0ggEAAAAAZa1IS2xF3DqB4kIElAFDyc6oyZM%3DKaJowQh3hk4sqf8b8Lb3DrlffIBE7wDWoTrbTQNrGI4tbaqXRz"

In [6]:
def auth():
    return os.getenv('TOKEN')

def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers

def create_url(keyword, max_results = 50):
    
    search_url = "https://api.twitter.com/2/tweets/search/recent" #Change to the endpoint you want to collect data from

    #change params based on the endpoint you are using
    query_params = {'query': keyword,
                    #'start_time': start_date,
                    #'end_time': end_date,
                    'max_results': max_results,
                    'expansions': 'author_id,in_reply_to_user_id,geo.place_id',
                    'tweet.fields': 'id,text,author_id,in_reply_to_user_id,geo,conversation_id,created_at,lang,public_metrics,referenced_tweets,reply_settings,source',
                    'user.fields': 'id,name,username,created_at,description,public_metrics,verified',
                    'place.fields': 'full_name,id,country,country_code,geo,name,place_type',
                    'next_token': {}}
    return (search_url, query_params)

def connect_to_endpoint(url, headers, params, next_token = None):
    params['next_token'] = next_token   #params object received from create_url function
    response = requests.request("GET", url, headers = headers, params = params)
    print("Endpoint Response Code: " + str(response.status_code))
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

In [36]:
#Inputs for the request
bearer_token = auth()
headers = create_headers(bearer_token)
keyword = "SWGOH lang:en"
start_time = "2023-03-03T00:00:00.000Z"
end_time = "2021-08-29T00:00:00.000Z"
max_results = 50

In [45]:
url = create_url(keyword)
json_response = connect_to_endpoint(url[0], headers, url[1])

Endpoint Response Code: 200


In [46]:
tweets = []
for i in range((50)):
    try:
        tweets.append(json_response["data"][i]["text"])
    except IndexError:
        break



In [47]:
tweets

['StarkSG is LIVE!! join us on twitch for SWGOH GAC, strategy discussion, and more! https://t.co/YBkjz6cxW3',
 'RT @bpimpanellie: late night swgoh chcanery! Can we win over impossible odds?  ....The Long Journey to 1000 FOLLOWS !!!!! https://t.co/Futt…',
 'late night swgoh chcanery! Can we win over impossible odds?  ....The Long Journey to 1000 FOLLOWS !!!!! https://t.co/FuttE9ANeb',
 '@ea_capitalgames how do I switch emails for my SWGOH account? Says I need to talk to customer service.',
 'My Ewoks finally did it…. Now I just gotta get C-3PO and Shaak Ti up to par….  #SWGoH https://t.co/WHmdrr2gTs',
 'I’ll be happy when I finish farming the inquisitors… #SWGoH https://t.co/w8ZLhN8JwH',
 'This last GAC is turning into a who can win with the least effort contest #SWGOH https://t.co/VzNhRsXYFI',
 "I'm loving this @spngl46 @akairistock2 @kotobukirakira @natsumarutomoka @brotherhandmade @yuzukilemon @swgoh_dokkan @surfdonuts @wmdv0VD4rsfmYZl @Reon_web3univ https://t.co/PzquP98AlC",
 'The n

In [48]:
pd.DataFrame(tweets)

Unnamed: 0,0
0,StarkSG is LIVE!! join us on twitch for SWGOH ...
1,RT @bpimpanellie: late night swgoh chcanery! C...
2,late night swgoh chcanery! Can we win over imp...
3,@ea_capitalgames how do I switch emails for my...
4,My Ewoks finally did it…. Now I just gotta get...
5,I’ll be happy when I finish farming the inquis...
6,This last GAC is turning into a who can win wi...
7,I'm loving this @spngl46 @akairistock2 @kotobu...
8,The next Contraband Cargo will be on 2023-03-0...
9,Supreme Leader Kylo Ren Ultimate Ability Unloc...


In [49]:
import nltk

In [50]:
# Importing Porterstemmer from nltk library
from nltk.stem import PorterStemmer
pst = PorterStemmer()
pst.stem("waiting")

tweets = [pst.stem(x) for x in tweets]

# Importing Lemmatizer library from nltk
#nltk.download("wordnet")
from nltk.stem import WordNetLemmatizer
lem = WordNetLemmatizer()

tweets = [lem.lemmatize(x) for x in tweets]



In [51]:
tweets

['starksg is live!! join us on twitch for swgoh gac, strategy discussion, and more! https://t.co/ybkjz6cxw3',
 'rt @bpimpanellie: late night swgoh chcanery! can we win over impossible odds?  ....the long journey to 1000 follows !!!!! https://t.co/futt…',
 'late night swgoh chcanery! can we win over impossible odds?  ....the long journey to 1000 follows !!!!! https://t.co/futte9aneb',
 '@ea_capitalgames how do i switch emails for my swgoh account? says i need to talk to customer service.',
 'my ewoks finally did it…. now i just gotta get c-3po and shaak ti up to par….  #swgoh https://t.co/whmdrr2gt',
 'i’ll be happy when i finish farming the inquisitors… #swgoh https://t.co/w8zlhn8jwh',
 'this last gac is turning into a who can win with the least effort contest #swgoh https://t.co/vznhrsxyfi',
 "i'm loving this @spngl46 @akairistock2 @kotobukirakira @natsumarutomoka @brotherhandmade @yuzukilemon @swgoh_dokkan @surfdonuts @wmdv0vd4rsfmyzl @reon_web3univ https://t.co/pzqup98alc",
 'the ne

In [52]:
import regex as re
regex = re.compile('[^a-zA-Z ]')
#First parameter is the replacement, second parameter is your input string
tweets = [regex.sub('', x) for x in tweets]
tweets

['starksg is live join us on twitch for swgoh gac strategy discussion and more httpstcoybkjzcxw',
 'rt bpimpanellie late night swgoh chcanery can we win over impossible odds  the long journey to  follows  httpstcofutt',
 'late night swgoh chcanery can we win over impossible odds  the long journey to  follows  httpstcofutteaneb',
 'eacapitalgames how do i switch emails for my swgoh account says i need to talk to customer service',
 'my ewoks finally did it now i just gotta get cpo and shaak ti up to par  swgoh httpstcowhmdrrgt',
 'ill be happy when i finish farming the inquisitors swgoh httpstcowzlhnjwh',
 'this last gac is turning into a who can win with the least effort contest swgoh httpstcovznhrsxyfi',
 'im loving this spngl akairistock kotobukirakira natsumarutomoka brotherhandmade yuzukilemon swgohdokkan surfdonuts wmdvvdrsfmyzl reonwebuniv httpstcopzqupalc',
 'the next contraband cargo will be on  swgoh events httpstcoctuuvsxhcl',
 'supreme leader kylo ren ultimate ability unlock

In [53]:
from nltk.corpus import stopwords
from nltk import word_tokenize
a = set(stopwords.words('english'))
tweets = [word_tokenize(x.lower()) for x in tweets]






In [57]:
#remove stopwords

for i in range(len(tweets)):
    tweets[i] = [x for x in tweets[i] if x not in a]

tweets

[['starksg',
  'live',
  'join',
  'us',
  'twitch',
  'swgoh',
  'gac',
  'strategy',
  'discussion',
  'httpstcoybkjzcxw'],
 ['rt',
  'bpimpanellie',
  'late',
  'night',
  'swgoh',
  'chcanery',
  'win',
  'impossible',
  'odds',
  'long',
  'journey',
  'follows',
  'httpstcofutt'],
 ['late',
  'night',
  'swgoh',
  'chcanery',
  'win',
  'impossible',
  'odds',
  'long',
  'journey',
  'follows',
  'httpstcofutteaneb'],
 ['eacapitalgames',
  'switch',
  'emails',
  'swgoh',
  'account',
  'says',
  'need',
  'talk',
  'customer',
  'service'],
 ['ewoks',
  'finally',
  'got',
  'ta',
  'get',
  'cpo',
  'shaak',
  'ti',
  'par',
  'swgoh',
  'httpstcowhmdrrgt'],
 ['ill',
  'happy',
  'finish',
  'farming',
  'inquisitors',
  'swgoh',
  'httpstcowzlhnjwh'],
 ['last',
  'gac',
  'turning',
  'win',
  'least',
  'effort',
  'contest',
  'swgoh',
  'httpstcovznhrsxyfi'],
 ['im',
  'loving',
  'spngl',
  'akairistock',
  'kotobukirakira',
  'natsumarutomoka',
  'brotherhandmade',
  'yu

In [62]:
# rejoining each set of words so that VADER can analyze properly

for i in range(len(tweets)):
    tweets[i]=" ".join(tweets[i])
    
tweets

['starksg live join us twitch swgoh gac strategy discussion httpstcoybkjzcxw',
 'rt bpimpanellie late night swgoh chcanery win impossible odds long journey follows httpstcofutt',
 'late night swgoh chcanery win impossible odds long journey follows httpstcofutteaneb',
 'eacapitalgames switch emails swgoh account says need talk customer service',
 'ewoks finally got ta get cpo shaak ti par swgoh httpstcowhmdrrgt',
 'ill happy finish farming inquisitors swgoh httpstcowzlhnjwh',
 'last gac turning win least effort contest swgoh httpstcovznhrsxyfi',
 'im loving spngl akairistock kotobukirakira natsumarutomoka brotherhandmade yuzukilemon swgohdokkan surfdonuts wmdvvdrsfmyzl reonwebuniv httpstcopzqupalc',
 'next contraband cargo swgoh events httpstcoctuuvsxhcl',
 'supreme leader kylo ren ultimate ability unlock guide smash th httpstcogpuwynzuda via youtube mobilegame swgoh starwars galaxyofheroes youtub',
 'swgoh grand arena championship v gac star wars news galaxy heroes amp saturday httpstc

In [63]:
import nltk

#nltk.download("vader_lexicon")

from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [64]:
vds = SentimentIntensityAnalyzer()
text = tweets[0]

print(text)

print("Sentiment Analysis of text")
print(vds.polarity_scores(text))

starksg live join us twitch swgoh gac strategy discussion httpstcoybkjzcxw
Sentiment Analysis of text
{'neg': 0.0, 'neu': 0.804, 'pos': 0.196, 'compound': 0.296}


In [65]:
for i in tweets:
    print("Tweet: ", i)
    print("Score: ", vds.polarity_scores(i))
    print("********")

Tweet:  starksg live join us twitch swgoh gac strategy discussion httpstcoybkjzcxw
Score:  {'neg': 0.0, 'neu': 0.804, 'pos': 0.196, 'compound': 0.296}
********
Tweet:  rt bpimpanellie late night swgoh chcanery win impossible odds long journey follows httpstcofutt
Score:  {'neg': 0.0, 'neu': 0.759, 'pos': 0.241, 'compound': 0.5859}
********
Tweet:  late night swgoh chcanery win impossible odds long journey follows httpstcofutteaneb
Score:  {'neg': 0.0, 'neu': 0.725, 'pos': 0.275, 'compound': 0.5859}
********
Tweet:  eacapitalgames switch emails swgoh account says need talk customer service
Score:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
********
Tweet:  ewoks finally got ta get cpo shaak ti par swgoh httpstcowhmdrrgt
Score:  {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
********
Tweet:  ill happy finish farming inquisitors swgoh httpstcowzlhnjwh
Score:  {'neg': 0.243, 'neu': 0.435, 'pos': 0.322, 'compound': 0.2263}
********
Tweet:  last gac turning win least effort