# Part 1: Data Collection (Ubisoft - External)
- Collecting data that mentions Ubisoft across the various subreddits we are trying to surf from
- Adding prerequisites for the data being collected
    - Post must be more 20 words 
    - Account must be more than 1 week old
    - Account must have more than 10 karma
    - Posts will be collected from the past 1 year


## (1) Import necessary libraries

In [1]:
%pip install praw

Note: you may need to restart the kernel to use updated packages.


In [8]:
import praw
import pandas as pd
import datetime as dt
import csv

import os 
import sys
import csv
from dotenv import load_dotenv

import itertools



In [3]:
# bring in env variables 
load_dotenv()

CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
USER_AGENT = os.getenv("USER_AGENT")
USERNAME = os.getenv("USERNAME")
PASSWORD = os.getenv("PASSWORD")

print("Env variables loaded")

Env variables loaded


In [4]:
# initialise connection with reddit
reddit = praw.Reddit(client_id=CLIENT_ID, 
                     client_secret=CLIENT_SECRET, 
                     user_agent=USER_AGENT, 
                     username=USERNAME, 
                     password=PASSWORD)

Version 7.7.1 of praw is outdated. Version 7.8.1 was released 3 days ago.


# (2) Data Scraping 
Within this subsection, we aim to focus mainly on the <strong>EXTERNAL</strong> factors as to why Ubisoft might have failed as a company. We will begin by doing a targeted, unbiased search of all things related to Ubisoft and their games within specific subreddits using neutral keywords. The subreddits that we will be focusing on are 

- r/Ubisoft 
- r/AssassinsCreed
- r/Rainbow6
- r/TheDivision
- r/GhostRecon
- r/ForHonorGame
- r/WatchDogs
- r/Gaming
- r/Games
- r/PCGaming
- r/VideoGameNews
- r/Steam
- r/CrackWatch (for checking how well Ubisoft monitors piracy + bug fixing issues)

The keywords that we will be using to target this search would be 

- (Ubisoft) Ubisoft, Ubi Soft, ubisoft, ubi, ubisofts, Ubisofts, ubis
- (Locations) Ubisoft [geographic location]
- (Platforms) Uplay, UConnect, Ubisoft Connect, Ubisoft Store, Login, PC, Playstation, Xbox, Nintendo Switch, Luna
- (Events) Ubisoft Forward, E3, Showcase, EA + Ubisoft
- (Games/Acronyms) AC, Assassin's Creed, Far Cry, FC5, FC6, Tom Clancy, R6S, R6 Siege, R6 Extraction, Tom Clancy's Rainbow Six, Tom Clancy's The Division, Watch Dogs, Ghost Recon, Just Dance, Prince of Persia, Splinter Cell (this will mainly be used in the r/gaming and more general subreddits), Skull and Bones, Beyond Good and Evil, Beyond Good and Evil 2, Riders Republic, Immortals Fenyx Rising, AC Valhalla, AC Origins, 
- (Game Features) Updates, DLC, Expansion, Patch, season pass, Open World, RPG, multiplayer, speed



In [9]:
# initialise variables 
# 1. subreddits to be scraped 
# 2. neutral keyword to better target searches within posts 

ubisoft_subreddits = ['Ubisoft', 
              "assassinscreed", 
              "Rainbow6", 
              "GhostRecon", 
              "thedivision", 
              "farcry", 
              "farcry5"
              "farcry6", 
              "watch_dogs", 
              "forhonor",
              "Splintercell", 
              "PrinceOfPersia", 
              "JustDance", 
              "Steep", 
              "TrialsGames", 
              "anno", 
              "FenyxRising",
            #   "TheSettlers", 
            #   "beyondgoodandevil", 
              "SkullAndBones",
              "ACValhalla",
              "AssassinsCreedOdyssey",
              "AssassinsCreedOrigins"
              ]

gaming_subreddits = ['gaming',
                     'pcgaming',
                     'PS4',
                     'XboxOne',
                     'NintendoSwitch',
                     'Steam',
                     'PS5',
                     'XboxSeriesX',
                     'CrackWatch']

company_keywords = [
    'Ubisoft', 'ubisoft', 'Ubi Soft', 'ubi soft', 'ubi', 'ubisofts', 'Ubisofts', 'ubis'
]

locations = [
    'Ubisoft Montreal', 'Ubisoft Toronto', 'Ubisoft Paris', 'Ubisoft Shanghai', 'Ubisoft Singapore'
]

platform_keywords = [
    'Uplay', 'UConnect', 'Ubisoft Connect', 'Ubisoft Store', 'Login', 'PC', 'Playstation', 'Xbox', 'Nintendo Switch', 'Luna'
]

event_keywords = [
    'Ubisoft Forward', 'E3', 'Showcase', 'EA + Ubisoft', 'EA and Ubisoft'
]

game_keywords = [
    'AC', "Assassin's Creed", 'Far Cry', 'FC5', 'FC6', 'Tom Clancy', 'R6S', 'R6 Siege', 'R6 Extraction',
    "Tom Clancy's Rainbow Six", "Tom Clancy's The Division", 'Watch Dogs', 'Ghost Recon', 'Just Dance',
    'Prince of Persia', 'Splinter Cell', 'Skull and Bones', 'Beyond Good and Evil', 'Beyond Good and Evil 2',
    'Riders Republic', 'Immortals Fenyx Rising', 'AC Valhalla', 'AC Origins', "Assassin's Creed Valhalla",
    "Assassin's Creed Origins"
]

feature_keywords = [
    'Updates', 'DLC', 'Expansion', 'Patch', 'Season Pass', 'Open World', 'RPG', 'Multiplayer', 'Speed'
]





In [10]:

# this is how the search query will be created -> difference combinations of the aforementioned keywords
company_phrase = ' OR '.join(f'"{word}"' for word in company_keywords)
game_phrase = ' OR '.join(f'"{word}"' for word in game_keywords)
feature_phrase = ' OR '.join(f'"{word}"' for word in feature_keywords)
search_query = f'({company_phrase}) AND ({game_phrase})'

print(search_query)


("Ubisoft" OR "ubisoft" OR "Ubi Soft" OR "ubi soft" OR "ubi" OR "ubisofts" OR "Ubisofts" OR "ubis") AND ("AC" OR "Assassin's Creed" OR "Far Cry" OR "FC5" OR "FC6" OR "Tom Clancy" OR "R6S" OR "R6 Siege" OR "R6 Extraction" OR "Tom Clancy's Rainbow Six" OR "Tom Clancy's The Division" OR "Watch Dogs" OR "Ghost Recon" OR "Just Dance" OR "Prince of Persia" OR "Splinter Cell" OR "Skull and Bones" OR "Beyond Good and Evil" OR "Beyond Good and Evil 2" OR "Riders Republic" OR "Immortals Fenyx Rising" OR "AC Valhalla" OR "AC Origins" OR "Assassin's Creed Valhalla" OR "Assassin's Creed Origins")


In [11]:
def create_query(keyword_list):
    return ' OR '.join(f'"{keyword}"' for keyword in keyword_list)

company_query = create_query(company_keywords + locations)
game_query = create_query(game_keywords)
feature_query = create_query(feature_keywords)
event_query = create_query(event_keywords)
platform_query = create_query(platform_keywords)


In [12]:
LIMITS_SUBREDDIT = 1000
# Total expected submissions
total_expected_submissions = LIMITS_SUBREDDIT * len(ubisoft_subreddits + gaming_subreddits)

def search_submissions(praw_instance, subreddits, query, limit_per_subreddit):
    collected_submissions = []
    for subreddit_name in subreddits:
        subreddit = praw_instance.subreddit(subreddit_name)
        print(f"Searching submissions in r/{subreddit_name}...")
        try:
            submissions = subreddit.search(query=query, sort='new', limit=limit_per_subreddit, syntax='lucene')
            for submission in submissions:
                submission_data = {
                    'id': submission.id,
                    'title': submission.title,
                    'selftext': submission.selftext,
                    'created_utc': submission.created_utc,
                    'subreddit': submission.subreddit.display_name,
                    'url': submission.url,
                    'author': str(submission.author)
                }
                collected_submissions.append(submission_data)
        except Exception as e:
            print(f"An error occurred in r/{subreddit_name}: {e}")
    return collected_submissions

def collect_comments_from_submissions(praw_instance, submission_ids):
    collected_comments = []
    for submission_id in submission_ids:
        submission = praw_instance.submission(id=submission_id)
        submission.comments.replace_more(limit=None)
        for comment in submission.comments.list():
            comment_data = {
                'id': comment.id,
                'body': comment.body,
                'created_utc': comment.created_utc,
                'submission_id': submission_id,
                'subreddit': comment.subreddit.display_name,
                'author': str(comment.author)
            }
            collected_comments.append(comment_data)
    return collected_comments










In [13]:
submissions = search_submissions(reddit, (gaming_subreddits + ubisoft_subreddits), search_query, LIMITS_SUBREDDIT)
print(f"Total submissions collected: {len(submissions)}")

submissions_df = pd.DataFrame(submissions)
submissions_df.drop_duplicates(subset='id', inplace=True)
submissions_df.to_csv('focused_reddit_data.csv', index=False)

Searching submissions in r/gaming...
Searching submissions in r/pcgaming...
Searching submissions in r/PS4...
Searching submissions in r/XboxOne...
Searching submissions in r/NintendoSwitch...
Searching submissions in r/Steam...
Searching submissions in r/PS5...
Searching submissions in r/XboxSeriesX...
Searching submissions in r/CrackWatch...
Searching submissions in r/Ubisoft...
Searching submissions in r/assassinscreed...
Searching submissions in r/Rainbow6...
Searching submissions in r/GhostRecon...
Searching submissions in r/thedivision...
Searching submissions in r/farcry...
Searching submissions in r/farcry5farcry6...
An error occurred in r/farcry5farcry6: Redirect to /subreddits/search
Searching submissions in r/watch_dogs...
Searching submissions in r/forhonor...
Searching submissions in r/Splintercell...
Searching submissions in r/PrinceOfPersia...
Searching submissions in r/JustDance...
Searching submissions in r/Steep...
Searching submissions in r/TrialsGames...
Searching s

FileNotFoundError: [Errno 2] No such file or directory: 'focused_reddit_data.csv'