# Analyzing Subreddits for Ubisoft Internal Matters
In this section, we will analyze various subreddits to gather insights and discussions related to Ubisoft's internal company culture, company policies, support, customer service, and management practices. This analysis will help us understand the public perception and employee experiences associated with Ubisoft.

Subreddits focused:
<ol>
<li>ubisoft company</li>
<li>ubisoft culture</li>
<li>ubisoft support </li>
<li>ubisoft customer service </li>
<li>ubisoft manage </li>
</ol>


# Part 1: Import libraries

In [3]:
import praw
import pandas as pd
import numpy as np
import datetime as dt
import csv
import pickle   # save and load files

# topic modelling 
import re    # regular expression
import nltk     # natural language processing
from nltk.corpus import stopwords    # stop words

# import spacy    # lemmaization
# import gensim      # topic modelling
# import gensim.corpora as corpora    # corpus and dictionary
# from gensim.utils import simple_preprocess  # tokenization    
# from gensim.models import CoherenceModel    # model evaluation

from wordcloud import WordCloud    # word cloud
from sklearn.feature_extraction.text import CountVectorizer    # count vectorizer       
from sklearn.decomposition import LatentDirichletAllocation as LDA # LDA    


# Enable logging for gensim - optional
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.ERROR)

import warnings
warnings.filterwarnings("ignore",category=DeprecationWarning)

# sentiment analysis
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

# network analysis
import networkx as nx
from collections import defaultdict  

# plotting tools
import pyLDAvis     # topic modelling visualization
import pyLDAvis.lda_model   # topic modelling visualization
# import pyLDAvis.gensim_models   # topic modelling visualization
import seaborn as sns   # visualization
import matplotlib.pyplot as plt  # visualization
%matplotlib inline  

import os 
import sys
from dotenv import load_dotenv

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\school\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


# Part 2: Data Collection 
- Collecting data that mentions Ubisoft across the various subreddits we are trying to surf from
- Adding prerequisites for the data being collected
    - Post must be more 20 words 
    - Account must be more than 1 week old
    - Account must have more than 10 karma
    - Posts will be collected from the past 1 year

In [4]:
# bring in env variables 
load_dotenv()

CLIENT_ID = os.getenv("CLIENT_ID")
CLIENT_SECRET = os.getenv("CLIENT_SECRET")
USER_AGENT = os.getenv("USER_AGENT")
USERNAME = os.getenv("USERNAME")
PASSWORD = os.getenv("PASSWORD")

print("Env variables loaded")

Env variables loaded


In [12]:
# initialise connection with reddit
reddit = praw.Reddit(client_id=CLIENT_ID, 
                     client_secret=CLIENT_SECRET, 
                     user_agent=USER_AGENT, 
                     username=USERNAME, 
                     password=PASSWORD)

## Scrape 7 subreddits for 5 keywords related to Ubisoft internal matters

In [17]:
total_count = 0
topic_dict = {
    "id":[],
    "author": [],
    "created": [],
    "title":[],
    "score":[],
    "comms_num": [],
    "body":[],
    "url":[]
}
# subreddits = ['gaming', 'pcgaming', 'videogames', 'Ubisoft', 'assassinscreed', 'Rainbow6', 'farcry']
subreddits = ["ubisoft"]

# Combine subreddits into one subreddit instance
subreddit = reddit.subreddit('+'.join(subreddits))

# Define the keywords you want to search for
keywords = ['company', 'culture', 'support', 'customer service', 'manage']

# Loop over each keyword
for keyword in keywords:
    print(f"Searching for posts mentioning: {keyword} and 'Ubisoft'")
    post_count = 0

    for submission in subreddit.search(keyword, limit=1000, sort='top'):
        # Check if 'Ubisoft' is mentioned in the title or body of the post
        if 'ubisoft' in submission.title.lower() or 'ubisoft' in submission.selftext.lower():
            
            # Add the post to the dictionary
            topic_dict["id"].append(submission.id)
            topic_dict["author"].append(submission.author)
            topic_dict["created"].append(submission.created)
            topic_dict["title"].append(submission.title)
            topic_dict["score"].append(submission.score)
            topic_dict["comms_num"].append(submission.num_comments)
            topic_dict["body"].append(submission.selftext)
            topic_dict["url"].append(submission.url)

            post_count += 1
            total_count += 1
            print(f"Added {post_count} post mentioning keyword {keyword}" )

print("Data collection complete.There are a total of ", total_count, " posts collected.")

Searching for posts mentioning: company and 'Ubisoft'
Added 1 post mentioning keyword company
Added 2 post mentioning keyword company
Added 3 post mentioning keyword company
Added 4 post mentioning keyword company
Added 5 post mentioning keyword company
Added 6 post mentioning keyword company
Added 7 post mentioning keyword company
Added 8 post mentioning keyword company
Added 9 post mentioning keyword company
Added 10 post mentioning keyword company
Added 11 post mentioning keyword company
Added 12 post mentioning keyword company
Added 13 post mentioning keyword company
Added 14 post mentioning keyword company
Added 15 post mentioning keyword company
Added 16 post mentioning keyword company
Added 17 post mentioning keyword company
Added 18 post mentioning keyword company
Added 19 post mentioning keyword company
Added 20 post mentioning keyword company
Added 21 post mentioning keyword company
Added 22 post mentioning keyword company
Added 23 post mentioning keyword company
Added 24 pos

In [18]:
topics_data = pd.DataFrame(topic_dict)  
topics_data

Unnamed: 0,id,author,created,title,score,comms_num,body,url
0,1fw0e08,MERKAT44,1.728052e+09,China's Tencent is considering buying Ubisoft:...,804,1161,The Guillemot family and Tencent are in talks ...,https://i.redd.it/6r4yfr1a1rsd1.jpeg
1,1fqjl48,OutlawGaming01,1.727429e+09,A Japanese gamer’s perspective on Assassin’s C...,518,1439,Yasuke being a legit samurai has never really ...,https://www.reddit.com/r/ubisoft/comments/1fqj...
2,1fyeiss,X-X-XIII,1.728325e+09,"From Loyal Fan to Loyal Hater, The Gamers Pers...",349,514,I grew up playing Ubisoft games when I got my ...,https://i.redd.it/imqtkrdildtd1.jpeg
3,mwfo2k,Comprehensive_Part42,1.619128e+09,I quit my job of 4 years at Ubisoft Customer S...,179,37,\#HoldUbisoftAccountable\n\nI worked for over ...,https://www.reddit.com/r/ubisoft/comments/mwfo...
4,1fx78t6,Raidenski,1.728184e+09,Ubisoft Is Reportedly Planning To Release 10 A...,140,206,When the news broke out that Assassin's Creed:...,https://www.thegamer.com/ubisoft-is-reportedly...
...,...,...,...,...,...,...,...,...
644,1e2nacz,JewelerAdditional751,1.720911e+09,Can't buy far cry 6 on ubisoft connect,2,1,https://preview.redd.it/az8zjoj19dcd1.png?widt...,https://www.reddit.com/r/ubisoft/comments/1e2n...
645,iwcecx,oniwod,1.600602e+09,i can't see which uplay account is linked to m...,2,0,i wanted to change the epic games account i li...,https://www.reddit.com/r/ubisoft/comments/iwce...
646,fndgdu,primeasoarus,1.584936e+09,UBISOFT ARE THIEVES,2,5,So imma try to shorten this as much as possibl...,https://www.reddit.com/r/ubisoft/comments/fndg...
647,kkxjn8,AppDude27,1.609045e+09,Ubisoft Should Try Making Their Own Version of...,2,1,Ubisoft does an amazing job at making open wor...,https://www.reddit.com/r/ubisoft/comments/kkxj...


In [None]:
'''
Fixing the date column

Reddit uses UNIX timestamps to format date and time. 
Instead of manually converting all those entries, or using a site like 
www.unixtimestamp.com, we can easily write up a function in Python to automate that process.

We define it, call it, and join the new column to dataset with the following code:
'''

def get_date(created):
    return dt.datetime.fromtimestamp(created)

_timestamp = topics_data["created"].apply(get_date)

topics_data = topics_data.assign(timestamp = _timestamp)

In [None]:

topics_data.to_csv('reddit_ubisoft_internal_posts.csv', index=False)