Animal Crossing: New Horizons for computational linguists

Importing necessary modules.
Learned how to activate the kernel and install necessary modules

In [7]:
#! usr/bin/env python3
import praw #imported with command pip install praw
import pandas as pd #imported with command conda install pandas
import datetime as dt
import csv
import requests
import nltk
from bs4 import BeautifulSoup
from collections import Counter
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.tokenize import word_tokenize

nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/ksl763/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

Running praw command to access Reddit
Redacted for privacy

In [8]:
reddit = praw.Reddit(client_id='', \
                     client_secret='', \
                     user_agent='', \
                     username='', \
                     password='')

subreddit = reddit.subreddit('AnimalCrossing')

Class type Villager created to retain all relevant information to make things more organized. Each villager class will hold onto the name, gender, personality, species, reputation, and mentions of the villager. The reputation will be calculated later using a sentiment analyzer.

In [9]:
class Villager:
    def __init__(self, name, gender, personality, species, reputation, mentions):
        self.name = name
        self.gender = gender
        self.personality = personality
        self.species = species
        self.reputation = reputation
        self.mentions = mentions

Class type Sentiemnt created to retain all relevant information about sentimentality scores for the different genders, personality types, and species. Reputation will be calculated later using a sentiment analyzer.

In [10]:
class Sentiment:
    def __init__(self, reputation, mentions):
        self.reputation = reputation
        self.mentions = mentions

Extracting data from Animal Crossing wiki page pertaining to all Villagers. One problem was that the website used Greek symbols for male/female which were not friendly to work with in Python. So I learned a little bit about ALT codes and translated the symbols to ALT codes to Male/Female string. Finally, a master dictionary is created with all the villagers using the villagers' names as keys and the "Villager" object as the value. In addition, different Counters will be created to count reputations based on Species, Gender, and Personality types.

In [11]:
url="https://animalcrossing.fandom.com/wiki/Villager_list_(New_Horizons)"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content)

#Dictionaires/Counters to hold sentiment data for each villager
#Key is Villager Name
#Value is Villager's sentiment value
characters = {}

#Dictionaries to hold sentiment data on gender, personality, and species respectively
#Key is each type of gender, personality, and species respectively
#Value is sentiment value for each type of gender, personality, and species respectively
Gender = {} 
Personality = {}
Species = {}
Mentions = Counter()

for tr in soup.findAll("table")[2].findAll('tr')[1:]:
    name = tr.text.splitlines()[1]
    if tr.text.splitlines()[3][1] == '\u2642':
        gender = "Male"
    else:
        gender = "Female"
    personality = tr.text.splitlines()[3][3:]
    species = tr.text.splitlines()[4].replace(" ", "")
    characters[name] = Villager(name, gender, personality, species, 0, 0)
    
    Gender[gender] = Sentiment(0,0)
    Personality[personality] = Sentiment(0,0)
    Species[species] = Sentiment(0,0)
    Mentions[name] = 0

This will be our Sentiment Analyzer made from NLTK. Documentation for Vader can be found here: https://www.nltk.org/_modules/nltk/sentiment/vader.html. Each comment including the villager's name would be analyzed using the Sentiment Analyzer. The number of comments is also included. In the end, the total sentimentality score is averaged with the total number of comments for each villager, gender, personality type, and species. The sentimentality score ranges from -1 to 1 (negative to positive).

PS I was unable to get any scores for villagers with two words as their name. As a result, they will have a score of 0 since I was unable to find any matches. Also, this process has TERRIBLE run-time. It's exponential which is the worst O(n), but I have tons of data to go through, so I could not think of another way of going about it. 

In [12]:
sid = SentimentIntensityAnalyzer()

#collects data from comments on ONE submission
example = subreddit.top(limit=500)

count = 0

for submission in example:
    submission.comments.replace_more(limit=0)
    for comment in submission.comments:
        tokens = word_tokenize(comment.body)
        for token in tokens:
            if token in characters.keys():
                characters[token].mentions += 1
                characters[token].reputation += sid.polarity_scores(comment.body)["compound"]
                
                Gender[characters[token].gender].mentions += 1
                Gender[characters[token].gender].reputation += sid.polarity_scores(comment.body)["compound"]
                
                Personality[characters[token].personality].mentions += 1
                Personality[characters[token].personality].reputation += sid.polarity_scores(comment.body)["compound"]
                
                Species[characters[token].species].mentions += 1
                Species[characters[token].species].reputation += sid.polarity_scores(comment.body)["compound"]

Helper function that computes the mean sentiment score from Sentiment Analyzer. Input is a dictionary with classes holding total sentiment scores and total mentions. Calculates the mean of each entry within dictionary. Returns a counter with entry as key and final sentiment score as value

In [13]:
def SentimentScoreCalculator(d):
    result = Counter()
    
    for a in d.keys():
        if d[a].mentions != 0:
            result[a] = round(d[a].reputation/d[a].mentions,4)
    
    return result

In [14]:
Reputations = SentimentScoreCalculator(characters)
Gender_scores = SentimentScoreCalculator(Gender)
Personality_scores = SentimentScoreCalculator(Personality)
Species_scores = SentimentScoreCalculator(Species)

Writing all the data we have to a CSV file.

In [15]:
with open('ACNH.csv', mode='w') as csv_file: #Learned how to convert all this text data into a presentable CSV file
    fieldnames = ['Name', 'Gender', 'Personality', 'Species', 'Reputation', 'Mentions']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    
    writer.writeheader()
    for c in characters.keys():
        writer.writerow({"Name": characters[c].name,
                         "Gender": characters[c].gender,
                         "Personality": characters[c].personality,
                         "Species": characters[c].species,
                         "Reputation": Reputations[c],
                         "Mentions": characters[c].mentions})

Extracting some data:
Top 10 Villagers
Worst 10 Villagers

In [23]:
#print(Reputations.most_common(10))
#print(Gender_scores.most_common())
#print(Personality_scores.most_common())
#print(Species_scores.most_common())

In [17]:
from operator import itemgetter
import heapq
import collections

def least_common_values(array, to_find=None):
    counter = collections.Counter(array)
    if to_find is None:
        return sorted(counter.items(), key=itemgetter(1), reverse=False)
    return heapq.nsmallest(to_find, counter.items(), key=itemgetter(1))

In [18]:
#least_common_values(Reputations)

In [21]:
for a in characters.keys():
    Mentions[a] = characters[a].mentions
    
#Mentions.most_common(10)

[('Tom', 511),
 ('Raymond', 218),
 ('Roald', 156),
 ('Kyle', 134),
 ('Pietro', 134),
 ('Rodney', 134),
 ('Dom', 127),
 ('Sherb', 120),
 ('Bob', 111),
 ('Bill', 106)]