Machine learning used to classify the race of characters from Lord of the Rings
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README.md
SparkScript.py
characters_data.csv
characters_no_ainur.csv
characters_no_surnames.csv
lotr-names-html.html
remove_ainur.R
scrape_names_from_html.R

README.md

An approach to classify the races of characters from Lord of the Rings using their names as feature and naive Bayes

Overview

As a huge fan of the Lord of the Rings and Tolkien's work, I was interested in finding a way of using data from the legendarium with machine learning. While searching and pondering about what problem could be interesting, I had the idea of playing around with the names of the characters and the relation to the race of said character.

In this report, I will discuss and show an approach used to predict the races of the characters from Lord of the Rings using a naive Bayes classifier and various techniques for natural language processing. The dataset used consists of 789 observations (characters) and their respective race.

Data fields

  • name: the name of the character
  • race: the race of the character. There are five possible races: Man, Ainur, Elf, Dwarf and Hobbit.

Tools used

  • Spark (Pyspark)
  • R: for scraping, transforming and preparing the data.

Repository

This repo holds the Python script used for the analysis, the R script used for scraping and transforming the data, the original scraped data, and several CSV files with the final data (the one used in the analysis is characters_no_surnames.csv)

Report

An approach to classify the races of characters from Lord of the Rings using their names as feature and naive Bayes