TITLE: Types of MineCraft Players that should be Recruited to Maximize Data for Research

INTRODUCTION:
A research group led by Frank Wood in Computer Science at UBC is collecting data about how people play video games. To conduct this study, they set up a MineCraft server and recorded how players played the game. They collected basic information about the player like their name, age, and gender, as well as data about their experience level with the game, subscription status to a game-related newsletter, and their total hours played.

This report aims to answer this question: "Can a player's experience level with MineCraft predict the hours they spent playing in players.csv?"

The dataset used to answer this question is contained within the file "players.csv". This dataset contains 196 rows of observations and 7 variables:
- experience: a character data type that describes the level of expertise of the player with the MineCraft game. There are 5 possible observations: "amateur", "beginner", "regular", "pro", and "veteran".
- subscribe: a logical data type that shares whether the player has subscribed to a game-related newsletter or not. Possible observations are "TRUE", for having subscribed, or "FALSE", for not having a subscription.
- hashedEmail: a character data type where the observation is the email address of the player that had a hashing algorithm applied to protect the user's privacy.
- played_hours: a double data type where the observations are numerical values that represent the number of hours the player has spent playing the game. Values seen are within the range of 0-223.1.
- name: a character data type representing the first names of the players studied.
- gender: a character data type where the possible inputs are "Male", "Female", "Non-binary", "Two-Spirited", "Agender", "Prefer not to say", and "Other".
- Age: a double data type where the values represent the age of the players, with one cell containing "NA".

An issue that I notice right away in the data is that some of the variables in the data set are not the data types that they should be in order for the data to be wrangled properly. For example, the "experience" column should be a factor data type rather than character. Additionally, the column names are not as clean as they could be, with some names containing a _ with others having two words written as one, or some with an uppercase letter.

METHODS & RESULTS: 

In [5]:
#loading the tidyverse package
library(tidyverse)

In [7]:
#loading the data
players <- read_csv("data/players.csv")
head(players)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
Amateur,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17


In [None]:
#first need to change the data type for the variables