<h1> 'Kinds' of Players That Contribute to a Large Amount of Data </h1>

<h3> Introduction </h3>

A research group in UBC Computer Science, led by Frank Wood, is collecting data on how people play video games by setting up a Minecraft server. Players' actions are collected through gameplay to analyze predictive factors for subscribing to a game-related newsletter; the lab also has other questions of interest, however, for this analysis, the focused question of interest will be explored.

This analysis asks if age and hours played can predict whether players are subscribed to the newsletter or not, using the `players.csv` dataset. 

This data is retrieved from Wood's research group. In this data, there are 196 players, each containing 7 different properties:

1. `experience`: The experience of the player (categorized from Beginner, Amateur, Regular, Pro, Veteran).
2. `subscribe`: Whether the player is subscribed to a game-related newsletter or not.
3. `hashedEmail`: The player's encrypted email.
4. `played_hours`: The total hours the player has contributed to the Minecraft server.
5. `name`: The player's first name.
6. `gender`: The player's gender.
7. `Age`: The age of the player.

<h3> Methods </h3>

This analysis

First, we load in the necessary packages to conduct our analysis.

In [2]:
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)

The data `players.csv` was retrieved from Wood's research group. This is uploaded from the GitHub repository; we can read the data in from the data folder. We will also remove the hashed email column.

In [5]:
players <- read_csv('data/players.csv') |>
    select(-hashedEmail)
head(players)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,played_hours,name,gender,Age
<chr>,<lgl>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,30.3,Morgan,Male,9
Veteran,True,3.8,Christian,Male,17
Veteran,False,0.0,Blake,Male,17
Amateur,True,0.7,Flora,Female,21
Regular,True,0.1,Kylie,Male,21
Amateur,True,0.0,Adrian,Female,17


Because we are focusing on the age and hours played as a predictor for whether players subscribe to a newsletter or not, we will filter out all NA values and select only `subscribe`, `played_hours`, and `Age` as columns in our tidied dataframe. 

In [6]:
players_clean <- players |>
    select(subscribe, played_hours, Age) |>
    na.omit()

head(players_clean)

subscribe,played_hours,Age
<lgl>,<dbl>,<dbl>
True,30.3,9
True,3.8,17
False,0.0,17
True,0.7,21
True,0.1,21
True,0.0,17


Make model to see if experience and age model

In [None]:
players_recipe <- recipes( ~ . , players) |>
    step_center(all_predictors()) |>
    step_scale(all_predictors())

players_spec