# Do Age and Engagement Play a Role in Predicting Whether a Player Subscribes to a Game-Related Newsletter?

#### Sophia Margarita Mokretsova

### (1) Data Description

A UBC computer science research group collected data from a MineCraft server measuring how players interact with the game, as well as personal descriptive information. By recording players' actions and collecting data, it is possible to determine which individuals should be targeted to increase popularity. A descriptive summary of collected data has been shown below.

In [None]:
#Load in pertinent libraries
library(tidyverse)
library(tidymodels)
library(repr)

#Assign the players dataset that has been uploaded to GitHub to simpler name
players <- "https://raw.githubusercontent.com/phia06-ubc/Individual_Project_Planning_23/refs/heads/main/players.csv"

#Read the players dataset, and assign the first row as the column headers
players <- read_csv(players, col_names = TRUE)

#Rename column names for clarity and consistency
colnames(players) = c("experience", "subscribed", "hashed_email", "played_hours", "name", "gender", "age")

#Convert numerical vectors to double-precision numeric vectors (dbl)
players <- players |>
    mutate(played_hours = as.numeric(played_hours), age = as.numeric(age))

#Convert catagorical vectors to factors
players <- players |>
    mutate(experience = as.factor(experience), subscribed = as.factor(subscribed), gender = as.factor(gender))

#Use summarize to provide summary statistics for age and remove any missing values
players_age_summarized <- players |>
    summarize(minimum_age = min(age, na.rm = TRUE),
              maximum_age = max(age, na.rm = TRUE),
              median_age = median(age, na.rm = TRUE),
              mean_age = mean(age, na.rm = TRUE),
              standard_deviation_age = sd(age, na.rm = TRUE))

#Use summarize to provide summary statistics for hours played and remove any missing values
players_hours_summarized <- players |>
    summarize(minimum_hours = min(played_hours, na.rm = TRUE),
              maximum_hours = max(played_hours, na.rm = TRUE),
              median_hours = median(played_hours, na.rm = TRUE),
              mean_hours = mean(played_hours, na.rm = TRUE),
              standard_deviation_hours = sd(played_hours, na.rm = TRUE))

#Count number of observations
observations <- nrow(players)

#Count number of variables
variables <- ncol(players)

players_age_summarized
players_hours_summarized
observations
variables

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39