In [1]:
library(tidyverse)
library(repr)
source("cleanup.R")

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
“cannot open file 'cleanup.R': No such file or directory”


ERROR: Error in file(filename, "r", encoding = encoding): cannot open the connection


## **Introduction**

Running an online game server means constantly balancing player experience with limited technical resources. If too many people are online at once, the server can lag or even crash; if very few people use the server, time and money are wasted maintaining unused capacity. At UBC, a research group led by Frank Wood operates a public Minecraft server to study how people play and interact in virtual worlds. To plan recruitment, schedule studies and provision hardware effectively, the team needs to understand what kinds of players join the server and how engaged those players become over time.

In this project, we focus on the players.csv dataset, which contains one row per player. Each row corresponds to an anonymized participant on the Minecraft research server and includes information describing that player’s participation in the project. Broadly, the variables in players.csv record when a player joined the server and several measures of how much they used it overall (for example, counts of how often they played and how long they stayed in total). Because this dataset summarizes behaviour at the player level, it is well suited for studying differences between “more engaged” and “less engaged” players.

Our analysis is based on Question 1 from the project description. In this question, we investigate how well we can use the information stored in players.csv to predict a player’s overall level of engagement with the server. Concretely, we treat one summary measure of usage (our response variable) as a proxy for engagement, and we use the remaining variables in players.csv as predictors. This allows us to ask: given what we know about a player in this dataset, can we predict how engaged that player is with the Minecraft server? Answering this question can help the research team identify which kinds of players are most likely to become active participants and may guide future recruitment or design decisions.

In the rest of the report, we describe how we clean and wrangle players.csv, explore the relationships between engagement and other player characteristics, and build a predictive model to address Question 1. We then evaluate the model’s performance and discuss what our findings suggest about player behaviour on the Minecraft research server.

## 1. Data Description

### players.csv

This data shows all unique players including data about each player like their experiences, behaviours, names, genders, and ages. The data has a total of 196 observations and 7 variables. The variables are as listed below:

| **Variable**   | **Type** | **Description**            | 
|----------------|----------|----------------------------|
| `experience`   | chr      | Level of players in game, for example: Amateur, Beginner, Regular, Pro, Veteran. |           
| `subscribe`    | lgl      | Indicates whether if the player has an active subscription.|           
| `hashedEmail`  | chr      | Player identifier in game.|           
| `played_hours` | dbl      | Indicates how long the player has played the game.|          
| `name`         | chr      | Player's name.|           
| `gender`       | chr      | Player's gender.|           
| `age`          | dbl      | Player's age.|     

## 2. Question

## **Exploratory Data Analysis and Visualization**

Describe:

In [2]:
players_data <- read_csv("players.csv")
head(players_data)

nrow(players_data)
ncol(players_data)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
Amateur,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17


In [3]:
glimpse(players_data)
summary(players_data)

Rows: 196
Columns: 7
$ experience   [3m[90m<chr>[39m[23m "Pro", "Veteran", "Veteran", "Amateur", "Regular", "Amate…
$ subscribe    [3m[90m<lgl>[39m[23m TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, T…
$ hashedEmail  [3m[90m<chr>[39m[23m "f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8…
$ played_hours [3m[90m<dbl>[39m[23m 30.3, 3.8, 0.0, 0.7, 0.1, 0.0, 0.0, 0.0, 0.1, 0.0, 1.6, 0…
$ name         [3m[90m<chr>[39m[23m "Morgan", "Christian", "Blake", "Flora", "Kylie", "Adrian…
$ gender       [3m[90m<chr>[39m[23m "Male", "Male", "Male", "Female", "Male", "Female", "Fema…
$ Age          [3m[90m<dbl>[39m[23m 9, 17, 17, 21, 21, 17, 19, 21, 47, 22, 23, 17, 25, 22, 17…


  experience        subscribe       hashedEmail         played_hours    
 Length:196         Mode :logical   Length:196         Min.   :  0.000  
 Class :character   FALSE:52        Class :character   1st Qu.:  0.000  
 Mode  :character   TRUE :144       Mode  :character   Median :  0.100  
                                                       Mean   :  5.846  
                                                       3rd Qu.:  0.600  
                                                       Max.   :223.100  
                                                                        
     name              gender               Age       
 Length:196         Length:196         Min.   : 9.00  
 Class :character   Class :character   1st Qu.:17.00  
 Mode  :character   Mode  :character   Median :19.00  
                                       Mean   :21.14  
                                       3rd Qu.:22.75  
                                       Max.   :58.00  
                               

In [4]:
players_data %>% summarise(
    mean_played_hours = mean(played_hours, na.rm = TRUE),
    mean_age = mean(Age, na.rm = TRUE),
    median_played_hours = median(played_hours, na.rm = TRUE),
    total_players = n(),
    subscribe_count = sum(subscribe == "TRUE", na.rm = TRUE),
    percentage_subscribe = subscribe_count / total_players * 100) %>%
    round(2)

mean_played_hours,mean_age,median_played_hours,total_players,subscribe_count,percentage_subscribe
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
5.85,21.14,0.1,196,144,73.47


## **Discussion**

## **References**