# Title

# Introduction

### Background
Understanding how players engage with games and related services has become an important area of research in both computer science and interactive AI systems. Modern game environments provide rich, complex worlds where players make decisions, communicate, and interact with their surroundings. These environments are increasingly used as testbeds for developing artificial intelligence systems that can understand speech, follow instructions, and act autonomously.

We want to find out the answer to the this general question: <br>
***What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter?*** <br>
More specifically: <br>
**Can a player's experience level, age, and total play time be used to predict whether they subscribe to a game-related newsletter?**

The `players_data` dataset contains 196 observations and 7 variables describing player demographics, in-game behavior, and subscription status to a game-related newsletter. The data was collected by a research group in Computer Science at UBC through the PLAICraft Minecraft server, which automatically records player actions and attributes as participants navigate through the world.

Below is a summary of all variables:

| Variable | Type | Description | Example |
|-----------|------|--------------|----------|
| experience | Factor | Player's experience level in Minecraft | "Intermediate" |
| subscribe | Factor | Whether the player subscribes to a game-related newsletter | "Yes" / "No" |
| hashedEmail | Character | Hashed email address for privacy protection | "c1a5f..." |
| played_hours | Numeric | Total hours the player has spent in the game | 45.6 |
| name | Character | Player’s in-game name | "BlockMaster42" |
| gender | Factor | Player’s self-identified gender | "Male" / "Female" / "Other" |
| Age | Numeric | Player’s age in years | 23 |

We can observe that `Experience` is a character. However, it would be easier to manipulate this variable as an ordinal value (ex: Beginner = 1, Amateur = 2, etc.). Furthermore, the formatting style of the column names are incosistent, since Age is capitalized, but the other variables are all lowercase.

# Methods & Results

- Describe the methods you used to perform your analysis from beginning to end that narrates the analysis code.
- Your report should include code which:
    - loads data 
    - wrangles and cleans the data to the format necessary for the planned analysis
    - performs a summary of the data set that is relevant for exploratory data analysis related to the planned analysis
    - creates a visualization of the dataset that is relevant for exploratory data analysis related to the planned analysis
    - performs the data analysis
    - creates a visualization of the analysis
    - note: all figures should have a figure number and a legend

First, we'll need to load in the following libraries to perform our data analysis:

In [1]:
### Run this cell
library(tidyverse)
library(repr)
library(tidymodels)
options(repr.matrix.max.rows = 6)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39

In [2]:
# Reading the data
players <- read_csv("data/players.csv")

head(players)
summary(players)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
Amateur,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
Regular,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
Amateur,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17


  experience        subscribe       hashedEmail         played_hours    
 Length:196         Mode :logical   Length:196         Min.   :  0.000  
 Class :character   FALSE:52        Class :character   1st Qu.:  0.000  
 Mode  :character   TRUE :144       Mode  :character   Median :  0.100  
                                                       Mean   :  5.846  
                                                       3rd Qu.:  0.600  
                                                       Max.   :223.100  
                                                                        
     name              gender               Age       
 Length:196         Length:196         Min.   : 9.00  
 Class :character   Class :character   1st Qu.:17.00  
 Mode  :character   Mode  :character   Median :19.00  
                                       Mean   :21.14  
                                       3rd Qu.:22.75  
                                       Max.   :58.00  
                               

In [3]:
# Factorize data
levels <- c("Beginner", "Amateur", "Regular", "Veteran", "Pro")

# Ordinal encoding the experience category (beginner = 1 & pro = 5)
players <- players |> 
    mutate(experience = as.numeric(factor(experience, levels = levels)), subscribe = as.factor(subscribe), gender = as.factor(gender))

colnames(players) <- c("experience", "subscribed", "hashed_email", "hours_played", "player_name", "gender", "age")

head(players)

experience,subscribed,hashed_email,hours_played,player_name,gender,age
<dbl>,<fct>,<chr>,<dbl>,<chr>,<fct>,<dbl>
5,True,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
4,True,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
4,False,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
2,True,23fe711e0e3b77f1da7aa221ab1192afe21648d47d2b4fa7a5a659ff443a0eb5,0.7,Flora,Female,21
3,True,7dc01f10bf20671ecfccdac23812b1b415acd42c2147cb0af4d48fcce2420f3e,0.1,Kylie,Male,21
2,True,f58aad5996a435f16b0284a3b267f973f9af99e7a89bee0430055a44fa92f977,0.0,Adrian,Female,17


In [6]:
set.seed(4321)

# Discussion

- Summarize what you found
- Discuss whether this is what you expected to find
- Discuss what impact could such findings have
- Discuss what future questions could this lead to

# References
(optional)