# **Predicting Subscription Status Based on player characteristics**

### Data Science 100 Project - Group 10

- Kyle Nguyen (76276393)
- Jiayin Wang (47186200)
- Clianta Anindya (78508892)


### **Introduction**

#### **(1) Background Information**
In recent years, the gaming industry has increasingly relied on data to better understand player behavior and improve user engagement.
One area of interest is predicting which players are most likely to subscribe to game-related newsletters, as this can inform targeted recruitment strategies, optimize resource allocation, and improve community building around gaming platforms.

Our group is addressing Question 1 from the project brief: **What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do these features differ between various player types?**

To explore this, we focus specifically on whether player age, total hours played (i.e. **_Age_** and **_played_hours_**) and session count can predict whether a player subscribed to the newsletter. These variables are available in the **_players.csv_** and **_sessions.csv_** datasets, which contains data on 196 unique players and 1535 sessions, respectively.

Understanding how these player characteristics relate to newsletter subscription behavior will help stakeholders improve how they engage different player types, and may also provide insights into motivations behind long-term or more invested gameplay.

#### **(2) Questions**
- **Broad question**: What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do these features differ between various player types?
- **Specific question**: Can **_Age_**, **_played_hours_** **_session_count_** predict **_subscribe_** in the data set?
#### **(3) Description of data sets used**
This project is based on data collected by a research group at UBC, led by Professor Frank Wood. The group operates a Minecraft research server (_plaicraft.ai_) where they track how users interact within the virtual world.
We use the **players.csv** and **sessions.csv** data.

The **players.csv** set has 196 rows, each representing one players, uniquely identified by _hashedEmail_.
There are 7 variables in the **players.csv** data set:
* _experience_: character, indicating player's experience level
* _hashedEmail_: character, unique identifier for each player
* _name_: character, player's name
* _gender_: character, player's gender
* _played_hours_: double, total hour the player has played
* _Age_: double, player's age
* _subscribe_: logical, whether the player subscribes the newsletter

### **Methods and Results**

In [1]:
# Run this cell first to load the libraries before continuing
library(repr)
library(tidyverse)
library(tidymodels)
options(repr.matrix.max.rows = 6)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39

In [2]:
# read the players.csv files to the players dataframe variable
players <- read_csv("data/players.csv")

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [3]:
# select the desired columns
players_selected_colummns <- select(players, played_hours, Age)

### **Discussions**

### **References**


In [None]:
# Hello