# Group 34: Predicting Subscription Status from Age and Play Time

In [17]:
library(tidyverse)
library(repr)
library(tidymodels)
library(readr)
library(ggplot2)
library(RColorBrewer)
options(repr.matrix.max.rows = 6)

## Introduction

As more improvements and innovations have been made in different aspects of video games such as mode and style design, video games started to play significant roles in the daily lives of individuals. A research group in the Computer Science department at the University of British Columbia (UBC), led by Professor Frank Wood, has established a Minecraft research server to collect data on how people interact within the game environment (1). Specifically, this research group collected two datasets, the players.csv summarized data about player information, and the sessions.csv which summarizes playing sessions. Our report further analyzes the data collected by this research group to investigate a specific question: Can total played hours and age predict subscription in the players.csv dataset? It is expected that players with greater played hours and a younger age would tend to subscribe the game-related newsletter.
<br>
<br>
In the players.csv dataset, there are 196 observations with 7 variables, summarizing players' information on the Minecraft server. Numerical data contains played hours and player age, character data contains the level of experience on the game, hashed email (user identifier), name of the players, and gender of the players. Logical data contains the status of the subscription to a game-related newsletter.

## Methods

### Wrangling

In [20]:
url_players <- "https://raw.githubusercontent.com/lucy-diaz/DSCIprojectgroup34/refs/heads/main/players.csv"
players <- read_csv(url_players)
players

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<chr>,<lgl>,<chr>,<dbl>,<chr>,<chr>,<dbl>
Pro,TRUE,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,TRUE,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,FALSE,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
⋮,⋮,⋮,⋮,⋮,⋮,⋮
Amateur,FALSE,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db299bd4fedb06a46ad5bb,0.0,Dylan,Prefer not to say,17
Amateur,FALSE,f19e136ddde68f365afc860c725ccff54307dedd13968e896a9f890c40aea436,2.3,Harlow,Male,17
Pro,TRUE,d9473710057f7d42f36570f0be83817a4eea614029ff90cf50d8889cdd729d11,0.2,Ahmed,Other,


In [30]:
tidy_player<- players|>
        mutate(experience = factor(experience), gender = factor(gender),subscribe=factor(subscribe))
tidy_player

clean_player<- tidy_player|>
                select(subscribe,played_hours,Age)|>
                filter(played_hours!=0.0)|>
                filter(!is.na(Age))

clean_player


experience,subscribe,hashedEmail,played_hours,name,gender,Age
<fct>,<fct>,<chr>,<dbl>,<chr>,<fct>,<dbl>
Pro,TRUE,f6daba428a5e19a3d47574858c13550499be23603422e6a0ee9728f8b53e192d,30.3,Morgan,Male,9
Veteran,TRUE,f3c813577c458ba0dfef80996f8f32c93b6e8af1fa939732842f2312358a88e9,3.8,Christian,Male,17
Veteran,FALSE,b674dd7ee0d24096d1c019615ce4d12b20fcbff12d79d3c5a9d2118eb7ccbb28,0.0,Blake,Male,17
⋮,⋮,⋮,⋮,⋮,⋮,⋮
Amateur,FALSE,d572f391d452b76ea2d7e5e53a3d38bfd7499c7399db299bd4fedb06a46ad5bb,0.0,Dylan,Prefer not to say,17
Amateur,FALSE,f19e136ddde68f365afc860c725ccff54307dedd13968e896a9f890c40aea436,2.3,Harlow,Male,17
Pro,TRUE,d9473710057f7d42f36570f0be83817a4eea614029ff90cf50d8889cdd729d11,0.2,Ahmed,Other,


subscribe,played_hours,Age
<fct>,<dbl>,<dbl>
TRUE,30.3,9
TRUE,3.8,17
TRUE,0.7,21
⋮,⋮,⋮
TRUE,0.1,44
FALSE,0.3,22
FALSE,2.3,17


### Summary

### Visialization of Data Set

### Analysis

### Analysis Visualization 

## Discussion

### Summary of Findings

### Hypothesis vs Results

### Impact

### Future Analysis

### Reference
(1)The University of British Columbia. (n.d.). Retrieved March 29, 2025, from https://canvas.ubc.ca/courses/153254/assignments/2055150?module_item_id=7644030
