# Project Planning Stage (Individual)

In [1]:
# Run this cell before continuing
library(tidyverse)
library(repr)
library(tidymodels)
library(GGally)
library(ISLR)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.1.1 ──

[32m✔[39m [34mbroom       [39m 1.0.6     [32m✔[39m [34mrsample     [39

# *(1) Data Description*: Players Data

- Number of Observations: 197
- Number of Variables: 7
- Name and Type of Variables: experience (chr), hashedEmail (chr), name (chr), gender (chr),
  played_hours (dbl), Age (dbl), subscribe (lgl)
- Variable Meanings:
  - experience: How adept the player was at the game seperated into 5 ranks: beginner, amateur, regular, pro, and veteran
  - hashedEmail: Each player's email identity
  - name: The name of the player
  - gender: The gender of the player (male, female, or other)
  - played_hours: The number of hours spent playing the game for each player
  - Age: The ages of the players
  - subscribe: If the player was subscribed to the channel for the game (true or false)
- Any Issues in Data: there are some empty spots found in the data
- Issues that can't be directly seen: idk...
- How data was collected: Data was collected from a research group in Computer Science conducted by Frank Wood by using PlaiCraft as the minecraft server to record player's actions and navigate through the world

- Summary statistics for the players data:

In [6]:
players <- read_csv("data/players.csv")
summary(players)

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


  experience        subscribe       hashedEmail         played_hours    
 Length:196         Mode :logical   Length:196         Min.   :  0.000  
 Class :character   FALSE:52        Class :character   1st Qu.:  0.000  
 Mode  :character   TRUE :144       Mode  :character   Median :  0.100  
                                                       Mean   :  5.846  
                                                       3rd Qu.:  0.600  
                                                       Max.   :223.100  
                                                                        
     name              gender               Age       
 Length:196         Length:196         Min.   : 9.00  
 Class :character   Class :character   1st Qu.:17.00  
 Mode  :character   Mode  :character   Median :19.00  
                                       Mean   :21.14  
                                       3rd Qu.:22.75  
                                       Max.   :58.00  
                               

# *(1) Data Description*: Sessions Data

- Number of Observations: 1535
- Number of Variables: 5
- Name and Type of Variables: hashedEmail (chr), start_time (chr), end_time (chr),
  original_start_time (dbl), original_end_time (dbl)
- Variable Meanings:
  - hashedEmail: Each player's email identity
  - start_time: The exact start date of the player when they first started in the game (formatted as day/month/year)
  - end_time: The exact end date of the player when they stopped playing the game (formatted as day/month/year)
  - original_start_time: The start time of the player 
  - original_end_time: The end time of the player
- Any Issues in Data: The original start and end time numbers are extremely large
- Issues that can't be directly seen: idk...
- How data was collected: Data was collected from a research group in Computer Science conducted by Frank Wood by using PlaiCraft as the minecraft server to record player's actions and navigate through the world

Summary statistics for sessions data:

In [5]:
sessions <- read_csv("data/sessions.csv")
summary(sessions)

[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


 hashedEmail         start_time          end_time         original_start_time
 Length:1535        Length:1535        Length:1535        Min.   :1.712e+12  
 Class :character   Class :character   Class :character   1st Qu.:1.716e+12  
 Mode  :character   Mode  :character   Mode  :character   Median :1.719e+12  
                                                          Mean   :1.719e+12  
                                                          3rd Qu.:1.722e+12  
                                                          Max.   :1.727e+12  
                                                                             
 original_end_time  
 Min.   :1.712e+12  
 1st Qu.:1.716e+12  
 Median :1.719e+12  
 Mean   :1.719e+12  
 3rd Qu.:1.722e+12  
 Max.   :1.727e+12  
 NA's   :2          