# DSCI 100 Final Project 

## Analysis of player activity patterns by time of day on a Minecraft research server
*Kaitlyn Uy 34370213 for DSCI 100-103 2025S*

In [4]:
## Load in libraries (run this cell before contiuing)
library(tidyverse)
library(repr)
library(infer)
library(cowplot)
options(repr.matrix.max.rows = 6)

## Introduction 

Minecraft, developed by Mojang Studios and released in 2011, is a 3D sandbox video game in which players can freely explore, build, and interact with the virtual environment in a world composed of blocks and entities. At the University of British Columbia, a research group in the Department Computer Science led by Frank Wood has created a Minecraft server to collect data on player behaviour.
Understanding player activity patterns is vital for effective server management. Since servers are often limited by their resources, the research team must be able to predict periods of high demand to ensure a smooth player experience. Using historical session data, this project aims to identify the times of day when player activity is at its highest.

The question we aim to answer in this report is: *can the **hour of day** predict the number of concurrent active players on the Minecraft research server, based on play session data?* The response variable in this analysis is the number of concurrent active players, aggregated by hour. The primary explanatory variable we focus on is hour of day. Other potential variables for future research could include day of week (e.g. whether it is a weekend), month, or season. 

The dataset used in this analysis, **sessions.csv**, contains 1535 rows and 5 columns containing player session information on the Minecraft research server. Each row indicates a single play session with the following variables.

| Variable             | Type       | Description                                                      |
|----------------------|------------|------------------------------------------------------------------|
| `hashed_email`       | Character  | Anonymized unique player identifier                              |
| `start_time`         | String     | Human-readable session start time (DD/MM/YYYY HH:MM)             |
| `end_time`           | String     | Human-readable session end time                                  |
| `original_start_time`| Numeric    | Epoch time in milliseconds (raw server timestamp)                |
| `original_end_time`  | Numeric    | Epoch time in milliseconds (raw server timestamp)                |


The dataset, collected via server logs, spans several months of server activity, with session lengths varying from a few minutes to multiple hours. As player data is anonymized, this dataset may be limited to a certain demographic, thus misrepresenting the target population. Furthermore, overlapping sessions, or those that cross hourly or daily boundaries require careful analysis in order to produce accurate player counts.


### *Question: What **times of day** are most likely to have high numbers of active players on the Minecraft research server?*

## Methods and results 

In [2]:
sessions_df <- read_csv("data/sessions.csv") 
head(sessions_df) 

[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


hashedEmail,start_time,end_time,original_start_time,original_end_time
<chr>,<chr>,<chr>,<dbl>,<dbl>
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,30/06/2024 18:12,30/06/2024 18:24,1719770000000.0,1719770000000.0
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,17/06/2024 23:33,17/06/2024 23:46,1718670000000.0,1718670000000.0
f8f5477f5a2e53616ae37421b1c660b971192bd8ff77e3398304c7ae42581fdc,25/07/2024 17:34,25/07/2024 17:57,1721930000000.0,1721930000000.0
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,25/07/2024 03:22,25/07/2024 03:58,1721880000000.0,1721880000000.0
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,25/05/2024 16:01,25/05/2024 16:12,1716650000000.0,1716650000000.0
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,23/06/2024 15:08,23/06/2024 17:10,1719160000000.0,1719160000000.0


- describe the methods you used to perform your analysis from beginning to end that narrates the analysis code.
- your report should include code which:
    - loads data
    - wrangles and cleans the data to the format necessary for the planned analysis
    - performs a summary of the data set that is relevant for exploratory data analysis related to the planned analysis
    - creates a visualization of the dataset that is relevant for exploratory data analysis related to the planned analysis
        - Use our visualization best practices to make high-quality plots (make sure to include labels, titles, units of measurement, etc)
        - Explain any insights you gain from these plots that are relevant to address your question
    - performs the data analysis. For your analysis, you should think about and provide a brief explanation of the following questions:
        - Why is this method appropriate?
        - Which assumptions are required, if any, to apply the method selected?
        - What are the potential limitations or weaknesses of the method selected?
        - How did you compare and select the model?
        - Note: you should also think about the following:
            - How are you going to process the data to apply the model? For example: Are you splitting the data? How? How many splits? What proportions will you use for the splits? At what stage will you split? Will there be a validation set? Will you use cross validation?
    - creates a visualization of the analysis
    - *note: all figures should have a figure number and a legend*

## Discussion

In [3]:
- summarize what you found
- discuss whether this is what you expected to find?
- discuss what impact could such findings have?
- discuss what future questions could this lead to?

ERROR: Error in parse(text = x, srcfile = src): <text>:1:13: unexpected symbol
1: - summarize what
                ^


## References