# Project Planning Stage (Individual)

For this project, I will be working with log data from a Minecraft server
about the players and the sessions played on the Minecraft server to come to a conclusion about
the question:
“Can a player's experience and age predict their average session length in the session dataset?”

In [None]:
library(tidyverse)
library(lubridate)

players <- read_csv("players.csv")

sessions <- read_csv("sessions.csv")

glimpse(players)
glimpse(sessions)

In [None]:
sessions_duration <- sessions |>
    mutate(
        start = dmy_hm(start_time), end = dmy_hm(end_time),
        duration_min = as.numeric(difftime(end, start, units = "mins")))

avg_session <- sessions_duration |>
    group_by(hashedEmail) |>
    summarize(
        avg_session_mins = mean(duration_min, na.rm = TRUE),
        n_sessions = n(),
        .groups = "drop")

data_analysis <- players |>
    left_join(avg_session, by = "hashedEmail") |>
    filter(!is.na(Age), !is.na(avg_session_mins))

glimpse(data_analysis)

# Data Description:

For this report I'm using one combined data set that comes from information about each player and information about their player sessions.

The "players" table has one row per player and it includes things like age in years and experience levels. The "sessions" table has one row per play session and records the start and end time of the session. Using the times given from the "sessions" table, I created the variable 'duration_min' that measures how long each session lasted in minutes. 

From there, I summarized this to get the average session minutes using the average session length for each player and the number of sessions they had. Then, I joined the tables together to form the variable 'data_analysis'. Players with a missing age or no recorded sessions were dropped as they had nothing to be calculated. Conclusively, most players in the 'data_analysis' were teenagers or young adults (with a few outliers). Their session lengths had variety with there being many short sessions and a few longer ones.

In [None]:
data_analysis |>
    summarize(
        mean_age = mean(Age, na.rm = TRUE),
        mean_avg_session = mean(avg_session_mins, na.rm = TRUE),
        mean_n_sessions = mean(n_sessions, na.rm = TRUE),
        mean_played_hours = mean(played_hours, na.rm = TRUE))

# Questions:

The *broad question* being addressed is: "How do player characteristics link to how much time someone typically spends playing on this Minecraft server?"

*Specific question*: "Can a player's experience level and age help predict their average session length (minutes)?"

This question fits the data well 