Overview
This project uses data collected from a Minecraft research server hosted by a UBC research group.
The dataset is divided into two files:
1. sessions.csv
2. players.csv

In [None]:
library(repr)
library(tidyverse)
library(tidymodels)
sessions <- read_csv("sessions.csv")
players <- read_csv("players.csv")
sessions |>
head(5)
players |>
head(5)


Data Summary :

1. sessions.csv: It contains columns with hashed emails of users, session start time and session end time in 2 different formats.
2. players.csv: It contains columns with player exprience, newsletter subscription status , hashed emails of users, total played hours, name, gender and the player's age.

Potential Issues:

1. The data was collected through gameplay log and it may miss the time the player was offline and still add it in total hours played.
2. There could be missing values in the data columns which need to be kept in mind.


Broad Question :-

What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do
these features differ between various player types?

Specific Question :-

Can a player’s total playtime, number of sessions, and average session length predict whether they subscribe to the 
newsletter?

Understanding the characteristics and playing style of the players who have subscribed for the new letter will help design recruitment strategies and understanding engagement. Players who are generally more active and consistent are probably more interested in things beyond the game. 

Now lets visualize the data in different ways to analyze the data in more depth.

In [None]:
time_players_plot <- ggplot(players,aes(x=played_hours))+
geom_histogram(xlim(c(played_hours("0"),played_hours("3")))),  position="identity",binwidth = 0.1)+
labs(title="Number of Players vs Total Time Played",fill="Newsletter Subscription Status"
     ,x="Total Time Played" , y="Number of Players")
time_players_plot