# **Individual Project Planning**

### Reading in Datasets

In [None]:
library(tidymodels)
library(tidyverse)

players <- read_csv("data/players.csv")
sessions <- read_csv("data/sessions.csv")

## (1) Data Description:

In [None]:
summary(players)
summary(sessions)

## Variables 
### Players Dataset:

|#| Variable Name | Type of Variable | Variable Meaning | Data Type |
|:--------:|:--------|:--------:|:--------|:--------:|
|1| `experience`  | Qualitative  | player’s experience level | chr |
|2| `subscribe`  | Qualitative  | whether the player is subscribed (True/False)  | lgl |
|3| `hashedEmail`  | Qualitative  | player’s email  | chr |
|4| `played hours`  | Quantitative  | total number of hours played in Minecraft  | dbl |
|5| `name`  | Qualitative  | player’s name  | chr |
|6|`gender` | Qualitative  | player’s gender  | chr |
|7| `Age`  | Quantitative  | player’s age  | int |

### Sessions Dataset:

|#| Variable Name | Type of Variable | Variable Meaning | Data Type |
|:--------:|:--------|:--------:|:--------|:--------:|
|1| `hashedEmail`  | Qualitative  | player’s email  | chr |
|2| `start_time`  | Quantitative  | time when player started playing  | chr |
|3| `end_time`  | Quantitative  | time when player stopped playing  | chr |
|4| `date`  | Quantitative  | date of session  | chr |
|5| `original_start_time`  | Quantitative  | Data B3  | dbl |
|6|`original_end_time` | Quantitative  | Data B3  | dbl |



## Questions

Broad Question: We would like to know which "kinds" of players are most likely to contribute a large amount of data so that we can target those players in our recruiting efforts.

Specific Question: Can player experience predict the total time spent on the Minecraft server in the players dataset?

The players dataset contains information about each player, including their player experience and played hours on the Minecraft server. These two variables are directly related to the research question where player experience is the explanatory, predictor variable, and playtime is the response variable. By analyzing this data, we can determine whether players with more experience tend to spend more time playing, and if player level can be used to predict the amount of playtime.

## Exploratory Data Analysis and Visualization

### Tidied Datasets

Players dataset: The `Age` column was a decimal (double data type), but since age is always a whole number, I converted it to an integer data type.

Sessions dataset: The `start_time` and `end_time` columns included both the date and time, therefore they had multiple values for 1 variable I split each of these into separate columns, `date`, `start_time`, `end_time` so that each variable has 1 value.

In [None]:

players <- players|>
mutate(Age = as.integer(Age))

sessions <- sessions|>
separate(col=start_time, into = c ("date", "start_time"), sep = " ")|>
    separate(col=end_time, into = c ("date", "end_time"), sep = " ")

head(players)
head(sessions)