# Project Planning Stage (Individual)

## (1) Data Description:
- ### player.csv
    - The player.csv dataset contains information about each unique player that has played on the MineCraft server. With the record spanning 196 different players (observations), the dataset keeps track of 7 kinds of information (variables) of each player as seen in the following.

| **Information**    | **Information Type** | **Description**        | **Statistic Summary**    | **Experience** | **Factor**    |
|--------------------|----------------------|------------------------|--------------------------|----------------|---------------|
| subscribe          | Logical              | (lgl)                  | -                        | -              | -             |
| hashedEmail        | Character Vector     | (chr)                  | -                        | -              | -             |
| played_hours       | Double               | (dbl)                  | Mean/Median/Mode/Min/Max | -              | -             |
| name               | String               | (chr)                  | -                        | -              | -             |
| gender             | String               | (chr)                  | -                        | -              | -             |
| Age                | Double               | (dbl)                  | Mean/Median/Mode/Min/Max | -              | -             |

- In this dataset, there are some potential issues that are present in the data.
  1. There are 'N/A' values in some of the cells, indicating that we have to either skip over those cells or replace them with a different value
  2. The dataset underrepresents Non-binary people as they seems be in the minority in the gender category
  3. The dataset underrepresents people of higher ages as most of the players around around 18 - 21
- Additionally, some unseen factors may include things such as where the data was collected or the reasoning behind people inputting 'N/A' as an answer 

## (2) Questions:
- ### Broad Question
    - What player characteristics and behaviours are most predictive of subscribing to a game-related newsletter, and how do these features differ between various player types?
- ### Specific Question
    - Can the skill level of players, age, and played hours predict subscription rates in player.csv?

## (3) Exploratory Data Analysis and Visualization:

In [None]:
#Required libraries
library(tidyverse)

In [None]:
# Loading dataset into R
players_data <- read_csv("data/players.csv")
players_data

In [None]:
# Minimum Wrangling on dataset
players_tidy <- players_data |>
    mutate(experience = as_factor(experience), gender = as_factor(gender))
players_tidy

In [None]:
#Computing mean values for each quantitative variable
played_hours_mean <- players_tidy |> select(played_hours) |> map_dfr(max, na.rm = TRUE) |> pull()
age_mean <- players_tidy |> select(Age) |> map_dfr(max, na.rm = TRUE) |> pull()

played_hours_mean
age_mean