1) Data Description  

**players.csv:** This file consists of 7 different variables and contains demographic and account-related information for each unique player. There is 196 different observations.
  
**Variables in players.csv:**  
1) experience: This is a character variable, it specifies the players specific experience level. It is categorized into Veteran, Pro, Regular, Amateur, and Beginner.  
2) subscribe: This is a logical variable, it specifies whether the player is subcribed or not. It is categorized into TRUE or FALSE.  
3) hashedEmail: This is a character variable, it contains the players email in a hashed format. This anonymizes the email information.  
4) played_hours: This is a numerical variable (dbl). It contains the total number of gameplay hours per player. The mean number of hours player per player is 5.85 hours.   
5) name: This is a character variable, it contains the players chosen name for the game.  
6) gender: This is a character variable, it contains the players specified gender.  
7) Age: This is a numerical variable (dbl). It contains the players age. The mean value for age is 21.14 years old.  

**Errors in players.csv**  
Some players have missing Age values, which is a concern because it reduces the completeness of the data and could bias any analysis or predictions that rely on age as an important demographic factor.  

**sessions.csv** This file contains 5 different variable sand includes information on individual gameplay sessions recorded in the game. Each row captures a single gameplay session linked to a player. There is 1535 different observations.  

**Variables in sessions.csv**  
1) hashedEmail: This is a character variable, it contains the players email in a hashed format. This anonymizes the email information.  
2) start_time: This is a character variable, it contains the gameplay session start time in the DD/MM/YYYY and HH:MM format.  
3) end_time: This is a character variable, it contains the gameplay session end time in the DD/MM/YYYY and HH:MM format.
4) original_start_time: This is a numerical variable (dbl). It contains the UNIX timestamp (in milliseconds) corresponding to the session start time.  
5) original_end_time: This is a numerical variable (dbl). It contains the UNIX timestamp (in milliseconds) corresponding to the session end time.

**Errors in sessions.csv**  
This dataset contains missing values in the end_time and original_end_time columns, and the time data are stored in two different formats (human-readable strings and UNIX timestamps), which will need to be cleaned and transformed before calculating useful information such as session duration.  





In [4]:
library(tidyverse)

In [7]:
players_data <- read_csv("players.csv")

players_mean <- players_data |>
  summarise(mean_age = mean(Age, na.rm = TRUE), mean_played_hours = mean(played_hours, na.rm = TRUE))

players_mean

[1mRows: [22m[34m196[39m [1mColumns: [22m[34m7[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (4): experience, hashedEmail, name, gender
[32mdbl[39m (2): played_hours, Age
[33mlgl[39m (1): subscribe

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


mean_age,mean_played_hours
<dbl>,<dbl>
21.13918,5.845918


In [8]:
sessions_data <- read_csv("sessions.csv")
sessions_data


[1mRows: [22m[34m1535[39m [1mColumns: [22m[34m5[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (3): hashedEmail, start_time, end_time
[32mdbl[39m (2): original_start_time, original_end_time

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


hashedEmail,start_time,end_time,original_start_time,original_end_time
<chr>,<chr>,<chr>,<dbl>,<dbl>
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,30/06/2024 18:12,30/06/2024 18:24,1.71977e+12,1.71977e+12
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,17/06/2024 23:33,17/06/2024 23:46,1.71867e+12,1.71867e+12
f8f5477f5a2e53616ae37421b1c660b971192bd8ff77e3398304c7ae42581fdc,25/07/2024 17:34,25/07/2024 17:57,1.72193e+12,1.72193e+12
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,25/07/2024 03:22,25/07/2024 03:58,1.72188e+12,1.72188e+12
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,25/05/2024 16:01,25/05/2024 16:12,1.71665e+12,1.71665e+12
bfce39c89d6549f2bb94d8064d3ce69dc3d7e72b38f431d8aa0c4bf95ccee6bf,23/06/2024 15:08,23/06/2024 17:10,1.71916e+12,1.71916e+12
fd6563a4e0f6f4273580e5fedbd8dda64990447aea5a33cbb5e894a3867ca44d,15/04/2024 07:12,15/04/2024 07:21,1.71317e+12,1.71317e+12
ad6390295640af1ed0e45ffc58a53b2d9074b0eea694b16210addd44d7c81f83,21/09/2024 02:13,21/09/2024 02:30,1.72688e+12,1.72689e+12
96e190b0bf3923cd8d349eee467c09d1130af143335779251492eb4c2c058a5f,21/06/2024 02:31,21/06/2024 02:49,1.71894e+12,1.71894e+12
36d9cbb4c6bc0c1a6911436d2da0d09ec625e43e6552f575d4acc9cf487c4686,16/05/2024 05:13,16/05/2024 05:52,1.71584e+12,1.71584e+12
