In [None]:
#Run this cell before the other stuff
library(tidyverse)

In the cell below, I read the dataset into the notebook.

In [None]:
players <- read_csv("https://raw.githubusercontent.com/nothingbutash/dsci-100-2024w2-group-006-2/refs/heads/main/players.csv")

I started making the dataset usable by turning the experience column into a factor variable (in case identification/grouping of the different types was needed). A few other wrangling/cleaning steps were necessary to get the data into a usable format:
- I changed the Age column to age in order to match the names of the others.
- I changed the hashedEmail column to use _ rather than camel case to match the rest.

In [None]:
players <- players |>
    ##creating the new columns with proper names, then removing the old ones
    mutate(experience = as_factor(experience), age = Age, hashed_email = hashedEmail) |>
    select(-Age, -hashedEmail)
players

In the cell below, I have visualized the age distribution of the sample, as well as calculated several statistics (mean age, median age, and standard deviation).

In [None]:
sample_stats <- players |>
    #calculating the mean, median, and standard deviation with summarize
    summarize(sample_mean = mean(age, na.rm = TRUE), sample_med = median(age, na.rm = TRUE), sample_sd = sd(age, na.rm = TRUE))
sample_stats

sample_distribution <- ggplot(players, aes(x = age)) + 
   geom_histogram(binwidth = 1) +
   labs(x = "Age (Years)", y = "Number of People") +
   ggtitle("Age Distribution of Players")
sample_distribution

In [None]:
players_data_selected <- players_data |>
    select(Age, subscribe, played_hours)

players_plot <- players_data_selected |>
        ggplot(aes(x = Age, y = played_hours, color = subscribe))+
        geom_point()+            
        xlab("Age of Player (yrs)")+
        ylab("Hours Played")+
        labs(title = "Age of player vs. average hours played in game")
players_plot

#cut out extreme outliers to provide better visual of sdata points
players_plot_better_visual <- players_plot +
    ylim(0, 4.5) +
    labs(title = "Age of player vs. average hours played in game (edited)")
players_plot_better_visual