Skip to content
Branch: master
Find file History
Latest commit 7f88702 Jul 29, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information. add warning Jul 29, 2019
video_games.csv add video game data Jul 29, 2019

Video Games Dataset

This week's data comes courtesy of Liza Wood via Steam Spy. She recently published a blog post on her data analysis of this video game data.

She was kind enough to provide a fairly clean dataset, and I have done some small additional clean up seen below.

There is time played, ownership, release date, publishing information, and for some a metascore! Lots of ways to slice and dice this data!


Please be advised that the average and median playtime is over the last two weeks, as such there are many many games where playtime is low or zero.

Get the data!

video_games <- readr::read_csv("")

Data Dictionary


variable class description
number double Game number
game character Game Title
release_date character Release date
price double US Dollars + Cents
owners character Estimated number of people owning this game.
developer character Group that developed the game
publisher character Group that published the game
average_playtime double Average playtime in minutes
median_playtime double Median playtime in minutes
metascore double Metascore rating

# clean dataset from lizawood's github
url <- ""

# read in raw data
raw_df <- url %>% 
  read_csv() %>% 

# clean up some of the factors and playtime data
clean_df <- raw_df %>% 
  mutate(price = as.numeric(price),
         score_rank = word(score_rank_userscore_metascore, 1),
         average_playtime = word(playtime_median, 1),
         median_playtime = word(playtime_median, 2),
         median_playtime = str_remove(median_playtime, "\\("),
         median_playtime = str_remove(median_playtime, "\\)"),
         average_playtime = 60 * as.numeric(str_sub(average_playtime, 1, 2)) +
           as.numeric(str_sub(average_playtime, 4, 5)),
         median_playtime = 60 * as.numeric(str_sub(median_playtime, 1, 2)) +
           as.numeric(str_sub(median_playtime, 4, 5)),
         metascore = as.double(str_sub(score_rank_userscore_metascore, start = -4, end = -3))) %>% 
  select(-score_rank_userscore_metascore, -score_rank, -playtime_median) %>% 
  rename(publisher = publisher_s, developer = developer_s)

You can’t perform that action at this time.