# An exploration into the video games of the last decade 

Questions: 

1. What are the top video game review sites 
2. Which are the must play video games of the last 10 years 
3. Which is the most successful gaming platform 
4. Who are the most successful developers 
5. Who are the most successful publishers 
6. Which are the most successful genres 

- How can we assess video games by: 
1. Platform 
2. Genre 
3. Popularity
4. Critical success 
5. Commercial success 
6. Date (Year, Month, Week, etc. )


## Introduction 

So I'm finally on the verge of purchasing a next generation console and I can't wait to start playing all the great games that I've missed out on since I stopped the hobby in 2013. 

However, the trouble is, a lot of remarkable games have come out but I only have the time to play a few. Moreover, I'm well aware that certain games are platform exclusives, so I'll have to decide on the platform that will provide me the best value. 

At first I figured these questions should be fairly easy to answer by visiting the various video game news sites Sure enough there are plenty of lists but each publisher seems to have a different curation of games. How do I decide on which list to trust and what games to play? Turns out that there are several approaches, each with it's own issues. 


- I could just follow the list provided by the site that ranks first on Google. However, this would give me a biased view since any single view on subjective matters such as the best video games to play is likely to be inferior to an aggregate view. Moreover, list may itself be contradictory with the scores provided by the reviewer For example, in [IGN's list](https://www.ign.com/articles/the-best-100-video-games-of-all-time), [Dark Souls](https://www.ign.com/articles/2011/09/30/dark-souls-review) which had received a 9 is ranked above [Person 5 Royal](https://www.ign.com/articles/persona-5-royal-review) which was rated a 10 by IGN. According to the site, the list reflects the views of the current staff which might be different to the ones who had originally reviewed the game and takes into account several qualitative factors such as how fun the game is to play today vs how great it was when it had launched, and to what extent the game is the very best in its class. This is not a bad approach to building a list but it does add significant bias. 

- I could compare multiple lists and only factor in the games that lie in the intersection of these lists. This approach should give me a less biased view, but to do this process manually would be painstaking.

- I could just view the top games on a review aggregation platform like [Metacritic](https://www.metacritic.com/browse/games/score/metascore/all/all/filtered) and filter it out by date and platform. On the surface, this should give me the least biased answers, but unfortunately this is not the case. [Metacritic] adds a hidden weight to each of the critics that they are aggregating, thus adding a hidden bias to the score. Isn't there a way for me to do better? 

Yes, there is indeed - Data Science. 



Top Gaming Review Sites 

Destructoid
GamesRadar+
Game Informer
Metacritic
Nintendo Life
GameSpot
Christ Centered Gamer
GameFAQS
Steam


YouTube
1. AngryJoeShow
2. Gameranx
3. ACG
4. Skill Up
5. Worth A Buy





## Gathering the Data 

To find the least biased answer, I should aggregate both critical and user scores from many different gaming sites. 

1. IGN
2. Gamespot 
3. Gamefaq
4. Kotaku
5. Polygon
6. Metacritic 
7. Pcgamer 
8. Pcworld 
9. Giantbomb 
10. Fextralife 
11. Eurogamer 
12. Gamingtarget
13. Gamesradar
14. vg247
15. Rockpapershotgun 
16. N4G
17. Destructoid 
18. Gameranx
19. Nintendolife 
20. Escapistmagazine 

The goal here is to find out the most trusted gaming sites. 



In [66]:
import requests 
from bs4 import BeautifulSoup
import re
import pandas as pd

In [110]:
def extract_gamespot_review(card):
    '''
    Extract a single review card into a Python dictionary 
    '''
    review = {}

    review['title'] = card.h4.text
    review['platform'] = card.span.text
    review['review_date'] = card.time['datetime']

    # Critic Score may or may not exist 
    review['critic_score'] = (
        len(card.select('.review-ring-score__score')) 
        and card.select('.review-ring-score__score')[0].text
    )

    review['critic_score_text'] = (
        len(card.select('.review-ring-score__text'))
        and card.select('.review-ring-score__text')[0].text
    )

    # VVIP: For further data extraction 
    base = 'https://www.gamespot.com'
    review['url'] =  base + card.a['href'] 

    # Metadata 
    review['comments'] = card.select('.card-metadata span')[0].text
    review['upvotes'] = card.select('.card-metadata span')[1].text

    review['img'] = card.img['src']
    return review

def extract_gamespot_reviews(soup, reviews): 
    for card in soup.find_all(class_='card-item'): 
        review = _extract_gamespot_review(card)
        reviews.append(review)
    return reviews
        

In [85]:
# TEST DRIVE
reviews = []

r = requests.get('https://www.gamespot.com/games/reviews')
soup = BeautifulSoup(r.text, 'lxml')

extract_gamespot_reviews(soup, reviews)

len(reviews)

21

In [86]:
reviews[0]

{'title': 'Soundfall Review - Not Quite My Tempo',
 'platform': 'PC',
 'datetime': 'Friday, May 27, 2022 3:39pm',
 'critic_score': '5',
 'critic_score_text': 'Mediocre',
 'url': 'https://www.gamespot.com/reviews/soundfall-review-not-quite-my-tempo/1900-6417887/',
 'comments': '0',
 'upvotes': '2',
 'img': 'https://www.gamespot.com/a/uploads/screen_petite/43/434805/3982898-8068884117-Sound.jpg'}

In [None]:
reviews = []

In [163]:
# Fetch a single page of reviews from Gamespot 
def fetch_gamespot_review_page(page_num):
    '''
    Fetch a page of reviews from gamespot.com 
    and return the page as an HTML parse tree. 
    '''
    url = f'https://www.gamespot.com/games/reviews/?page={page_num}'
    r = requests.get(url)
    page = BeautifulSoup(r.text, 'lxml')
    return page
    

In [164]:
# Fetch all pages of reviews from Gamespot 
def fetch_gamespot_reviews(progress=False): 
    '''
    Fetch pages of reviews from gamespot.com and
    return a collection of pages as HTML parse trees. 
    '''
    pages = [] 
    for i in range(1, 729): 
        page = fetch_gamespot_review_page(i)
        if progress:
            print(f'Downloading page {i} of {729} ....... {round(i/729*100)}% complete')
    return pages

In [111]:
review_2 = []
for i, soup in enumerate(soups): 
    extract_gamespot_reviews(soup, review_2)

In [112]:
len(review_2)

12127

In [113]:
df2 = pd.DataFrame(review_2)

In [114]:
df2

Unnamed: 0,title,platform,datetime,critic_score,critic_score_text,url,comments,upvotes,img
0,NHL 12 Review,PS3,"Monday, Sep 26, 2011 4:29pm",8,Great,https://www.gamespot.com/reviews/nhl-12-review...,0,0,https://www.gamespot.com/a/uploads/screen_peti...
1,Child of Eden Review,PS3,"Monday, Sep 26, 2011 9:03am",8.5,Great,https://www.gamespot.com/reviews/child-of-eden...,1,0,https://www.gamespot.com/a/uploads/screen_peti...
2,Rotastic Review,X360,"Friday, Sep 23, 2011 5:45pm",5,Mediocre,https://www.gamespot.com/reviews/rotastic-revi...,0,0,https://www.gamespot.com/a/uploads/screen_peti...
3,F1 2011 Review,PS3,"Friday, Sep 23, 2011 12:23pm",7.5,Good,https://www.gamespot.com/reviews/f1-2011-revie...,1,2,https://www.gamespot.com/a/uploads/screen_peti...
4,Bit.Trip Saga Review,3DS,"Thursday, Sep 22, 2011 6:03pm",8,Great,https://www.gamespot.com/reviews/bittrip-saga-...,1,2,https://www.gamespot.com/a/uploads/screen_peti...
...,...,...,...,...,...,...,...,...,...
12122,Under a Killing Moon Review,PC,"Wednesday, May 1, 1996 12:00am",6.9,Fair,https://www.gamespot.com/reviews/under-a-killi...,2,5,https://www.gamespot.com/a/uploads/screen_peti...
12123,Warhammer: Shadow of the Horned Rat Review,PC,"Wednesday, May 1, 1996 12:00am",6.8,Fair,https://www.gamespot.com/reviews/warhammer-sha...,3,9,https://www.gamespot.com/a/uploads/screen_peti...
12124,Whiplash Review,PC,"Wednesday, May 1, 1996 12:00am",6.1,Fair,https://www.gamespot.com/reviews/whiplash-revi...,4,3,https://www.gamespot.com/a/uploads/screen_peti...
12125,Worms Review,PC,"Wednesday, May 1, 1996 12:00am",7.6,Good,https://www.gamespot.com/reviews/worms-review/...,3,6,https://www.gamespot.com/a/uploads/screen_peti...


In [118]:
df2.head(30)

Unnamed: 0,title,platform,datetime,critic_score,critic_score_text,url,comments,upvotes,img
0,NHL 12 Review,PS3,"Monday, Sep 26, 2011 4:29pm",8.0,Great,https://www.gamespot.com/reviews/nhl-12-review...,0,0,https://www.gamespot.com/a/uploads/screen_peti...
1,Child of Eden Review,PS3,"Monday, Sep 26, 2011 9:03am",8.5,Great,https://www.gamespot.com/reviews/child-of-eden...,1,0,https://www.gamespot.com/a/uploads/screen_peti...
2,Rotastic Review,X360,"Friday, Sep 23, 2011 5:45pm",5.0,Mediocre,https://www.gamespot.com/reviews/rotastic-revi...,0,0,https://www.gamespot.com/a/uploads/screen_peti...
3,F1 2011 Review,PS3,"Friday, Sep 23, 2011 12:23pm",7.5,Good,https://www.gamespot.com/reviews/f1-2011-revie...,1,2,https://www.gamespot.com/a/uploads/screen_peti...
4,Bit.Trip Saga Review,3DS,"Thursday, Sep 22, 2011 6:03pm",8.0,Great,https://www.gamespot.com/reviews/bittrip-saga-...,1,2,https://www.gamespot.com/a/uploads/screen_peti...
5,F1 2011 Review,PC,"Thursday, Sep 22, 2011 11:19am",8.0,Great,https://www.gamespot.com/reviews/f1-2011-revie...,1,1,https://www.gamespot.com/a/uploads/screen_peti...
6,Bit.Trip Complete Review,WII,"Wednesday, Sep 21, 2011 7:24pm",8.0,Great,https://www.gamespot.com/reviews/bittrip-compl...,2,0,https://www.gamespot.com/a/uploads/screen_peti...
7,Renegade Ops Review,X360,"Wednesday, Sep 21, 2011 6:44pm",8.0,Great,https://www.gamespot.com/reviews/renegade-ops-...,4,3,https://www.gamespot.com/a/uploads/screen_peti...
8,Renegade Ops Review,PS3,"Wednesday, Sep 21, 2011 6:43pm",8.0,Great,https://www.gamespot.com/reviews/renegade-ops-...,1,0,https://www.gamespot.com/a/uploads/screen_peti...
9,TrackMania 2: Canyon Review,PC,"Wednesday, Sep 21, 2011 4:28am",8.0,Great,https://www.gamespot.com/reviews/trackmania-2-...,1,2,https://www.gamespot.com/a/uploads/screen_peti...


In [88]:
len(reviews)

3129

In [89]:
df = pd.DataFrame(reviews)

In [90]:
df.to_csv('gamespot_reviews.csv')

In [123]:
df.tail(100).iloc[::-1].reset_index()['title']

0                        Gears of War 3 Review
1            Call of Juarez: The Cartel Review
2                               F1 2011 Review
3        Dragon Quest Monsters: Joker 2 Review
4                     Radiant Silvergun Review
                        ...                   
95                      Disney Universe Review
96                 Pokemon Rumble Blast Review
97       Rune Factory: Tides of Destiny Review
98    Dragon Ball Z: Ultimate Tenkaichi Review
99    Dragon Ball Z: Ultimate Tenkaichi Review
Name: title, Length: 100, dtype: object

In [128]:
df['title'].isin(df2.head(100).reset_index()['title'].to_list())

0       False
1       False
2       False
3       False
4       False
        ...  
3124     True
3125     True
3126     True
3127     True
3128     True
Name: title, Length: 3129, dtype: bool

In [134]:
next = df2[~df2['title'].isin(df['title'])].reset_index().drop(columns='index')

In [141]:
data = pd.concat([df, next], ignore_index=True)

In [148]:
data['datetime']

0           Friday, May 27, 2022 3:39pm
1         Thursday, May 26, 2022 7:08pm
2         Thursday, May 26, 2022 1:33pm
3         Thursday, May 26, 2022 8:00am
4           Friday, May 20, 2022 2:09pm
                      ...              
15123    Wednesday, May 1, 1996 12:00am
15124    Wednesday, May 1, 1996 12:00am
15125    Wednesday, May 1, 1996 12:00am
15126    Wednesday, May 1, 1996 12:00am
15127    Wednesday, May 1, 1996 12:00am
Name: datetime, Length: 15128, dtype: object

In [159]:
data['datetime'] = pd.to_datetime(data['datetime'])

In [170]:
len(data)

15128

In [160]:
data.to_csv('gamespot_reviews_all.csv', index=False)

In [None]:
#regex = re.findall('(.+?)(Review|-)', text)

#### Convert Critic Score Text to Category

#The critic score text should be a category. Let's check what type of text is present. 
#categories = df[['critic_score_text', 'critic_score']].sort_values('critic_score')['critic_score_text'].value_counts(sort=False)

#df['critic_score_text'] = pd.Categorical(df['critic_score_text'], categories=categories.index, ordered=True)

