# ML using PS Vita games

### 1. Frame the problem and look at the big picture

The goals of this program will be to make a system that can utilize reviews and ratings of different games on different websites to determine and a list of personal reviews to recommend games to gamers. 

The solution will be used to help gamers chose the next game they would play.

This will be a supervised learning method. Using multivariate regresssion.

Perfomance will be measured using the root mean square error.

Assumptions

* The reviews on sites like Amazon are all made by gamers.
* That the reviews stars are a representation of the quality of gameplay.

### 2. Get the data

We will make a function for fetching the data structures for the games lists from a website.

In [133]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
    
class HTMLTableParser:
       
    def parse_url(self, url):
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'lxml')
        return [(self.parse_html_table(table))\
                for table in soup.find_all('table')]
    
    def parse_html_table(self, table):
        n_columns = 0
        n_rows=0
        column_names = []
    
        # Find number of rows and columns
        # we also find the column titles if we can
        for row in table.find_all('tr'):
                
            # Determine the number of rows in the table
            td_tags = row.find_all('td')
            if len(td_tags) > 0:
                n_rows+=1
                if n_columns == 0:
                    # Set the number of columns for our table
                    n_columns = len(td_tags)
                        
            # Handle column names if we find them
            th_tags = row.find_all('th') 
            if len(th_tags) > 0 and len(column_names) == 0:
                for th in th_tags:
                    column_names.append(th.get_text())
    
        # Safeguard on Column Titles
        if len(column_names) > 0 and len(column_names) != n_columns:
            raise Exception("Column titles do not match the number of columns")
    
        columns = column_names if len(column_names) > 0 else range(0,n_columns)
        df = pd.DataFrame(columns = columns,
                            index= range(0,n_rows))
        row_marker = 0
        for row in table.find_all('tr'):
            column_marker = 0
            columns = row.find_all('td')
            for column in columns:
                df.iat[row_marker,column_marker] = column.get_text()
                column_marker += 1
            if len(columns) > 0:
                row_marker += 1
                    
        # Convert to float if possible
        for col in df:
            try:
                df[col] = df[col].astype(float)
            except ValueError:
                pass
            
        return df 

Let's check the function works. 

In [134]:
DATA_SOURCE = "http://www.vgchartz.com/platform/43/playstation-vita/"

hp = HTMLTableParser()
tables_list = hp.parse_url(DATA_SOURCE)

# Grabbing the table from the list
table = tables_list.pop()
table.head()

Unnamed: 0,Pos,Game,Year,Genre,Publisher,North America,Europe,Japan,Rest of World,Global
0,1.0,MineCraft,2014,Misc,Sony Computer Entertainment Europe,0.22,0.73,1.23,0.27,2.45
1,2.0,Call of Duty Black Ops: Declassified,2012,Misc,Activision,0.74,0.52,0.07,0.38,1.71
2,3.0,Uncharted: Golden Abyss,2011,Action,Sony Computer Entertainment,0.53,0.71,0.13,0.25,1.62
3,4.0,Assassin's Creed,2012,Adventure,Ubisoft,0.53,0.57,0.06,0.32,1.48
4,5.0,LittleBigPlanet PS Vita,2012,Platform,Sony Computer Entertainment,0.36,0.69,0.02,0.31,1.38


Storing the data as a csv file. 

In [137]:
import os
import csv

GAMES_PATH = os.path.join("datasets", "PSVita")

def update_games_data(table, games_path=GAMES_PATH):
    if not os.path.isdir(games_path):
        os.makedirs(games_path)
    if not os.path.isfile(os.path.join(games_path, "psvita.csv")):
        table.to_csv(os.path.join(games_path, "psvita.csv"), encoding='utf-8', index=False)
    else:
        os.rename(os.path.join(games_path, "psvita.csv"), os.path.join(games_path, "psvita(backup).csv"))
        table.to_csv(os.path.join(games_path, "psvita.csv"), encoding='utf-8', index=False)

In [138]:
update_games_data(table)