### Game data exploration

ARRM database files were copied from ARRM install folder and converted from mdb to csv using mdb-tools

Sources:
- ARRM database files
- SteamDB data
- what else?

Preferred features for a database:
- name/title
- filename (important for arcade titles)
- description
- platform
- release date/year
- genre(s)
- developer
- publisher
- number of players
- cooperative
- rating

In [1]:
import pandas as pd

## DAT Databases

In [2]:
dat_daphne = pd.read_csv('arrm/dat_database_daphne.csv')
dat_dos = pd.read_csv('arrm/dat_database_dos.csv')
dat_mame = pd.read_csv('arrm/dat_database_mame.csv')
dat_scumm_vm = pd.read_csv('arrm/dat_database_scummvm.csv')

  dat_mame = pd.read_csv('arrm/dat_database_mame.csv')


In [3]:
dat_daphne.head()

Unnamed: 0,ID,gametitle_mame,filename_mame,cloneof_mame,manufacturer_mame,year_mame,genre_mame
0,51,Astron Belt,astron,,,-,
1,52,Badlands,badlands,,,-,
2,53,Bega's Battle,bega,,,-,
3,54,Cliff Hanger,cliff,,,-,
4,55,Cobra Command,cobra,,,-,


In [4]:
dat_daphne.isnull().sum()

ID                    0
gametitle_mame        0
filename_mame         0
cloneof_mame         34
manufacturer_mame    34
year_mame             0
genre_mame           34
dtype: int64

Analysis: Daphne dataframe is only 34 rows, nulls for manufacturer and genre info. Lets skip this database.

In [5]:
dat_mame.head()

Unnamed: 0,ID,systemname_mame,name,filename,cloneof_mame,developer,release_date,genre
0,355065,advmame 0.94-RetroPie-260.dat,"PuckMan (Japan set 1, Probably Bootleg)",puckman,,Namco,1980/01/01,
1,355066,advmame 0.94-RetroPie-260.dat,PuckMan (Japan set 2),puckmana,puckman,Namco,1980/01/01,
2,355067,advmame 0.94-RetroPie-260.dat,PuckMan (Japan set 1 with speedup hack),puckmanf,puckman,Namco,1980/01/01,
3,355068,advmame 0.94-RetroPie-260.dat,Puckman (Falcom),puckmanh,puckman,hack,1980/01/01,
4,355069,advmame 0.94-RetroPie-260.dat,Pac-Man (Midway),pacman,puckman,[Namco] (Midway license),1980/01/01,


In [6]:
dat_mame.isnull().sum()

ID                      0
systemname_mame         0
name                    0
filename                0
cloneof_mame        88733
developer           11564
release_date        94049
genre              193224
dtype: int64

Analysis: The MAME list is huge (222251 rows), genre missing for 87%, year for 42% of the titles. No descriptions. However the existence of mame filenames could be useful. <br>
Columns to keep: gametitle_mame, filename_mame manufacturer_mame and year_mame.

In [7]:
dat_scumm_vm.head()

Unnamed: 0,ID,name,filename,cloneof,developer,release_year,genre
0,41893,Gabriel Knight - Sins of the Fathers,gk1,,Sierra,1993,Adventure
1,41894,The Beast Within - A Gabriel Knight Mystery,gk2,,Sierra,1995,Adventure
2,41895,Astro Chicken,astrochicken,,Sierra,1989,Adventure
3,41896,Donald Duck's Playground,ddp,,Sierra,1986,Educational
4,41897,3 Skulls of the Toltecs (CD DOS),toltecs,,Revistonic,1996,Adventure


In [8]:
dat_scumm_vm.isnull().sum()

ID                0
name              0
filename          0
cloneof         402
developer       145
release_year      0
genre           145
dtype: int64

SCUMM analysis: Bit of a niche dataset but still useful. Manufacturer and genre missing for 35% titles but after inspection the 35% represents more niche games or some non-SCUMM games. Let's come back to this later.

In [9]:
dat_dos.head()

Unnamed: 0,ID,Name,Developer,Y,Genre,Description,Publisher
0,61191,Stunt Island,The Assembly Line,19920101T000000,"Flight Simulator,Vehicle Simulation",Stunt Island was marketed as The Stunt Flying ...,"Walt Disney Computer Software, Inc."
1,61192,Tommy's Yahtzee,Tommy's Toys,19860101T000000,Board / Party Game,Tommy's Yahtzee is a shareware implementation ...,Freeware
2,61193,Fantasy Empires,Silicon Knights,19931001T000000,Strategy,Build and control an Empire! In Fantasy Empire...,"Strategic Simulations, Inc."
3,61194,Pune,Hardware Not Included,19950101T000000,Action,Pune is a light cycle/snake-type game for mult...,Hardware Not Included
4,61195,Mortal Kombat,Probe Software Ltd.,19940525T000000,"Action,Fighting","Five Hundred years ago, an ancient and well re...","Acclaim Entertainment, Inc."


In [10]:
dat_dos.isnull().sum()

ID              0
Name            0
Developer       3
Y               0
Genre           1
Description    12
Publisher      31
dtype: int64

DOS analysis: A very nice set and i love DOS games. Lets use it.

In [11]:
# rename platforms
dat_daphne['platform'] = 'daphne'
dat_dos['platform'] = 'dos'
dat_mame['platform'] ='mame'
dat_scumm_vm['platform'] ='scummvm'

## GameTDB

In [12]:
gamestdb = pd.read_csv('arrm/games_on_gametdb.csv')

In [13]:
gamestdb.head()

Unnamed: 0,N°,gametdb_id,gametdb_type,gametdb_region,gametdb_languages,gametdb_title_de,gametdb_title_en,gametdb_title_es,gametdb_title_fr,gametdb_title_it,...,gametdb_players,gametdb_genre,gametdb_rom,gametdb_rom_sans_ext,gametdb_platform,gametdb_title_cn,gametdb_title_tw,gametdb_synopsis_cn,gametdb_synopsis_tw,gametdb_rom_cleaned
0,880253,A22J,3DS,NTSC-J,JP,Bokujou Monogatari Futago no Mura+,Bokujou Monogatari Futago no Mura+,,Bokujou Monogatari Futago no Mura+,Bokujou Monogatari Futago no Mura+,...,1.0,,Bokujou Monogatari Futago no Mura+ (Japan) (JA...,Bokujou Monogatari Futago no Mura+ (Japan) (JA),3DS,Bokujou Monogatari Futago no Mura+,Bokujou Monogatari Futago no Mura+,,,Bokujou Monogatari Futago no Mura+
1,880254,A2AE,3DS,NTSC-U,EN,Pokémon Ultra Sun,Pokémon Ultra Sun,,Pokémon Ultra Sun,Pokémon Ultra Sun,...,1.0,"adventure,role-playing,action rpg",Pokémon Ultra Sun (USA) (EN).3ds,Pokémon Ultra Sun (USA) (EN),3DS,Pokémon Ultra Sun,Pokémon Ultra Sun,,,Pokémon Ultra Sun
2,880255,A2AJ,3DS,NTSC-J,JP,Pokémon Ultra Sun,Pokémon Ultra Sun,,Pokémon Ultra Sun,Pokémon Ultra Sun,...,1.0,"adventure,role-playing,action rpg",Pokémon Ultra Sun (Japan) (JA).3ds,Pokémon Ultra Sun (Japan) (JA),3DS,Pokémon Ultra Sun,Pokémon Ultra Sun,,,Pokémon Ultra Sun
3,880256,A2AK,3DS,NTSC-K,KR,Pokémon Ultra Sun,Pokémon Ultra Sun,,Pokémon Ultra Sun,Pokémon Ultra Sun,...,1.0,"role-playing,action rpg",Pokémon Ultra Sun (Korea) (KO).3ds,Pokémon Ultra Sun (Korea) (KO),3DS,Pokémon Ultra Sun,Pokémon Ultra Sun,,,Pokémon Ultra Sun
4,880257,A2AP,3DS,PAL,"EN,FR,DE,ES,IT",Pokémon Ultra Sun,Pokémon Ultra Sun,Pokémon UltraSol,Pokémon Ultra Sun,Pokémon Ultra Sun,...,1.0,"action,adventure,role-playing,fantasy,action rpg","Pokémon Ultra Sun (Europe) (EN,FR,DE,ES,IT).3ds","Pokémon Ultra Sun (Europe) (EN,FR,DE,ES,IT)",3DS,Pokémon Ultra Sun,Pokémon Ultra Sun,Return to Alola for an alternate adventure in ...,Return to Alola for an alternate adventure in ...,Pokémon Ultra Sun


In [14]:
gamestdb.columns

Index(['N°', 'gametdb_id', 'gametdb_type', 'gametdb_region',
       'gametdb_languages', 'gametdb_title_de', 'gametdb_title_en',
       'gametdb_title_es', 'gametdb_title_fr', 'gametdb_title_it',
       'gametdb_title_ja', 'gametdb_title_ko', 'gametdb_title_nl',
       'gametdb_title_pt', 'gametdb_title_ru', 'gametdb_synopsis_de',
       'gametdb_synopsis_en', 'gametdb_synopsis_es', 'gametdb_synopsis_fr',
       'gametdb_synopsis_it', 'gametdb_synopsis_ja', 'gametdb_synopsis_ko',
       'gametdb_synopsis_nl', 'gametdb_synopsis_pt', 'gametdb_synopsis_ru',
       'gametdb_developer', 'gametdb_publisher', 'gametdb_date',
       'gametdb_players', 'gametdb_genre', 'gametdb_rom',
       'gametdb_rom_sans_ext', 'gametdb_platform', 'gametdb_title_cn',
       'gametdb_title_tw', 'gametdb_synopsis_cn', 'gametdb_synopsis_tw',
       'gametdb_rom_cleaned'],
      dtype='object')

In [15]:
gamestdb.isnull().sum()

N°                          0
gametdb_id                  0
gametdb_type                0
gametdb_region           6395
gametdb_languages        2481
gametdb_title_de          951
gametdb_title_en          963
gametdb_title_es        41454
gametdb_title_fr          888
gametdb_title_it          952
gametdb_title_ja          251
gametdb_title_ko          834
gametdb_title_nl          958
gametdb_title_pt          963
gametdb_title_ru          963
gametdb_synopsis_de     33485
gametdb_synopsis_en     33655
gametdb_synopsis_es     41744
gametdb_synopsis_fr     33344
gametdb_synopsis_it     33586
gametdb_synopsis_ja     32046
gametdb_synopsis_ko     33453
gametdb_synopsis_nl     33576
gametdb_synopsis_pt     33651
gametdb_synopsis_ru     33652
gametdb_developer       29843
gametdb_publisher       17120
gametdb_date                0
gametdb_players          8118
gametdb_genre           22939
gametdb_rom               963
gametdb_rom_sans_ext        6
gametdb_platform            0
gametdb_ti

In [16]:
# show distinct gametdb_type values
gamestdb['gametdb_type'].unique()

array(['3DS', 'DS', 'PS3', 'Switch', 'Wii', 'WiiU'], dtype=object)

Gametdb analysis: nice dataset but only 6 platforms represented. We're still going to use it.

### Launchbox

In [17]:
launchbox = pd.read_csv('arrm/launchbox.csv')

In [18]:
launchbox.head()

Unnamed: 0,N°,DatabaseID,Name,ReleaseYear,Overview,MaxPlayers,ReleaseType,Cooperative,VideoURL,CommunityRating,...,AlternateName_China,AlternateName_Europe,AlternateName_France,AlternateName_Germany,AlternateName_Japan,AlternateName_Korea,AlternateName_NorthAmerica,AlternateName_Spain,AlternateName_UnitedStates,AlternateName_World
0,4931214,17687.0,20th Century Video Almanac,1993.0,"In The Best of Our Century, we've taken multim...",1.0,Released,0,https://www.youtube.com/watch?v=dfFT6jqjFq0,3.279851,...,,,,,,,,,,
1,4931215,219683.0,3D Atlas,1994.0,The World Isn't Flat. Why Should Your Atlas Be...,1.0,Released,0,https://www.youtube.com/watch?v=RZ-pph55Ui4,3.5,...,,,,,,,,,,
2,4931216,25182.0,3DO Action Pak,1995.0,This is a four-game compilation pack that cont...,1.0,Released,0,,3.706667,...,,,,,,,,,,
3,4931217,417702.0,3DO de Shiru Miru Asobu Nakajima Miyuki,,,1.0,Unreleased,0,,3.666667,...,,,,,,,,,,
4,4931218,130908.0,3DO Demo Disc Program,,A white binder with blue silk-screened art. Th...,,,0,,3.372727,...,,,,,,,,,,


Launchbox analysis: huge list, some missing values. Prime candidate for the main dataset that other datasets will be added to.

## Mobygames

In [19]:
mobygames = pd.read_csv('arrm/mobygames.csv')

In [20]:
mobygames.head()

Unnamed: 0,numauto,platform_id,platform,game_id,name,release_date,mobygames_url
0,2781406,35,3do,96,Quarantine,,t1H5j95PeMaZo+Cw1R9vKw==hY9Nu9IGxGC1KpiwIcK0pX...
1,2781407,35,3do,111,Virtuoso,,pskb3XM01ASHZRWZmUS00g==9/zl7gK72ixL8RSmDQYsDq...
2,2781408,35,3do,165,Princess Maker 2,,IaammuI4ElexSjqcTb6zAg==LU77bhsLWm+TMdKjgrt5Wv...
3,2781409,35,3do,179,Star Control II,,5NOpZrNp7xOTX79DuN4weg==uPkHQmlKhUSynxAzrMmt4b...
4,2781410,35,3do,210,Cannon Fodder,,ICShm7ilii6jmpRScaapPw==QXCpV2CiXimgLBnmVrOM6c...


Mobygames analysis: Only platform names and game titles here. But it is a huge list so a likely prime candidate.

## Recalbox

In [21]:
recalbox = pd.read_csv('arrm/recalbox_gamelist.csv')

In [22]:
recalbox.head()

Unnamed: 0,numauto_rom,nomjeu_rom,fichier_rom,description_rom,image_rom,rating_rom,annee_rom,developer_rom,publisher_rom,genre_rom,...,arcadesystemname_rom,gametime_rom,boxback_rom,temporary_rom,bezel_rom,ratio_rom,rotation_rom,extra1_rom,famille_rom,mode_rom
0,17605,10000000-in-1 [p1][!],./10000000-in-1 [p1][!].zip,,,,,,,,...,,0,,,,,,,,
1,17606,10-Yard Fight,./10-Yard Fight (U).zip,10-Yard Fight is a simple arcade game based on...,./miximages/10-Yard Fight (U).png,0.4,19851206T000000,Irem,Nintendo,Sports,...,,0,,,,,,,,
2,17607,10-Yard Fight,./10-Yard Fight.zip,10-Yard Fight is a simple arcade game based on...,./miximages/10-Yard Fight.png,0.4,19851206T000000,Irem,Nintendo,Sports,...,,0,,,,,,,,
3,17608,118-in-1 [p1][!],./118-in-1 [p1][!].zip,,./miximages/118-in-1 [p1][!].png,,,,,,...,,0,,,,,,,,
4,17609,11-in-1 Ball Games [p1][!],./11-in-1 Ball Games [p1][!].zip,,,,,,,,...,,0,,,,,,,,


Recalbox: Looks like its (mostly) NES/Famicom titles. It has descriptions. Looks like the xml version has less missing values. Let's try joining these two before merging them to main.

## all_games.csv
(don't know where its from)

In [23]:
all_games = pd.read_csv('arrm/all_games.csv')

In [24]:
all_games.head()

Unnamed: 0,name,platform,release_date,summary,meta_score,user_review
0,The Legend of Zelda: Ocarina of Time,Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ...",99,9.1
1,Tony Hawk's Pro Skater 2,PlayStation,"September 20, 2000",As most major publishers' development efforts ...,98,7.4
2,Grand Theft Auto IV,PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...,98,7.7
3,SoulCalibur,Dreamcast,"September 8, 1999","This is a tale of souls and swords, transcendi...",98,8.4
4,Grand Theft Auto IV,Xbox 360,"April 29, 2008",[Metacritic's 2008 Xbox 360 Game of the Year; ...,98,7.9


In [25]:
all_games.isnull().sum()

name              0
platform          0
release_date      0
summary         114
meta_score        0
user_review       0
dtype: int64

all_games: really nice list. I think it is metacritic top games. barely any missing values. definitely useful.