# 1. Exploración Inicial de Datos del Proyecto SQL

En este cuaderno, cargaremos y analizaremos la estructura inicial de los datasets `games.csv` y `vgchartz-2024.csv`.  
Objetivo: Entender las columnas, tipos de datos y detectar problemas de calidad (nulos, formatos extraños).

In [7]:
import pandas as pd
import os

# Configuración para mostrar todas las columnas
pd.set_option('display.max_columns', None)

### 1. Cargar Datos
Los archivos se encuentran en `../data/raw/`.

In [8]:
# Rutas relativas
path_games = '../data/raw/games.csv'
path_vgchartz = '../data/raw/vgchartz-2024.csv'

# Carga de datos
try:
    # Games.csv puede tener saltos de linea en descripciones, el motor por defecto de pandas (C engine) suele manejarlo bien si están entre comillas.
    df_games = pd.read_csv(path_games)
    df_sales = pd.read_csv(path_vgchartz)
    
    print("✅ Datos cargados correctamente.")
except Exception as e:
    print(f"❌ Error al cargar los datos: {e}")

✅ Datos cargados correctamente.


### 2. Inspección Dataset: Games (Metadatos)

In [13]:
print(f"Dimensiones df_games: {df_games.shape}")
df_games.head(3)

Dimensiones df_games: (13442, 16)


Unnamed: 0,id,title,releaseDate,rating,genres,description,platforms,metascore,metascore_count,metascore_sentiment,userscore,userscore_count,userscore_sentiment,platform_metascores,developer,publisher
0,1300001290,The Legend of Zelda: Ocarina of Time,1998-11-23,E,Open-World Action,"As a young boy, Link is tricked by Ganondorf, ...",Nintendo 64,99.0,22.0,Universal acclaim,91.0,10611,Universal acclaim,99,Nintendo,"Nintendo,Gradiente"
1,1300001928,SoulCalibur,1999-09-08,T,3D Fighting,"[Xbox Live Arcade] Soulcalibur, the highest M...","Dreamcast,iOS (iPhone/iPad),Xbox 360",98.0,24.0,Universal acclaim,78.0,605,Generally favorable,987379,Namco,Namco
2,1300027043,Grand Theft Auto IV,2008-04-29,M,Open-World Action,[Metacritic's 2008 PS3 Game of the Year; Also ...,"PlayStation 3,Xbox 360,PC",98.0,86.0,Universal acclaim,83.0,4781,Generally favorable,989890,Rockstar North,"Rockstar Games,Capcom"


In [10]:
df_games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13442 entries, 0 to 13441
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   id                   13442 non-null  int64  
 1   title                13442 non-null  object 
 2   releaseDate          13410 non-null  object 
 3   rating               11328 non-null  object 
 4   genres               13442 non-null  object 
 5   description          13394 non-null  object 
 6   platforms            13436 non-null  object 
 7   metascore            13436 non-null  float64
 8   metascore_count      13436 non-null  float64
 9   metascore_sentiment  13436 non-null  object 
 10  userscore            13442 non-null  float64
 11  userscore_count      13442 non-null  int64  
 12  userscore_sentiment  11934 non-null  object 
 13  platform_metascores  13436 non-null  object 
 14  developer            13433 non-null  object 
 15  publisher            13440 non-null 

### 3. Inspección Dataset: Sales (Ventas)

In [11]:
print(f"Dimensiones df_sales: {df_sales.shape}")
df_sales.head(3)

Dimensiones df_sales: (64016, 14)


Unnamed: 0,img,title,console,genre,publisher,developer,critic_score,total_sales,na_sales,jp_sales,pal_sales,other_sales,release_date,last_update
0,/games/boxart/full_6510540AmericaFrontccc.jpg,Grand Theft Auto V,PS3,Action,Rockstar Games,Rockstar North,9.4,20.32,6.37,0.99,9.85,3.12,2013-09-17,
1,/games/boxart/full_5563178AmericaFrontccc.jpg,Grand Theft Auto V,PS4,Action,Rockstar Games,Rockstar North,9.7,19.39,6.06,0.6,9.71,3.02,2014-11-18,2018-01-03
2,/games/boxart/827563ccc.jpg,Grand Theft Auto: Vice City,PS2,Action,Rockstar Games,Rockstar North,9.6,16.15,8.41,0.47,5.49,1.78,2002-10-28,


In [12]:
df_sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64016 entries, 0 to 64015
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   img           64016 non-null  object 
 1   title         64016 non-null  object 
 2   console       64016 non-null  object 
 3   genre         64016 non-null  object 
 4   publisher     64016 non-null  object 
 5   developer     63999 non-null  object 
 6   critic_score  6678 non-null   float64
 7   total_sales   18922 non-null  float64
 8   na_sales      12637 non-null  float64
 9   jp_sales      6726 non-null   float64
 10  pal_sales     12824 non-null  float64
 11  other_sales   15128 non-null  float64
 12  release_date  56965 non-null  object 
 13  last_update   17879 non-null  object 
dtypes: float64(6), object(8)
memory usage: 6.8+ MB
