# üìä Proyecto Guiado - An√°lisis de Ventas de Videojuegos
## Parte 1: Introducci√≥n y Carga de Datos

---

### üéØ Objetivos de este notebook:
1. Entender qu√© es un EDA (Exploratory Data Analysis)
2. Aprender a cargar datos desde archivos CSV
3. Realizar una primera inspecci√≥n de los datos
4. Identificar las caracter√≠sticas del dataset

---

### üìö ¬øQu√© es un EDA?

El **An√°lisis Exploratorio de Datos (EDA)** es el proceso de:
- Examinar los datos antes de construir modelos
- Descubrir patrones, anomal√≠as y relaciones
- Formular hip√≥tesis sobre los datos
- Verificar suposiciones mediante visualizaciones y estad√≠sticas

**¬øPor qu√© es importante?**
- Nos ayuda a entender nuestros datos
- Detecta problemas de calidad (datos faltantes, duplicados, errores)
- Gu√≠a las decisiones sobre limpieza y preparaci√≥n
- Revela insights valiosos para el negocio

---
## 1. Importar las librer√≠as necesarias

Comenzamos importando las librer√≠as que usaremos:
- **pandas**: Para manipulaci√≥n y an√°lisis de datos
- **numpy**: Para operaciones num√©ricas

In [1]:
import pandas as pd
import numpy as np

---
## 2. Cargar el dataset

Vamos a cargar el archivo CSV con datos de ventas de videojuegos.

**Informaci√≥n sobre el dataset:**
- **Fuente**: Kaggle - Video Game Sales Dataset
- **Per√≠odo**: Ventas de videojuegos hasta 2016
- **Contenido**: Ranking, nombre, plataforma, a√±o, g√©nero, publisher y ventas por regi√≥n

In [2]:
# Cargar el dataset desde el archivo CSV
df = pd.read_csv(r'C:\Users\Propietario\OneDrive\Escritorio\Data_analysis\Modulo_1\7.Proyectos guiados\vgsales.csv')

---
## 3. Primera inspecci√≥n: Ver las primeras filas

El m√©todo `.head()` nos muestra las primeras 5 filas por defecto.
Esto nos da una idea r√°pida de c√≥mo se ven nuestros datos.

In [4]:
# Visualizar las primeras 10 filas del dataset
df.head(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
7,8,Wii Play,Wii,2006.0,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
9,10,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31


### üîç ¬øQu√© observamos?


---
## 4. Ver las √∫ltimas filas

El m√©todo `.tail()` nos muestra las √∫ltimas filas.
√ötil para ver juegos con menor ranking.

In [5]:
# Visualizar las √∫ltimas 10 filas del dataset
df.tail(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
16588,16591,Mega Brain Boost,DS,2008.0,Puzzle,Majesco Entertainment,0.01,0.0,0.0,0.0,0.01
16589,16592,Chou Ezaru wa Akai Hana: Koi wa Tsuki ni Shiru...,PSV,2016.0,Action,dramatic create,0.0,0.0,0.01,0.0,0.01
16590,16593,Eiyuu Densetsu: Sora no Kiseki Material Collec...,PSP,2007.0,Role-Playing,Falcom Corporation,0.0,0.0,0.01,0.0,0.01
16591,16594,Myst IV: Revelation,PC,2004.0,Adventure,Ubisoft,0.01,0.0,0.0,0.0,0.01
16592,16595,Plushees,DS,2008.0,Simulation,Destineer,0.01,0.0,0.0,0.0,0.01
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.0,0.0,0.0,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.0,0.0,0.0,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.0,0.01,0.0,0.0,0.01
16597,16600,Spirits & Spells,GBA,2003.0,Platform,Wanadoo,0.01,0.0,0.0,0.0,0.01


---
## 5. Informaci√≥n general del dataset

El m√©todo `.info()` nos proporciona:
- N√∫mero total de entradas (filas)
- Tipos de datos de cada columna
- Cantidad de valores no nulos
- Uso de memoria

In [6]:
# Informaci√≥n general del dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


---
## 6. Dimensiones del dataset

Vamos a obtener informaci√≥n sobre el tama√±o de nuestro dataset.

In [16]:
# Obtener dimensiones
df.shape  


#Otra forma 
num_filas , num_columnas = df.shape

print(f'Dimensiones del dartaset: {num_filas} filas y {num_columnas} columnas')
print(f'N√∫mero de registros (filas): {num_filas}')
print(f'N√∫mero de variables (columnas): {num_columnas}')
print(f"Total de celdas: {num_filas * num_columnas}")

Dimensiones del dartaset: 16598 filas y 11 columnas
N√∫mero de registros (filas): 16598
N√∫mero de variables (columnas): 11
Total de celdas: 182578


---
## 8. Tipos de datos

Vamos a examinar los tipos de datos de cada columna de forma detallada.

In [11]:
# Mostrar tipos de datos
df.dtypes

Rank              int64
Name             object
Platform         object
Year            float64
Genre            object
Publisher        object
NA_Sales        float64
EU_Sales        float64
JP_Sales        float64
Other_Sales     float64
Global_Sales    float64
dtype: object

---
## 9. Vista aleatoria de los datos

El m√©todo `.sample()` nos permite ver filas aleatorias.
Esto es √∫til para obtener una muestra representativa.

In [12]:
# Mostrar 10 filas aleatorias
df.sample(10)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
5111,5113,Prince of Persia: The Forgotten Sands,PSP,2010.0,Action,Ubisoft,0.09,0.18,0.01,0.1,0.37
95,96,Crash Bandicoot 2: Cortex Strikes Back,PS,1997.0,Platform,Sony Computer Entertainment,3.78,2.17,1.31,0.31,7.58
2354,2356,Mega Man 8 Anniversary Collector's Edition,PS,1996.0,Platform,Capcom,0.44,0.3,0.09,0.06,0.88
15371,15374,Jewel Link: Galactic Quest,DS,2012.0,Action,Avanquest Software,0.0,0.02,0.0,0.0,0.02
16386,16389,TrackMania Turbo,PC,2016.0,Action,Ubisoft,0.0,0.01,0.0,0.0,0.01
4693,4695,Tom Clancy's Rainbow Six: Lockdown,XB,2005.0,Shooter,Ubisoft,0.26,0.13,0.0,0.02,0.41
6660,6662,Just Dance Kids 2,PS3,2011.0,Misc,Ubisoft,0.1,0.1,0.0,0.04,0.25
6851,6853,Dragon Ball Z: Sagas,XB,2005.0,Fighting,Atari,0.18,0.05,0.0,0.01,0.24
15425,15428,Malice,PS2,2004.0,Platform,Evolved Games,0.01,0.01,0.0,0.0,0.02
15,16,Kinect Adventures!,X360,2010.0,Misc,Microsoft Game Studios,14.97,4.94,0.24,1.67,21.82


---
## 10. Nombres de las columnas

Verificar los nombres exactos de las columnas es importante para evitar errores.

In [None]:
# Listar todas las columnas
print ( )


df.columns
# Tambi√©n podemos verlas como lista


Index(['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales',
       'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales'],
      dtype='object')