# Limpieza y Procesamiento

**Objetivo:** Limpiar el dataset extraído y preparar columnas clave para el análisis.

# Preparación de datos para el análisis:  
## ¿Qué géneros de videojuegos han sido más populares cada año?

Una vez extraidos los datos de la API y almacenarlos en un archivo **JSON** queremos limpiar los datos para trabajar con ellos según nuestras necesidades. Como en este caso lo que queremos es saber que generos de videojuegos han sido mas populares, segun el año, y además, hemos podido comprobar que la columna **'genres'**, que es la que vamos a trabajar principalmente, no tiene nulos. Haremos lo siguiente:

- Nos vamos a quedar con las columnas **'name'**, **'released'** y **'genres'**

In [1]:
#Importación de las librerias necesarias para el proyecto.

import pandas as pd


In [2]:
#Accedemos a los datos extraidos de la API en el archivo JSON.
df = pd.read_json('../data/raw/juegos_rawg.json')

#df.info() para comprobar que se han extraido los datos correctamente.
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2000 entries, 0 to 1999
Data columns (total 41 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   slug                     2000 non-null   object 
 1   name                     2000 non-null   object 
 2   playtime                 2000 non-null   int64  
 3   platforms                2000 non-null   object 
 4   stores                   1989 non-null   object 
 5   released                 2000 non-null   object 
 6   tba                      2000 non-null   bool   
 7   background_image         1999 non-null   object 
 8   rating                   2000 non-null   float64
 9   rating_top               2000 non-null   int64  
 10  ratings                  2000 non-null   object 
 11  ratings_count            2000 non-null   int64  
 12  reviews_text_count       2000 non-null   int64  
 13  added                    2000 non-null   int64  
 14  metacritic               1056

In [3]:
#Sobrescribimos/transformamos el dataframe de manera que nos muestre las columnas 'name', 'released' y 'genres'. 
df = df[['name', 'released', 'genres', 'added', 'playtime']]
df

Unnamed: 0,name,released,genres,added,playtime
0,The Witcher 3: Wild Hunt,2015-05-18,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",21662,43
1,Life is Strange,2015-01-29,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",15758,6
2,Fallout 4,2015-11-09,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",14082,38
3,Rocket League,2015-07-07,"[{'id': 1, 'name': 'Racing', 'slug': 'racing'}...",12769,21
4,Rise of the Tomb Raider,2015-11-10,"[{'id': 4, 'name': 'Action', 'slug': 'action'}]",12212,14
...,...,...,...,...,...
1995,Dark Hours: Prologue,2024-07-18,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",67,2
1996,Age of Water: The First Voyage,2024-03-21,"[{'id': 59, 'name': 'Massively Multiplayer', '...",66,1
1997,Legacy of Kain™ Soul Reaver 1&2 Remastered,2024-12-10,"[{'id': 83, 'name': 'Platformer', 'slug': 'pla...",66,0
1998,SOUTH PARK: SNOW DAY!,2024-03-25,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",65,4


### Vamos a añadir la columna `año` al dataframe, ya que lo necesitaremos más adelante a la hora de filtrar.

In [5]:
#Creamos la columna 'año', que obtendrá el valor del año de la columna 'released' que tiene formato datetime.
df['año'] = pd.to_datetime(df['released'], errors='coerce').dt.year
df = df.dropna(subset=['año'])
df['año'] = df['año'].astype(int)

In [6]:
df

Unnamed: 0,name,released,genres,added,playtime,año
0,The Witcher 3: Wild Hunt,2015-05-18,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",21662,43,2015
1,Life is Strange,2015-01-29,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",15758,6,2015
2,Fallout 4,2015-11-09,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",14082,38,2015
3,Rocket League,2015-07-07,"[{'id': 1, 'name': 'Racing', 'slug': 'racing'}...",12769,21,2015
4,Rise of the Tomb Raider,2015-11-10,"[{'id': 4, 'name': 'Action', 'slug': 'action'}]",12212,14,2015
...,...,...,...,...,...,...
1995,Dark Hours: Prologue,2024-07-18,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",67,2,2024
1996,Age of Water: The First Voyage,2024-03-21,"[{'id': 59, 'name': 'Massively Multiplayer', '...",66,1,2024
1997,Legacy of Kain™ Soul Reaver 1&2 Remastered,2024-12-10,"[{'id': 83, 'name': 'Platformer', 'slug': 'pla...",66,0,2024
1998,SOUTH PARK: SNOW DAY!,2024-03-25,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",65,4,2024


In [7]:
#Creamos la columna 'genero' que extrae el genero que vamos a usar como principal para cada juego
df['genero'] = df['genres'].apply(lambda x: x[0]['name'] if x else None)


In [8]:
df

Unnamed: 0,name,released,genres,added,playtime,año,genero
0,The Witcher 3: Wild Hunt,2015-05-18,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",21662,43,2015,Action
1,Life is Strange,2015-01-29,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",15758,6,2015,Adventure
2,Fallout 4,2015-11-09,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",14082,38,2015,Action
3,Rocket League,2015-07-07,"[{'id': 1, 'name': 'Racing', 'slug': 'racing'}...",12769,21,2015,Racing
4,Rise of the Tomb Raider,2015-11-10,"[{'id': 4, 'name': 'Action', 'slug': 'action'}]",12212,14,2015,Action
...,...,...,...,...,...,...,...
1995,Dark Hours: Prologue,2024-07-18,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",67,2,2024,Adventure
1996,Age of Water: The First Voyage,2024-03-21,"[{'id': 59, 'name': 'Massively Multiplayer', '...",66,1,2024,Massively Multiplayer
1997,Legacy of Kain™ Soul Reaver 1&2 Remastered,2024-12-10,"[{'id': 83, 'name': 'Platformer', 'slug': 'pla...",66,0,2024,Platformer
1998,SOUTH PARK: SNOW DAY!,2024-03-25,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",65,4,2024,Adventure


### Vamos a almacenar los 10 años en **`2 rangos de 5 años`** cada uno, para trabajar con ellos.

In [9]:
df['rango'] = pd.cut(
    df['año'],
    bins=[2014, 2019, 2024],
    labels=['2015-2019', '2020-2024']
)

In [10]:
df

Unnamed: 0,name,released,genres,added,playtime,año,genero,rango
0,The Witcher 3: Wild Hunt,2015-05-18,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",21662,43,2015,Action,2015-2019
1,Life is Strange,2015-01-29,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",15758,6,2015,Adventure,2015-2019
2,Fallout 4,2015-11-09,"[{'id': 4, 'name': 'Action', 'slug': 'action'}...",14082,38,2015,Action,2015-2019
3,Rocket League,2015-07-07,"[{'id': 1, 'name': 'Racing', 'slug': 'racing'}...",12769,21,2015,Racing,2015-2019
4,Rise of the Tomb Raider,2015-11-10,"[{'id': 4, 'name': 'Action', 'slug': 'action'}]",12212,14,2015,Action,2015-2019
...,...,...,...,...,...,...,...,...
1995,Dark Hours: Prologue,2024-07-18,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",67,2,2024,Adventure,2020-2024
1996,Age of Water: The First Voyage,2024-03-21,"[{'id': 59, 'name': 'Massively Multiplayer', '...",66,1,2024,Massively Multiplayer,2020-2024
1997,Legacy of Kain™ Soul Reaver 1&2 Remastered,2024-12-10,"[{'id': 83, 'name': 'Platformer', 'slug': 'pla...",66,0,2024,Platformer,2020-2024
1998,SOUTH PARK: SNOW DAY!,2024-03-25,"[{'id': 3, 'name': 'Adventure', 'slug': 'adven...",65,4,2024,Adventure,2020-2024


In [30]:
#Guardamos los datos ya procesados en un archivo CSV (Como ya se hizo, solamente se comenta el código)
df.to_json("../data/processed/popularidad_generos.json", index=False)

---

# Preparación de datos para el análisis:  
## ¿Qué relación existe entre la popularidad y la puntuación de un videojuego?

Para analizar la relación entre la popularidad y la puntuación de los videojuegos, partimos de un subconjunto del DataFrame original que incluye las siguientes columnas relevantes:

- `name`: nombre del videojuego
- `rating`: puntuación media dada por los usuarios
- `ratings_count`: número de valoraciones recibidas
- `added`: número de veces que el juego ha sido añadido por usuarios a listas o favoritos
- `released`: fecha de lanzamiento
- `metacritic`: puntuación media otorgada por la crítica especializada

In [None]:
# Cargamos el DataFrame original en la variable df
df= pd.read_json("../data/raw/juegos_rawg.json")

In [32]:
def filtro_popularidad_puntuacion(df):
    
    #  Seleccionamos las columnas relevantes para el análisis
    df_filtrado = df[['name', 'rating', 'ratings_count', 'added', 'released', 'metacritic']].copy()
    return df_filtrado

# Tratamiento de valores faltantes (`NaN`)

Para evitar eliminar videojuegos del análisis, hemos optado por **conservar todos los registros**. En lugar de eliminar filas, se ha aplicado la siguiente estrategia:

- **rating_clean**: si no hay puntuación de usuarios (`NaN`), se reemplaza por `-1`, indicando ausencia de datos.
- **added_clean**: si no hay dato de popularidad, se reemplaza también por `-1`.
- Se han añadido tres columnas booleanas auxiliares:
- `has_rating`: indica si el juego tiene puntuación
- `has_added`: indica si hay datos de popularidad
- `has_metacritic`: indica si tiene puntuación metacritic

**En llamadas posteriores a la API las columnas `rating` y `added` no presentan `NaN`**
No obstante dejamos la función por si en futuras llamada a la API si existen.

In [33]:
def nan_juegos(df):
    
    # Nuevas columnas para saber si hay datos reales
    df['has_rating'] = df['rating'].notna()
    df['has_added'] = df['added'].notna()
    df['has_metacritic'] = df['metacritic'].notna()

    # Rellenar NaN con valores de control
    df['rating_clean'] = df['rating'].fillna(-1)
    df['added_clean'] = df['added'].fillna(-1)

    return df


In [34]:
# Cargamos el DataFrame original en la variable df
df= pd.read_json("../data/raw/juegos_rawg.json")

In [35]:
df.head()

Unnamed: 0,slug,name,playtime,platforms,stores,released,tba,background_image,rating,rating_top,...,added_by_status.toplay,added_by_status.dropped,added_by_status.playing,esrb_rating.id,esrb_rating.name,esrb_rating.slug,esrb_rating.name_en,esrb_rating.name_ru,esrb_rating,community_rating
0,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,43,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-05-18,False,https://media.rawg.io/media/games/618/618c2031...,4.65,5,...,832.0,991.0,892.0,4.0,Mature,mature,Mature,С 17 лет,,
1,life-is-strange-episode-1-2,Life is Strange,6,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-01-29,False,https://media.rawg.io/media/games/562/56255381...,4.12,5,...,367.0,659.0,152.0,4.0,Mature,mature,Mature,С 17 лет,,
2,fallout-4,Fallout 4,38,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-09,False,https://media.rawg.io/media/games/d82/d82990b9...,3.81,4,...,439.0,1362.0,322.0,4.0,Mature,mature,Mature,С 17 лет,,
3,rocket-league,Rocket League,21,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-07-07,False,https://media.rawg.io/media/games/8cc/8cce7c0e...,3.93,4,...,114.0,1676.0,542.0,1.0,Everyone,everyone,Everyone,Для всех,,
4,rise-of-the-tomb-raider,Rise of the Tomb Raider,14,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-10,False,https://media.rawg.io/media/games/b45/b45575f3...,4.04,4,...,400.0,431.0,158.0,,,,,,,


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2000 entries, 0 to 1999
Data columns (total 41 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   slug                     2000 non-null   object 
 1   name                     2000 non-null   object 
 2   playtime                 2000 non-null   int64  
 3   platforms                2000 non-null   object 
 4   stores                   1989 non-null   object 
 5   released                 2000 non-null   object 
 6   tba                      2000 non-null   bool   
 7   background_image         1999 non-null   object 
 8   rating                   2000 non-null   float64
 9   rating_top               2000 non-null   int64  
 10  ratings                  2000 non-null   object 
 11  ratings_count            2000 non-null   int64  
 12  reviews_text_count       2000 non-null   int64  
 13  added                    2000 non-null   int64  
 14  metacritic               1056

In [37]:
df_pop_score = filtro_popularidad_puntuacion(df) # Filtramos el DataFrame con las columnas relevantes
df_pop_score = nan_juegos(df_pop_score) # Añadimos las columnas de control para NaN

In [38]:
df_pop_score.head()

Unnamed: 0,name,rating,ratings_count,added,released,metacritic,has_rating,has_added,has_metacritic,rating_clean,added_clean
0,The Witcher 3: Wild Hunt,4.65,6896,21662,2015-05-18,92.0,True,True,True,4.65,21662
1,Life is Strange,4.12,3706,15758,2015-01-29,83.0,True,True,True,4.12,15758
2,Fallout 4,3.81,3352,14082,2015-11-09,84.0,True,True,True,3.81,14082
3,Rocket League,3.93,2822,12769,2015-07-07,86.0,True,True,True,3.93,12769
4,Rise of the Tomb Raider,4.04,2741,12212,2015-11-10,86.0,True,True,True,4.04,12212


# Resumen de disponibilidad de datos

Antes de proceder con el análisis, es importante conocer cuántos juegos tienen realmente datos disponibles en las variables clave:

- **`rating`**: puntuación media de los usuarios.
- **`added`**: número de veces que un juego ha sido añadido por los usuarios.
- **`metacritic`**: puntuación media otorgada por la crítica.

El siguiente resumen muestra cuántos juegos tienen datos presentes y cuántos no, lo que permite entender el alcance real de los análisis posteriores y evitar sesgos derivados de valores faltantes.


In [39]:
# Juegos afectados por NaN

print(f"Total de juegos analizados: {len(df_pop_score)}\n")

print(f"Rating de usuarios:")
print(f"- Con datos: {df_pop_score['has_rating'].sum()}")
print(f"- Sin datos: {len(df_pop_score) - df_pop_score['has_rating'].sum()}\n")

print(f"Popularidad (added):")
print(f"- Con datos: {df_pop_score['has_added'].sum()}")
print(f"- Sin datos: {len(df_pop_score) - df_pop_score['has_added'].sum()}\n")

print(f"Metacritic:")
print(f"- Con datos: {df_pop_score['has_metacritic'].sum()}")
print(f"- Sin datos: {len(df_pop_score) - df_pop_score['has_metacritic'].sum()}")


Total de juegos analizados: 2000

Rating de usuarios:
- Con datos: 2000
- Sin datos: 0

Popularidad (added):
- Con datos: 2000
- Sin datos: 0

Metacritic:
- Con datos: 1056
- Sin datos: 944


In [None]:
# Guardamos este DataFrame limpio en un archivo JSON para análisis posterior
df_pop_score.to_json("../data/processed/popularidad_vs_puntuacion.json", index=False)

---

# Preparación de datos para el análisis:  
## ¿Qué plataformas tienen mejores valoraciones promedio de juegos?

Para analizar la relación entre las plataformas y la cantidad de juegos que pueden tener o valorando la puntación de los juegos en las plataformas. Nos interesa las columnas:
- `Platforms`: Nombre de las plataformas que está disponible los juegos.
- `Rating`: Valoración de cada juego.
- `Released`: Fecha de salida de cada juego.

La columna `relasead` nos interesa solamente el año en el que sale cada juego, por lo que crearemos una nueva columna en base a esta llamada `año`, el cual nos quedará un datetime con únicamente el año.


In [11]:
# Cargamos el DataFrame original en la variable df
df= pd.read_json("../data/raw/juegos_rawg.json")

In [12]:
df['año'] = pd.to_datetime(df['released'], errors='coerce').dt.year
df = df.dropna(subset=['año'])
df['año'] = df['año'].astype(int)
df

Unnamed: 0,slug,name,playtime,platforms,stores,released,tba,background_image,rating,rating_top,...,added_by_status.dropped,added_by_status.playing,esrb_rating.id,esrb_rating.name,esrb_rating.slug,esrb_rating.name_en,esrb_rating.name_ru,esrb_rating,community_rating,año
0,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,43,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-05-18,False,https://media.rawg.io/media/games/618/618c2031...,4.65,5,...,991.0,892.0,4.0,Mature,mature,Mature,С 17 лет,,,2015
1,life-is-strange-episode-1-2,Life is Strange,6,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-01-29,False,https://media.rawg.io/media/games/562/56255381...,4.12,5,...,659.0,152.0,4.0,Mature,mature,Mature,С 17 лет,,,2015
2,fallout-4,Fallout 4,38,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-09,False,https://media.rawg.io/media/games/d82/d82990b9...,3.81,4,...,1362.0,322.0,4.0,Mature,mature,Mature,С 17 лет,,,2015
3,rocket-league,Rocket League,21,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-07-07,False,https://media.rawg.io/media/games/8cc/8cce7c0e...,3.93,4,...,1676.0,542.0,1.0,Everyone,everyone,Everyone,Для всех,,,2015
4,rise-of-the-tomb-raider,Rise of the Tomb Raider,14,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-10,False,https://media.rawg.io/media/games/b45/b45575f3...,4.04,4,...,431.0,158.0,,,,,,,,2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,dark-hours-prologue,Dark Hours: Prologue,2,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-07-18,False,https://media.rawg.io/media/screenshots/c16/c1...,0.00,0,...,,,,,,,,,0.0,2024
1996,age-of-water-the-first-voyage,Age of Water: The First Voyage,1,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-21,False,https://media.rawg.io/media/screenshots/721/72...,0.00,0,...,,,,,,,,,0.0,2024
1997,legacy-of-kaintm-soul-reaver-12-remastered,Legacy of Kain™ Soul Reaver 1&2 Remastered,0,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-12-10,False,https://media.rawg.io/media/games/e03/e03a08a3...,0.00,5,...,1.0,,4.0,Mature,mature,Mature,С 17 лет,,,2024
1998,south-park-snow-day,SOUTH PARK: SNOW DAY!,4,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-25,False,https://media.rawg.io/media/screenshots/f90/f9...,2.36,1,...,7.0,,,,,,,,,2024


La columna platforms tiene una lista de diccionarios la cual dificulta la extracción de esta.
Para esto creamos una función que depure los datos y extraiga los nombres de las plataformas dentro de ella. Luego los une en una cadena de texto.

In [13]:
def extraer_nombres_plataformas(plataformas):
    return ', '.join([item['platform']['name'] for item in plataformas])

df['platforms'] = df['platforms'].apply(extraer_nombres_plataformas)
df

Unnamed: 0,slug,name,playtime,platforms,stores,released,tba,background_image,rating,rating_top,...,added_by_status.dropped,added_by_status.playing,esrb_rating.id,esrb_rating.name,esrb_rating.slug,esrb_rating.name_en,esrb_rating.name_ru,esrb_rating,community_rating,año
0,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,43,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-05-18,False,https://media.rawg.io/media/games/618/618c2031...,4.65,5,...,991.0,892.0,4.0,Mature,mature,Mature,С 17 лет,,,2015
1,life-is-strange-episode-1-2,Life is Strange,6,"PC, PlayStation 4, Xbox One, iOS, Android, mac...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-01-29,False,https://media.rawg.io/media/games/562/56255381...,4.12,5,...,659.0,152.0,4.0,Mature,mature,Mature,С 17 лет,,,2015
2,fallout-4,Fallout 4,38,"PC, PlayStation 5, Xbox One, PlayStation 4","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-09,False,https://media.rawg.io/media/games/d82/d82990b9...,3.81,4,...,1362.0,322.0,4.0,Mature,mature,Mature,С 17 лет,,,2015
3,rocket-league,Rocket League,21,"PC, Xbox One, PlayStation 4, Nintendo Switch, ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-07-07,False,https://media.rawg.io/media/games/8cc/8cce7c0e...,3.93,4,...,1676.0,542.0,1.0,Everyone,everyone,Everyone,Для всех,,,2015
4,rise-of-the-tomb-raider,Rise of the Tomb Raider,14,"PC, Xbox One, PlayStation 4, macOS","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-10,False,https://media.rawg.io/media/games/b45/b45575f3...,4.04,4,...,431.0,158.0,,,,,,,,2015
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,dark-hours-prologue,Dark Hours: Prologue,2,PC,"[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-07-18,False,https://media.rawg.io/media/screenshots/c16/c1...,0.00,0,...,,,,,,,,,0.0,2024
1996,age-of-water-the-first-voyage,Age of Water: The First Voyage,1,PC,"[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-21,False,https://media.rawg.io/media/screenshots/721/72...,0.00,0,...,,,,,,,,,0.0,2024
1997,legacy-of-kaintm-soul-reaver-12-remastered,Legacy of Kain™ Soul Reaver 1&2 Remastered,0,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-12-10,False,https://media.rawg.io/media/games/e03/e03a08a3...,0.00,5,...,1.0,,4.0,Mature,mature,Mature,С 17 лет,,,2024
1998,south-park-snow-day,SOUTH PARK: SNOW DAY!,4,PC,"[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-25,False,https://media.rawg.io/media/screenshots/f90/f9...,2.36,1,...,7.0,,,,,,,,,2024


### Creamos un **`array`** con todas las plataformas y lo añadimos a un **`set`** para que no tenga valores repetidos. Esto proporciona una lista de todas las plataformas y cambiamos el tipo de dato a una **`lista`** para facilitar el uso de esta.

In [61]:
df3= df["platforms"].unique()
lista_unica = ', '.join(df3).split(', ')
lista_unica=set(lista_unica)
lista_unica=list(lista_unica)
print(lista_unica)

['Xbox One', 'Xbox 360', 'Classic Macintosh', 'Nintendo Switch', 'Xbox Series S/X', 'PlayStation 3', 'Linux', 'PSP', 'PlayStation 5', 'Nintendo DS', 'Android', 'Web', 'Wii U', 'iOS', 'PS Vita', 'Nintendo 3DS', 'macOS', 'PC', 'PlayStation 4', 'Wii']


## Limpieza de número de juegos por plataformas.

Creamos una tabla con los datos que vamos a trabajar. En este caso `platforms` y el año de salida del juego.

In [62]:
df2=df[["platforms","año"]]
df2

Unnamed: 0,platforms,año
0,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",2015
1,"PC, PlayStation 4, Xbox One, iOS, Android, mac...",2015
2,"PC, PlayStation 5, Xbox One, PlayStation 4",2015
3,"PC, Xbox One, PlayStation 4, Nintendo Switch, ...",2015
4,"PC, Xbox One, PlayStation 4, macOS",2015
...,...,...
1995,PC,2024
1996,PC,2024
1997,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",2024
1998,PC,2024


Usamos np.where(...) para verificar si la columna "platforms" contiene cada elemento de la lista. Si lo contiene, asigna 1, de lo contrario, asigna  0

In [63]:
df2=df2.copy()
for x in lista_unica:
    df2[x]=np.where(df2["platforms"].str.contains(x),1,0)
df2

Unnamed: 0,platforms,año,Xbox One,Xbox 360,Classic Macintosh,Nintendo Switch,Xbox Series S/X,PlayStation 3,Linux,PSP,...,Android,Web,Wii U,iOS,PS Vita,Nintendo 3DS,macOS,PC,PlayStation 4,Wii
0,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",2015,1,0,0,1,1,0,0,0,...,0,0,0,0,0,0,1,1,1,0
1,"PC, PlayStation 4, Xbox One, iOS, Android, mac...",2015,1,1,0,0,0,1,1,0,...,1,0,0,1,0,0,1,1,1,0
2,"PC, PlayStation 5, Xbox One, PlayStation 4",2015,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,1,0
3,"PC, Xbox One, PlayStation 4, Nintendo Switch, ...",2015,1,0,0,1,0,0,1,0,...,0,0,0,0,0,0,1,1,1,0
4,"PC, Xbox One, PlayStation 4, macOS",2015,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,PC,2024,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1996,PC,2024,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1997,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",2024,1,0,0,1,1,0,0,0,...,0,0,0,0,0,0,0,1,1,0
1998,PC,2024,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


Como no nos interesa los 0, para evitar posibles errores sustituimos los 0 por NaN. Así indicando que el valor está vacío.

In [64]:
df2.replace(0, np.nan, inplace=True)
df2

Unnamed: 0,platforms,año,Xbox One,Xbox 360,Classic Macintosh,Nintendo Switch,Xbox Series S/X,PlayStation 3,Linux,PSP,...,Android,Web,Wii U,iOS,PS Vita,Nintendo 3DS,macOS,PC,PlayStation 4,Wii
0,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",2015,1.0,,,1.0,1.0,,,,...,,,,,,,1.0,1.0,1.0,
1,"PC, PlayStation 4, Xbox One, iOS, Android, mac...",2015,1.0,1.0,,,,1.0,1.0,,...,1.0,,,1.0,,,1.0,1.0,1.0,
2,"PC, PlayStation 5, Xbox One, PlayStation 4",2015,1.0,,,,,,,,...,,,,,,,,1.0,1.0,
3,"PC, Xbox One, PlayStation 4, Nintendo Switch, ...",2015,1.0,,,1.0,,,1.0,,...,,,,,,,1.0,1.0,1.0,
4,"PC, Xbox One, PlayStation 4, macOS",2015,1.0,,,,,,,,...,,,,,,,1.0,1.0,1.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,PC,2024,,,,,,,,,...,,,,,,,,1.0,,
1996,PC,2024,,,,,,,,,...,,,,,,,,1.0,,
1997,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",2024,1.0,,,1.0,1.0,,,,...,,,,,,,,1.0,1.0,
1998,PC,2024,,,,,,,,,...,,,,,,,,1.0,,


# Limpieza de valoraciones por plataformas.

Creamos una tabla con los datos que vamos a trabajar. En este caso "platforms" , el año de salida y valoración del juego.

In [65]:
df3=df[["platforms","rating","año"]]
df3

Unnamed: 0,platforms,rating,año
0,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",4.65,2015
1,"PC, PlayStation 4, Xbox One, iOS, Android, mac...",4.12,2015
2,"PC, PlayStation 5, Xbox One, PlayStation 4",3.81,2015
3,"PC, Xbox One, PlayStation 4, Nintendo Switch, ...",3.93,2015
4,"PC, Xbox One, PlayStation 4, macOS",4.04,2015
...,...,...,...
1995,PC,0.00,2024
1996,PC,0.00,2024
1997,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",0.00,2024
1998,PC,2.36,2024


Como se realizó anteriormente el np.where(...) verifica si la columna "platforms" contiene cada elemento de la lista. Si lo contiene, asigna la valoración del juego, de lo contrario, asigna  0. Posteriormente sustituimos el 0 por un NaN

In [66]:
df3=df3.copy()
for x in lista_unica:
    df3[x] = np.where(df3["platforms"].str.contains(x, na=False), df3["rating"], 0)
df3

Unnamed: 0,platforms,rating,año,Xbox One,Xbox 360,Classic Macintosh,Nintendo Switch,Xbox Series S/X,PlayStation 3,Linux,...,Android,Web,Wii U,iOS,PS Vita,Nintendo 3DS,macOS,PC,PlayStation 4,Wii
0,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",4.65,2015,4.65,0.00,0.0,4.65,4.65,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,4.65,4.65,4.65,0.0
1,"PC, PlayStation 4, Xbox One, iOS, Android, mac...",4.12,2015,4.12,4.12,0.0,0.00,0.00,4.12,4.12,...,4.12,0.0,0.0,4.12,0.0,0.0,4.12,4.12,4.12,0.0
2,"PC, PlayStation 5, Xbox One, PlayStation 4",3.81,2015,3.81,0.00,0.0,0.00,0.00,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,0.00,3.81,3.81,0.0
3,"PC, Xbox One, PlayStation 4, Nintendo Switch, ...",3.93,2015,3.93,0.00,0.0,3.93,0.00,0.00,3.93,...,0.00,0.0,0.0,0.00,0.0,0.0,3.93,3.93,3.93,0.0
4,"PC, Xbox One, PlayStation 4, macOS",4.04,2015,4.04,0.00,0.0,0.00,0.00,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,4.04,4.04,4.04,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,PC,0.00,2024,0.00,0.00,0.0,0.00,0.00,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.00,0.00,0.0
1996,PC,0.00,2024,0.00,0.00,0.0,0.00,0.00,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.00,0.00,0.0
1997,"PC, PlayStation 5, Xbox One, PlayStation 4, Xb...",0.00,2024,0.00,0.00,0.0,0.00,0.00,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,0.00,0.00,0.00,0.0
1998,PC,2.36,2024,0.00,0.00,0.0,0.00,0.00,0.00,0.00,...,0.00,0.0,0.0,0.00,0.0,0.0,0.00,2.36,0.00,0.0


In [67]:
df3.drop(columns=["platforms","rating"],inplace=True)


In [68]:
df3.replace(0, np.nan, inplace=True)

Extraemos los datos y añadimos a un json para facilitar el uso de estos.

In [69]:
df_valo.to_json("../data/processed/valoracion_plataforma.json", index=False)

# Filtrado de cantidad de juegos

Creamos la tabla para visualizar la cantidad de juegos por plataforma.

Este código agrupa el DataFrame df2 por el año y suma los valores de distintas plataformas de videojuegos.
Así, obtienes el total de cada plataforma por año, en lugar del promedio que calculabas antes.

In [70]:
df_grouped = df2.groupby("año")[['PlayStation 4', 'Xbox One', 'Nintendo Switch', 'PC', 'Nintendo 3DS', 'Wii U',
                                 'PlayStation 5', 'PlayStation 3', 'PS Vita', 'Xbox Series S/X', 'Classic Macintosh',
                                 'macOS', 'Android', 'iOS', 'Linux', 'Wii', 'Nintendo DS', 'Web', 'PSP', 'Xbox 360']].sum()

print(df_grouped)


      PlayStation 4  Xbox One  Nintendo Switch     PC  Nintendo 3DS  Wii U  \
año                                                                          
2015          122.0      98.0             58.0  193.0           3.0    6.0   
2016          139.0     127.0             77.0  191.0           4.0    5.0   
2017          140.0     122.0             91.0  190.0           2.0    1.0   
2018          137.0     132.0            102.0  196.0           0.0    0.0   
2019          130.0     130.0            101.0  191.0           0.0    0.0   
2020          120.0     123.0             94.0  197.0           0.0    0.0   
2021          104.0     116.0             80.0  197.0           0.0    0.0   
2022           79.0      82.0             63.0  194.0           0.0    0.0   
2023           55.0      40.0             46.0  193.0           0.0    0.0   
2024           22.0      20.0             24.0  193.0           0.0    0.0   

      PlayStation 5  PlayStation 3  PS Vita  Xbox Series S/X  \

In [71]:
df_grouped.to_json("../data/processed/juegos_plataforma.json", index=False)

---

# Preparación de datos para el análisis:  
## ¿Cómo ha evolucionado la puntuación media de los videojuegos con los años?

Este notebook forma parte de un proyecto de análisis de datos donde se trabaja con un archivo `JSON` extraído desde la API de RAWG.

El objetivo de este script es preparar los datos para su posterior análisis. Se utiliza la biblioteca `pandas` para limpiar y estructurar la información, filtrando solo los videojuegos que cuentan con:

- Fecha de lanzamiento válida (`released`)
- Puntuación (`rating`)
- Al menos un género asignado (`genres`)
- Fechas comprendidas entre 2015 y 2024

El resultado es un DataFrame limpio y útil, que incluye columnas como el nombre del juego, la fecha de lanzamiento, la puntuación media y los géneros asociados preparado para un análisis exploratorio o visualizaciones posteriores.

In [131]:
df = pd.read_json('../data/raw/juegos_rawg.json')


In [132]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2000 entries, 0 to 1999
Data columns (total 41 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   slug                     2000 non-null   object 
 1   name                     2000 non-null   object 
 2   playtime                 2000 non-null   int64  
 3   platforms                2000 non-null   object 
 4   stores                   1989 non-null   object 
 5   released                 2000 non-null   object 
 6   tba                      2000 non-null   bool   
 7   background_image         1999 non-null   object 
 8   rating                   2000 non-null   float64
 9   rating_top               2000 non-null   int64  
 10  ratings                  2000 non-null   object 
 11  ratings_count            2000 non-null   int64  
 12  reviews_text_count       2000 non-null   int64  
 13  added                    2000 non-null   int64  
 14  metacritic               1056

# Tratamiento de NaN

Aunque en el analisis exploratorio, las columnas que vamos a utilizar para trabajar con los datosn no vemos presentes ningun `NaN`hemos creado una celda de filtrado para si en posteriores cargas de datos si existiesen, tener preparado el notebook para tal caso.

In [133]:
# 2. Filtrar registros válidos (con fecha, rating y géneros)
df = df[df['released'].notna() & df['rating'].notna() & df['genres'].notna()]
df

Unnamed: 0,slug,name,playtime,platforms,stores,released,tba,background_image,rating,rating_top,...,added_by_status.toplay,added_by_status.dropped,added_by_status.playing,esrb_rating.id,esrb_rating.name,esrb_rating.slug,esrb_rating.name_en,esrb_rating.name_ru,esrb_rating,community_rating
0,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,43,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-05-18,False,https://media.rawg.io/media/games/618/618c2031...,4.65,5,...,832.0,991.0,892.0,4.0,Mature,mature,Mature,С 17 лет,,
1,life-is-strange-episode-1-2,Life is Strange,6,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-01-29,False,https://media.rawg.io/media/games/562/56255381...,4.12,5,...,367.0,659.0,152.0,4.0,Mature,mature,Mature,С 17 лет,,
2,fallout-4,Fallout 4,38,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-09,False,https://media.rawg.io/media/games/d82/d82990b9...,3.81,4,...,439.0,1362.0,322.0,4.0,Mature,mature,Mature,С 17 лет,,
3,rocket-league,Rocket League,21,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-07-07,False,https://media.rawg.io/media/games/8cc/8cce7c0e...,3.93,4,...,114.0,1676.0,542.0,1.0,Everyone,everyone,Everyone,Для всех,,
4,rise-of-the-tomb-raider,Rise of the Tomb Raider,14,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-10,False,https://media.rawg.io/media/games/b45/b45575f3...,4.04,4,...,400.0,431.0,158.0,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,dark-hours-prologue,Dark Hours: Prologue,2,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-07-18,False,https://media.rawg.io/media/screenshots/c16/c1...,0.00,0,...,,,,,,,,,,0.0
1996,age-of-water-the-first-voyage,Age of Water: The First Voyage,1,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-21,False,https://media.rawg.io/media/screenshots/721/72...,0.00,0,...,4.0,,,,,,,,,0.0
1997,legacy-of-kaintm-soul-reaver-12-remastered,Legacy of Kain™ Soul Reaver 1&2 Remastered,0,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-12-10,False,https://media.rawg.io/media/games/e03/e03a08a3...,0.00,5,...,32.0,1.0,,4.0,Mature,mature,Mature,С 17 лет,,
1998,south-park-snow-day,SOUTH PARK: SNOW DAY!,4,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-25,False,https://media.rawg.io/media/screenshots/f90/f9...,2.36,1,...,27.0,7.0,,,,,,,,


# Convertir y filtrar fechas

- Convierte la columna released a tipo datetime, manejando errores con errors='coerce' (lo convierte en NaT si falla).

- Filtra el DataFrame para que solo queden los juegos con fecha entre 2015 y 2024.

- Muestra el DataFrame (df).

In [134]:
 df['fecha_lanzamiento'] = pd.to_datetime(df['released'], errors='coerce')
 df = df[(df['fecha_lanzamiento'].dt.year >= 2015) & (df['fecha_lanzamiento'].dt.year <= 2024)]
 df

Unnamed: 0,slug,name,playtime,platforms,stores,released,tba,background_image,rating,rating_top,...,added_by_status.dropped,added_by_status.playing,esrb_rating.id,esrb_rating.name,esrb_rating.slug,esrb_rating.name_en,esrb_rating.name_ru,esrb_rating,community_rating,fecha_lanzamiento
0,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,43,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-05-18,False,https://media.rawg.io/media/games/618/618c2031...,4.65,5,...,991.0,892.0,4.0,Mature,mature,Mature,С 17 лет,,,2015-05-18
1,life-is-strange-episode-1-2,Life is Strange,6,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-01-29,False,https://media.rawg.io/media/games/562/56255381...,4.12,5,...,659.0,152.0,4.0,Mature,mature,Mature,С 17 лет,,,2015-01-29
2,fallout-4,Fallout 4,38,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-09,False,https://media.rawg.io/media/games/d82/d82990b9...,3.81,4,...,1362.0,322.0,4.0,Mature,mature,Mature,С 17 лет,,,2015-11-09
3,rocket-league,Rocket League,21,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-07-07,False,https://media.rawg.io/media/games/8cc/8cce7c0e...,3.93,4,...,1676.0,542.0,1.0,Everyone,everyone,Everyone,Для всех,,,2015-07-07
4,rise-of-the-tomb-raider,Rise of the Tomb Raider,14,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-10,False,https://media.rawg.io/media/games/b45/b45575f3...,4.04,4,...,431.0,158.0,,,,,,,,2015-11-10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,dark-hours-prologue,Dark Hours: Prologue,2,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-07-18,False,https://media.rawg.io/media/screenshots/c16/c1...,0.00,0,...,,,,,,,,,0.0,2024-07-18
1996,age-of-water-the-first-voyage,Age of Water: The First Voyage,1,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-21,False,https://media.rawg.io/media/screenshots/721/72...,0.00,0,...,,,,,,,,,0.0,2024-03-21
1997,legacy-of-kaintm-soul-reaver-12-remastered,Legacy of Kain™ Soul Reaver 1&2 Remastered,0,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-12-10,False,https://media.rawg.io/media/games/e03/e03a08a3...,0.00,5,...,1.0,,4.0,Mature,mature,Mature,С 17 лет,,,2024-12-10
1998,south-park-snow-day,SOUTH PARK: SNOW DAY!,4,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-25,False,https://media.rawg.io/media/screenshots/f90/f9...,2.36,1,...,7.0,,,,,,,,,2024-03-25


# 5. Variables derivadas
Se crean dos nuevas columnas: `año` (año de lanzamiento) y `n_generos` (cantidad de géneros por juego).

In [135]:
df['año'] = df['fecha_lanzamiento'].dt.year
df['n_generos'] = df['genres'].apply(lambda g: len(g) if isinstance(g, list) else 0)

In [136]:
df

Unnamed: 0,slug,name,playtime,platforms,stores,released,tba,background_image,rating,rating_top,...,esrb_rating.id,esrb_rating.name,esrb_rating.slug,esrb_rating.name_en,esrb_rating.name_ru,esrb_rating,community_rating,fecha_lanzamiento,año,n_generos
0,the-witcher-3-wild-hunt,The Witcher 3: Wild Hunt,43,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-05-18,False,https://media.rawg.io/media/games/618/618c2031...,4.65,5,...,4.0,Mature,mature,Mature,С 17 лет,,,2015-05-18,2015,2
1,life-is-strange-episode-1-2,Life is Strange,6,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-01-29,False,https://media.rawg.io/media/games/562/56255381...,4.12,5,...,4.0,Mature,mature,Mature,С 17 лет,,,2015-01-29,2015,1
2,fallout-4,Fallout 4,38,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-09,False,https://media.rawg.io/media/games/d82/d82990b9...,3.81,4,...,4.0,Mature,mature,Mature,С 17 лет,,,2015-11-09,2015,2
3,rocket-league,Rocket League,21,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-07-07,False,https://media.rawg.io/media/games/8cc/8cce7c0e...,3.93,4,...,1.0,Everyone,everyone,Everyone,Для всех,,,2015-07-07,2015,3
4,rise-of-the-tomb-raider,Rise of the Tomb Raider,14,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2015-11-10,False,https://media.rawg.io/media/games/b45/b45575f3...,4.04,4,...,,,,,,,,2015-11-10,2015,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,dark-hours-prologue,Dark Hours: Prologue,2,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-07-18,False,https://media.rawg.io/media/screenshots/c16/c1...,0.00,0,...,,,,,,,0.0,2024-07-18,2024,2
1996,age-of-water-the-first-voyage,Age of Water: The First Voyage,1,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-21,False,https://media.rawg.io/media/screenshots/721/72...,0.00,0,...,,,,,,,0.0,2024-03-21,2024,3
1997,legacy-of-kaintm-soul-reaver-12-remastered,Legacy of Kain™ Soul Reaver 1&2 Remastered,0,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-12-10,False,https://media.rawg.io/media/games/e03/e03a08a3...,0.00,5,...,4.0,Mature,mature,Mature,С 17 лет,,,2024-12-10,2024,4
1998,south-park-snow-day,SOUTH PARK: SNOW DAY!,4,"[{'platform': {'id': 4, 'name': 'PC', 'slug': ...","[{'store': {'id': 1, 'name': 'Steam', 'slug': ...",2024-03-25,False,https://media.rawg.io/media/screenshots/f90/f9...,2.36,1,...,,,,,,,,2024-03-25,2024,2


# Selección y transformación

- Crea un nuevo DataFrame llamado df_final con columnas renombradas y transformadas:

- nombre: nombre del videojuego

- fecha_lanzamiento: fecha limpia

- puntuacion_media: rating

- generos: convierte la lista de diccionarios (genres) a un texto plano tipo 'action, rpg'.

In [137]:
 df = pd.DataFrame({
    'nombre': df['name'],
    'fecha_lanzamiento': df['fecha_lanzamiento'],
    'puntuacion_media': df['rating'],
    'generos': df['genres'].apply(lambda lista: ', '.join([g['slug'] for g in lista])),
    'año': df['año'],
    'n_generos': df['n_generos'],
    "rating": df['rating'],
 })

# Visualización del resultado

- Muestra las primeras 5 filas del DataFrame df_final, que debe contener los datos ya transformados.

In [138]:
df.head()

Unnamed: 0,nombre,fecha_lanzamiento,puntuacion_media,generos,año,n_generos,rating
0,The Witcher 3: Wild Hunt,2015-05-18,4.65,"action, role-playing-games-rpg",2015,2,4.65
1,Life is Strange,2015-01-29,4.12,adventure,2015,1,4.12
2,Fallout 4,2015-11-09,3.81,"action, role-playing-games-rpg",2015,2,3.81
3,Rocket League,2015-07-07,3.93,"racing, indie, sports",2015,3,3.93
4,Rise of the Tomb Raider,2015-11-10,4.04,action,2015,1,4.04


# Exporta el DataFrame df a un archivo JSON en formato:

- "records" → lista de diccionarios (uno por fila).

- lines=True → cada registro va en una línea (

In [139]:
df.to_json("../data/processed/juegos_rawg_generos1.json", index=False)