### Proyecto Individual Henry
**5_Sistema_Reco**  
Sistema de Recomendación
**Autor: Bioing. Urteaga Facundo Nahuel**  

**Resumen:** Este script comprende las siguientes etapas:

1. **Carga de librerías**
2. **Carga de datos (archivo .parquet)**
3. **Pre-procesamiento de dataframes para el análisis posterior**
4. **Primer entrenamiento del modelo (V1)**
5. **Segundo entrenamiento del modelo (V2)**
6. **Tercer entrenamiento del modelo (V3)**
7. **Cuarto entrenamiento del modelo (V4)**

In [154]:
# 1. Carga de librerías

import numpy as np
import pandas as pd
from sklearn.neighbors import NearestNeighbors

In [254]:
# 2. Carga de dataframes

df_games_tec = pd.read_parquet('df_games_tec.parquet')
df_games_genres = pd.read_parquet('df_games_genres.parquet')
df_games_specs = pd.read_parquet('df_games_specs.parquet')
df_games_tags = pd.read_parquet('df_games_tags.parquet')

In [255]:
df_games_genres.columns

Index(['item_id', 'genres', 'Utilities', 'Racing', 'Massively Multiplayer',
       'Sports', 'Action', 'Audio Production', 'Indie', 'Web Publishing',
       'RPG', 'Photo Editing', 'Casual', 'Software Training',
       'Animation &amp; Modeling', 'Design &amp; Illustration', 'Simulation',
       'Adventure', 'Early Access', 'Video Production', 'Education',
       'Accounting', 'Free to Play', 'Strategy'],
      dtype='object')

In [138]:
# 3. Pre-procesamiento de dataframes para el análisis posterior

df_games_names = df_games_tec[['item_id', 'app_name']]
df_games_genres = df_games_genres.drop(columns=['genres'])
df_games_specs = df_games_specs.drop(columns=['specs'])
df_games_tags = df_games_tags.drop(columns=['tags'])

# Realiza un join de los DataFrames df1 y df2
merged_df_1 = pd.merge(df_games_names, df_games_genres, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_2 = pd.merge(merged_df_1, df_games_specs, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_final = pd.merge(merged_df_2, df_games_tags, on='item_id', how='inner')

In [4]:
len(merged_df_final.columns)

403

In [5]:
games_dummies = merged_df_final.drop(columns=['item_id', 'app_name'])

In [6]:
# 4. Primer entrenamiento del modelo (V1)

n_neighbors=6

nneighbors = NearestNeighbors(n_neighbors = n_neighbors, metric = 'cosine').fit(games_dummies)

In [7]:
index = 32108
game_eval = np.array(games_dummies.iloc[index]).reshape(1,-1)


dif, ind = nneighbors.kneighbors(game_eval)




In [8]:
print("Liked Film")
print("="*80)
print(df_games_names.loc[ind[0][0], :])
print("Recommended Films")
print("="*80)
df_games_names.loc[ind[0][1:], :]

Liked Film
item_id                         90007.0
app_name    International Online Soccer
Name: 32108, dtype: object
Recommended Films


Unnamed: 0,item_id,app_name
25506,389310.0,Tic-Toc-Tower
4169,365680.0,BRAWL
24577,425180.0,Fantasy Grounds - AAW Map Pack Vol 1
24611,425030.0,Fantasy Grounds - Baldur's Gate Portrait Pack
6127,413500.0,Rocket Fist


In [None]:
# 5. Segundo entrenamiento del modelo (V2)

In [80]:
# Detecto categorías en specs, genres y labels que, a mi criterio, no aportan información al algoritmo

df_games_specs.columns

# Encuentro que de acá podría solo dejar ['Mods','Online Multi-Player','Standing','Local Multi-Player','Room-Scale',
# 'Single-player', 'Windows Mixed Reality', 'Keyboard / Mouse','HTC Vive', 'Cross-Platform Multiplayer', 'Online Co-op', 'Seated',
# 'MMO','Co-op', 'Gamepad', 'Downloadable Content','Local Co-op','Multi-player']


Index(['item_id', 'Mods (require HL1)', 'Stats', 'Includes level editor',
       'In-App Purchases', 'Steam Cloud', 'Mods', 'Online Multi-Player',
       'Standing', 'Partial Controller Support', 'Local Multi-Player',
       'SteamVR Collectibles', 'Steam Achievements', 'Room-Scale',
       'Single-player', 'Windows Mixed Reality', 'Keyboard / Mouse',
       'HTC Vive', 'Cross-Platform Multiplayer', 'Online Co-op', 'Seated',
       'MMO', 'Commentary available', 'Mods (require HL2)', 'Game demo',
       'Steam Leaderboards', 'Co-op', 'Gamepad', 'Downloadable Content',
       'Steam Workshop', 'Oculus Rift', 'Local Co-op', 'Shared/Split Screen',
       'Includes Source SDK', 'Tracked Motion Controllers',
       'Valve Anti-Cheat enabled', 'Steam Turn Notifications', 'Multi-player',
       'Captions available', 'Steam Trading Cards', 'Full controller support'],
      dtype='object')

In [81]:
df_games_genres.columns

# Encuentro que de acá podría sacar Early Acces

Index(['item_id', 'Audio Production', 'Massively Multiplayer', 'Free to Play',
       'Design &amp; Illustration', 'Software Training', 'Action',
       'Photo Editing', 'Video Production', 'Adventure', 'Utilities',
       'Accounting', 'Indie', 'Simulation', 'Casual', 'Racing',
       'Animation &amp; Modeling', 'Web Publishing', 'RPG', 'Education',
       'Sports', 'Strategy', 'Early Access'],
      dtype='object')

In [88]:
df_games_tags.columns[300:]

# De acá puedo sacar "Early Acces","Soundtrack"

Index(['Sokoban', 'Underwater', 'Steampunk', 'Funny', 'Multiplayer',
       'Tactical RPG', 'Sailing', 'Atmospheric', 'Horses', 'Split Screen',
       'GameMaker', 'Mars', 'Science', 'Great Soundtrack', 'Time Manipulation',
       'Card Game', '2.5D', 'Shooter', 'Web Publishing', 'Dark', 'Nudity',
       'Fighting', 'Dark Comedy', 'Turn-Based', 'Retro', 'Hunting', 'Parkour',
       'Survival Horror', '3D Platformer', 'Flight', 'Military', 'Lemmings',
       'Puzzle', 'Futuristic', 'Hardware', 'Zombies', 'Online Co-Op', 'War',
       'Female Protagonist', 'Bowling'],
      dtype='object')

In [None]:
# CAMBIOS PARA NUEVO ESTADO DE SISTEMA DE RECOMENDACION (V2)

#   * Eliminar las columnas nombradas
#   * Ponderar specs*0.25 genres*1 tags*4

In [11]:
# Vuelvo a realizar el SIST de RECO con estas modificaciones:
# OJO: Volver a cargar dfs

# Selecciono solo las columnas de interés

df_games_names = df_games_tec[['item_id', 'app_name']]
df_games_genres = df_games_genres.drop(columns=['genres','Early Access'])
df_games_specs = df_games_specs[['item_id','Mods','Online Multi-Player','Standing','Local Multi-Player','Room-Scale',
    'Single-player', 'Windows Mixed Reality', 'Keyboard / Mouse','HTC Vive', 'Cross-Platform Multiplayer', 'Online Co-op', 'Seated',
    'MMO','Co-op', 'Gamepad', 'Downloadable Content','Local Co-op','Multi-player']]
df_games_tags = df_games_tags.drop(columns=['tags',"Early Access","Soundtrack"])

# Pondero categorías

df_games_specs[df_games_specs == 1] = 0.25
df_games_tags[df_games_tags == 1] = 4

# Realiza un join de los DataFrames df1 y df2
merged_df_1 = pd.merge(df_games_names, df_games_genres, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_2 = pd.merge(merged_df_1, df_games_specs, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_final = pd.merge(merged_df_2, df_games_tags, on='item_id', how='inner')

In [None]:
# Índices de juegos conocidos para luego poder evaluar el desempeño del modelo

# Juego Counter Strike 32103
# Juego de futbol 32108
# Juego de baseball 6001
# Juego de Formula 1 7013
# Juego Worms 7027

In [30]:
games_dummies = merged_df_final.drop(columns=['item_id', 'app_name'])

n_neighbors=6
nneighbors = NearestNeighbors(n_neighbors = n_neighbors, metric = 'cosine').fit(games_dummies)

index = 32103
game_eval = np.array(games_dummies.iloc[index]).reshape(1,-1)
dif, ind = nneighbors.kneighbors(game_eval)

print("Juego seleccionado")
print("="*80)
print(df_games_names.loc[ind[0][0], :])
print("Juegos recomendados")
print("="*80)
df_games_names.loc[ind[0][1:], :]

Juego seleccionado
item_id               10.0
app_name    Counter-Strike
Name: 32103, dtype: object
Juegos recomendados




Unnamed: 0,item_id,app_name
15178,720930.0,VR Toolbox: Cartoon Forest Props DLC
10421,510360.0,Recession
11782,683210.0,Apez
17550,624560.0,Medusa's Labyrinth VR
14391,738320.0,Idle Champions of the Forgotten Realms - Start...


In [29]:
for iter in range(len(df_games_specs.columns)):
    if df_games_specs.iloc[30698, iter] != 0:
        nombre_columna = df_games_specs.columns[iter]
        print(nombre_columna)

# Por inspección encuentro que debo eliminar "Downloable Content" de df_games_specs para evitar que me recomiende DLCS y contenido descargable.
# También encuentro que puede recomendar juegos de cualquier año y eso puede ser algo no deseable

item_id
Single-player
Downloadable Content


In [None]:
# 6. Tercer entrenamiento del modelo (V3)

# CAMBIOS PARA NUEVO ESTADO DE SISTEMA DE RECOMENDACION (V3)

#   * Eliminar Downloable Content
#   * Agregar y ponderar columna de decada del juego 

In [40]:
df_games_tec.columns

Index(['app_name', 'item_id', 'publisher', 'release_date', 'price',
       'developer', 'release_year'],
      dtype='object')

In [None]:
df_games_tec["release_year"].value_counts()

In [156]:
# Agrupo las fechas en décadas para que la influencia del año de cada juego sea mas flexible. Luego, genero variables dummies.

df_games_release_lustrum = df_games_tec[['item_id', 'release_year']].copy()

# Definir los límites de los lustros
bins = [0, 1999, 2005, 2010, 2015, 9999]
labels = ['before_2000', '2000_2005', '2005_2010', '2010_2015', 'after_2015']

# Dividir los años en lustros y crear variables dummies
df_games_release_lustrum['release_lustrum'] = pd.cut(df_games_release_lustrum['release_year'], bins=bins, labels=labels)
df_games_release_lustrum = pd.get_dummies(df_games_release_lustrum, columns=['release_lustrum'])
df_games_release_lustrum = df_games_release_lustrum.multiply(1)

# Eliminar la columna original de 'release_year'
df_games_release_lustrum.drop(columns=['release_year'], inplace=True)


In [132]:
df_games_release_lustrum["release_lustrum_2010_2015"].value_counts()

release_lustrum_2010_2015
0    21025
1    11107
Name: count, dtype: int64

In [133]:
# Mostrar el nuevo DataFrame
df_games_release_lustrum.head()

Unnamed: 0,item_id,release_lustrum_before_2000,release_lustrum_2000_2005,release_lustrum_2005_2010,release_lustrum_2010_2015,release_lustrum_after_2015
0,761140.0,0,0,0,0,1
1,643980.0,0,0,0,0,1
2,670290.0,0,0,0,0,1
3,767400.0,0,0,0,0,1
4,773570.0,0,0,0,0,0


In [251]:
# 2. Carga de dataframes

df_games_tec = pd.read_parquet('df_games_tec.parquet')
df_games_genres = pd.read_parquet('df_games_genres.parquet')
df_games_specs = pd.read_parquet('df_games_specs.parquet')
df_games_tags = pd.read_parquet('df_games_tags.parquet')

# Agrupo las fechas en décadas para que la influencia del año de cada juego sea mas flexible. Luego, genero variables dummies.

df_games_release_lustrum = df_games_tec[['item_id', 'release_year']].copy()

# Definir los límites de los lustros
bins = [0, 1999, 2005, 2010, 2015, 9999]
labels = ['before_2000', '2000_2005', '2005_2010', '2010_2015', 'after_2015']

# Dividir los años en lustros y crear variables dummies
df_games_release_lustrum['release_lustrum'] = pd.cut(df_games_release_lustrum['release_year'], bins=bins, labels=labels)
df_games_release_lustrum = pd.get_dummies(df_games_release_lustrum, columns=['release_lustrum'])
df_games_release_lustrum = df_games_release_lustrum.multiply(1)

# Eliminar la columna original de 'release_year'
df_games_release_lustrum.drop(columns=['release_year'], inplace=True)



# Vuelvo a realizar el SIST de RECO con estas modificaciones:
# OJO: Volver a cargar dfs

# Selecciono solo las columnas de interés

df_games_names = df_games_tec[['item_id', 'app_name']]
df_games_genres = df_games_genres.drop(columns=['genres','Early Access'])
df_games_specs = df_games_specs[['item_id','Online Multi-Player','Local Multi-Player','Room-Scale',
    'Single-player', 'Keyboard / Mouse', 'Cross-Platform Multiplayer', 'Online Co-op', 'Seated',
    'MMO','Co-op', 'Gamepad','Local Co-op','Multi-player']]
df_games_tags = df_games_tags.drop(columns=['tags',"Early Access","Soundtrack"])

# Pondero categorías
df_games_release_lustrum[df_games_release_lustrum == 1] = 1 # Ponderación fuerte ya que es una sola columna con 1
df_games_specs[df_games_specs == 1] = 0.25
df_games_tags[df_games_tags == 1] = 1
df_games_genres[df_games_genres == 1] = 0.125

# Realiza un join de los DataFrames df1 y df2
merged_df_1 = pd.merge(df_games_names, df_games_genres, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_2 = pd.merge(merged_df_1, df_games_specs, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_3 = pd.merge(merged_df_2, df_games_release_lustrum, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_final = pd.merge(merged_df_3, df_games_tags, on='item_id', how='inner')

In [233]:
# Índices de juegos conocidos para luego poder evaluar el desempeño del modelo

# Juego Counter Strike 32103
# Juego de futbol 32108
# Juego de baseball 6001
# Juego de Formula 1 7013
# Juego Worms 7027

In [253]:
games_dummies = merged_df_final.drop(columns=['item_id', 'app_name'])

n_neighbors=6
nneighbors = NearestNeighbors(n_neighbors = n_neighbors, metric = 'cosine').fit(games_dummies)

index = 32103
game_eval = np.array(games_dummies.iloc[index]).reshape(1,-1)
dif, ind = nneighbors.kneighbors(game_eval)

print("Juego seleccionado")
print("="*80)
print(df_games_names.loc[ind[0][0], :])
print("Juegos recomendados")
print("="*80)
df_games_names.loc[ind[0][1:], :]

Juego seleccionado
item_id               10.0
app_name    Counter-Strike
Name: 32103, dtype: object
Juegos recomendados




Unnamed: 0,item_id,app_name
16812,645130.0,Space Dream VR
17924,627030.0,Fantasy Grounds - Pathfinder RPG - Rise of the...
10390,672570.0,The Western Hunter
19863,540711.0,Assetto Corsa - Porsche Pack III
14682,728461.0,NASCAR Heat 2 - October Value Pack


In [None]:
# 7. **Cuarto entrenamiento del modelo (V4)**

# CAMBIOS PARA NUEVO ESTADO DE SISTEMA DE RECOMENDACION (V4)

#   * Dividir cada variable dummie por la sumatoria total de etiquetas en cada categoría (genre, spec y tags)

In [257]:
df_games_tags.columns

Index(['item_id', 'tags', 'Thriller', 'Philisophical', 'Superhero',
       'Massively Multiplayer', 'Diplomacy', 'Satire', 'Mature',
       'Side Scroller',
       ...
       'Word Game', 'Visual Novel', 'Experimental', 'Capitalism', 'Hex Grid',
       'Hacking', 'Video Production', 'Hunting', 'Turn-Based', 'Underwater'],
      dtype='object', length=341)

In [279]:
# 2. Carga de dataframes

df_games_tec = pd.read_parquet('df_games_tec.parquet')
df_games_genres = pd.read_parquet('df_games_genres.parquet')
df_games_specs = pd.read_parquet('df_games_specs.parquet')
df_games_tags = pd.read_parquet('df_games_tags.parquet')

# Selecciono solo las columnas de interés

df_games_names = df_games_tec[['item_id', 'app_name']]
df_games_genres = df_games_genres.drop(columns=['genres','Early Access'])
df_games_specs = df_games_specs[['item_id','Online Multi-Player','Local Multi-Player','Room-Scale',
    'Single-player', 'Keyboard / Mouse', 'Cross-Platform Multiplayer', 'Online Co-op', 'Seated',
    'MMO','Co-op', 'Gamepad','Local Co-op','Multi-player']]
df_games_tags = df_games_tags.drop(columns=['tags',"Early Access","Soundtrack"])

# Seleccionar solo las columnas de variables dummies
df_games_genres_dummies = df_games_genres.drop(columns=['item_id'])
df_games_specs_dummies = df_games_specs.drop(columns=['item_id'])
df_games_tags_dummies = df_games_tags.drop(columns=['item_id'])

# Sumar por fila la cantidad de variables dummies que son 1
suma_por_fila1 = df_games_genres_dummies.sum(axis=1)
suma_por_fila2 = df_games_specs_dummies.sum(axis=1)
suma_por_fila3 = df_games_tags_dummies.sum(axis=1)

# Dividir cada valor en la fila por la suma total (evitando la división por cero)
df_games_genres_dummies_dividido = df_games_genres_dummies.div(suma_por_fila1, axis=0)
df_games_specs_dummies_dividido = df_games_specs_dummies.div(suma_por_fila2, axis=0)
df_games_tags_dummies_dividido = df_games_tags_dummies.div(suma_por_fila3, axis=0)

# Reemplazar NaN con 0 si la suma por fila es 0
df_games_genres_dummies_dividido.fillna(0, inplace=True)
df_games_specs_dummies_dividido.fillna(0, inplace=True)
df_games_tags_dummies_dividido.fillna(0, inplace=True)

# Unir el DataFrame resultante con las columnas 'item_id' y 'genres'
df_games_genres_v4 = pd.concat([df_games_genres[['item_id']], df_games_genres_dummies_dividido], axis=1)
df_games_specs_v4 = pd.concat([df_games_specs[['item_id']], df_games_specs_dummies_dividido], axis=1)
df_games_tags_v4 = pd.concat([df_games_tags[['item_id']], df_games_tags_dummies_dividido], axis=1)


In [280]:
# Agrupo las fechas en décadas para que la influencia del año de cada juego sea mas flexible. Luego, genero variables dummies.

df_games_release_lustrum = df_games_tec[['item_id', 'release_year']].copy()

# Definir los límites de los lustros
bins = [0, 1999, 2005, 2010, 2015, 9999]
labels = ['before_2000', '2000_2005', '2005_2010', '2010_2015', 'after_2015']

# Dividir los años en lustros y crear variables dummies
df_games_release_lustrum['release_lustrum'] = pd.cut(df_games_release_lustrum['release_year'], bins=bins, labels=labels)
df_games_release_lustrum = pd.get_dummies(df_games_release_lustrum, columns=['release_lustrum'])
df_games_release_lustrum = df_games_release_lustrum.multiply(1)

# Eliminar la columna original de 'release_year'
df_games_release_lustrum.drop(columns=['release_year'], inplace=True)

# Vuelvo a realizar el SIST de RECO con estas modificaciones:
# OJO: Volver a cargar dfs

# Pondero categorías
df_games_release_lustrum[df_games_release_lustrum == 1] = 2 # Ponderación fuerte ya que es una sola columna con 1
#df_games_specs_v4[df_games_specs_v4 == 1] = 0.25
#df_games_tags_v4[df_games_tags == 1] = 1
#df_games_genres_v4[df_games_genres == 1] = 0.125

# Realiza un join de los DataFrames df1 y df2
merged_df_1 = pd.merge(df_games_names, df_games_genres_v4, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_2 = pd.merge(merged_df_1, df_games_specs_v4, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_3 = pd.merge(merged_df_2, df_games_release_lustrum, on='item_id', how='inner')

# Realiza un join de merged_df con df3
merged_df_final = pd.merge(merged_df_3, df_games_tags_v4, on='item_id', how='inner')

In [None]:
# Índices de juegos conocidos para luego poder evaluar el desempeño del modelo

# Juego Counter Strike 32103
# Juego de futbol 32108
# Juego de baseball 6001
# Juego de Formula 1 7013
# Juego Worms 7027

In [284]:
games_dummies = merged_df_final.drop(columns=['item_id', 'app_name'])

n_neighbors=6
nneighbors = NearestNeighbors(n_neighbors = n_neighbors, metric = 'cosine').fit(games_dummies)

index = 7027
game_eval = np.array(games_dummies.iloc[index]).reshape(1,-1)
dif, ind = nneighbors.kneighbors(game_eval)

print("Juego seleccionado")
print("="*80)
print(df_games_names.loc[ind[0][0], :])
print("Juegos recomendados")
print("="*80)
df_games_names.loc[ind[0][1:], :]



Juego seleccionado
item_id        327030.0
app_name    Worms W.M.D
Name: 7027, dtype: object
Juegos recomendados


Unnamed: 0,item_id,app_name
23084,435460.0,NotCoD™
15373,637745.0,Rocksmith® 2014 Edition – Remastered – Marilyn...
23506,463090.0,Hatoful Boyfriend: Holiday Star Collector's Ed...
23083,438680.0,One Troll Army
12070,617810.0,Total War: WARHAMMER II - Mortal Empires


In [285]:
# 8. Exporto en formato parquet el df final para implementar el Sistema de Recomendación V4

merged_df_final.to_parquet('df_sist_reco_v4.parquet')
