### Comenzamos el algoritmo de ML para recomendacion de juegos

Nos centraremos en algo basico que brinde una solucion, como una matriz de similitud segun los generos de cada titulo, asi tendremos la fiabilidad de que los juegos seran de cierta manera similares.

In [129]:
import pandas as pd
import numpy as np


In [130]:
import json

data = []
with open('data/output_steam_games.json', 'r') as f:
    for line in f:
        try:
            obj = json.loads(line)
            data.append(obj)
        except json.JSONDecodeError as e:
            print("Error en línea:", line)


steam = pd.DataFrame(data)

print(steam.shape)



(120445, 13)


In [131]:
steam = steam.dropna(thresh=3)
print(steam.shape)
steam.columns

(32135, 13)


Index(['publisher', 'genres', 'app_name', 'title', 'url', 'release_date',
       'tags', 'reviews_url', 'specs', 'price', 'early_access', 'id',
       'developer'],
      dtype='object')

In [132]:
steam.head(1)
steam['price'] = steam['price'].apply(pd.to_numeric, errors='coerce')
steam.price = steam.price.fillna(0)
steam['genres'].fillna('[]', inplace=True)
steam['genres'] = steam['genres'].astype(str)
steam.dropna(inplace=True)

In [133]:
df_ml = steam[['price','title','genres']]
df_ml.set_index('title')
df_ml.reset_index
df_ml.head(1)

Unnamed: 0,price,title,genres
88310,4.99,Lost Summoner Kitty,"['Action', 'Casual', 'Indie', 'Simulation', 'S..."


In [134]:
df_ml['genres'] = df_ml['genres'].apply(lambda x: ','.join(eval(x)))

genres_encoded = df_ml['genres'].str.get_dummies(',')

df_ml = pd.concat([df_ml, genres_encoded], axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_ml['genres'] = df_ml['genres'].apply(lambda x: ','.join(eval(x)))


In [135]:
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler


scaler = StandardScaler()
df_ml['price_normalized'] = scaler.fit_transform(df_ml['price'].values.reshape(-1, 1))


features = pd.concat([genres_encoded, df_ml['price_normalized']], axis=1)

similarity_matrix = cosine_similarity(features)


In [136]:
ml_final = df_ml.reset_index(drop=True)

In [160]:
recommendations = []

for game_index in range(len(df_ml)):

    similar_games = list(enumerate(similarity_matrix[game_index]))
    sorted_similar_games = sorted(similar_games, key=lambda x: x[1], reverse=True)

    top_n = 5
    recommended_games = []
    for i, sim in sorted_similar_games[1:top_n+1]:
        recommended_games.append(ml_final['title'][i])


    recommendations.append({'titulo': ml_final['title'][game_index], 'juegos_recomendados': recommended_games})


recommendations_df = pd.DataFrame(recommendations)




                         titulo  \
0           Lost Summoner Kitty   
1                     Ironbound   
2       Real Pool 3D - Poolians   
3                       弹炸人2222   
4         Battle Royale Trainer   
...                         ...   
23617              Kebab it Up!   
23618            Colony On Mars   
23619  LOGistICAL: South Africa   
23620             Russian Roads   
23621       EXIT 2 - Directions   

                                     juegos_recomendados  
0      [Trivia Vault: Mixed Trivia, R.C. Bot Inc., A....  
1      [Shadow Hunter, Immortal Empire, MINDNIGHT, Vu...  
2      [Snooker-online multiplayer snooker game!, Mal...  
3      [Blood and Bacon, Luke Sidewalker, Cyber Utopi...  
4      [Rocket Craze 3D, Parkan 2, The Tomorrow War, ...  
...                                                  ...  
23617  [Gal-X-E, Dyna Bomb - Soundtrack OST, Tomato J...  
23618  [Just Deserts - Original Sound Track, BoomTown...  
23619  [iBomber Defense Pacific, Bumbledore, Bra

In [161]:
recommendations_df

Unnamed: 0,titulo,juegos_recomendados
0,Lost Summoner Kitty,"[Trivia Vault: Mixed Trivia, R.C. Bot Inc., A...."
1,Ironbound,"[Shadow Hunter, Immortal Empire, MINDNIGHT, Vu..."
2,Real Pool 3D - Poolians,"[Snooker-online multiplayer snooker game!, Mal..."
3,弹炸人2222,"[Blood and Bacon, Luke Sidewalker, Cyber Utopi..."
4,Battle Royale Trainer,"[Rocket Craze 3D, Parkan 2, The Tomorrow War, ..."
...,...,...
23617,Kebab it Up!,"[Gal-X-E, Dyna Bomb - Soundtrack OST, Tomato J..."
23618,Colony On Mars,"[Just Deserts - Original Sound Track, BoomTown..."
23619,LOGistICAL: South Africa,"[iBomber Defense Pacific, Bumbledore, Bravelan..."
23620,Russian Roads,"[Russian Roads, Car Mechanic Simulator 2015 - ..."


Ahora que tenemos las recomendaciones de cada titulo en el dataframe lo exportamos para luego usarlo en la api.

In [162]:
recommendations_df.to_csv('data_endpoints/recomendacion.csv')

In [153]:
import gc
gc.collect()

0