# MODELO DE APRENDIZAJE AUTOMÁTICO

Para el desarrollo de este modelo, usaremos la similitud del coseno, que es una herramienta útil en ciencia de datos para medir la similitud entre dos conjuntos de datos multidimensionales, como documentos de texto, imágenes o perfiles de usuarios.

###### -CountVectorizer: Esta clase se utiliza para convertir una colección de documentos de texto en una matriz de recuentos de términos. Básicamente, toma una lista de documentos de texto y los transforma en una representación numérica que los algoritmos de aprendizaje automático pueden entender. Cada fila de la matriz representa un documento y cada columna representa una palabra única en el vocabulario. La matriz cuenta cuántas veces aparece cada palabra en cada documento. 
###### -cosine_similarity: Esta función calcula la similitud del coseno entre vectores. En el contexto del aprendizaje automático, se usa comúnmente para medir la similitud entre dos conjuntos de características. Retorna una matriz de similitud donde cada fila y columna representan la similitud entre dos documentos.


### 1. Importar librerías

In [1]:
import pandas as pd
#scikit-learn (sklearn) - biblioteca de aprendizaje automático 
#
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [10]:
# Cargar el conjunto de datos
steam_games = pd.read_csv(r'C:\PI1_copia\data\DataLimpia\steam_games_cln.csv')

In [12]:
steam_games.head(10)

Unnamed: 0,genres,title,tags,specs,price,item_id,developer,year
0,"['Action', 'Casual', 'Indie', 'Simulation', 'S...",Lost Summoner Kitty,"['Strategy', 'Action', 'Indie', 'Casual', 'Sim...",['Single-player'],4.99,761140,Kotoshiro,2018.0
1,"['Free to Play', 'Indie', 'RPG', 'Strategy']",Ironbound,"['Free to Play', 'Strategy', 'Indie', 'RPG', '...","['Single-player', 'Multi-player', 'Online Mult...",0.0,643980,Secret Level SRL,2018.0
2,"['Casual', 'Free to Play', 'Indie', 'Simulatio...",Real Pool 3D - Poolians,"['Free to Play', 'Simulation', 'Sports', 'Casu...","['Single-player', 'Multi-player', 'Online Mult...",0.0,670290,Poolians.com,2017.0
3,"['Action', 'Adventure', 'Casual']",弹炸人2222,"['Action', 'Adventure', 'Casual']",['Single-player'],0.99,767400,彼岸领域,2017.0
4,"['Action', 'Adventure', 'Simulation']",Battle Royale Trainer,"['Action', 'Adventure', 'Simulation', 'FPS', '...","['Single-player', 'Steam Achievements']",3.99,772540,Trickjump Games Ltd,2018.0
5,"['Free to Play', 'Indie', 'Simulation', 'Sports']",SNOW - All Access Pro Pass,"['Free to Play', 'Indie', 'Simulation', 'Sports']","['Single-player', 'Multi-player', 'Online Mult...",18.99,774277,Poppermost Productions,2018.0
6,"['Free to Play', 'Indie', 'Simulation', 'Sports']",SNOW - All Access Legend Pass,"['Free to Play', 'Indie', 'Simulation', 'Sports']","['Single-player', 'Multi-player', 'Online Mult...",29.99,774278,Poppermost Productions,2018.0
7,"['Action', 'Adventure', 'Casual', 'Indie', 'RPG']",Army of Tentacles: (Not) A Cthulhu Dating Sim:...,"['Action', 'Adventure', 'RPG', 'Indie', 'Casual']","['Single-player', 'Steam Achievements']",10.99,770380,Stegalosaurus Game Development,2018.0
8,"['Casual', 'Indie']",Beach Rules,"['Casual', 'Indie', 'Pixel Graphics', 'Cute', ...",['Single-player'],3.99,768880,Copperpick Studio,2018.0
9,"['Casual', 'Indie', 'Simulation']",Planetarium 2 - Zen Odyssey,"['Indie', 'Casual', 'Simulation']",['Single-player'],2.99,765320,Ghulam Jewel,2018.0


### 2. Adecuación del dataframe

In [15]:
# Seleccionar las columnas necesarias
df = steam_games.loc[:, ["tags","specs", "item_id", "title"]]
df.head(20)

Unnamed: 0,tags,specs,item_id,title
0,"['Strategy', 'Action', 'Indie', 'Casual', 'Sim...",['Single-player'],761140,Lost Summoner Kitty
1,"['Free to Play', 'Strategy', 'Indie', 'RPG', '...","['Single-player', 'Multi-player', 'Online Mult...",643980,Ironbound
2,"['Free to Play', 'Simulation', 'Sports', 'Casu...","['Single-player', 'Multi-player', 'Online Mult...",670290,Real Pool 3D - Poolians
3,"['Action', 'Adventure', 'Casual']",['Single-player'],767400,弹炸人2222
4,"['Action', 'Adventure', 'Simulation', 'FPS', '...","['Single-player', 'Steam Achievements']",772540,Battle Royale Trainer
5,"['Free to Play', 'Indie', 'Simulation', 'Sports']","['Single-player', 'Multi-player', 'Online Mult...",774277,SNOW - All Access Pro Pass
6,"['Free to Play', 'Indie', 'Simulation', 'Sports']","['Single-player', 'Multi-player', 'Online Mult...",774278,SNOW - All Access Legend Pass
7,"['Action', 'Adventure', 'RPG', 'Indie', 'Casual']","['Single-player', 'Steam Achievements']",770380,Army of Tentacles: (Not) A Cthulhu Dating Sim:...
8,"['Casual', 'Indie', 'Pixel Graphics', 'Cute', ...",['Single-player'],768880,Beach Rules
9,"['Indie', 'Casual', 'Simulation']",['Single-player'],765320,Planetarium 2 - Zen Odyssey


In [16]:
# Convertir 'item_id' a tipo entero
df["item_id"] = df["item_id"].astype(int)

In [17]:
# Limpiar la columna 'tags', ya que en este caso se ve más adecuada para recomendar juegos similares
df['tags'] = df['tags'].apply(lambda x: str(x).replace('[', '').replace(']', '').replace("'", ''))

In [18]:
df.head()

Unnamed: 0,tags,specs,item_id,title
0,"Strategy, Action, Indie, Casual, Simulation",['Single-player'],761140,Lost Summoner Kitty
1,"Free to Play, Strategy, Indie, RPG, Card Game,...","['Single-player', 'Multi-player', 'Online Mult...",643980,Ironbound
2,"Free to Play, Simulation, Sports, Casual, Indi...","['Single-player', 'Multi-player', 'Online Mult...",670290,Real Pool 3D - Poolians
3,"Action, Adventure, Casual",['Single-player'],767400,弹炸人2222
4,"Action, Adventure, Simulation, FPS, Shooter, T...","['Single-player', 'Steam Achievements']",772540,Battle Royale Trainer


### 3. Creación de modelo

In [19]:
# Crear un vectorizador de texto basado en la columna "tags"
cv = CountVectorizer()
vectores = cv.fit_transform(df['tags']).toarray()

In [20]:
# Calcular la similitud del coseno entre vectores
similitud = cosine_similarity(vectores)

### 4. Función de recomendación

In [23]:
# Función para obtener recomendaciones por título
def recomendacion(games):
    #Buscar índice de juegos en el DataFrame original (df) para acceder a la fila correspondiente en la matriz de similitud
    indice_juego = df[df["item_id"] == games].index[0]
    
    #Cálculo de Similitudes: Se obtienen las distancias de similitud entre el input los otros juegos del dataset
    distancias = similitud[indice_juego]
    
    #Las distancias se ordenan de manera descendente. Se seleccionan los cinco más similares
    game_list = sorted(list(enumerate(distancias)), reverse=True, key=lambda x: x[1])[1:6]
    
    #Salida. Devuelve la lista de títulos recomendados.
    recom_games = [df.iloc[i[0]]['title'] for i in game_list]
    
    return recom_games

In [26]:
#Testeo de la función
print(recomendacion(772540))

['R.I.P.D.: The Game', 'The Club™', 'Global Ops: Commando Libya', 'CT Special Forces: Fire for Effect', 'Lead and Gold: Gangs of the Wild West']


In [27]:
# Aplicar la función a la columna 'item_id' y crear la columna 'Recomendados'
df['Recomendados'] = df['item_id'].apply(recomendacion)

In [29]:
df['Recomendados'].head()
df

Unnamed: 0,tags,specs,item_id,title,Recomendados
0,"Strategy, Action, Indie, Casual, Simulation",['Single-player'],761140,Lost Summoner Kitty,"[World of Cinema - Directors Cut, Aerial Destr..."
1,"Free to Play, Strategy, Indie, RPG, Card Game,...","['Single-player', 'Multi-player', 'Online Mult...",643980,Ironbound,"[Chronicle: RuneScape Legends, Guardians of Gr..."
2,"Free to Play, Simulation, Sports, Casual, Indi...","['Single-player', 'Multi-player', 'Online Mult...",670290,Real Pool 3D - Poolians,"[Snooker-online multiplayer snooker game!, Mal..."
3,"Action, Adventure, Casual",['Single-player'],767400,弹炸人2222,"[Atomic Adam: Episode 1, Biozone, Abandoned Kn..."
4,"Action, Adventure, Simulation, FPS, Shooter, T...","['Single-player', 'Steam Achievements']",772540,Battle Royale Trainer,"[R.I.P.D.: The Game, The Club™, Global Ops: Co..."
...,...,...,...,...,...
22516,"Action, Indie, Casual, Violent, Adventure","['Single-player', 'Steam Achievements', 'Steam...",745400,Kebab it Up!,"[Broforce: The Soundtrack, Hope in Hell, Mini'..."
22517,"Strategy, Indie, Casual, Simulation","['Single-player', 'Steam Achievements']",773640,Colony On Mars,"[Fate of the World: Migration, Fate of the Wor..."
22518,"Strategy, Indie, Casual","['Single-player', 'Steam Achievements', 'Steam...",733530,LOGistICAL: South Africa,"[Ticket to Ride - USA 1910, Ticket to Ride - E..."
22519,"Indie, Simulation, Racing","['Single-player', 'Steam Achievements', 'Steam...",610660,Russian Roads,[Car Mechanic Simulator 2015 - Total Modificat...


In [None]:
# Se eliminan las columnas innecesarias porque en el código anterior dejamos organizadas las recomendaciones en la columna recomendados por cada "item_id"
df.drop(columns=["specs", "tags"], inplace=True)

In [33]:
df

Unnamed: 0,item_id,title,Recomendados
0,761140,Lost Summoner Kitty,"[World of Cinema - Directors Cut, Aerial Destr..."
1,643980,Ironbound,"[Chronicle: RuneScape Legends, Guardians of Gr..."
2,670290,Real Pool 3D - Poolians,"[Snooker-online multiplayer snooker game!, Mal..."
3,767400,弹炸人2222,"[Atomic Adam: Episode 1, Biozone, Abandoned Kn..."
4,772540,Battle Royale Trainer,"[R.I.P.D.: The Game, The Club™, Global Ops: Co..."
...,...,...,...
22516,745400,Kebab it Up!,"[Broforce: The Soundtrack, Hope in Hell, Mini'..."
22517,773640,Colony On Mars,"[Fate of the World: Migration, Fate of the Wor..."
22518,733530,LOGistICAL: South Africa,"[Ticket to Ride - USA 1910, Ticket to Ride - E..."
22519,610660,Russian Roads,[Car Mechanic Simulator 2015 - Total Modificat...


In [36]:
# Exportar dataframe a CSV
df.to_csv('C:\PI1_copia\data\DataML\modelo_item_item.csv', index=False)