#### Deep Learning
#### Laboratorio 6: Sistemas de Recomendaciones
##### Sistema de recomendaciones basado en contenido
##### Autores: 
- Roberto Rios 20979
- Javier Mombiela 20067

##### Importar librerias

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, Input
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

#### Cargar datos

In [2]:
data = pd.read_csv('./datasets/joined.csv')
data

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
3,8680,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
4,10314,034545104X,9,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
...,...,...,...,...,...,...,...
1031131,276688,0517145553,0,Mostly Harmless,Douglas Adams,1995,Random House Value Pub
1031132,276688,1575660792,7,Gray Matter,Shirley Kennett,1996,Kensington Publishing Corporation
1031133,276690,0590907301,0,Triplet Trouble and the Class Trip (Triplet Tr...,Debbie Dadey,1997,Apple
1031134,276704,0679752714,0,A Desert of Pure Feeling (Vintage Contemporaries),Judith Freeman,1997,Vintage Books USA


Eliminar filas con valores nulos

In [3]:
data = data.dropna()
data.shape

(1031132, 7)

#### Preprocesamiento de datos

Encodeamos las variables de books

In [4]:
label_encoder = LabelEncoder()
data['Publisher'] = label_encoder.fit_transform(data['Publisher'])
data['Book-Title'] = label_encoder.fit_transform(data['Book-Title'])
data['Book-Author'] = label_encoder.fit_transform(data['Book-Author'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Publisher'] = label_encoder.fit_transform(data['Publisher'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Book-Title'] = label_encoder.fit_transform(data['Book-Title'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Book-Author'] = label_encoder.fit_transform(data['Book-Author']

Seleccion de caracteristicas

In [5]:
features = ['User-ID', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher']

X = data[features]
y = data['Book-Rating']

Dividir datos en conjuntos de entrenamiento y prueba

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Creacion del modelo secuencial

In [7]:
model = Sequential([
    Input(shape=(X_train.shape[1],)),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(1, activation='linear')
])

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 64)                384       
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
Total params: 2497 (9.75 KB)
Trainable params: 2497 (9.75 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Compilacion del modelo

In [8]:
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_squared_error'])

Entrenamiento del modelo

In [9]:
model.fit(X_train, y_train, epochs=5, batch_size=32, validation_data=(X_test, y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x2a3daa1f610>

#### Predicciones del modelo

In [10]:
predicciones = model.predict(X_test)



In [11]:
# Calcular las métricas
mse = mean_squared_error(y_test, predicciones)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, predicciones)

print(f'MSE: {mse}')
print(f'RMSE: {rmse}')
print(f'R^2: {r2}')

MSE: 14.857647075507128
RMSE: 3.854561852598441
R^2: -1.9383271747219766e-05


In [23]:
def recommend_books(user_id, num_recommendations):
    # Obtén los libros que el usuario aún no ha calificado
    user_ratings = data[data['User-ID'] == user_id]
    unrated_books = data[~data['Book-Title'].isin(user_ratings['Book-Title'])]

    # Crea un array de entrada para el modelo
    user_array = np.array([user_id for _ in range(len(unrated_books))])
    book_array = np.array(unrated_books[['Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher']])

    # Usa el modelo para predecir las calificaciones
    predictions = model.predict(np.column_stack((user_array, book_array)))

    # Añade las predicciones al dataframe de libros no calificados
    unrated_books['Predicted-Rating'] = predictions

    # Ordena los libros por la calificación predicha
    recommended_books = unrated_books.sort_values(by='Predicted-Rating', ascending=False)

    # Devuelve solo el título, el autor y la calificación predicha de los libros con las calificaciones más altas
    return recommended_books[['Book-Title', 'Book-Author', 'Predicted-Rating']][:num_recommendations]

In [24]:
print(recommend_books(9, 10))

        Book-Title  Book-Author  Predicted-Rating
766365        8558           54         14.516417
865366        7648          545         11.881720
865365        7648          545         11.881720
865364        7648          545         11.881720
899768        4842          212          9.658482
899769        4842          212          9.658482
770337        1956          592          9.635492
770336        1956          592          9.635492
437299        3813          682          9.164554
437301        3813          682          9.164554


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unrated_books['Predicted-Rating'] = predictions
