# Content-based(CB)

- Movie lens Data의 장르를 이용하여 CB를 만들어 보자
- 장르를 tf,idf,tfidf로 변환하고 각각 코사인 유사도 기반 평점 예측
- tf-idf값에 평점을 곱하여 평점 예측하자

In [2]:
import math
import numpy as np
from numpy import linalg as LA
import pandas as pd

### Movies Weight Matrix on Genres

Read movie metadata from a csv file.

In [3]:
movies = pd.read_csv('data/movielens/movies_w_imgurl.csv')
movies.head()

Unnamed: 0,movieId,imdbId,title,genres,imgurl
0,1,114709,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,https://images-na.ssl-images-amazon.com/images...
1,2,113497,Jumanji (1995),Adventure|Children|Fantasy,https://images-na.ssl-images-amazon.com/images...
2,3,113228,Grumpier Old Men (1995),Comedy|Romance,https://images-na.ssl-images-amazon.com/images...
3,4,114885,Waiting to Exhale (1995),Comedy|Drama|Romance,https://images-na.ssl-images-amazon.com/images...
4,5,113041,Father of the Bride Part II (1995),Comedy,https://images-na.ssl-images-amazon.com/images...


Split genres and stack genres into one column.

## TF(Term Frequency, 단어 빈도)
- 특정한 단어가 문서 내에 얼마나 자주 등장하는지를 나타내는 값
- TF를 구하는 법 3가지

1. 불린 빈도: tf(t,d) = t가 d에 한 번이라도 나타나면 1, 아니면 0
2. 로그 스케일 빈도: tf(t,d) = log (f(t,d) + 1)
3. 증가 빈도: 최빈 단어를 분모로 target 단어의 TF를 나눈 값으로, 일반적으로는 문서의 길이가 상대적으로 길 경우, 단어 빈도값을 조절하기 위해 사용한다.

In [4]:
movieGenres = pd.DataFrame(data=movies['genres'].str.split('|').apply(pd.Series, 1).stack(), columns=['genre'])
movieGenres.index = movieGenres.index.droplevel(1)
movieGenres['genre'].unique()

array(['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy',
       'Romance', 'Drama', 'Action', 'Crime', 'Thriller', 'Horror',
       'Mystery', 'Sci-Fi', 'Documentary', 'IMAX', 'War', 'Musical',
       'Western', 'Film-Noir', '(no genres listed)'], dtype=object)

모든 영화의 장르를 뽑아내기

In [5]:
movies[movieGenres['genre'].unique()] = 0

장르 컬럼만들고 0으로 초기화

In [6]:
for idx, movie in movies[:10].iterrows():
    for genre in movieGenres['genre'].unique():
        if genre in movie['genres']:
            movies.loc[idx, genre] += 1


장르 개수만큼 1을 DataFrame에 더해줌

In [7]:
tfmovie = movies.drop(columns=['imdbId','title','genres','imgurl'])
tfmovie

Unnamed: 0,movieId,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,...,Horror,Mystery,Sci-Fi,Documentary,IMAX,War,Musical,Western,Film-Noir,(no genres listed)
0,1,1,1,1,1,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,1,0,1,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,3,0,0,0,1,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,1,0,1,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9120,162672,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9121,163056,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9122,163949,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9123,164977,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## IDF(Inverse Document Frequency, 역문서 빈도)
- 단어 자체가 문서군 내에서 자주 사용되는 경우, 이것은 그 단어가 흔하게 등장한다는 것을 의미
- DF: 단어 자체가 문서군 내에서 등장하는 수(빈도)
- IDF: 전체 문서의 수(n)를 해당 단어를 포함한 문서의 수(DF)로 나눈 뒤 로그를 취하여 얻을 수 있다.   
즉, 한 단어가 문서 집합 전체에서 얼마나 공통적으로 나타나는지를 나타내는 값   

- log를 씌우는 이유: log를 사용하지 않았을 때, IDF를 DF의 역수(n/DF)로 사용한다면 총 문서의 수 n이 커질 수록, IDF의 값은 기하급수적으로 커지게 됩니다. 그렇기 때문에 log를 사용합니다.
또한, 없는 단어인 경우 0으로 표시되어 오류가 날 수 있으므로 log(n/1+DF)처럼 1을 더해주는 형태로 많이 사용된다.

In [8]:
movies = pd.read_csv('data/movielens/movies_w_imgurl.csv')
movies.head()

Unnamed: 0,movieId,imdbId,title,genres,imgurl
0,1,114709,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,https://images-na.ssl-images-amazon.com/images...
1,2,113497,Jumanji (1995),Adventure|Children|Fantasy,https://images-na.ssl-images-amazon.com/images...
2,3,113228,Grumpier Old Men (1995),Comedy|Romance,https://images-na.ssl-images-amazon.com/images...
3,4,114885,Waiting to Exhale (1995),Comedy|Drama|Romance,https://images-na.ssl-images-amazon.com/images...
4,5,113041,Father of the Bride Part II (1995),Comedy,https://images-na.ssl-images-amazon.com/images...


In [9]:
movieGenres = pd.DataFrame(data=movies['genres'].str.split('|').apply(pd.Series, 1).stack(), columns=['genre'])
movieGenres.index = movieGenres.index.droplevel(1)

movieGenres dataframe에 대해 시리즈 데이터로 쌓는다.

In [10]:
movieGenres

Unnamed: 0,genre
0,Adventure
0,Animation
0,Children
0,Comedy
0,Fantasy
...,...
9121,Fantasy
9121,Sci-Fi
9122,Documentary
9123,Comedy


Count movies that have each genre and then compute IDF of genres.

In [11]:
genres = pd.DataFrame(data=movieGenres.groupby('genre')['genre'].count())
genres.columns = ['movieCount']

totalItems = movies.shape[0]

genres['idf'] = genres['movieCount'].apply(lambda x: math.log10(totalItems/x))

genres.head()

Unnamed: 0_level_0,movieCount,idf
genre,Unnamed: 1_level_1,Unnamed: 2_level_1
(no genres listed),18,2.70496
Action,1545,0.771304
Adventure,1117,0.91218
Animation,447,1.309925
Children,583,1.194564


Join genre's IDF to movie genre DataFrame.

In [12]:
movieGenreWeights = movieGenres.join(genres['idf'], on='genre')
movieGenreWeights

Unnamed: 0,genre,idf
0,Adventure,0.912180
0,Animation,1.309925
0,Children,1.194564
0,Comedy,0.439749
0,Fantasy,1.144655
...,...,...
9121,Fantasy,1.144655
9121,Sci-Fi,1.061508
9122,Documentary,1.265628
9123,Comedy,0.439749


## TF-IDF
- 단어 빈도-역 문서 빈도
- 특정 문서 내에서 단어 빈도가 높을 수록, 그리고 전체 문서들 중 그 단어를 포함한 문서가 적을 수록 TF-IDF값이 높아진다
- 이 값을 이용하면 모든 문서에 흔하게 나타나는 단어를 걸러내는 효과를 얻을 수 있다.

In [13]:
movieWeights = movies[['movieId']]

for genre in genres.index:
    movieGenreIdf = movieGenreWeights[movieGenreWeights['genre'] == genre][['idf']]
    movieGenreIdf = movieGenreIdf.rename(columns={'idf':genre})
    movieWeights = movieWeights.join(movieGenreIdf)

movieWeights.fillna(0, inplace=True)

In [14]:
movieWeights.head()

Unnamed: 0,movieId,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,...,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,0.0,0.0,0.91218,1.309925,1.194564,0.439749,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,0.0,0.0,0.91218,0.0,1.194564,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,0.0,0.0,0.0,0.0,0.0,0.439749,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.771304,0.0,0.0,0.0,0.0
3,4,0.0,0.0,0.0,0.0,0.0,0.439749,0.0,0.0,0.320249,...,0.0,0.0,0.0,0.0,0.0,0.771304,0.0,0.0,0.0,0.0
4,5,0.0,0.0,0.0,0.0,0.0,0.439749,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Movie-Movie Cosine Similarity Matrix

In [15]:
movieWeights.iloc[:,1:].values

array([[0.       , 0.       , 0.9121797, ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 0.9121797, ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       ...,
       [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ],
       [0.       , 0.       , 0.       , ..., 0.       , 0.       ,
        0.       ]])

Compute $l_2$-norm of movies.

In [16]:
movieNorms = pd.DataFrame(data = LA.norm(movieWeights.iloc[:,1:].values, ord=2, axis=1), index=movieWeights.index, columns=['norm2'])
movieNorms

Unnamed: 0,norm2
0,2.340636
1,1.889257
2,0.887857
3,0.943848
4,0.439749
...,...
9120,1.236746
9121,1.965710
9122,1.265628
9123,0.439749


Normalize movie vector so that similarity can be computed simply by inner product between vectors.

$$ cosine(u, v)=\frac{\sum_{\forall i}{u_i v_i}}{||u||_2||v||_2}=\sum_{\forall i}{\frac{u_i v_i}{||u||_2||v||_2}}=\sum_{\forall i}{\frac{u_i}{||u||_2}\frac{v_i}{||v||_2}}=u'\cdot v'$$

In [17]:
normalizedMovieWeights = movieWeights.iloc[:, 1:].divide(movieNorms['norm2'], axis=0)

normalizedMovieWeights

Unnamed: 0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,0.0,0.00000,0.389715,0.559645,0.510359,0.187876,0.0,0.0,0.000000,0.489036,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
1,0.0,0.00000,0.482825,0.000000,0.632293,0.000000,0.0,0.0,0.000000,0.605876,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
2,0.0,0.00000,0.000000,0.000000,0.000000,0.495293,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.868726,0.000000,0.0,0.0,0.0
3,0.0,0.00000,0.000000,0.000000,0.000000,0.465911,0.0,0.0,0.339301,0.000000,0.0,0.0,0.0,0.0,0.0,0.817191,0.000000,0.0,0.0,0.0
4,0.0,0.00000,0.000000,0.000000,0.000000,1.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9120,0.0,0.00000,0.737564,0.000000,0.000000,0.000000,0.0,0.0,0.258944,0.000000,0.0,0.0,0.0,0.0,0.0,0.623656,0.000000,0.0,0.0,0.0
9121,0.0,0.39238,0.464046,0.000000,0.000000,0.000000,0.0,0.0,0.000000,0.582311,0.0,0.0,0.0,0.0,0.0,0.000000,0.540012,0.0,0.0,0.0
9122,0.0,0.00000,0.000000,0.000000,0.000000,0.000000,0.0,1.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0
9123,0.0,0.00000,0.000000,0.000000,0.000000,1.000000,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0


- 각 element에 norm2 값을 나누어 처리 진행
- 행렬*행렬T를 통해 한번에 코사인 유사도를 구하기 위한 전처리

Create item-item similarity matrix

In [18]:
sims = pd.DataFrame(data=np.matmul(normalizedMovieWeights, normalizedMovieWeights.T))

sims.index = movieWeights['movieId']
sims.columns = movieWeights['movieId']

sims

  sims = pd.DataFrame(data=np.matmul(normalizedMovieWeights, normalizedMovieWeights.T))


movieId,1,2,3,4,5,6,7,8,9,10,...,161830,161918,161944,162376,162542,162672,163056,163949,164977,164979
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.000000,0.807155,0.093054,0.087534,0.187876,0.000000,0.093054,0.642140,0.00000,0.254643,...,0.000000,0.187658,0.000000,0.000000,0.000000,0.287439,0.465617,0.0,0.187876,0.0
2,0.807155,1.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.795559,0.00000,0.315482,...,0.000000,0.232493,0.000000,0.000000,0.000000,0.356114,0.576861,0.0,0.000000,0.0
3,0.093054,0.000000,1.000000,0.940678,0.495293,0.000000,1.000000,0.000000,0.00000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.634039,0.541786,0.000000,0.0,0.495293,0.0
4,0.087534,0.000000,0.940678,1.000000,0.465911,0.000000,0.940678,0.000000,0.00000,0.000000,...,0.084356,0.000000,0.339301,0.339301,0.596426,0.597506,0.000000,0.0,0.465911,0.0
5,0.187876,0.000000,0.495293,0.465911,1.000000,0.000000,0.495293,0.000000,0.00000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,1.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
162672,0.287439,0.356114,0.541786,0.597506,0.000000,0.000000,0.541786,0.447627,0.00000,0.481932,...,0.064378,0.355158,0.258944,0.258944,0.455175,1.000000,0.342264,0.0,0.000000,0.0
163056,0.465617,0.576861,0.000000,0.000000,0.000000,0.216114,0.000000,0.281629,0.39238,0.520001,...,0.000000,0.685812,0.000000,0.000000,0.000000,0.342264,1.000000,0.0,0.000000,0.0
163949,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.00000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1.0,0.000000,1.0
164977,0.187876,0.000000,0.495293,0.465911,1.000000,0.000000,0.495293,0.000000,0.00000,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,1.000000,0.0


행렬곱을 통해 하나의 movie 와 다른 모든 movie의 코사인 유사도를 구함

## Recommend Movies based on Predicted Ratings

Read ratings as train and test datasets.

In [19]:
from IPython.display import display, HTML

np.set_printoptions(precision=2)
pd.set_option('display.precision', 2)

def displayMovies(movies, movieIds, ratings=[]):

    html = ""

    for i, movieId in enumerate(movieIds):
        movie = movies[movies['movieId'] == movieId].iloc[0]

        html += f"""
            <div style="display:inline-block;min-width:150px;max-width:150px; vertical-align:top">
                <img src='{movie.imgurl}' width=120> <br/>
                <span>{movie.title}</span> <br/>
                {f"<span>{ratings[i]}</span> <br/>" if len(ratings) > 0 else ""}
                <ul>{"".join([f"<li>{genre}</li>" for genre in movie.genres.split('|')])}</ul>
            </div>
        """

    display(HTML(html))


def getMAE(real, pred):
    errors = real - pred
    return errors.abs().mean()

def getRMSE(real, pred):
    errors = real - pred
    return math.sqrt(errors.pow(2).mean())

- movie data 시각화 함수
- 평가 지표 함수

In [20]:
ratings = pd.read_csv('data/ratings-9_1.csv')

train = ratings[ratings['type'] == 'train'][['userId', 'movieId', 'rating']]
test = ratings[ratings['type'] == 'test'][['userId', 'movieId', 'rating']]

Set test user ID

In [21]:
userId = 33

Check top rated movies of the test user

In [22]:
userRatings = train[train['userId'] == userId][['movieId', 'rating']] 

topRatings = userRatings.sort_values(by='rating', ascending=False).head(20)

topRatings

displayMovies(movies, topRatings['movieId'].values, topRatings['rating'].values)

Predict item ratings for the test users.

In [23]:
userRatings

Unnamed: 0,movieId,rating
6176,19,3.0
6177,88,3.0
6178,157,1.0
6179,231,3.0
6180,344,4.0
...,...,...
6309,5282,4.0
6310,5339,4.0
6311,5483,4.0
6312,5669,4.0


In [24]:
recSimSums = sims.loc[userRatings['movieId'].values, :].sum().values

# recSimSums = recSimSums + 1

recWeightedRatingSums = np.matmul(sims.loc[userRatings['movieId'].values, :].T.values, userRatings['rating'].values)

recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=sims.index)

recItemRatings.columns = ['pred']

  recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=sims.index)


In [25]:
'''
- user 33번 대한 Pred
- user가 시청한 movie 들의 코사인 유사도만 추려낸다.(9125,129)                                           :sims.loc[userRatings['movieId'].values, :].T.values
- user가 시청한 movie 평점을 모든 movie에 대해서 곱함 (9125,129)*(129,1)= 9125,1                        :recWeightedRatingSums
- 위 계산 값에 user가 시청한 movie에 대한 코사인 유사도들((9125,129)==>axis=0 대해 sum (9125,1)) 합         : recSimSums
으로 나누어 평균을 내줌 (9125,1)/(9125,1)

- userid 33 대한 예측값(9125,1)                                                                     :recItemRatings
'''

"\n- user 33번 대한 Pred\n- user가 시청한 movie 들의 코사인 유사도만 추려낸다.(9125,129)                                           :sims.loc[userRatings['movieId'].values, :].T.values\n- user가 시청한 movie 평점을 모든 movie에 대해서 곱함 (9125,129)*(129,1)= 9125,1                        :recWeightedRatingSums\n- 위 계산 값에 user가 시청한 movie에 대한 코사인 유사도들((9125,129)==>axis=0 대해 sum (9125,1)) 합         : recSimSums\n으로 나누어 평균을 내줌 (9125,1)/(9125,1)\n\n- userid 33 대한 예측값(9125,1)                                                                     :recItemRatings\n"

In [26]:
recItemRatings

Unnamed: 0_level_0,pred
movieId,Unnamed: 1_level_1
1,3.14
2,2.96
3,3.29
4,3.29
5,3.28
...,...
162672,3.20
163056,2.95
163949,3.21
164977,3.28


userId 33 대한 MAE, RMSE SCORE

In [None]:
userTestRatings = pd.DataFrame(data=test[test['userId'] == userId])

temp = userTestRatings.join(recItemRatings.loc[userTestRatings['movieId']], on='movieId')

mae = getMAE(temp['rating'], temp['pred'])
rmse = getRMSE(temp['rating'], temp['pred'])

print(f"MAE : {mae:.4f}")
print(f"RMSE: {rmse:.4f}")x

MAE : 0.8653
RMSE: 0.9781


In [28]:
top30Movies = recItemRatings.sort_values(by='pred', ascending=False).head(30)

displayMovies(movies, top30Movies.index, top30Movies['pred'].values)

## 모든 UserId 예측

In [None]:
# totalrecItemRatings =  pd.DataFrame(index=sims.index)
TotalMAE = []
TotalRMSE = []
for userId in train['userId'].unique():
    userRatings = train[train['userId'] == userId][['movieId', 'rating']]

    recSimSums = sims.loc[userRatings['movieId'].values, :].sum().values

    recWeightedRatingSums = np.matmul(sims.loc[userRatings['movieId'].values, :].T.values, userRatings['rating'].values)
    
    recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=sims.index)
    recItemRatings.columns = ['pred']
    
    userTestRatings = pd.DataFrame(data=test[test['userId'] == userId])

    temp = userTestRatings.join(recItemRatings.loc[userTestRatings['movieId']], on='movieId')

    mae = getMAE(temp['rating'], temp['pred'])
    rmse = getRMSE(temp['rating'], temp['pred'])
    TotalMAE.append(mae)
    TotalRMSE.append(rmse)

## Score 매길 수 없는 userId

- userId 중 너무 sparse하여 recSimSums 과 recWeightedRatingSums 0인 movie가 있음(0/0=nan)
- 때문에 nan 값이 나오게 되는 movie 평점들이 존재하게 되었으며
- userId test 와 비교하여 nan인 값에 대해 비교하면 mae, rmse가 nan이 나오게됨
- 따라서 mae,rmse 값이 nan 이 나오는 userId 대해 전처리후 mae,rmse score를 계산

In [30]:

for userId in [186]:
    userRatings = train[train['userId'] == userId][['movieId', 'rating']]

    recSimSums = sims.loc[userRatings['movieId'].values, :].sum().values

    recWeightedRatingSums = np.matmul(sims.loc[userRatings['movieId'].values, :].T.values, userRatings['rating'].values)
    
    recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=sims.index)
    recItemRatings.columns = ['pred']
    
    userTestRatings = pd.DataFrame(data=test[test['userId'] == userId])

    temp = userTestRatings.join(recItemRatings.loc[userTestRatings['movieId']], on='movieId')

    mae = getMAE(temp['rating'], temp['pred'])
    rmse = getRMSE(temp['rating'], temp['pred'])
print(mae)
print(rmse)

nan
nan


  recItemRatings = pd.DataFrame(data = np.divide(recWeightedRatingSums, recSimSums), index=sims.index)


## All RMSE, MAE SCORE

In [54]:
print(f"MAE : {pd.DataFrame(TotalMAE).dropna().mean()}")
print(f"RMSE : {pd.DataFrame(TotalRMSE).dropna().mean()}")

MAE : 0    0.72
dtype: float64
RMSE : 0    0.86
dtype: float64


Compute MAE and RMSE for the test user.