# **E15. 영화 추천 알고리즘**

**INDEX**

- 00. 필요한 모듈 가져오기

- 01. 데이터 준비 & 전처리

- 02. 분석하기

- 03. 데이터 추가

- 04. CSR Matrix 생성

- 05. 모델 훈련

- 06. 훈련 상태 확인

- 07. 가설 확인

- 08. 기존 데이터셋중에 고르면?

- 09. 회고

---

## **00. 필요한 모듈 가져오기**

In [1]:
from google.colab import drive
drive.mount('/content/mydrive')

Mounted at /content/mydrive


In [2]:
!pip install implicit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting implicit
  Downloading implicit-0.5.2-cp37-cp37m-manylinux2014_x86_64.whl (18.5 MB)
[K     |████████████████████████████████| 18.5 MB 1.3 MB/s 
Installing collected packages: implicit
Successfully installed implicit-0.5.2


In [3]:
import os
import numpy as np
import pandas as pd

from implicit.als import AlternatingLeastSquares
from scipy.sparse import csr_matrix

print("Done!")

Done!


## **01. 데이터 준비 & 전처리**

In [4]:
rating_file_path = '/content/mydrive/MyDrive/AIFFEL/E15/data/ratings.dat'
ratings_cols = ['user_id', 'movie_id', 'ratings', 'timestamp']
ratings = pd.read_csv(rating_file_path, sep='::', names=ratings_cols, engine='python', encoding = "ISO-8859-1")
original_data_size = len(ratings)
ratings.head()

Unnamed: 0,user_id,movie_id,ratings,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [5]:
# 3점 이상만 남기기
ratings = ratings[ratings['ratings'] >= 3]
filtered_data_size = len(ratings)

print(f'original data size: {original_data_size}, filtered data size: {filtered_data_size}')
print(f'Ratio of remaining data is {filtered_data_size / original_data_size}')

original data size: 1000209, filtered data size: 836478
Ratio of remaining data is 0.8363032126285607


In [6]:
# rating 컬럼 이름 변경
ratings.rename(columns = {'ratings':'counts'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [7]:
ratings['counts']

0          5
1          3
2          3
3          4
4          5
          ..
1000203    3
1000205    5
1000206    5
1000207    4
1000208    4
Name: counts, Length: 836478, dtype: int64

In [8]:
# 영화 제목을 보기 위해 메타 데이터 읽어오기
movie_file_path = '/content/mydrive/MyDrive/AIFFEL/E15/data/movies.dat'
cols = ['movie_id', 'title', 'genre'] 
movies = pd.read_csv(movie_file_path, sep='::', names=cols, engine='python', encoding='ISO-8859-1')
movies.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [9]:
# 유저 정보 읽어오기
user_file_path = '/content/mydrive/MyDrive/AIFFEL/E15/data/users.dat'
cols = ['user_id', 'gender', 'age', 'occupation', 'zip-code'] 
users = pd.read_csv(user_file_path, sep='::', names = cols, engine='python', encoding='ISO-8859-1')
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip-code
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


---

## **02. 분석하기**

In [10]:
ratings.head()

Unnamed: 0,user_id,movie_id,counts,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [11]:
# ratings의 유니크 영화 개수
ratings['movie_id'].nunique()

3628

In [12]:
# ratings에 있는 유니크한 사용자 수
ratings['user_id'].nunique()

6039

In [13]:
# 가장 인기 있는 영화 30개(인기순)
# 인기 있는 = 많이 봤다 (counts가 높다)

movie_count = ratings.groupby('movie_id')['user_id'].count()
top30 = movie_count.sort_values(ascending=False)[:30]

In [14]:
# 30개의 영화 누적 count수
for i, k in zip(top30.index, top30.values):
    print(movies[movies['movie_id']==i]['title'].values[0], '\ncount:', k, '\n')

American Beauty (1999) 
count: 3211 

Star Wars: Episode IV - A New Hope (1977) 
count: 2910 

Star Wars: Episode V - The Empire Strikes Back (1980) 
count: 2885 

Star Wars: Episode VI - Return of the Jedi (1983) 
count: 2716 

Saving Private Ryan (1998) 
count: 2561 

Terminator 2: Judgment Day (1991) 
count: 2509 

Silence of the Lambs, The (1991) 
count: 2498 

Raiders of the Lost Ark (1981) 
count: 2473 

Back to the Future (1985) 
count: 2460 

Matrix, The (1999) 
count: 2434 

Jurassic Park (1993) 
count: 2413 

Sixth Sense, The (1999) 
count: 2385 

Fargo (1996) 
count: 2371 

Braveheart (1995) 
count: 2314 

Men in Black (1997) 
count: 2297 

Schindler's List (1993) 
count: 2257 

Princess Bride, The (1987) 
count: 2252 

Shakespeare in Love (1998) 
count: 2213 

L.A. Confidential (1997) 
count: 2210 

Shawshank Redemption, The (1994) 
count: 2194 

Godfather, The (1972) 
count: 2167 

Groundhog Day (1993) 
count: 2121 

E.T. the Extra-Terrestrial (1982) 
count: 2102 

Being J

## **03. 데이터 추가**

In [47]:
# 내가 좋아하는 영화 5가지 추가
# 있는지 확인

# 인타임
movies[movies['title'].str.contains('In Time')]

Unnamed: 0,movie_id,title,genre


In [48]:
# 라이프오브파이
movies[movies['title'].str.contains('Life of Pi')]

Unnamed: 0,movie_id,title,genre


In [49]:
# 다크 나이트
movies[movies['title'].str.contains('Knight')]

Unnamed: 0,movie_id,title,genre
166,168,First Knight (1995),Action|Adventure|Drama|Romance
324,328,Tales From the Crypt Presents: Demon Knight (1...,Horror
3492,3561,Stacy's Knights (1982),Drama
3550,3619,"Hollywood Knights, The (1980)",Comedy
3736,3805,Knightriders (1981),Action|Adventure|Drama


In [50]:
# 더 플랫폼
movies[movies['title'].str.contains('Platform')]

Unnamed: 0,movie_id,title,genre


In [51]:
# 덩케르크
movies[movies['title'].str.contains('Dunkirk')]

Unnamed: 0,movie_id,title,genre


데이터셋이 오래 되서 그런지 다 없습니다.

추가해 주겠습니다.

In [52]:
ratings.head()

Unnamed: 0,user_id,movie_id,counts,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [53]:
movies.tail()

Unnamed: 0,movie_id,title,genre
3878,3948,Meet the Parents (2000),Comedy
3879,3949,Requiem for a Dream (2000),Drama
3880,3950,Tigerland (2000),Drama
3881,3951,Two Family House (2000),Drama
3882,3952,"Contender, The (2000)",Drama|Thriller


In [54]:
users.tail()

Unnamed: 0,user_id,gender,age,occupation,zip-code
6035,6036,F,25,15,32603
6036,6037,F,45,1,76006
6037,6038,F,56,1,14706
6038,6039,F,45,0,1060
6039,6040,M,25,6,11106


In [73]:
new_rat_col = ratings.columns
new_mov_col = movies.columns
new_user_col = users.columns

new_rat = ratings.copy()
new_mov = movies.copy()
new_user = users.copy()

# ratings에 추가할 df
my_ratings = [[6041, 3953, 5, 0],
              [6041, 3954, 5, 0],
              [6041, 3955, 4, 0],
              [6041, 3956, 4, 0],
              [6041, 3957, 5, 0]]

# movies에 추가할 df
my_movies = [[3953, 'The Platform', 'Thriller'],
             [3954, 'Dunkirk', 'War'],
             [3955, 'Life of Pi', 'Adventure'],
             [3956, 'In Time', 'Action'],
             [3957, 'Batman: the Dark Knight', 'Action']]

# users에 추가할 df
its_me = [[6041, 'M', 28, 4, 00000]]

print("Done!")

Done!


In [74]:
# ratings 데이터 추가
my_rat_df = pd.DataFrame(data=my_ratings, columns=new_rat_col)
new_ratings = pd.concat([new_rat, my_rat_df], axis=0)

# 인덱스 정리
new_ratings.reset_index(drop=True, inplace=True)
new_ratings.tail(8)

Unnamed: 0,user_id,movie_id,counts,timestamp
836475,6040,562,5,956704746
836476,6040,1096,4,956715648
836477,6040,1097,4,956715569
836478,6041,3953,5,0
836479,6041,3954,5,0
836480,6041,3955,4,0
836481,6041,3956,4,0
836482,6041,3957,5,0


In [57]:
# movies 데이터 추가
my_mov_df = pd.DataFrame(data=my_movies, columns=new_mov_col)
new_movies = pd.concat([new_mov, my_mov_df], axis=0)

# 인덱스 정리
new_movies.reset_index(drop=True, inplace=True)
new_movies.tail(8)

Unnamed: 0,movie_id,title,genre
3880,3950,Tigerland (2000),Drama
3881,3951,Two Family House (2000),Drama
3882,3952,"Contender, The (2000)",Drama|Thriller
3883,3953,The Platform,Thriller
3884,3954,Dunkirk,War
3885,3955,Life of Pi,Adventure
3886,3956,In Time,Action
3887,3957,Batman: the Dark Knight,Action


In [58]:
# users 데이터 추가
my_user_df = pd.DataFrame(data=its_me, columns=new_user_col)
new_users = pd.concat([new_user, my_user_df], axis=0)

# 인덱스 정리
new_users.reset_index(drop=True, inplace=True)
new_users.tail(8)

Unnamed: 0,user_id,gender,age,occupation,zip-code
6033,6034,M,25,14,94117
6034,6035,F,25,1,78734
6035,6036,F,25,15,32603
6036,6037,F,45,1,76006
6037,6038,F,56,1,14706
6038,6039,F,45,0,1060
6039,6040,M,25,6,11106
6040,6041,M,28,4,0


---

## **04. CSR Matrix 생성**

In [59]:
num_users = new_ratings['user_id'].nunique()
num_movies = new_ratings['movie_id'].nunique()

csr_data = csr_matrix((new_ratings['counts'], (new_ratings.user_id, new_ratings.movie_id)))
csr_data

<6042x3958 sparse matrix of type '<class 'numpy.longlong'>'
	with 836483 stored elements in Compressed Sparse Row format>

---

## **05. 모델 훈련**

In [60]:
os.environ['OPENBLAS_NUM_THREADS']='1'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
os.environ['MKL_NUM_THREADS']='1'

print("Done!")

Done!


In [61]:
'''
factors : 유저와 아이템의 벡터를 몇 차원으로 할 것인지
regularization : 과적합을 방지하기 위해 정규화 값을 얼마나 사용할 것인지
use_gpu : GPU를 사용할 것인지
iterations : epochs와 같은 의미입니다. 데이터를 몇 번 반복해서 학습할 것인지
'''

factors = 100
regularization = 0.01
use_gpu = False
iterations = 20

# 모델 선언
als_model = AlternatingLeastSquares(factors=factors, regularization=regularization,
                                    use_gpu=use_gpu, iterations=iterations, dtype=np.float32)
print("Done!")

Done!


In [62]:
# 모델 훈련
als_model.fit(csr_data)

  0%|          | 0/20 [00:00<?, ?it/s]

---

## **06. 훈련 상태 확인**

In [63]:
# 영화 id 가져오는 함수
def movie_name_to_id(name):
    return new_movies[new_movies['title']==name]['movie_id'].values[0]

print("Done!")

Done!


In [64]:
my_vector = als_model.user_factors[6041]
in_time = als_model.item_factors[movie_name_to_id('In Time')]

print("Done!")

Done!


In [65]:
my_vector

array([ 4.00223257e-03,  1.68437837e-03, -4.00686497e-03, -6.45541213e-03,
        3.45533434e-03,  3.17071157e-04, -8.85669215e-05, -6.65653916e-03,
       -1.75411068e-03, -3.48537316e-04,  5.83201181e-03,  5.05269319e-03,
       -7.02544674e-03,  4.85215988e-03,  1.70353509e-03, -1.46722398e-03,
        1.37464271e-03,  3.89331050e-04, -8.22169066e-04, -4.86947270e-03,
       -9.86661995e-04,  9.55093931e-03,  1.98917603e-03, -3.93230992e-04,
       -1.19078955e-04, -4.28068248e-04, -5.59502980e-03, -5.09099846e-05,
       -5.96548012e-03,  7.76893599e-03,  5.47136366e-03,  6.96352276e-04,
        6.45817025e-03,  1.23612839e-03, -8.85134621e-04,  2.40942393e-03,
       -3.07470979e-03, -5.86367212e-03,  1.17189274e-03, -2.72993720e-03,
       -3.15979286e-03, -1.71567500e-03,  8.17230251e-03, -5.17968042e-03,
        3.60231148e-03, -7.79571710e-03, -1.67598645e-03, -3.14962352e-03,
        3.29306209e-03,  1.04140176e-03,  6.36393379e-04, -6.53559435e-03,
        9.16716736e-03, -

In [66]:
in_time

array([ 2.76467210e-04,  2.07729085e-04,  5.24357602e-05,  9.71066765e-06,
        2.83809408e-04,  1.55006666e-04,  1.04994884e-04,  5.04001473e-05,
        1.46496692e-04,  1.03456419e-04,  2.52105121e-04,  1.54754089e-04,
       -5.04255149e-06,  2.78761931e-04,  1.55415648e-04,  8.33155063e-05,
        2.83583475e-04,  1.75557070e-04,  2.07899546e-04,  2.73987607e-05,
        1.71125110e-04,  3.74867785e-04,  8.66589035e-05,  1.81787749e-04,
        1.05187159e-04,  2.06221739e-04,  2.08843721e-05,  2.13963431e-04,
        5.68747309e-05,  2.50081910e-04,  3.06202914e-04,  1.24267375e-04,
        3.56679055e-04,  1.61717282e-04,  1.64365789e-04,  2.00016468e-04,
        1.34589252e-04,  1.58017301e-05,  1.32482892e-04,  6.12801887e-05,
        1.04893203e-04,  1.45062542e-04,  3.55627824e-04,  1.16580515e-04,
        2.30550781e-04, -6.86167914e-05,  1.65055753e-04,  5.51833873e-05,
        2.15814362e-04,  2.07277408e-04,  6.73404356e-05,  3.15628063e-06,
        3.71608126e-04,  

In [67]:
np.dot(my_vector, in_time)

4.592808e-05

흠, 내적이 상당히 맞지 않는데 한번 다른 영화들은 어떤지 확인해보겠습니다.

In [68]:
def mov_vec(movie_name):
    return als_model.item_factors[movie_name_to_id(movie_name)]

print("Done!")

Done!


In [69]:
np.dot(my_vector, mov_vec('Toy Story (1995)'))

0.00043907468

여전히 내적값이 매우 적습니다.

아무래도 좋아하는 영화 5개가 전부 새로운 데이터다 보니 학습이 불평등하게 진행된 모양입니다.

그렇다면 기존 데이터에 있던 영화를 favorite으로 삼아 추천을 받아보겠습니다.

---

## **07. 가설 확인**

In [75]:
# ratings에 추가할 new df
my_ratings2 = [[6041, 1, 5, 0],
               [6041, 2, 5, 0],
               [6041, 3, 5, 0],
               [6041, 4, 5, 0],
               [6041, 5, 5, 0]]

In [76]:
# ratings 데이터 추가
my_rat_df2 = pd.DataFrame(data=my_ratings2, columns=new_rat_col)
new_ratings2 = pd.concat([new_ratings, my_rat_df2], axis=0)

# 인덱스 정리
new_ratings2.reset_index(drop=True, inplace=True)
new_ratings2.tail(8)

Unnamed: 0,user_id,movie_id,counts,timestamp
836480,6041,3955,4,0
836481,6041,3956,4,0
836482,6041,3957,5,0
836483,6041,1,5,0
836484,6041,2,5,0
836485,6041,3,5,0
836486,6041,4,5,0
836487,6041,5,5,0


In [77]:
num_users = new_ratings2['user_id'].nunique()
num_movies = new_ratings2['movie_id'].nunique()

csr_data = csr_matrix((new_ratings2['counts'], (new_ratings2.user_id, new_ratings2.movie_id)))
csr_data

<6042x3958 sparse matrix of type '<class 'numpy.longlong'>'
	with 836488 stored elements in Compressed Sparse Row format>

In [78]:
# 모델 선언
als_model = AlternatingLeastSquares(factors=factors, regularization=regularization,
                                    use_gpu=use_gpu, iterations=iterations, dtype=np.float32)

# 모델 훈련
als_model.fit(csr_data)

  0%|          | 0/20 [00:00<?, ?it/s]

In [79]:
my_vector = als_model.user_factors[6041]
toy_story = als_model.item_factors[movie_name_to_id('Toy Story (1995)')]

np.dot(my_vector, toy_story)

0.48982108

정상적으로 나오는군요!

이로써 <데이터가 적어서 똑바로 작동을 못한다>는 가설은 확인되었습니다.

기왕에 데이터를 추가했으니 한번 비슷한 영화 추천도 받아보죠.

In [80]:
# 내가 좋아하는 영화와 비슷한 영화 추천

in_time_id = movie_name_to_id('In Time')
life_of_pi_id = movie_name_to_id('Life of Pi')
dunkirk_id = movie_name_to_id('Dunkirk')
dark_knight_id = movie_name_to_id('Batman: the Dark Knight')
the_platform_id = movie_name_to_id('The Platform')

print("Done!")

Done!


In [81]:
similar_movie = als_model.similar_items(in_time_id, N=5)
similar_movie[0]

array([3956, 3955, 3954, 3957, 3953], dtype=int32)

음... 똑바로 추천을 못하는군요.

다른 영화를 찾아봅시다.

In [82]:
similar_movie = als_model.similar_items(life_of_pi_id, N=5)
similar_movie[0]

array([3955, 3956, 3957, 3954, 3953], dtype=int32)

In [83]:
similar_movie = als_model.similar_items(dunkirk_id, N=5)
similar_movie[0]

array([3954, 3953, 3956, 3955, 3957], dtype=int32)

In [84]:
similar_movie = als_model.similar_items(dark_knight_id, N=5)
similar_movie[0]

array([3957, 3955, 3956, 3954, 3953], dtype=int32)

In [85]:
similar_movie = als_model.similar_items(the_platform_id, N=5)
similar_movie[0]

array([3953, 3954, 3956, 3955, 3957], dtype=int32)

아무래도 아직 제 취향을 잘 모르는 것 같습니다.

그렇다면 과연 좋아할만한 영화 추천은 잘할 수 있을까요?

In [87]:
# user 추천
user = 6041
movie_recommended = als_model.recommend(user, csr_data, N=10, filter_already_liked_items=False)
movie_recommended

(array([3114,    1,    2,  317,    3, 2355, 3489,  367,  586,  588],
       dtype=int32),
 array([0.5289215 , 0.48982108, 0.30776945, 0.26866412, 0.2502007 ,
        0.23429604, 0.22900808, 0.22796173, 0.21754825, 0.2161132 ],
       dtype=float32))

In [88]:
movie_recommended[0]

array([3114,    1,    2,  317,    3, 2355, 3489,  367,  586,  588],
      dtype=int32)

In [89]:
for id in (movie_recommended[0]):
    print(movies.loc[(movies['movie_id'] == id)]['title'])

3045    Toy Story 2 (1999)
Name: title, dtype: object
0    Toy Story (1995)
Name: title, dtype: object
1    Jumanji (1995)
Name: title, dtype: object
314    Santa Clause, The (1994)
Name: title, dtype: object
2    Grumpier Old Men (1995)
Name: title, dtype: object
2286    Bug's Life, A (1998)
Name: title, dtype: object
3420    Hook (1991)
Name: title, dtype: object
363    Mask, The (1994)
Name: title, dtype: object
582    Home Alone (1990)
Name: title, dtype: object
584    Aladdin (1992)
Name: title, dtype: object


쥬만지, 토이스토리, 벅스라이프, 나홀로 집에, 알라딘은 아는 영화들입니다.

제 취향에도 맞는 영화들입니다. 나머지 영화는 오래된 것들이라 잘 모르겠네요.

이게 바로 Cold Start 문제인 것 같습니다.

그렇다면 만약 제가 좋아하는 영화 5개를 기존 데이터셋 중에 골라서 해보면 다를까요?

---

## **08. 기존 데이터셋중에 고르면?**

In [90]:
ratings.tail()

Unnamed: 0,user_id,movie_id,counts,timestamp
1000203,6040,1090,3,956715518
1000205,6040,1094,5,956704887
1000206,6040,562,5,956704746
1000207,6040,1096,4,956715648
1000208,6040,1097,4,956715569


In [91]:
final_rat = ratings.copy()

# ratings 데이터 추가
my_rat_df3 = pd.DataFrame(data=my_ratings2, columns=new_rat_col)
final_ratings = pd.concat([final_rat, my_rat_df3], axis=0)

# 인덱스 정리
final_ratings.reset_index(drop=True, inplace=True)
final_ratings.tail(8)

Unnamed: 0,user_id,movie_id,counts,timestamp
836475,6040,562,5,956704746
836476,6040,1096,4,956715648
836477,6040,1097,4,956715569
836478,6041,1,5,0
836479,6041,2,5,0
836480,6041,3,5,0
836481,6041,4,5,0
836482,6041,5,5,0


In [92]:
num_users = new_ratings2['user_id'].nunique()
num_movies = new_ratings2['movie_id'].nunique()

csr_data = csr_matrix((new_ratings2['counts'], (new_ratings2.user_id, new_ratings2.movie_id)))
csr_data

<6042x3958 sparse matrix of type '<class 'numpy.longlong'>'
	with 836488 stored elements in Compressed Sparse Row format>

In [93]:
# 모델 선언
als_model = AlternatingLeastSquares(factors=factors, regularization=regularization,
                                    use_gpu=use_gpu, iterations=iterations, dtype=np.float32)

# 모델 훈련
als_model.fit(csr_data)

  0%|          | 0/20 [00:00<?, ?it/s]

In [96]:
movies.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [98]:
# 내가 좋아하는 영화와 비슷한 영화 추천

toy_story_id = movie_name_to_id('Toy Story (1995)')
jumanji_id = movie_name_to_id('Jumanji (1995)')
grumpier_id = movie_name_to_id('Grumpier Old Men (1995)')
exhale_id = movie_name_to_id('Waiting to Exhale (1995)')
father_id = movie_name_to_id('Father of the Bride Part II (1995)')

print("Done!")

Done!


In [99]:
similar_movie = als_model.similar_items(toy_story_id, N=5)
similar_movie[0]

array([   1, 3114,  588, 2355,   34], dtype=int32)

In [100]:
similar_movie = als_model.similar_items(jumanji_id, N=5)
similar_movie[0]

array([   2, 3489,   60,  653, 2162], dtype=int32)

In [101]:
similar_movie = als_model.similar_items(grumpier_id, N=5)
similar_movie[0]

array([   3, 3450,  276,  804, 3953], dtype=int32)

In [102]:
similar_movie = als_model.similar_items(exhale_id, N=5)
similar_movie[0]

array([   4,  218, 1410, 1621, 2154], dtype=int32)

In [103]:
similar_movie = als_model.similar_items(father_id, N=5)
similar_movie[0]

array([   5, 2082,  186,  355, 2953], dtype=int32)

확실히 데이터가 많아야 여러 다른 콘텐츠들을 추천하는 걸 확인할 수 있습니다.

---

## **09. 회고**

- cold start 문제를 눈으로 확인할 수 있었는데, 그래도 그 와중에 (장르라던가  영화의 아이덴티티가 좀 겹치긴 해도) 나름대로 다른 영화들을 추천하는게 신기했습니다. 어떻게 하는거지?

- 중간에 오류 때문에 라이브러리 깃허브에 찾아가서 코드 뜯어보느라 시간이 오래 걸렸습니다.