#스타워즈 팬이 좋아할 만한 다른 영화 찾기

개발 환경
<br/>데이터 정보

데이터 전처리
<br/>암묵적 피드백
<br/>고유값
<br/>결측치 처리

데이터 탐색
<br/>영화 랭킹
<br/>영화 리뷰

모델 구성
<br/>CSR Matrix
<br/>하이퍼 파라미터

모델 학습
<br/>모델 평가
<br/>Similar Items
<br/>Recommendation
<br/>결론
<br/>참고문헌

# 개발 환경

In [None]:
!pip install implicit

In [None]:
import os
import random
import pandas as pd
import numpy as np

In [None]:
import scipy
from scipy.sparse import csr_matrix

In [None]:
import implicit
from implicit.als import AlternatingLeastSquares

os.environ['OPENBLAS_NUM_THREADS']='1'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
os.environ['MKL_NUM_THREADS']='1'

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
pip freeze > '/content/drive/MyDrive/lms/library_version.txt'

In [None]:
library_name = ['pandas=', 'numpy=', 'scipy=', 'implicit=']
library_version = []
f = open('/content/drive/MyDrive/lms/library_version.txt', 'r')
line = f.readline()
while True:
    line = f.readline()
    if not line:
      break
    for i in library_name:
      if i in line:
        library_version.append(line)
        library_version.append('    ')

f.close()

import sys
print(sys.version)
print()

for i in range(0, len(library_version) - 1, 6):
  print(str(library_version[i : i+6]).replace("[","").replace("]","").replace("'","").replace("\\n","").replace(",",""), end='') 
  if i % 6 == 0:
    print()

for i in range(len(library_version) - 1):
  if (i-1) % 6 == 0 and i == len(library_version) - 6:
    print(str(library_version[-1]).replace("[","").replace("]","").replace("'","").replace("\\n","").replace(",",""), end='')

3.7.13 (default, Apr 24 2022, 01:04:09) 
[GCC 7.5.0]

implicit==0.5.2      numpy==1.21.6      pandas==1.3.5     
scipy==1.4.1      sklearn-pandas==1.8.0     


In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Google Colab에서 할당된 GPU를 확인한다.
<br/>고용량 메모리 VM에 액세스한다.

#데이터 정보

[MovieLens 1M Dataset](https://grouplens.org/datasets/movielens/)

GroupLens Research가 [MovieLens](https://movielens.org)에서  영화 3,900편에 대한 6,040명의 사용자의 평점 1,000,209개를 수집한 데이터셋이다.
<br/>논문 [The MovieLens Datasets: History and Contex](https://grouplens.org/blog/movielens-datasets-context-and-history/)에서는 MovieLens 데이터셋의 역사에 대해서 이야기한다.


In [None]:
rating_file_path = '/content/drive/MyDrive/lms/recommender_system/ml-1m/ratings.dat'
ratings_cols = ['user_id', 'movie_id', 'ratings', 'timestamp']
ratings = pd.read_csv(rating_file_path, sep='::', names=ratings_cols, engine='python', encoding = "ISO-8859-1")
orginal_data_size = len(ratings)
ratings.head()

Unnamed: 0,user_id,movie_id,ratings,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [None]:
movie_file_path = '/content/drive/MyDrive/lms/recommender_system/ml-1m/movies.dat'
cols = ['movie_id', 'title', 'genre'] 
movies = pd.read_csv(movie_file_path, sep='::', names=cols, engine='python', encoding='ISO-8859-1')
movies.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


#데이터 전처리

##암묵적 피드백

In [None]:
ratings = ratings[ratings['ratings']>=3]
filtered_data_size = len(ratings)

print(f'orginal_data_size: {orginal_data_size}, filtered_data_size: {filtered_data_size}')
print(f'Ratio of Remaining Data is {filtered_data_size / orginal_data_size:.2%}')

orginal_data_size: 1000209, filtered_data_size: 836478
Ratio of Remaining Data is 83.63%


현재 가지고 있는 데이터는 사용자가 영화의 평점으로 몇 점을 주었냐이다.
<br/>이렇게 서비스를 사용하면서 자연스럽게 발생하는 암묵적(implicit)인 피드백도
<br/>사용자의 아이템에 대한 평가를 알 수 있는 단서가 될 수 있다.

앞으로 만들어갈 모델에서는 암묵적 데이터의 해석을 위해 다음과 같은 규칙을 적용한다.
<br/>3점이 넘으면 선호한다고 판단한다.
<br/>평점을 높게 준 영화에 대해 가중치를 주어서 더 확실히 좋아한다고 판단한다.

##고유값

In [None]:
num_user = ratings['user_id'].nunique()
num_movie = ratings['movie_id'].nunique()

In [None]:
num_user

6039

In [None]:
num_movie

3628

##결측치 처리

In [None]:
user = ratings['user_id'].unique()
movie = ratings['movie_id'].unique()

In [None]:
user_to_idx = {v:k  for k,v in enumerate(user)}
title_to_idx = {v:k for k,v in enumerate(movie)}

In [None]:
for key, value in user_to_idx.items():
    if 3595 <= key <= 3600: 
      print(key, value)

3595 3594
3596 3595
3597 3596
3599 3597
3600 3598


In [None]:
condition = (ratings['user_id'] == 3598)
ratings[condition]

Unnamed: 0,user_id,movie_id,ratings,timestamp


데이터프레임의 빈 행이 존재한다.

In [None]:
ratings.isnull().sum()

user_id      0
movie_id     0
ratings      0
timestamp    0
dtype: int64

그러나 결측치가 없다고 하는 이상한 결과가 나온다.

#데이터 탐색

##영화 랭킹

In [None]:
movies_ratings = pd.merge(movies, ratings)
cols = ['user_id','title', 'ratings']
movies_ratings = movies_ratings[cols]

In [None]:
movies_ratings

Unnamed: 0,user_id,title,ratings
0,1,Toy Story (1995),5
1,6,Toy Story (1995),4
2,8,Toy Story (1995),4
3,9,Toy Story (1995),5
4,10,Toy Story (1995),5
...,...,...,...
836473,5682,"Contender, The (2000)",3
836474,5812,"Contender, The (2000)",4
836475,5831,"Contender, The (2000)",3
836476,5837,"Contender, The (2000)",4


In [None]:
movies_ratings['title'] = movies_ratings['title'].str.lower() 
movies_count = movies_ratings.groupby('title')['user_id'].count()
popluar_thirty_movies = movies_count.sort_values(ascending=False)[:30]

In [None]:
popluar_thirty_movies = pd.DataFrame(popluar_thirty_movies.reset_index())
popluar_thirty_movies.index = popluar_thirty_movies.index + 1
popluar_thirty_movies = popluar_thirty_movies.reset_index()

In [None]:
popluar_thirty_movies.columns = ['rank', 'title', 'counts']
popluar_thirty_movies = popluar_thirty_movies[['title', 'rank', 'counts']]

In [None]:
popluar_thirty_movies

Unnamed: 0,title,rank,counts
0,american beauty (1999),1,3211
1,star wars: episode iv - a new hope (1977),2,2910
2,star wars: episode v - the empire strikes back...,3,2885
3,star wars: episode vi - return of the jedi (1983),4,2716
4,saving private ryan (1998),5,2561
5,terminator 2: judgment day (1991),6,2509
6,"silence of the lambs, the (1991)",7,2498
7,raiders of the lost ark (1981),8,2473
8,back to the future (1985),9,2460
9,"matrix, the (1999)",10,2434


##영화 리뷰

In [None]:
favorite_movies_search_keyword = ['star wars: episode v', 'space odyssey', 'titanic', 'lion king', 'pulp fiction']
keyword = '|'.join(favorite_movies_search_keyword)
favorite_movies_ratings = movies_ratings[movies_ratings['title'].str.contains(keyword)]

In [None]:
favorite_movies_count = favorite_movies_ratings.groupby('title')['user_id'].count()
favorite_movies_search = favorite_movies_count.sort_values(ascending=False)
favorite_movies_search = pd.DataFrame(favorite_movies_search.reset_index())
favorite_movies_search.columns = ['title', 'counts']

In [None]:
favorite_movies_search

Unnamed: 0,title,counts
0,star wars: episode v - the empire strikes back...,2885
1,star wars: episode vi - return of the jedi (1983),2716
2,pulp fiction (1994),2030
3,2001: a space odyssey (1968),1568
4,titanic (1997),1270
5,"lion king, the (1994)",1029
6,titanic (1953),179
7,raise the titanic (1980),20
8,"chambermaid on the titanic, the (1998)",15


좋아하는 영화가 데이터셋에 존재하는지 검색한다.

In [None]:
favorite_title_list = ['star wars: episode v - the empire strikes back (1980)', 'pulp fiction (1994)',
                       '2001: a space odyssey (1968)', 'titanic (1997)', 'lion king, the (1994)']

In [None]:
favorite_ratings_list = []
ratings_list = [4, 5]

for i in range(5):
  if i == 2:
    out = random.sample(ratings_list, i)
    favorite_ratings_list.append(out)
  elif i >= 3:
    out = random.sample(ratings_list, i-2)
    favorite_ratings_list.append(out)

favorite_ratings_list = sum(favorite_ratings_list, [])

In [None]:
favorite_ratings_list

[5, 4, 4, 4, 5]

좋아하는 영화에 랜덤으로 평점을 부여한다.

In [None]:
favorite_movies = pd.DataFrame({'user_id': ['6041']*5,
                            'title': favorite_title_list,
                            'ratings':favorite_ratings_list})

if not movies_ratings.isin({'user_id':['6041']})['user_id'].any():  
    movies_ratings_update = movies_ratings.append(favorite_movies) 

movies_ratings_update = movies_ratings_update.astype({'user_id':'int'})
movies_ratings_update = movies_ratings_update.sort_values(by='user_id')
movies_ratings_update = movies_ratings_update.reset_index()
movies_ratings_update = movies_ratings_update[['user_id', 'title', 'ratings']]

In [None]:
movies_ratings_update 

Unnamed: 0,user_id,title,ratings
0,1,toy story (1995),5
1,1,"secret garden, the (1993)",4
2,1,schindler's list (1993),5
3,1,"wizard of oz, the (1939)",4
4,1,titanic (1997),4
...,...,...,...
836478,6041,titanic (1997),4
836479,6041,star wars: episode v - the empire strikes back...,5
836480,6041,pulp fiction (1994),4
836481,6041,2001: a space odyssey (1968),4


좋아하는 영화 5편을 데이터셋에 추가한다.
<br/>새로운 사용자의 데이터가 추가됐으므로 user_id는 6041이다.

#모델 구성

##CSR Matrix

Matrix Factorization(MF)

협업 필터링(Collaborative Filtering)은 평가 행렬을 전제로 한다.

m명의 사용자가 n편의 영화에 대해
<br/>평가한 데이터를 포함한 (m,n) 사이즈의 평가 행렬(Rating Matrix)을 만든다.


MF 모델은 Rating Matrix R을 두 개의 Feature Matrix P와 Q로 분해한다.

벡터를 만드는 기준은 사용자 i의 벡터(U_i)와 아이템 j의 벡터(I_j)를 내적했을 때 <br/>사용자 i가 아이템 j에 대해 평가한 수치(M_ij)와 비슷한지 아닌지이다.

CSR(Compressed Sparse Row) Matrix

사용자는 6,040명이고 영화는 3,559편이다.
<br/>이를 행렬로 표현하고 행렬의 각 원소에 정수 한 개 (1byte)가 들어간다면
<br/>6040 * 3559 * 1byte ≈ 21.50MB가 필요하다.

모델 학습의 input으로 사용할 데이터 타입을 CSR Matrix로 한다.
<br/>CSR Matrix는 Sparse한 matrix에서 0이 아닌 유효한 데이터로 채워지는 
<br/>데이터의 값과 좌표 정보만으로 구성하여 메모리 사용량을 최소화하면서도
<br/>Sparse한 matrix와 동일한 행렬을 표현할 수 있도록 하는 데이터 구조이다.
<br/>SCSR Matrix는 data, indices, indptr 로 행렬을 압축하여 표현한다.

In [None]:
movies_ratings_preprocess = movies_ratings_update.copy()

In [None]:
user_unique = movies_ratings_preprocess['user_id'].unique()
title_unique = movies_ratings_preprocess['title'].unique()

In [None]:
user_to_idx = {v:k for k,v in enumerate(user_unique)}
title_to_idx = {v:k for k,v in enumerate(title_unique)}

In [None]:
temp_user_data = movies_ratings_preprocess['user_id'].map(user_to_idx.get).dropna()

if len(temp_user_data) == len(movies_ratings_preprocess):   
    movies_ratings_preprocess['user_id'] = temp_user_data   

In [None]:
temp_title_data = movies_ratings_preprocess['title'].map(title_to_idx.get)

if len(temp_title_data) == len(movies_ratings_preprocess):
    movies_ratings_preprocess['title'] = temp_title_data 

In [None]:
movies_ratings_preprocess

Unnamed: 0,user_id,title,ratings
0,0,0,5
1,0,1,4
2,0,2,5
3,0,3,4
4,0,4,4
...,...,...,...
836478,6039,4,4
836479,6039,95,5
836480,6039,284,4
836481,6039,524,4


In [None]:
num_user = movies_ratings_preprocess['user_id'].nunique()
num_movie = movies_ratings_preprocess['title'].nunique()

In [None]:
csr_data = csr_matrix((movies_ratings_preprocess.ratings, (movies_ratings_preprocess.user_id, movies_ratings_preprocess.title)), shape= (num_user, num_movie))

In [None]:
csr_data

<6040x3628 sparse matrix of type '<class 'numpy.longlong'>'
	with 836483 stored elements in Compressed Sparse Row format>

##하이퍼 파라미터

Implicit Alternating Least Squares 모델을 구성한다.

In [None]:
als_model = AlternatingLeastSquares(factors=100,
                                    regularization=0.01,
                                    use_gpu=False,
                                    iterations=15,
                                    dtype=np.float32)

factors : 유저와 아이템의 벡터의 차원
<br/>regularization : 과적합을 방지하기 위한 정규화 값
<br/>use_gpu : GPU 사용 여부
<br/>iterations(epochs) :  학습 횟수

#모델 학습

In [None]:
als_model.fit(csr_data)

  0%|          | 0/15 [00:00<?, ?it/s]

#모델 평가

In [None]:
new_user = list(user_to_idx.items())[-1][0]
new_user_vector = als_model.user_factors[-1]

In [None]:
favorite_title_list = ['star wars: episode v - the empire strikes back (1980)', 'pulp fiction (1994)',
                       '2001: a space odyssey (1968)', 'titanic (1997)', 'lion king, the (1994)']

favorite_vector_list = []

for i in favorite_title_list:
  favorite_movie_idx = title_to_idx[i]
  favorite_movie_vector = als_model.item_factors[favorite_movie_idx]
  favorite_vector_list.append(favorite_movie_vector)

In [None]:
favorite_preference = pd.DataFrame(index=range(0, 1), columns = {'0'})

for i in range(len(favorite_vector_list)):
  favorite_preference_list = []
  favorite_preference_list.append(favorite_title_list[i])
  favorite_preference_list.append(np.dot(new_user_vector, favorite_vector_list[i]))
  favorite_preference_df = pd.DataFrame(favorite_preference_list).transpose()
  favorite_preference =  favorite_preference.append(favorite_preference_df)

favorite_movie_preference = favorite_preference.iloc[1:, :-1]
favorite_movie_preference.columns = ['title', 'preference']

In [None]:
favorite_movie_preference 

Unnamed: 0,title,preference
0,star wars: episode v - the empire strikes back...,0.539274
0,pulp fiction (1994),0.322543
0,2001: a space odyssey (1968),0.464326
0,titanic (1997),0.522046
0,"lion king, the (1994)",0.36908


Alternating Least Squares 모델이 좋아하는 영화에 대한 선호도를 예측한다.

#Similar Items

In [None]:
title_unique = movies_ratings_update['title'].unique()
title_to_idx = {v:k for k,v in enumerate(title_unique)}
idx_to_title = {v:k for k,v in title_to_idx.items()}

In [None]:
def get_similar_title(title_name: str):
    title_id = title_to_idx[title_name]
    similar_title = als_model.similar_items(title_id)
    similar_title = [i for i in similar_title]
    
    a = []
    
    for i, j in enumerate(similar_title):
      a.append(j[1:6])

    favorite_similar = pd.DataFrame(index=range(0, 1), columns = {'0'})

    for i in range(2):
      favorite_similar_list = []
      favorite_similar_list.append(title_unique[a[0]])
      favorite_similar_list.append(a[1])
      favorite_similar_df = pd.DataFrame(favorite_similar_list).transpose()
      favorite_similar =  favorite_similar.append(favorite_similar_df)
      favorite_movie_similar = favorite_similar.iloc[1:6, :-1]
      favorite_movie_similar.columns = ['title', 'similarity']

    return favorite_movie_similar

AlternatingLeastSquares 클래스에 구현되어 있는 similar_items 메서드를 통하여 좋아하는 영화와 비슷한 영화를 찾는다.

In [None]:
get_similar_title('star wars: episode v - the empire strikes back (1980)')

Unnamed: 0,title,similarity
0,star wars: episode iv - a new hope (1977),0.884727
1,star wars: episode vi - return of the jedi (1983),0.847585
2,raiders of the lost ark (1981),0.661602
3,"terminator, the (1984)",0.499929
4,back to the future (1985),0.449703


스타워즈와 비슷한 영화 5편을 출력한다.

In [None]:
favorite_title_list = ['star wars: episode v - the empire strikes back (1980)', 'pulp fiction (1994)',
                       '2001: a space odyssey (1968)', 'titanic (1997)', 'lion king, the (1994)']

favorite_movie_similar_collection = pd.DataFrame(index=range(0, 1), columns = {'title'})

for i in favorite_title_list:
  favorite_movie_similar_collection = favorite_movie_similar_collection.append({'similarity':'', 'title': i},ignore_index=True)
  favorite_movie_similar_collection = favorite_movie_similar_collection.append(get_similar_title(i))

favorite_movie_similar_collection = favorite_movie_similar_collection.iloc[1:, :]
favorite_movie_similar_collection = favorite_movie_similar_collection.reset_index()
favorite_movie_similar_collection = favorite_movie_similar_collection [['title', 'similarity']]

In [None]:
def draw_color_cell(x,color):
    color = f'background-color:{color}'
    return color

color_column = []

for i in range(len(favorite_title_list)):
  color_column.append(6*i)

favorite_movie_similar_collection.style.applymap(draw_color_cell,color='#767575',subset=pd.IndexSlice[color_column,'title':'similarity'])

Unnamed: 0,title,similarity
0,star wars: episode v - the empire strikes back (1980),
1,star wars: episode iv - a new hope (1977),0.8847275
2,star wars: episode vi - return of the jedi (1983),0.8475849
3,raiders of the lost ark (1981),0.66160196
4,"terminator, the (1984)",0.49992856
5,back to the future (1985),0.44970334
6,pulp fiction (1994),
7,goodfellas (1990),0.85232806
8,fargo (1996),0.78796273
9,"usual suspects, the (1995)",0.72496426


좋아하는 영화 리스트에 있는 각 작품과 비슷한 영화 5편씩 추천한다.

#Recommendation

In [None]:
new_user = list(user_to_idx.items())[-3][0]

In [None]:
movies_recommended = als_model.recommend(new_user, csr_data, N=20, filter_already_liked_items=True)

ValueError: ignored

als_model.recommend에서 ValueError: user_items must contain 1 row for every user in userids가 발생한다.

In [None]:
movies_recommended = als_model.recommend(new_user, csr_data, N=20, filter_already_liked_items=False)

In [None]:
movies_recommended = als_model.recommend(new_user, csr_data, N=20, filter_already_liked_items=False)

filter_already_liked_items=False인 경우에는 에러가 발생하지 않는다.
<br/>라이브러리 implicit 버전 업데이트가 되면서 발생한 에러로 보인다.
<br/>이에 따라 추천시스템에서 유저가 이미 평가한 아이템은 제외하지 못한다.

In [None]:
favorite_recommended = pd.DataFrame(index=range(0, 1), columns = {'0'})

for i in range(20):
  favorite_recommended_list = []
  favorite_recommended_list.append(idx_to_title[movies_recommended[0][i]])
  favorite_recommended_list.append(movies_recommended[1][i])
  favorite_recommended_df = pd.DataFrame(favorite_recommended_list).transpose()
  favorite_recommended =  favorite_recommended.append(favorite_recommended_df)

favorite_movie_recommended = favorite_recommended.iloc[1:, :-1]
favorite_movie_recommended.columns = ['title', 'preference']

In [None]:
favorite_movie_recommended

Unnamed: 0,title,preference
0,star wars: episode v - the empire strikes back...,0.539274
0,titanic (1997),0.522046
0,star wars: episode iv - a new hope (1977),0.501663
0,2001: a space odyssey (1968),0.464326
0,star wars: episode vi - return of the jedi (1983),0.415568
0,raiders of the lost ark (1981),0.395419
0,"lion king, the (1994)",0.36908
0,e.t. the extra-terrestrial (1982),0.350005
0,pulp fiction (1994),0.322543
0,dr. strangelove or: how i learned to stop worr...,0.295737


AlternatingLeastSquares 클래스에 구현되어 있는 recommend 메서드를 이용하여 영화를 추천한다.

In [None]:
odyssey = title_to_idx['raiders of the lost ark (1981)']
explain = als_model.explain(new_user, csr_data, itemid=odyssey)

b = []
    
for i, j in enumerate(explain):
  b.append(j)

favorite_explained = pd.DataFrame(index=range(0, 1), columns = {'0'})

for i in range(len(b[1])):
  favorite_explained_list = []
  favorite_explained2_list = []
  favorite_explained_list.append(title_unique[b[1][i][0]])
  favorite_explained_list.append(b[1][i][1])
  favorite_explained_df = pd.DataFrame(favorite_explained_list).transpose()
  favorite_explained =  favorite_explained.append(favorite_explained_df)

favorite_movie_explained = favorite_explained.iloc[1:, :-1]
favorite_movie_explained.columns = ['title', 'proportion']

In [None]:
favorite_movie_explained

Unnamed: 0,title,proportion
0,star wars: episode v - the empire strikes back...,0.284022
0,titanic (1997),0.067546
0,pulp fiction (1994),0.030886
0,"lion king, the (1994)",0.017423
0,2001: a space odyssey (1968),-0.013418


추천한 콘텐츠 raiders of the lost ark (1981)의 점수에 기여한 다른 콘텐츠의 기여도를 반환한다.
<br/>star wars: episode v - the empire strikes back (1980)과 titanic (1997)이 가장 크게 기여한다.

#결론

**결측치 처리**




ratings[ratings['user_id'] == 3598]에서 데이터프레임 빈 행이 존재하는데도
<br/>ratings.isnull().sum()하면 결측치가 없다는 이상한 결과가 나온다.
<br/>나중에 결측치를 주제로 공부하여 문제 원인을 분석하고 디버깅한다.

**Recommendation**

als_model.recommend에서
<br/>ValueError: user_items must contain 1 row for every user in userids를 해결해야 한다.
<br/>이에 대한 해결책이 Github Conversation [#365](https://github.com/benfred/implicit/issues/365) [#389](https://github.com/benfred/implicit/pull/389)에서 제시된다.
<br/>라이브러리 implicit 버전 업데이트가 되면서 발생한 에러로 보인다.
<br/>그러나 아직 깃허브의 내용을 이해할 수 있는 수준이 아니므로
<br/>나중에 문제 원인을 분석하고 디버깅이 가능한 수준이 되면
<br/>filter_already_liked_items=True를 설정할 수 있을 것이다.

#참고문헌

**LMS**
<br/>[ziminpark](https://github.com/ZiminPark)

<br/>**공식 사이트**
<br/>[MovieLens 1M Dataset](https://grouplens.org/datasets/movielens/)

<br/>**깃허브**
<br/>[영화 추천 시스템.ipynb](https://github.com/PEBpung/Aiffel/blob/master/Project/Exploration/E7.%20%EC%98%81%ED%99%94%20%EC%B6%94%EC%B2%9C%20%EC%8B%9C%EC%8A%A4%ED%85%9C.ipynb)
<br/>[filter_already_liked_items Working for ALS? #365](https://github.com/benfred/implicit/issues/365)
<br/>[Fix rank items in ItemItemRecommender #389](https://github.com/benfred/implicit/pull/389)

<br/>**웹사이트**
<br/>[python 여러 문자열 포함하는 모든 행 검색 - pandas](https://hyang2data.tistory.com/31)
<br/>[파이썬 2차원 리스트를 1차원으로 (flatten)](https://wyatt37.tistory.com/16)
<br/>[Python에서 목록을 Float로 변환](https://www.delftstack.com/ko/howto/python/convert-list-to-float-python/)
<br/>[python 파이썬, pandas 판다스 데이터 값 변경하기, 바꾸기](https://sunning-10.tistory.com/m/entry/python-%ED%8C%8C%EC%9D%B4%EC%8D%AC-pandas-%ED%8C%90%EB%8B%A4%EC%8A%A4-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EA%B0%92-%EB%B3%80%EA%B2%BD%ED%95%98%EA%B8%B0-%EB%B0%94%EA%BE%B8%EA%B8%B0)
<br/>[Pandas 11. 데이터프레임 셀 스타일 변경하기](https://zephyrus1111.tistory.com/62)
<br/>[Day18 넷플릭스 추천 알고리즘을 만들어 볼까?](https://softwareeng.tistory.com/entry/Day18-%EB%84%B7%ED%94%8C%EB%A6%AD%EC%8A%A4-%EC%B6%94%EC%B2%9C-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98%EC%9D%84-%EB%A7%8C%EB%93%A4%EC%96%B4-%EB%B3%BC%EA%B9%8C)