## Movielens Movie Recommendation
- rating data | MovieLens 1M Dataset 사용
- rating data는 explicit data이나 implicit data라고 간주하고 진행 
- 별점을 시청횟수로 해석
- 3점 미만 rating은 선호하지 않는다고 가정 

In [5]:
# !pip install implicit

In [6]:
import pandas as pd
import numpy as np 

from scipy.sparse import csr_matrix
from implicit.als import AlternatingLeastSquares

import os
from pathlib import Path

### Data 확인 및 전처리

In [7]:
root_path = Path('/content/drive/MyDrive/Colab_Notebook/aiffel_lms/E9_Recommend')
data_path = root_path.joinpath('movie_data')
movie_data_path = data_path.joinpath('movies.dat')
rating_data_path = data_path.joinpath('ratings.dat')
user_data_path = data_path.joinpath('users.dat')

README를 근거로 아래와 같은 데이터가 들어있다는 것을 확인
- movie_data : MovieID::Title::Genres
- user_data : UserID::Gender::Age::Occupation::Zip-code
- rating_data : UserID::MovieID::Rating::Timestamp. 


<br>

- 3개의 데이터 모두 열 index가 제대로 처리되어있지 않음을 확인, column name 부여
- 위 분석에서는 rating_data 및 movie_data를 중심적으로 사용할 것으로 예상됨

In [8]:
rating_cols = ['user_id', 'movie_id', 'rating', 'timestamp']
rating_data = pd.read_csv(rating_data_path, sep = '::', names = rating_cols, engine = 'python', encoding = 'ISO-8859-1')
rating_data.head(10)

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
5,1,1197,3,978302268
6,1,1287,5,978302039
7,1,2804,5,978300719
8,1,594,4,978302268
9,1,919,4,978301368


In [9]:
movie_cols = ['movie_id', 'title', 'genre']
movie_data = pd.read_csv(movie_data_path, sep = '::', names = movie_cols, engine = 'python', encoding = 'ISO-8859-1')
movie_data.head(10)

# 검색의 용이성을 위해 대문자 to 소문자 
movie_data['title'] = movie_data['title'].str.lower()
movie_data['genre'] = movie_data['genre'].str.lower()
movie_data.head(10)


Unnamed: 0,movie_id,title,genre
0,1,toy story (1995),animation|children's|comedy
1,2,jumanji (1995),adventure|children's|fantasy
2,3,grumpier old men (1995),comedy|romance
3,4,waiting to exhale (1995),comedy|drama
4,5,father of the bride part ii (1995),comedy
5,6,heat (1995),action|crime|thriller
6,7,sabrina (1995),comedy|romance
7,8,tom and huck (1995),adventure|children's
8,9,sudden death (1995),action
9,10,goldeneye (1995),action|adventure|thriller


In [10]:
# title split을 무엇을 기준으로 해야 하는지 확인
# movie title에도 ()가 있기 때문에 '('로 split 하기에는 무리가 있음을 확인함

movie_data_check = pd.read_csv(movie_data_path, sep = '::', names = movie_cols, engine = 'python', encoding = 'ISO-8859-1')
movie_data.head(50)
movie_data_check.tail(50)

Unnamed: 0,movie_id,title,genre
3833,3903,Urbania (2000),Drama
3834,3904,"Uninvited Guest, An (2000)",Drama
3835,3905,"Specials, The (2000)",Comedy
3836,3906,Under Suspicion (2000),Crime
3837,3907,"Prince of Central Park, The (1999)",Drama
3838,3908,Urban Legends: Final Cut (2000),Horror
3839,3909,Woman on Top (2000),Comedy|Romance
3840,3910,Dancer in the Dark (2000),Drama|Musical
3841,3911,Best in Show (2000),Comedy
3842,3912,Beautiful (2000),Comedy|Drama


In [11]:
# title에 ()을 기준으로 개봉일이 있으니 열 나눔 
# title : 영화명, year : 개봉일
title_year = movie_data['title'].str.slice(start = -5, stop = -1)
title_name = movie_data['title'].str.slice(stop = -6)
title_name

# # year 열 추가 
movie_data.insert(2, 'year', title_year , allow_duplicates=False)

# # title 열 수정 
movie_data['title'] = title_name

movie_data

Unnamed: 0,movie_id,title,year,genre
0,1,toy story,1995,animation|children's|comedy
1,2,jumanji,1995,adventure|children's|fantasy
2,3,grumpier old men,1995,comedy|romance
3,4,waiting to exhale,1995,comedy|drama
4,5,father of the bride part ii,1995,comedy
...,...,...,...,...
3878,3948,meet the parents,2000,comedy
3879,3949,requiem for a dream,2000,drama
3880,3950,tigerland,2000,drama
3881,3951,two family house,2000,drama


In [12]:
user_cols = ['user_id', 'gender', 'age', 'occupation', 'zipcode']
user_data = pd.read_csv(user_data_path, sep = '::', names = user_cols, engine = 'python', encoding = 'ISO-8859-1' )
user_data.head(10)

Unnamed: 0,user_id,gender,age,occupation,zipcode
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455
5,6,F,50,9,55117
6,7,M,35,1,6810
7,8,M,25,12,11413
8,9,M,25,17,61614
9,10,F,35,1,95370


In [13]:
# rating data가 분석의 기본이 될 데이터이니 가정한 상황에 맞게 데이터 처리 

rating_org_len = len(rating_data) # 원 rating_data 데이터 수 확인
rating_mod = rating_data[rating_data['rating'] >= 3]
rating_mod_len = len(rating_mod)

print(rating_org_len, rating_mod_len)
print(f'Ratio of Remaining Rating data is {rating_mod_len / rating_org_len:.2%}')

1000209 836478
Ratio of Remaining Rating data is 83.63%


In [14]:
# rating을 시청횟수로 가정했으니 col name 변경 
rating_mod.rename(columns = {'rating': 'count'}, inplace = True)
rating_mod.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,user_id,movie_id,count,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291
5,1,1197,3,978302268
6,1,1287,5,978302039
7,1,2804,5,978300719
8,1,594,4,978302268
9,1,919,4,978301368


In [15]:
# 각 데이터에 unique한 user_id와 movie_id가 몇개인지 확인 
print(rating_mod['user_id'].nunique(), rating_mod['movie_id'].nunique())

# user은 6039명, movie는 3628개

user_per_movie = rating_mod.groupby('user_id')['movie_id'].count()
user_per_movie.describe()

# user_id 한명당 평균 138편의 영화를 시청함 

6039 3628


count    6039.000000
mean      138.512668
std       156.241599
min         1.000000
25%        38.000000
50%        81.000000
75%       177.000000
max      1968.000000
Name: movie_id, dtype: float64

In [16]:
# user별로 확인할 수 있게 다시 확인 

user_per_movie2 = rating_mod['movie_id'].groupby(rating_mod['user_id'])
user_per_movie2.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,53.0,1560.547170,935.976178,1.0,783.00,1270.0,2340.00,3408.0
2,116.0,1782.655172,1006.062148,110.0,1118.75,1688.5,2518.50,3809.0
3,46.0,1755.586957,1011.542158,104.0,1151.00,1386.5,2441.25,3868.0
4,19.0,1886.789474,1048.690749,260.0,1199.50,1387.0,2819.50,3702.0
5,143.0,1674.748252,1024.402117,16.0,881.00,1683.0,2427.50,3799.0
...,...,...,...,...,...,...,...,...
6036,708.0,1815.381356,995.662958,6.0,1060.75,1842.5,2690.50,3576.0
6037,189.0,1751.074074,978.360425,17.0,965.00,1358.0,2527.00,3543.0
6038,18.0,1595.888889,973.439015,232.0,1139.00,1249.5,1964.25,3548.0
6039,119.0,1529.983193,888.841466,48.0,923.50,1210.0,2091.50,3549.0


In [17]:
# count 수 높은 movie 확인 

movie_count = rating_mod.groupby('movie_id')['user_id'].count()
movie_count_sort = movie_count.sort_values(ascending = False).head(30)
movie_count_sort # movie_id가 index로 들어가있는 상황 

# 3211명의 unique한 user가 선택한 2858 movie_id 영화가 가장 노출이 많은 영화임을 확인할 수 있음

movie_id
2858    3211
260     2910
1196    2885
1210    2716
2028    2561
589     2509
593     2498
1198    2473
1270    2460
2571    2434
480     2413
2762    2385
608     2371
110     2314
1580    2297
527     2257
1197    2252
2396    2213
1617    2210
318     2194
858     2167
1265    2121
1097    2102
2997    2066
2716    2051
296     2030
356     2022
1240    2019
1       2000
457     1941
Name: user_id, dtype: int64

**[TIP]** user_per_movie2.sort_values(by = 'count') # 이 방법은 'SeriesGroupBy' object has no attribute 'sort_values' 오류 일어남

### rating_mod의 count 기준 top 30 영화 리스트 추출하기 

In [18]:
# user_id count기준 top 30 movie_id
# tolist() 안하면 int64index 데이터 유형으로 나옴

popular_movie_id = movie_count_sort.index.tolist()
popular_movie_id

[2858,
 260,
 1196,
 1210,
 2028,
 589,
 593,
 1198,
 1270,
 2571,
 480,
 2762,
 608,
 110,
 1580,
 527,
 1197,
 2396,
 1617,
 318,
 858,
 1265,
 1097,
 2997,
 2716,
 296,
 356,
 1240,
 1,
 457]

In [19]:
movie_data #3883 row
# movie_data.nunique()

# index와 movie_id의 숫자 차이가 점차 넓어지는 것 또한 확인할 수 있음. 그 이유는 뒤에서 확인하고자 함 

Unnamed: 0,movie_id,title,year,genre
0,1,toy story,1995,animation|children's|comedy
1,2,jumanji,1995,adventure|children's|fantasy
2,3,grumpier old men,1995,comedy|romance
3,4,waiting to exhale,1995,comedy|drama
4,5,father of the bride part ii,1995,comedy
...,...,...,...,...
3878,3948,meet the parents,2000,comedy
3879,3949,requiem for a dream,2000,drama
3880,3950,tigerland,2000,drama
3881,3951,two family house,2000,drama


In [20]:
# index를 이용해서 rating_mod의 top 30 영화 리스트 추출하기 
# movie_data[movie_data['movie_id'] == 2858]

top30_movie = pd.DataFrame([])

for x in popular_movie_id:
  y = movie_data[movie_data['movie_id'] == x]
  top30_movie = top30_movie.append(y)
  # return top30_movie
  # print(y)


In [21]:
# top 30 영화 리스트 확인

top30_movie

Unnamed: 0,movie_id,title,year,genre
2789,2858,american beauty,1999,comedy|drama
257,260,star wars: episode iv - a new hope,1977,action|adventure|fantasy|sci-fi
1178,1196,star wars: episode v - the empire strikes back,1980,action|adventure|drama|sci-fi|war
1192,1210,star wars: episode vi - return of the jedi,1983,action|adventure|romance|sci-fi|war
1959,2028,saving private ryan,1998,action|drama|war
585,589,terminator 2: judgment day,1991,action|sci-fi|thriller
589,593,"silence of the lambs, the",1991,drama|thriller
1180,1198,raiders of the lost ark,1981,action|adventure
1250,1270,back to the future,1985,comedy|sci-fi
2502,2571,"matrix, the",1999,action|sci-fi|thriller


In [22]:
# movie title과 movie_id 매칭 시키기 
# index와 movie_id가 차이나는 것을 확인할 수 있음 

print(len(movie_data))
print(movie_data['title'].nunique())
print(movie_data['movie_id'].nunique())


3883
3841
3883


In [23]:
# 42행 차이
# 육안으로 봤을 때 몇 개는 실제로 중복값이고, 대부분은 중복값이 아님에도 불구하고 중복값으로 나옴 
# 오류로 생각됨

movie_data[movie_data.duplicated(['title']) == True].sort_values(['title'])


Unnamed: 0,movie_id,title,year,genre
2016,2085,101 dalmatians,1961,animation|children's
2443,2512,"ballad of narayama, the (narayama bushiko)",1982,drama
1323,1344,cape fear,1962,film-noir|thriller
2066,2135,doctor dolittle,1967,adventure|musical
2576,2645,dracula,1958,horror
3057,3126,"end of the affair, the",1955,drama
2386,2455,"fly, the",1986,horror|sci-fi
2953,3022,"general, the",1927,comedy
3877,3947,get carter,1971,thriller
2295,2364,godzilla (gojira),1984,action|sci-fi


- movie id는 다른데 movie title이 같은 영화가 있음
- movie id의 len은 total movie_data len과 동일해
- title과 year을 나눔으로 생긴 오류로 보임 -> 42개
- ***movie_data를 다시 concat하는것보다, 겹치는 movie_id를 수정하여 중복이 없게 만들고자 함*** -> ***NOPE 이와 같이 진행하려 하였으나 아래와 같은 문제가 발생하여 title과 year을 다시 concat 하고자 함*** 


In [24]:
# movie_data Remodified
# movie_data : movie_id + year

# movie_data['title'] = movie_data['title'].map(str) + ' ' + movie_data['year']

movie_data['title'] = movie_data[['title', 'year']].apply(''.join, axis = 1)
movie_data.drop(['year'], axis = 1, inplace = True)
movie_data

Unnamed: 0,movie_id,title,genre
0,1,toy story 1995,animation|children's|comedy
1,2,jumanji 1995,adventure|children's|fantasy
2,3,grumpier old men 1995,comedy|romance
3,4,waiting to exhale 1995,comedy|drama
4,5,father of the bride part ii 1995,comedy
...,...,...,...
3878,3948,meet the parents 2000,comedy
3879,3949,requiem for a dream 2000,drama
3880,3950,tigerland 2000,drama
3881,3951,two family house 2000,drama


### 사용자 지정 favorite 영화 리스트 | 사용자 추천의 기반이 되는 Data Setting

위의 movie_data 수정 전에 먼저 favorite list 설정해놓았기 때문에 뒤에서 수정 진행함

In [25]:
favorite = ['men in black', 'before sunset', 'bridget jones baby', 
            'romantic holiday', 'harry potter', 'august rush', 'davinch code']

In [26]:
# 부분 str 일치를 사용하여 행 인덱스 가져오기 

# movie_data.index[movie_data['title'].str.contains('men in')].tolist()
movie_data.loc[movie_data['title'].str.contains('men in')]
movie_data.loc[movie_data['title'].str.contains('before')]
movie_data.loc[movie_data['title'].str.contains('august')]
movie_data.loc[movie_data['title'].str.contains('terminator')]
movie_data.loc[movie_data['title'].str.contains('matrix')]
movie_data.loc[movie_data['title'].str.contains('boys')]

Unnamed: 0,movie_id,title,genre
516,520,robin hood: men in tights 1993,comedy
1539,1580,men in black 1997,action|adventure|comedy|sci-fi


***TODO*** : 위의 코드 자동화 

### Fav list 수정 

In [27]:
# favorite list 수정 

favorite = ['men in black 1997', 'before sunrise 1995', 'terminator 2: judgment day 1991', 
            'matrix, the 1999', 'boys life 2 1997']

# favorite df화
favorite_df = pd.DataFrame(favorite, columns = ['title'])
favorite_df

Unnamed: 0,title
0,men in black 1997
1,before sunrise 1995
2,terminator 2: judgment day 1991
3,"matrix, the 1999"
4,boys life 2 1997


[Pandas 참고페이지 1](https://www.delftstack.com/ko/howto/python-pandas/pandas-get-index-of-row/). 


[Pandas 참고페이지 2](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.slice.html)


### 나만의 Rating 추가 & Movie title, id Embedding 

In [30]:
 # movie_data 인덱스와 movie_id값이 다르기 때문에 movie_id를 인덱스화 시키고자 진행 
 # movie to index & index to movie 생성 

movie_data_mod = movie_data.set_index('movie_id', drop = False)
movie_data_mod

# index 값이 1부터 시작하고 있으니까 k+1 처리함 
movie_unique = movie_data_mod['title'].unique()
movie_to_index = {v:k+1 for k, v in enumerate(movie_unique)}
index_to_movie = {k+1:v for k, v in enumerate(movie_unique)}

In [31]:
# 확인
index_to_movie[15]
movie_to_index['cutthroat island 1995']

15

Colab 오류

In [32]:
# `movie_data_mod['title']`로 하면 두칸 띄어지고 
# `movie_data_mod[['title']]`로 하면 정상적으로 나오는 오류가 있었으나
# Colab의 자체 오류였는지, runtime마다 오류가 occur될 때가 있고 아닐 때가 있음
# occur 될 때는 아래와 같이 수정이 필요함 

# # favorite list re 수정

# favorite = ['men in black  1997', 'before sunrise  1995', 'terminator 2: judgment day  1991', 
#             'matrix, the  1999', 'boys life 2  1997']

Unnamed: 0,title
0,men in black 1997
1,before sunrise 1995
2,terminator 2: judgment day 1991
3,"matrix, the 1999"
4,boys life 2 1997


### 함수로 변환이 쉬이 되지 않아서 우선 수동으로 title to movie_id 진행하고자 함

In [34]:
# 이상하게 여기서는 gap이 1개여야 되네... colab의 오류인가 

print(movie_to_index['men in black 1997'])
print(movie_to_index['before sunrise 1995'])
print(movie_to_index['terminator 2: judgment day 1991'])
print(movie_to_index['matrix, the 1999'])
print(movie_to_index['boys life 2 1997'])

favorite_idx = [1540, 214, 586, 2503, 1444]

1540
214
586
2503
1444


In [35]:
# 내 user_id 만들어서 rating에 추가하기 
# user_id: 6041, movie_id는 favorite과 매칭되는 아이들로 세팅
# 시청 횟수는 모두 5로 설정 

js_movie_list = pd.DataFrame({'user_id': 6041, 'movie_id': favorite_idx , 'count': [5]*5})

if not rating_mod.isin({'user_id':['6041']})['user_id'].any(): 
    rating_mod = rating_mod.append(js_movie_list)                          

rating_mod.tail(10) 

Unnamed: 0,user_id,movie_id,count,timestamp
1000203,6040,1090,3,956715518.0
1000205,6040,1094,5,956704887.0
1000206,6040,562,5,956704746.0
1000207,6040,1096,4,956715648.0
1000208,6040,1097,4,956715569.0
0,6041,1540,5,
1,6041,214,5,
2,6041,586,5,
3,6041,2503,5,
4,6041,1444,5,


### CSR Matrix 생성 

In [36]:
rating_mod.shape

(836483, 4)

In [37]:
from scipy.sparse import csr_matrix

# 아래 코드에서 row index exceeds matrix dimensions 오류가 생겨서 index 때문인가 하여 시도 
rating_mod_idxm = rating_mod.copy()
rating_mod_idxm = rating_mod_idxm.set_index('user_id', drop = False)


user_unique = rating_mod['user_id'].nunique()
movie_unique = rating_mod['movie_id'].nunique()

# row가 user id , col이 movie id 
csr_user_movie = csr_matrix((rating_mod_idxm['count'], (rating_mod_idxm['user_id'], rating_mod_idxm['movie_id'])))


**[CHECK]** shape을 지정하면 오히려 index exceeds 오류가 생김. 오류가 생기면 수동으로 값 조정을 해주어야 하기 때문에 자동적으로 setting되게끔 진행하는게 나을 것 같음 

### MF 모델 학습하기
- Matrix Factorization model을 implicit 패키지를 사용하여 학습하기 
- `implicit` 패키지
- 위의 패키지에 `als`(AlternatingLeastSquares) 모델을 사용 
- Matrix Factorization에서 쪼개진 두 feature matrix를 한번에 훈련하는 것은 잘 수렴하지 않기 때문에 한쪽을 고정시키고 다른 쪽을 학습하는 방식을 번갈아 수행하는 alternatingleastSquares 방식이 효과적인 것으로 알려져 있음 

In [38]:
from implicit.als import AlternatingLeastSquares

# implicit 라이브러리 권장사항
os.environ['OPENBLAS_NUM_THREADS']='1'
os.environ['KMP_DUPLICATE_LIB_OK']='True'
os.environ['MKL_NUM_THREADS']='1'

In [39]:
# Model 생성애는 AlternatinvLeastSquares의 default 값을 사용 

als_model = AlternatingLeastSquares(factors = 100, regularization = 0.01,
                                    use_gpu = False, iterations = 15, dtype = np.float32)

[참고페이지](https://yeomko.tistory.com/8?category=805638)

In [40]:
# 현재 csr_user_movie는 userid, movieid의 형태 
# als_model은 movieid, userid의 input로 들어가야 하기 때문에 Transpose 

csr_user_movie_t = csr_user_movie.T
csr_user_movie_t 

<3953x6042 sparse matrix of type '<class 'numpy.longlong'>'
	with 836483 stored elements in Compressed Sparse Column format>

### Model Fit

In [41]:
als_model.fit(csr_user_movie_t)

  0%|          | 0/15 [00:00<?, ?it/s]

### [결과확인] 비슷한 영화 확인하기

In [43]:
# before sunrise 1995를 좋아하는 사용자가 마음에 들 수 있는 영화 제안

favorite_movie = 'before sunrise 1995'
favorite_movie_idx = movie_to_index[favorite_movie]
suggest_movie = als_model.similar_items(favorite_movie_idx, N=15)
suggest_movie

[(214, 0.9999999),
 (1533, 0.82414293),
 (263, 0.8067019),
 (1038, 0.80270857),
 (1846, 0.7933644),
 (1860, 0.7924915),
 (80, 0.7863605),
 (1844, 0.78634864),
 (2544, 0.78493893),
 (1116, 0.78194124),
 (3816, 0.7803135),
 (3010, 0.7790403),
 (652, 0.7786176),
 (2610, 0.7781224),
 (1519, 0.7767565)]

In [48]:
# before sunrise 1995를 좋아하는 사용자가 마음에 들 수 있는 영화 제안
# idx까지 변환하여 return하는 함수 만들기 

def suggest_movie_based_movie(movie_name: str):
    favorite_movie_idx = movie_to_index[favorite_movie]
    suggest_movie = als_model.similar_items(favorite_movie_idx)
    suggest_movie = [index_to_movie[i[0]] for i in suggest_movie]
    return suggest_movie


In [49]:
suggest_movie_based_movie('before sunrise 1995')

['before sunrise 1995',
 'contempt (le mépris) 1963',
 'like water for chocolate (como agua para chocolate) 1992',
 'trees lounge 1996',
 'smoke signals 1998',
 'cimarron 1931',
 'white balloon, the (badkonake sefid ) 1995',
 'out of sight 1998',
 'mildred pierce 1945',
 'jean de florette 1986']

In [50]:
suggest_movie_based_movie('men in black 1997')

['before sunrise 1995',
 'contempt (le mépris) 1963',
 'like water for chocolate (como agua para chocolate) 1992',
 'trees lounge 1996',
 'smoke signals 1998',
 'cimarron 1931',
 'white balloon, the (badkonake sefid ) 1995',
 'out of sight 1998',
 'mildred pierce 1945',
 'jean de florette 1986']

### [결과확인] 사용자 제안 
- `AlternatingLeastSquares` 클래스에 구현되어 있는 `recommend` 메서드를 사용
- `filter_already_liked_items` 는 유저가 이미 평가한 아이템은 제외하는 Argument

In [63]:
user = 6041
# recommend에서는 user*item CSR Matrix를 받습니다.
recommend = als_model.recommend(user, csr_user_movie, N=20, filter_already_liked_items=True)
recommend = [index_to_movie[i[0]] for i in recommend]
recommend

['new age, the 1994',
 'happy go lovely 1951',
 'babe 1995',
 'mr. death: the rise and fall of fred a. leuchter jr. 1999',
 'jaws 1975',
 'gods must be crazy, the 1980',
 'broken english 1996',
 'toy story 1995',
 '20 dates 1998',
 'emerald forest, the 1985',
 'end of violence, the 1997',
 'bitter moon 1992',
 'lady and the tramp 1955',
 "god said 'ha!' 1998",
 'get real 1998',
 'red rock west 1992',
 'house arrest 1996',
 'best men 1997',
 'general, the 1998',
 'exotica 1994']

In [70]:
# 함수화 진행 

def suggest_movie_based_user(user: int):
    recommend = als_model.recommend(user, csr_user_movie, N=20, filter_already_liked_items=True)
    recommend = [index_to_movie[i[0]] for i in recommend]
    return recommend


In [71]:
suggest_movie_based_user(6041)

['new age, the 1994',
 'happy go lovely 1951',
 'babe 1995',
 'mr. death: the rise and fall of fred a. leuchter jr. 1999',
 'jaws 1975',
 'gods must be crazy, the 1980',
 'broken english 1996',
 'toy story 1995',
 '20 dates 1998',
 'emerald forest, the 1985',
 'end of violence, the 1997',
 'bitter moon 1992',
 'lady and the tramp 1955',
 "god said 'ha!' 1998",
 'get real 1998',
 'red rock west 1992',
 'house arrest 1996',
 'best men 1997',
 'general, the 1998',
 'exotica 1994']

### 결론
- 성공적으로 movie 기반 제안 & user 기반 제안 진행 완료 
- 코드 중간 df 오류 생기는 부분 확인 진행 필요 
- csr matrix shape 관련 공부 more 진행
