# 평가 문항
1. CSR matrix가 정상적으로 만들어졌다.
    - 사용자와 아이템 개수를 바탕으로 정확한 사이즈로 만들었다.
2. MF 모델이 정상적으로 훈련되어 그럴듯한 추천이 이루어졌다.
    - 사용자와 아이템 벡터 내적수치가 의미있게 형성되었다.
3. 비슷한 영화 찾기와 유저에게 추천하기의 과정이 정상적으로 진행되었다.
    - MF모델이 예측한 유저 선호도 및 아이템간 유사도, 기여도를 측정하고 의미를 분석해보았다.

# 모듈 임포트

In [1]:
import numpy as np
import scipy
import implicit
import os
import pandas as pd
from implicit.als import AlternatingLeastSquares
import os
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# 데이터 준비

## 평점 데이터

In [3]:
import os
rating_file_path=os.getenv('HOME') + '/aiffel/recommendata_iu/data/ml-1m/ratings.dat'
ratings_cols = ['user_id', 'movie_id', 'ratings', 'timestamp']
ratings = pd.read_csv(rating_file_path, sep='::', names=ratings_cols, engine='python', encoding = "ISO-8859-1")
orginal_data_size = len(ratings)
ratings.head()

Unnamed: 0,user_id,movie_id,ratings,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [4]:
ratings.shape

(1000209, 4)

## 영화 데이터

In [5]:
# 영화 제목을 보기 위해 메타 데이터를 읽어옵니다.
movie_file_path=os.getenv('HOME') + '/aiffel/recommendata_iu/data/ml-1m/movies.dat'
cols = ['movie_id', 'title', 'genre'] 
movies = pd.read_csv(movie_file_path, sep='::', names=cols, engine='python', encoding='ISO-8859-1')
movies.head()

Unnamed: 0,movie_id,title,genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [6]:
movies.shape

(3883, 3)

# 데이터 전처리하기
- ratings에 있는 유니크한 영화 개수
- ratings에 있는 유니크한 사용자 수
- 가장 인기 있는 영화 30개(인기순)

#### 필요없는 컬럼 삭제

In [7]:
del ratings['timestamp']

#### 데이터 합치기

In [8]:
movie_data = pd.merge(ratings, movies)
movie_data.head()

Unnamed: 0,user_id,movie_id,ratings,title,genre
0,1,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama
1,2,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama
2,12,1193,4,One Flew Over the Cuckoo's Nest (1975),Drama
3,15,1193,4,One Flew Over the Cuckoo's Nest (1975),Drama
4,17,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama


In [9]:
movie_data.shape

(1000209, 5)

#### ratings 점수가 3점 미만인 데이터는 선호하지 않는다고 가정하고 제외

In [None]:
# 3점 이상만 남깁니다.
ratings = ratings[ratings['ratings']>=3]
filtered_data_size = len(ratings)

print(f'orginal_data_size: {orginal_data_size}, filtered_data_size: {filtered_data_size}')
print(f'Ratio of Remaining Data is {filtered_data_size / orginal_data_size:.2%}')

In [10]:
# 3점 이상만 남깁니다.
ratings3_movie_data = movie_data[movie_data['ratings']>=3]
filtered_data_size = len(ratings3_movie_data)

print(f'orginal_data_size: {orginal_data_size}, filtered_data_size: {filtered_data_size}')
print(f'Ratio of Remaining Data is {filtered_data_size / orginal_data_size:.2%}')

orginal_data_size: 1000209, filtered_data_size: 836478
Ratio of Remaining Data is 83.63%


- ratings라는 컬럼을 좀 더 보기 편하도록 counts라는 컬럼명으로 변경

In [11]:
ratings3_movie_data.rename(columns={'ratings':'counts'}, inplace=True)

In [None]:
ratings.rename(columns={'ratings':'counts'}, inplace=True)

In [12]:
ratings3_movie_data['counts']

0          5
1          5
2          4
3          4
4          5
          ..
1000203    3
1000204    5
1000205    3
1000207    5
1000208    4
Name: counts, Length: 836478, dtype: int64

In [13]:
ratings3_movie_data.head(5)

Unnamed: 0,user_id,movie_id,counts,title,genre
0,1,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama
1,2,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama
2,12,1193,4,One Flew Over the Cuckoo's Nest (1975),Drama
3,15,1193,4,One Flew Over the Cuckoo's Nest (1975),Drama
4,17,1193,5,One Flew Over the Cuckoo's Nest (1975),Drama


#### 검색을 쉽게 하기 위해 영화 제목과 장르를 소문자로 바꿔줌

In [14]:
ratings3_movie_data['title'] = ratings3_movie_data['title'].str.lower()

In [15]:
ratings3_movie_data['genre'] = ratings3_movie_data['genre'].str.lower()

## 데이터 탐색

#### ratings에 있는 user와 movie의 유니크 갯수 구해보기

In [16]:
ratings3_movie_data['user_id'].nunique()

6039

In [17]:
ratings3_movie_data['movie_id'].nunique()

3628

- 관객은 총 6039명이 존재하며, 영화는 3628개가 있음

#### 가장 인기 있는 영화 50개(인기순) 구해보기

In [18]:
ratings3_movie_data.groupby('title')['counts'].count().sort_values(ascending=False)[:50]

title
american beauty (1999)                                   3211
star wars: episode iv - a new hope (1977)                2910
star wars: episode v - the empire strikes back (1980)    2885
star wars: episode vi - return of the jedi (1983)        2716
saving private ryan (1998)                               2561
terminator 2: judgment day (1991)                        2509
silence of the lambs, the (1991)                         2498
raiders of the lost ark (1981)                           2473
back to the future (1985)                                2460
matrix, the (1999)                                       2434
jurassic park (1993)                                     2413
sixth sense, the (1999)                                  2385
fargo (1996)                                             2371
braveheart (1995)                                        2314
men in black (1997)                                      2297
schindler's list (1993)                                  2257
pr

- 해당 영화 데이터는 2017년 7월 또는 그 이전에 출시된 영화들로만 이루어져 있어서 옛날 영화들이 많이 보임
- 가장 인기있는 영화는 american beauty이며, star wars 시리즈도 2-4위권에 포함되어 있음
- 사실 나는 여기 영화중에서 매트릭스, 토이 스토리 같은 것만 제외하면 아는것이 거의 없다..

#### 한 영화에 대한 관객의 total 평점의 분포

In [19]:
ratings3_movie_data.groupby('title')['counts'].count().describe()

count    3628.000000
mean      230.561742
std       355.596393
min         1.000000
25%        23.000000
50%        87.000000
75%       285.000000
max      3211.000000
Name: counts, dtype: float64

- 한 영화의 최고 total 평점은 3211점
- 가장 작은 평점은 1점을 받았음
- 한 영화는 평균적으로 230점의 점수를 받았음

#### 가장 평점이 높은 영화 30개(인기순) 구해보기

In [20]:
ratings3_movie_data.groupby('title')['counts'].mean().sort_values(ascending=False)[:30]

title
gate of heavenly peace, the (1995)                                     5.000000
follow the bitch (1998)                                                5.000000
identification of a woman (identificazione di una donna) (1982)        5.000000
schlafes bruder (brother of sleep) (1995)                              5.000000
criminal lovers (les amants criminels) (1999)                          5.000000
country life (1994)                                                    5.000000
foreign student (1994)                                                 5.000000
lured (1947)                                                           5.000000
baby, the (1973)                                                       5.000000
late bloomers (1996)                                                   5.000000
zachariah (1971)                                                       5.000000
bittersweet motel (2000)                                               5.000000
black sunday (la maschera del demo

- 평점이 높은 영화와 인기가 많은 영화는 다르게 나왔음
- 아마 인기가 많다는 것은 사람들이 그만큼 많이 보고 다양한 의견이 나오기 때문에 평균 평점이 떨어지는 것으로 생각됨
- 평점이 높은 영화는 Ulysses, Country, Schlafes Bruder등이 있음

#### 1번 관객이 본 영화 알아보기

In [34]:
ratings3_movie_data[ratings3_movie_data['user_id']==1]

Unnamed: 0,user_id,movie_id,counts,title,genre
0,1,1193,5,one flew over the cuckoo's nest (1975),drama
1725,1,661,3,james and the giant peach (1996),animation|children's|musical
2250,1,914,3,my fair lady (1964),musical|romance
2886,1,3408,4,erin brockovich (2000),drama
4201,1,2355,5,"bug's life, a (1998)",animation|children's|comedy
5904,1,1197,3,"princess bride, the (1987)",action|adventure|comedy|romance
8222,1,1287,5,ben-hur (1959),action|adventure|drama
8926,1,2804,5,"christmas story, a (1983)",comedy|drama
10278,1,594,4,snow white and the seven dwarfs (1937),animation|children's|musical
11041,1,919,4,"wizard of oz, the (1939)",adventure|children's|drama|musical


- 1번 관객은 총 53편의 영화를 봤음
- 드라마, 뮤지컬 장르의 영화를 많이 본것으로 확인됨

## 내가 좋아하는 영화 찾아보기

- 저는 comedy, drama 장르를 좋아하기는 하지만, 유명한 영화를 위주로 한번 찾아보도록 하겠습니다.

In [22]:
ratings3_movie_data[ratings3_movie_data['genre']== "animation|children's|comedy"].drop_duplicates(subset='title')[:50]

Unnamed: 0,user_id,movie_id,counts,title,genre
4201,1,2355,5,"bug's life, a (1998)",animation|children's|comedy
41626,1,1,5,toy story (1995),animation|children's|comedy
55246,1,3114,4,toy story 2 (1999),animation|children's|comedy
405723,9,3751,4,chicken run (2000),animation|children's|comedy
656928,18,2141,5,"american tail, an (1986)",animation|children's|comedy
657332,18,2142,3,"american tail: fievel goes west, an (1991)",animation|children's|comedy
676614,44,1064,4,aladdin and the king of thieves (1996),animation|children's|comedy
928719,75,2354,4,"rugrats movie, the (1998)",animation|children's|comedy
944202,148,3754,3,"adventures of rocky and bullwinkle, the (2000)",animation|children's|comedy
996385,563,3611,3,saludos amigos (1943),animation|children's|comedy


In [23]:
ratings3_movie_data[ratings3_movie_data['genre']== "action|sci-fi|thriller"].drop_duplicates(subset='title')[:50]

Unnamed: 0,user_id,movie_id,counts,title,genre
102056,2,589,4,terminator 2: judgment day (1991),action|sci-fi|thriller
140405,2,2571,4,"matrix, the (1999)",action|sci-fi|thriller
221008,17,3527,4,predator (1987),action|sci-fi|thriller
224023,4,1240,5,"terminator, the (1984)",action|sci-fi|thriller
349735,7,1573,4,face/off (1997),action|sci-fi|thriller
379260,17,2600,3,existenz (1999),action|sci-fi|thriller
427328,10,1831,5,lost in space (1998),action|sci-fi|thriller
596260,15,748,3,"arrival, the (1996)",action|sci-fi|thriller
622779,23,2722,3,deep blue sea (1999),action|sci-fi|thriller
721095,42,1037,3,"lawnmower man, the (1992)",action|sci-fi|thriller


In [24]:
ratings3_movie_data[ratings3_movie_data['genre']== "crime|drama"].drop_duplicates(subset='title')[:50]

Unnamed: 0,user_id,movie_id,counts,title,genre
64571,2,2268,5,"few good men, a (1992)",crime|drama
74399,5,1213,5,goodfellas (1990),crime|drama
106150,2,1945,5,on the waterfront (1954),crime|drama
114987,2,1084,3,bonnie and clyde (1967),crime|drama
146474,2,3735,3,serpico (1973),crime|drama
231376,5,1466,3,donnie brasco (1997),crime|drama
240911,5,296,4,pulp fiction (1994),crime|drama
289423,5,1729,4,jackie brown (1997),crime|drama
616162,15,1804,4,"newton boys, the (1998)",crime|drama
714353,22,1799,4,suicide kings (1997),crime|drama


In [25]:
ratings3_movie_data[ratings3_movie_data['genre']== "comedy"].drop_duplicates(subset='title')[:50]

Unnamed: 0,user_id,movie_id,counts,title,genre
14386,1,2918,4,ferris bueller's day off (1986),comedy
16741,1,2791,4,airplane! (1980),comedy
21674,1,2321,3,pleasantville (1998),comedy
61126,2,1537,4,shall we dance? (shall we dansu?) (1996),comedy
138499,2,2359,3,waking ned devine (1998),comedy
148641,2,3809,3,what about bob? (1991),comedy
182198,3,3421,4,animal house (1978),comedy
183406,12,1641,3,"full monty, the (1997)",comedy
184604,3,1394,4,raising arizona (1987),comedy
186038,3,3534,3,28 days (2000),comedy


In [26]:
ratings3_movie_data[ratings3_movie_data['title']== 'shall we dance? (shall we dansu?) (1996)']

Unnamed: 0,user_id,movie_id,counts,title,genre
61126,2,1537,4,shall we dance? (shall we dansu?) (1996),comedy
61127,28,1537,4,shall we dance? (shall we dansu?) (1996),comedy
61128,45,1537,3,shall we dance? (shall we dansu?) (1996),comedy
61129,59,1537,3,shall we dance? (shall we dansu?) (1996),comedy
61130,76,1537,5,shall we dance? (shall we dansu?) (1996),comedy
...,...,...,...,...,...
61471,6015,1537,3,shall we dance? (shall we dansu?) (1996),comedy
61472,6016,1537,5,shall we dance? (shall we dansu?) (1996),comedy
61473,6031,1537,4,shall we dance? (shall we dansu?) (1996),comedy
61474,6036,1537,4,shall we dance? (shall we dansu?) (1996),comedy


#### 제가 정한 영화는 아래와 같습니다
- toy story 2 (1999), chicken run (2000), matrix, the (1999), goodfellas (1990), shall we dance? (shall we dansu?) (1996) 임
- 해당 영화를 내 취향의 영화라고 가정을 하고, 데이터셋에 추가해주는 작업을 진행함

In [27]:
ratings3_movie_data.shape

(836478, 5)

In [28]:
ratings3_movie_data.tail(10)

Unnamed: 0,user_id,movie_id,counts,title,genre
1000197,5334,3323,3,chain of fools (2000),comedy|crime
1000199,5334,3382,5,song of freedom (1936),drama
1000200,5420,1843,3,slappy and the stinkers (1998),children's|comedy
1000201,5433,286,3,nemesis 2: nebula (1995),action|sci-fi|thriller
1000202,5494,3530,4,smoking/no smoking (1993),comedy
1000203,5556,2198,3,modulations (1998),documentary
1000204,5949,2198,5,modulations (1998),documentary
1000205,5675,2703,3,broken vessels (1998),drama
1000207,5851,3607,5,one little indian (1973),comedy|drama|western
1000208,5938,2909,4,"five wives, three secretaries and me (1998)",documentary


In [37]:
ratings3_movie_data.user_id.value_counts()

4169    1968
4277    1715
1680    1515
3618    1146
1015    1145
        ... 
1102       9
4636       9
4056       9
4349       7
4486       1
Name: user_id, Length: 6039, dtype: int64

In [38]:
my_favorite = ['toy story 2 (1999)', 'chicken run (2000)', 'matrix, the (1999)', 'goodfellas (1990)', 'shall we dance? (shall we dansu?) (1996)']
favorite_num = [3114, 3751, 2571, 1213, 1537]
my_playlist = pd.DataFrame({'user_id': ['lsh']*5, 'movie_id': favorite_num, 'counts': [5,4,3,5,4]})


if not ratings3_movie_data.isin({'user_id':['lsh']})['user_id'].any():  # user_id에 '6041'라는 데이터가 없다면
    ratings3_movie_data = ratings3_movie_data.append(my_playlist) #my_playlist를 추가해줌

ratings3_movie_data.tail(10)

Unnamed: 0,user_id,movie_id,counts,title,genre
1000203,5556,2198,3,modulations (1998),documentary
1000204,5949,2198,5,modulations (1998),documentary
1000205,5675,2703,3,broken vessels (1998),drama
1000207,5851,3607,5,one little indian (1973),comedy|drama|western
1000208,5938,2909,4,"five wives, three secretaries and me (1998)",documentary
0,lsh,3114,5,,
1,lsh,3751,4,,
2,lsh,2571,3,,
3,lsh,1213,5,,
4,lsh,1537,4,,


# 모델에 활용하기 위하여 전처리

In [39]:
ratings3_movie_data = ratings3_movie_data.reset_index(drop=True)
ratings3_movie_data.tail(10)

Unnamed: 0,user_id,movie_id,counts,title,genre
836473,5556,2198,3,modulations (1998),documentary
836474,5949,2198,5,modulations (1998),documentary
836475,5675,2703,3,broken vessels (1998),drama
836476,5851,3607,5,one little indian (1973),comedy|drama|western
836477,5938,2909,4,"five wives, three secretaries and me (1998)",documentary
836478,lsh,3114,5,,
836479,lsh,3751,4,,
836480,lsh,2571,3,,
836481,lsh,1213,5,,
836482,lsh,1537,4,,


In [40]:
# 고유한 유저, 아티스트를 찾아내는 코드
user_unique = ratings3_movie_data['user_id'].unique()
movie_unique = ratings3_movie_data['movie_id'].unique()

In [41]:
user_to_idx = {v:k for k,v in enumerate(user_unique)}
movie_to_idx = {v:k for k,v in enumerate(movie_unique)}

In [44]:
print(user_to_idx['lsh'])
print(movie_to_idx[3114])

6039
50


In [45]:
temp_user_data = ratings3_movie_data['user_id'].map(user_to_idx.get).dropna()
if len(temp_user_data) == len(ratings3_movie_data):   # 모든 row가 정상적으로 인덱싱되었다면
    print('user_id column indexing OK!!')
    ratings3_movie_data['user_id'] = temp_user_data
else:
    print('user_id column indexing Fail!!')

# artist_to_idx을 통해 artist 컬럼도 동일한 방식으로 인덱싱해 줍니다. 
temp_movie_data = ratings3_movie_data['movie_id'].map(movie_to_idx.get).dropna()
if len(temp_movie_data) == len(ratings3_movie_data):
    print('artist column indexing OK!!')
    ratings3_movie_data['movie_id'] = temp_movie_data
else:
    print('artist column indexing Fail!!')

ratings3_movie_data

user_id column indexing OK!!
artist column indexing OK!!


Unnamed: 0,user_id,movie_id,counts,title,genre
0,0,0,5,one flew over the cuckoo's nest (1975),drama
1,1,0,5,one flew over the cuckoo's nest (1975),drama
2,2,0,4,one flew over the cuckoo's nest (1975),drama
3,3,0,4,one flew over the cuckoo's nest (1975),drama
4,4,0,5,one flew over the cuckoo's nest (1975),drama
...,...,...,...,...,...
836478,6039,50,5,,
836479,6039,531,4,,
836480,6039,132,3,,
836481,6039,67,5,,


In [46]:
from scipy.sparse import csr_matrix
import numpy as np

# csr_matrix 생성
csr_data = csr_matrix((ratings3_movie_data.counts, (ratings3_movie_data.user_id, ratings3_movie_data.movie_id)), shape=(len(user_unique), len(movie_unique)))

csr_data

<6040x3628 sparse matrix of type '<class 'numpy.int64'>'
	with 836483 stored elements in Compressed Sparse Row format>

- csr_matrix가 잘 생성되었음

# als_model 구성하기
1. factors : 유저와 아이템의 벡터를 몇 차원으로 할 것인지
2. regularization : 과적합을 방지하기 위해 정규화 값을 얼마나 사용할 것인지
3. use_gpu : GPU를 사용할 것인지
4. iterations : epochs와 같은 의미입니다. 데이터를 몇 번 반복해서 학습할 것인지

In [47]:
als_model = AlternatingLeastSquares(factors=120, regularization=0.01, use_gpu=False, iterations=20, dtype=np.float32)

In [48]:
#als 모델은 input으로 item X user 꼴의 matrix를 받기 때문에 Transpose해줌
csr_data_transpose = csr_data.T
csr_data_transpose

<3628x6040 sparse matrix of type '<class 'numpy.int64'>'
	with 836483 stored elements in Compressed Sparse Column format>

In [49]:
# 모델 훈련
als_model.fit(csr_data_transpose)

  0%|          | 0/20 [00:00<?, ?it/s]

- 훈련이 완료되었음

In [50]:
print(ratings3_movie_data.tail(5))

        user_id  movie_id  counts title genre
836478     6039        50       5   NaN   NaN
836479     6039       531       4   NaN   NaN
836480     6039       132       3   NaN   NaN
836481     6039        67       5   NaN   NaN
836482     6039        55       4   NaN   NaN


In [51]:
als_model.user_factors[6039]

array([-8.53705183e-02,  5.37052274e-01, -1.12087794e-01,  5.95998466e-01,
        4.14331138e-01, -6.18317910e-02,  2.71955401e-01, -6.04444206e-01,
        1.97816446e-01,  3.84320505e-02, -1.19313985e-01, -3.35570127e-01,
       -3.26557547e-01, -1.44790396e-01,  9.13839817e-01, -7.48380065e-01,
       -7.20492601e-02,  8.15467298e-01, -1.58641525e-02,  3.57634485e-01,
        2.44402081e-01,  3.87732778e-03,  7.09041238e-01, -3.04392368e-01,
       -5.85133135e-01, -1.39939189e-01, -5.80922067e-01, -8.41647908e-02,
        2.94399969e-02,  2.86483824e-01, -3.79607946e-01,  4.05484557e-01,
        2.48027429e-01, -1.34701326e-01, -1.01085699e+00,  1.26450256e-01,
       -1.26676127e-01,  3.82600009e-01, -5.13621449e-01, -4.24090385e-01,
        5.30969463e-02,  7.60107219e-01, -5.06653726e-01, -6.55268356e-02,
        9.08387750e-02, -3.19544673e-01,  3.59972447e-01,  9.95329559e-01,
        1.76201358e-01,  4.29758728e-02, -4.32685196e-01,  3.40098768e-01,
        6.69364154e-01, -

- 벡터가 잘 생성된것을 확인할 수 있음

In [52]:
my_id = als_model.user_factors[6039]
movie1 = als_model.item_factors[50] 
movie2 = als_model.item_factors[531]
movie3 = als_model.item_factors[132]
movie4 = als_model.item_factors[67]
movie5 = als_model.item_factors[55]

In [54]:
print('toy story2: ', np.dot(my_id, movie1))
print('chicken run: ', np.dot(my_id, movie2))
print('matrix: ', np.dot(my_id, movie3))
print('goodfellas: ', np.dot(my_id, movie4))
print('shall we dance?: ', np.dot(my_id, movie5))

toy story2:  0.7061159
chicken run:  0.62313414
matrix:  0.28991598
goodfellas:  0.40681395
shall we dance?:  0.30592582


In [67]:
ratings3_movie_data.loc[ratings3_movie_data['movie_id']==50].iloc[0].title

'toy story 2 (1999)'

In [68]:
def movie_names(movie_id):
    movie_name = ratings3_movie_data.loc[ratings3_movie_data['movie_id']==movie_id].iloc[0].title
    return movie_name

In [69]:
movie_names(50)

'toy story 2 (1999)'

- 영화이름이 index로 뽑히기 때문에 영화 이름을 가져오는 함수를 만들어줌

## 유사 영화 찾아보기

In [83]:
similar_movie = als_model.similar_items(50, N=10)
similar_movie[0][1]

1.0000002

In [84]:
for i in similar_movie:
    print(i)

(50, 1.0000002)
(40, 0.7846546)
(4, 0.6093057)
(381, 0.3730424)
(33, 0.3382471)
(939, 0.33638173)
(1691, 0.31800535)
(32, 0.29966077)
(531, 0.29712594)
(16, 0.29326802)


In [89]:
def similary_movie(idx):
    similar_movie = als_model.similar_items(idx, N=10)
    for i in similar_movie:
        print('영화이름:{}, 유사도:{}'.format(movie_names(i[0]),round(i[1],3)))

- 유사 영화를 찾아주는 함수를 만들어줌

In [90]:
similary_movie(50)

영화이름:toy story 2 (1999), 유사도:1.0
영화이름:toy story (1995), 유사도:0.7850000262260437
영화이름:bug's life, a (1998), 유사도:0.609000027179718
영화이름:babe (1995), 유사도:0.37299999594688416
영화이름:aladdin (1992), 유사도:0.33799999952316284
영화이름:iron giant, the (1999), 유사도:0.335999995470047
영화이름:stuart little (1999), 유사도:0.3179999887943268
영화이름:hercules (1997), 유사도:0.30000001192092896
영화이름:chicken run (2000), 유사도:0.296999990940094
영화이름:tarzan (1999), 유사도:0.2930000126361847


- toy story 2와 유사한 영화를 찾아서 보여줌
- toy story1, chicken run, tarzan 같은 유사 영화가 존재함

## 영화 추천 받기

In [95]:
movie_recommended = als_model.recommend(6039, csr_data, N=10, filter_already_liked_items=True)
movie_recommended

[(40, 0.48182172),
 (244, 0.34726202),
 (4, 0.32908463),
 (167, 0.28031832),
 (259, 0.27252793),
 (51, 0.2690822),
 (381, 0.24837357),
 (128, 0.24043323),
 (97, 0.2320743),
 (336, 0.21587679)]

In [93]:
def recommended_movie(idx):
    movie_recommended = als_model.recommend(idx, csr_data, N=10, filter_already_liked_items=True)

    for i in movie_recommended:
        print('영화이름:{}, 추천도:{}'.format(movie_names(i[0]),round(i[1],3)))

In [94]:
recommended_movie(6039)

영화이름:toy story (1995), 추천도:0.4819999933242798
영화이름:pulp fiction (1994), 추천도:0.34700000286102295
영화이름:bug's life, a (1998), 추천도:0.32899999618530273
영화이름:shawshank redemption, the (1994), 추천도:0.2800000011920929
영화이름:usual suspects, the (1995), 추천도:0.27300000190734863
영화이름:fargo (1996), 추천도:0.26899999380111694
영화이름:babe (1995), 추천도:0.24799999594688416
영화이름:silence of the lambs, the (1991), 추천도:0.23999999463558197
영화이름:terminator 2: judgment day (1991), 추천도:0.23199999332427979
영화이름:reservoir dogs (1992), 추천도:0.2160000056028366


- shawshank redemption, the, pulp fiction와 같은 범죄/드라마 장르의 영화를 추천해주는것을 알 수 있음
- 실제로도 범죄/스릴러/드라마 같은 장르를 좋아하기 때문에 개인적으로는 좋은 추천이라고 생각됨

## 기여도 확인해보기
- pulp fiction를 추천해준 이유에 대해 알아보기

In [96]:
explain = als_model.explain(6039, csr_data, itemid=244)
explain

(0.34361756566292256,
 [(67, 0.3074363547165795),
  (132, 0.014215865124967492),
  (55, 0.01243762249240068),
  (50, 0.005882047720603415),
  (531, 0.003645675608371532)],
 (array([[ 6.00193867e-01,  1.23686119e-01,  8.10714842e-02, ...,
           9.87894867e-02,  9.71732537e-02,  1.20545832e-01],
         [ 7.42356500e-02,  6.15568561e-01,  6.32009086e-02, ...,
           1.32234950e-01,  9.83889347e-02,  9.77875860e-02],
         [ 4.86586076e-02,  4.89319096e-02,  5.88962401e-01, ...,
           8.72844593e-02,  7.69917355e-02,  3.64148010e-02],
         ...,
         [ 5.92928440e-02,  9.36185661e-02,  6.77736440e-02, ...,
           5.49947181e-01, -3.56868894e-04, -1.71177138e-02],
         [ 5.83227909e-02,  7.25841175e-02,  5.94414873e-02, ...,
           6.53323349e-02,  5.29021341e-01,  5.99522758e-03],
         [ 7.23508689e-02,  7.51048097e-02,  3.74000424e-02, ...,
           5.58156405e-02,  7.05960748e-02,  5.36210666e-01]]),
  False))

In [98]:
explain[1]

[(67, 0.3074363547165795),
 (132, 0.014215865124967492),
 (55, 0.01243762249240068),
 (50, 0.005882047720603415),
 (531, 0.003645675608371532)]

In [99]:
def contribution_movie(user, movie_idx):
    explain = als_model.explain(user, csr_data, itemid=movie_idx)
    explain1 = explain[1]
    
    for i in explain1:
        print('영화이름:{}, 기여도:{}'.format(movie_names(i[0]),round(i[1],3)))

In [100]:
contribution_movie(6039, 244)

영화이름:goodfellas (1990), 기여도:0.307
영화이름:matrix, the (1999), 기여도:0.014
영화이름:shall we dance? (shall we dansu?) (1996), 기여도:0.012
영화이름:toy story 2 (1999), 기여도:0.006
영화이름:chicken run (2000), 기여도:0.004


- pulp fiction를 추천해준 이유는 goodfellas라는 영화를 선호했기 때문이란것을 알 수 있음
- 두 영화는 모두 범죄/드라마물로서 관객이 비슷한 장르의 영화를 좋아할것이라 생각하고 추천해준다는것을 알 수 있음

# 전체 회고
- 생각보다 어렵지는 않은 노드였다고 생각된다.
- 하지만 crs_matrix를 만들때 index값이 안맞아서 오류가 생겼던것 말고는 잘 해결됐다.
- 추천 받은 영화도 내가 관심있었던 장르로 추천을 해줘서 의미있는 crs_matrix가 만들어진것 같다고 생각된다.

# 참고자료
- https://bkshin.tistory.com/entry/NLP-7-%ED%9D%AC%EC%86%8C-%ED%96%89%EB%A0%AC-Sparse-Matrix-COO-%ED%98%95%EC%8B%9D-CSR-%ED%98%95%EC%8B%9D
- https://lsjsj92.tistory.com/569