## 추천 시스템 (Recommendation System)

1. 콘텐츠 기반 필터링 (Content-based Filtering)
    - 아이템의 속성을 기반으로 사용자에게 적합한 아이템 추천

2. 협업 필터링 (Collaborative Filtering)
    - 사용자들 간의 유사성을 기반으로 추천
    - 사용자 기반과 아이템 기반으로 각각 추천할 수 있음

3. 하이브리드 추천 시스템 (Hybrid Recommendation System)
    - 협업 필터링과 콘텐츠 기반 필터링을 결합하여 추천

- 영화 데이터
    1. **id**: 영화의 고유 ID를 나타냄.
    2. **title**: 영화의 제목.
    3. **budget**: 영화 제작에 소요된 예산 (단위: USD).
    4. **popularity**: 영화의 인기 점수. TMDb에서 제공하는 영화의 인기도를 나타냄.
    5. **genres**: 영화의 장르를 나타내며, 여러 장르가 포함된 경우 리스트로 표현됨.
    6. **overview**: 영화의 줄거리나 개요를 설명하는 텍스트.
    7. **release_date**: 영화의 개봉 날짜.
    8. **revenue**: 영화의 총 수익 (단위: USD).
    9. **runtime**: 영화의 상영 시간 (단위: 분).
    10. **vote_average**: TMDb에서 제공하는 영화의 평균 평점.
    11. **vote_count**: 영화에 대한 평가 개수.
    12. **production_companies**: 영화의 제작 회사 리스트.
    13. **production_countries**: 영화의 제작 국가 리스트.
    14. **spoken_languages**: 영화에서 사용된 언어 리스트.
    15. **cast**: 주요 출연진 리스트.
    16. **crew**: 영화 제작에 참여한 주요 제작진 리스트.
    17. **keywords**: 영화의 키워드 리스트.
    18. **tagline**: 영화의 태그라인(주요 홍보 문구).
    19. **original_language**: 영화의 원어 (예: 영어, 한국어 등).
    20. **homepage**: 영화의 공식 웹사이트 URL.
    21. **poster_path**: 영화 포스터 이미지 URL 경로.

In [64]:
import numpy as np
import pandas as pd

In [65]:
# 데이터 로드
movie_df = pd.read_csv('data/tmdb_5000_movies.csv')
movie_df.head()
movie_df.shape

(4803, 20)

In [66]:
# 사용할 컬럼 선택
movie_df = movie_df[['id', 'title', 'genres', 'vote_average', 'vote_count', 'popularity', 'keywords', 'overview']]
movie_df.info()
movie_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            4803 non-null   int64  
 1   title         4803 non-null   object 
 2   genres        4803 non-null   object 
 3   vote_average  4803 non-null   float64
 4   vote_count    4803 non-null   int64  
 5   popularity    4803 non-null   float64
 6   keywords      4803 non-null   object 
 7   overview      4800 non-null   object 
dtypes: float64(2), int64(2), object(4)
memory usage: 300.3+ KB


Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview
0,19995,Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",7.2,11800,150.437577,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",6.9,4500,139.082615,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.3,4466,107.376788,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",7.6,9106,112.31295,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...
4,49529,John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.1,2124,43.926995,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca..."


In [67]:
# 장르 데이터 전처리
from ast import literal_eval

# str -> list(dict)
movie_df['genres'] = movie_df['genres'].apply(literal_eval)

In [68]:
# name value만 꺼내서 list
movie_df['genres'] = movie_df['genres'].apply(lambda genres: [genre['name'] for genre in genres])
movie_df['genres']

0       [Action, Adventure, Fantasy, Science Fiction]
1                        [Adventure, Fantasy, Action]
2                          [Action, Adventure, Crime]
3                    [Action, Crime, Drama, Thriller]
4                [Action, Adventure, Science Fiction]
                            ...                      
4798                        [Action, Crime, Thriller]
4799                                [Comedy, Romance]
4800               [Comedy, Drama, Romance, TV Movie]
4801                                               []
4802                                    [Documentary]
Name: genres, Length: 4803, dtype: object

In [69]:
# list -> str (공백 기준 구분 문자열)
movie_df['genres'] = movie_df['genres'].apply(lambda x: ' '.join(x))
movie_df['genres']

0       Action Adventure Fantasy Science Fiction
1                       Adventure Fantasy Action
2                         Action Adventure Crime
3                    Action Crime Drama Thriller
4               Action Adventure Science Fiction
                          ...                   
4798                       Action Crime Thriller
4799                              Comedy Romance
4800               Comedy Drama Romance TV Movie
4801                                            
4802                                 Documentary
Name: genres, Length: 4803, dtype: object

In [70]:
# 장르 유사도 측정을 위한 CountVectorizer 사용
from sklearn.feature_extraction.text import CountVectorizer

count_vectorizer = CountVectorizer(ngram_range=(1, 2))
genres_vec = count_vectorizer.fit_transform(movie_df['genres'])
print(genres_vec.shape)
print(genres_vec.toarray()[:5])
genres_vec_vocab = pd.DataFrame(count_vectorizer.get_feature_names_out())
genres_vec_vocab

(4803, 276)
[[1 1 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 1 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 1 0 ... 0 0 0]]


Unnamed: 0,0
0,action
1,action adventure
2,action animation
3,action comedy
4,action crime
...,...
271,western drama
272,western history
273,western music
274,western romance


### 코사인 유사도 측정

In [71]:
from sklearn.metrics.pairwise import cosine_similarity

genres_sim = cosine_similarity(genres_vec, genres_vec)
genres_sim.shape
genres_sim[:2]

array([[1.        , 0.59628479, 0.4472136 , ..., 0.        , 0.        ,
        0.        ],
       [0.59628479, 1.        , 0.4       , ..., 0.        , 0.        ,
        0.        ]], shape=(2, 4803))

In [72]:
movie_idx_by_genres_sim = genres_sim.argsort(axis=1)[:, ::-1]
movie_idx_by_genres_sim[:2]

array([[   0, 3494,  813, ..., 3038, 3037, 2401],
       [ 262,    1,  129, ..., 3069, 3067, 2401]], shape=(2, 4803))

In [73]:
def recommend_movie_by_gener(movie_title, top_n=10):
    
    movie = movie_df[movie_df['title'] == movie_title]
    if movie.empty:
        return '없는 영화 입니다'
    
    movie_idx = movie.index
    
    topn_movie_idx = movie_idx_by_genres_sim[movie_idx, :top_n]
    topn_movie_idx = topn_movie_idx.reshape(-1)
    return movie_df.iloc[topn_movie_idx] 

In [74]:
recommend_movie_by_gener('Avatar', top_n=30)

Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview
0,19995,Avatar,Action Adventure Fantasy Science Fiction,7.2,11800,150.437577,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
3494,27549,Beastmaster 2: Through the Portal of Time,Action Adventure Fantasy Science Fiction,4.6,17,1.478505,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","Mark Singer returns as Dar, the warrior who ca..."
813,1924,Superman,Action Adventure Fantasy Science Fiction,6.9,1022,48.507081,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Mild-mannered Clark Kent works as a reporter a...
870,8536,Superman II,Action Adventure Fantasy Science Fiction,6.5,629,30.515175,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Three escaped criminals from the planet Krypto...
46,127585,X-Men: Days of Future Past,Action Adventure Fantasy Science Fiction,7.5,6032,118.078691,"[{""id"": 1228, ""name"": ""1970s""}, {""id"": 1852, ""...",The ultimate X-Men ensemble fights a war for t...
14,49521,Man of Steel,Action Adventure Fantasy Science Fiction,6.5,6359,99.398009,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",A young boy learns that he has extraordinary p...
1296,9531,Superman III,Comedy Action Adventure Fantasy Science Fiction,5.3,490,22.164202,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...","Aiming to defeat the Man of Steel, wealthy exe..."
1652,14164,Dragonball Evolution,Action Adventure Fantasy Science Fiction Thriller,2.9,462,21.677732,"[{""id"": 3436, ""name"": ""karate""}, {""id"": 9715, ...",The young warrior Son Goku sets out on a quest...
419,8247,Jumper,Adventure Fantasy Science Fiction,5.9,1799,21.218,"[{""id"": 704, ""name"": ""adolescence""}, {""id"": 81...","David Rice is a man who knows no boundaries, a..."
420,11253,Hellboy II: The Golden Army,Adventure Fantasy Science Fiction,6.5,1527,58.57976,"[{""id"": 2096, ""name"": ""auction""}, {""id"": 7005,...",In this continuation to the adventure of the d...


### 폄점을 반영한 추천 시스템

**가중평점** 

$$
가중 평점(Weighted Rating) = (v/(v+m)) * R + (m/(v+m)) * C
$$

-  v: 개별 영화에 평점을 투표한 횟수. vote_count
-  m: 평점을 부여하기 위한 최소 투표 횟수. 임계치 설정(직접)
-  R: 개별 영화에 대한 평균 평점. vote_average
-  C: 전체 영화에 대한 평균 평점. 전체적으로 평점이 후한 편인지, 박한 편인지 반영


In [75]:
movie_df[['title', 'vote_average', 'vote_count']].sort_values('vote_average', ascending=False).head

<bound method NDFrame.head of                       title  vote_average  vote_count
3519       Stiff Upper Lips          10.0           1
4247  Me You and Five Bucks          10.0           2
4045  Dancer, Texas Pop. 81          10.0           1
4662         Little Big Top          10.0           1
3992              Sardaarji           9.5           2
...                     ...           ...         ...
3960           The Deported           0.0           0
4684         American Beast           0.0           0
3967    Four Single Fathers           0.0           0
4486       Naturally Native           0.0           0
4458    Harrison Montgomery           0.0           0

[4803 rows x 3 columns]>

In [76]:
movie_df['vote_count'].quantile(0.6)

np.float64(370.1999999999998)

In [77]:
# 가중 평점 계산을 위한 변수 (v, v, R, C)
m = movie_df['vote_average'].quantile(0.6)      # 최소 투표 횟수
C = movie_df['vote_average'].mean()             # 전체 영화 평점 평균

def weighted_rating(movie):
    v = movie['vote_count']
    R = movie['vote_average']
    return ((v / (v + m)) * R) + ((m / (v + m)) * C) 

movie_df['weighted_vote'] = movie_df.apply(weighted_rating, axis=1)

movie_df[['title', 'vote_average', 'vote_count', 'weighted_vote']].sort_values('weighted_vote', ascending=False).head()   

Unnamed: 0,title,vote_average,vote_count,weighted_vote
1881,The Shawshank Redemption,8.5,8205,8.498094
3337,The Godfather,8.4,5893,8.397457
662,Fight Club,8.3,9413,8.298476
3232,Pulp Fiction,8.3,8428,8.298299
1818,Schindler's List,8.3,4329,8.29669


In [82]:
def recommend_movie_by_gener(movie_title, top_n=10):
    
    movie = movie_df[movie_df['title'] == movie_title]
    if movie.empty:
        return '없는 영화 입니다'
    
    movie_idx = movie.index
    movie_df['genres_sim'] = genres_sim[movie_idx].reshape(-1)
    
    topn_movie_idx = movie_idx_by_genres_sim[movie_idx, :(top_n) * 2]
    topn_movie_idx = topn_movie_idx[topn_movie_idx != movie_idx[0]]
    return movie_df.iloc[topn_movie_idx].sort_values('weighted_vote', ascending=False)[:top_n]

In [83]:
recommend_movie_by_gener('Avatar')

Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview,weighted_vote,genres_sim
3208,333355,Star Wars: Clone Wars: Volume 1,Action Adventure Animation Fantasy Science Fic...,8.0,27,1.881466,"[{""id"": 6091, ""name"": ""war""}, {""id"": 161176, ""...","The Saga continues with the Emmy-winning ""Star...",7.629824,0.80403
46,127585,X-Men: Days of Future Past,Action Adventure Fantasy Science Fiction,7.5,6032,118.078691,"[{""id"": 1228, ""name"": ""1970s""}, {""id"": 1852, ""...",The ultimate X-Men ensemble fights a war for t...,7.498485,1.0
813,1924,Superman,Action Adventure Fantasy Science Fiction,6.9,1022,48.507081,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Mild-mannered Clark Kent works as a reporter a...,6.894895,1.0
14,49521,Man of Steel,Action Adventure Fantasy Science Fiction,6.5,6359,99.398009,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",A young boy learns that he has extraordinary p...,6.499584,1.0
420,11253,Hellboy II: The Golden Army,Adventure Fantasy Science Fiction,6.5,1527,58.57976,"[{""id"": 2096, ""name"": ""auction""}, {""id"": 7005,...",In this continuation to the adventure of the d...,6.498271,0.881917
870,8536,Superman II,Action Adventure Fantasy Science Fiction,6.5,629,30.515175,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Three escaped criminals from the planet Krypto...,6.495829,1.0
232,76170,The Wolverine,Action Science Fiction Adventure Fantasy,6.3,4053,15.953444,"[{""id"": 233, ""name"": ""japan""}, {""id"": 1462, ""n...",Wolverine faces his ultimate nemesis - and tes...,6.299667,0.777778
1191,11551,Small Soldiers,Comedy Adventure Fantasy Science Fiction Action,6.2,511,23.088571,"[{""id"": 3599, ""name"": ""defense industry""}, {""i...",When missile technology is used to enhance toy...,6.198646,0.80403
419,8247,Jumper,Adventure Fantasy Science Fiction,5.9,1799,21.218,"[{""id"": 704, ""name"": ""adolescence""}, {""id"": 81...","David Rice is a man who knows no boundaries, a...",5.900692,0.881917
72,297761,Suicide Squad,Action Adventure Crime Fantasy Science Fiction,5.9,7458,90.23792,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 1296...","From DC Comics comes the Suicide Squad, an ant...",5.900167,0.80403
