# 내용 기반 추천시스템
내용 기반 추천 시스템은 사용자가 과거에 좋아했던 아이템과 유사한 콘텐츠를 추천하는 방식입니다. 이 시스템은 아이템 자체의 속성(예: 영화의 장르, 배우, 감독)을 분석하여 사용자의 선호도를 파악하고, 그 선호도에 맞는 다른 아이템을 추천합니다. 만약 어떤 사용자가 액션 영화를 좋아했다면, 내용 기반 추천 시스템은 그 액션 영화와 장르, 감독, 배우 등이 유사한 다른 액션 영화를 추천해 줄 수 있습니다.

## 장점
* 새로운 아이템 추천 가능: 다른 사용자의 평가에 의존하지 않기 때문에, 새로 등록된 아이템이라도 그 속성에 따라 사용자에게 추천할 수 있습니다.
* 사용자 선호의 명확한 반영: 사용자가 명시적으로 좋아했던 아이템의 속성을 기반으로 추천하므로, 추천 이유를 설명하기 용이합니다.

## 단점
* 과도한 특수화: 사용자가 과거에 좋아했던 아이템과 너무 유사한 아이템만 추천하여, 사용자의 취향을 넓히는 데 한계가 있을 수 있습니다.
* 새로운 사용자 문제: 사용자의 과거 상호작용 데이터가 없는 경우(새로운 사용자)는 추천이 어렵습니다.

In [9]:
import pandas as pd
movies = pd.read_csv("Data/movies.csv")
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [10]:
movies_idx = {}
for i in range(len(movies)):
    row = movies.iloc[i]
    movies_idx[row["movieId"]] = row['title']
movies_idx

{1: 'Toy Story (1995)',
 2: 'Jumanji (1995)',
 3: 'Grumpier Old Men (1995)',
 4: 'Waiting to Exhale (1995)',
 5: 'Father of the Bride Part II (1995)',
 6: 'Heat (1995)',
 7: 'Sabrina (1995)',
 8: 'Tom and Huck (1995)',
 9: 'Sudden Death (1995)',
 10: 'GoldenEye (1995)',
 11: 'American President, The (1995)',
 12: 'Dracula: Dead and Loving It (1995)',
 13: 'Balto (1995)',
 14: 'Nixon (1995)',
 15: 'Cutthroat Island (1995)',
 16: 'Casino (1995)',
 17: 'Sense and Sensibility (1995)',
 18: 'Four Rooms (1995)',
 19: 'Ace Ventura: When Nature Calls (1995)',
 20: 'Money Train (1995)',
 21: 'Get Shorty (1995)',
 22: 'Copycat (1995)',
 23: 'Assassins (1995)',
 24: 'Powder (1995)',
 25: 'Leaving Las Vegas (1995)',
 26: 'Othello (1995)',
 27: 'Now and Then (1995)',
 28: 'Persuasion (1995)',
 29: 'City of Lost Children, The (Cité des enfants perdus, La) (1995)',
 30: 'Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)',
 31: 'Dangerous Minds (1995)',
 32: 'Twelve Monkeys (a.k.a. 12 Monkeys) (199

In [11]:
del movies['title']
movies.head()

Unnamed: 0,movieId,genres
0,1,Adventure|Animation|Children|Comedy|Fantasy
1,2,Adventure|Children|Fantasy
2,3,Comedy|Romance
3,4,Comedy|Drama|Romance
4,5,Comedy


In [12]:
movies['genres'] = movies['genres'].str.split("|")
movies.head()

Unnamed: 0,movieId,genres
0,1,"[Adventure, Animation, Children, Comedy, Fantasy]"
1,2,"[Adventure, Children, Fantasy]"
2,3,"[Comedy, Romance]"
3,4,"[Comedy, Drama, Romance]"
4,5,[Comedy]


In [13]:
movies = movies.explode('genres')
movies

Unnamed: 0,movieId,genres
0,1,Adventure
0,1,Animation
0,1,Children
0,1,Comedy
0,1,Fantasy
...,...,...
9738,193583,Fantasy
9739,193585,Drama
9740,193587,Action
9740,193587,Animation


In [14]:
movies['present'] = 1
movies.head()

Unnamed: 0,movieId,genres,present
0,1,Adventure,1
0,1,Animation,1
0,1,Children,1
0,1,Comedy,1
0,1,Fantasy,1


In [15]:
movies2 = movies.pivot_table(
                        index='movieId',
                        columns='genres',
                        values='present',
                        fill_value=0)
movies2.head()

genres,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [17]:
ratings = pd.read_csv("Data/ratings.csv")
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [18]:
ratings['timestamp'] = pd.to_datetime(ratings['timestamp'], unit='s')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,2000-07-30 18:45:03
1,1,3,4.0,2000-07-30 18:20:47
2,1,6,4.0,2000-07-30 18:37:04
3,1,47,5.0,2000-07-30 19:03:35
4,1,50,5.0,2000-07-30 18:48:51


* 유저1에게 영화 추천하기

In [19]:
user = 1
sample = ratings[ratings["userId"] == user][['movieId', 'rating']]
sample.head()

Unnamed: 0,movieId,rating
0,1,4.0
1,3,4.0
2,6,4.0
3,47,5.0
4,50,5.0


In [20]:
sample2 = movies2.merge(sample, left_index = True, right_on = "movieId", how = 'outer')
sample2 = sample2.set_index("movieId")
sample2

Unnamed: 0_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,...,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,rating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
2,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0
4,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
193583,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
193585,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
193587,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [21]:
sample3 = sample2.dropna()
sample3

Unnamed: 0_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,...,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,rating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
3,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0
6,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,4.0
47,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,5.0
50,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3744,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,4.0
3793,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5.0
3809,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
4006,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,4.0


In [22]:
X = sample3.drop("rating", axis = 1)
Y = sample3['rating']

In [23]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

train_x, test_x, train_y, test_y = train_test_split(X, Y)

In [24]:
model = RandomForestRegressor()
model.fit(train_x, train_y)

pred = model.predict(test_x)

mean_absolute_error(test_y, pred), mean_squared_error(test_y, pred) ** (1/2)

(0.6767199649572875, 0.8896250211521103)

In [25]:
model.fit(X, Y)

In [26]:
pred_df = sample2[sample2['rating'].isnull()]
pred_df

Unnamed: 0_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,...,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,rating
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
7,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,
8,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
193583,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
193585,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
193587,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


In [27]:
import warnings
warnings.filterwarnings("ignore")

X = pred_df.drop('rating', axis = 1)

pred_df["Pred"] = model.predict(X)
pred_df

Unnamed: 0_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,...,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,rating,Pred
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,3.763143
4,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,,4.067028
5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.318390
7,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,,4.004444
8,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.563714
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.041381
193583,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.004756
193585,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.992000
193587,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.706431


In [28]:
recommend = pred_df.sort_values("Pred", ascending = False).head()
recommend

Unnamed: 0_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,...,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western,rating,Pred
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1033,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,5.0
50872,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,5.0
131714,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.992321
129229,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.992321
3999,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,4.992321


In [29]:
box = []
for i in recommend.index:
    name = movies_idx[i]
    box.append(name)

print(box)

['Fox and the Hound, The (1981)', 'Ratatouille (2007)', 'Last Knights (2015)', 'Northmen - A Viking Saga (2014)', 'Vertical Limit (2000)']


## 연습문제
주어진 영화 데이터와 사용자-영화 평점 데이터를 이용하여, 각 사용자에게 아직 평가하지 않은 영화 중 5개의 영화를 추천하고, 그 결과를 "Recommend.xlsx"라는 엑셀 파일로 저장하세요. 엑셀 파일의 형식은 제공된 Recommend.xlsx 파일과 같아야 합니다. (정확히 예시와 똑같은 영화가 추천되지 않아도 괜찮습니다. 목표는 사용자별로 5개의 추천 영화 리스트를 생성하여 엑셀 파일로 저장하는 것입니다.)

### 내가

In [48]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
import warnings
warnings.filterwarnings("ignore")

box2 = []

for user in range(1, 611):
    sample = ratings[ratings["userId"] == user][["movieId", "rating"]]
    sample2 = movies2.merge(sample, on = "movieId", how = "outer").set_index("movieId")

    # 유저가 본 영화로 모델 학습
    sample3 = sample2.dropna()
    
    X = sample3.drop("rating", axis = 1)
    Y = sample3['rating']
    
    model = RandomForestRegressor()
    
    model.fit(X, Y)

    # 유저가 안본영화
    pred_df = sample2[sample2["rating"].isnull()] 
    
    X = pred_df.drop("rating", axis = 1)
    
    pred_df["rating"] = model.predict(X)
    
    idx = pred_df.sort_values("rating", ascending = False).head()

    box = []
    for i in range(len(idx)):
        row = idx.iloc[i]
        box.append(movies_idx[row.name])
    box2.append(box)

print(box2)

[['The Runner (2015)', 'West Beirut (West Beyrouth) (1998)', "Coal Miner's Daughter (1980)", 'Dog Days (Hundstage) (2001)', 'Desierto (2016)'], ['Revolution Will Not Be Televised, The (a.k.a. Chavez: Inside the Coup) (2003)', 'Regret to Inform (1998)', 'Grizzly Man (2005)', 'Earthlings (2006)', 'Samsara (2011)'], ['Chopping Mall (a.k.a. Killbots) (1986)', 'AVPR: Aliens vs. Predator - Requiem (2007)', 'The Purge: Election Year (2016)', 'Hidden, The (1987)', 'Hellraiser: Bloodline (1996)'], ['Seven Years in Tibet (1997)', 'Lawrence of Arabia (1962)', 'War, The (1994)', 'Revolution (1985)', 'Beyond Rangoon (1995)'], ['Fantasia (1940)', 'Peter Pan (1953)', 'Beauty and the Beast: The Enchanted Christmas (1997)', 'Sword in the Stone, The (1963)', 'Strange Magic (2015)'], ['Treasure of the Sierra Madre, The (1948)', 'Legend of Zorro, The (2005)', 'Escape from New York (1981)', 'Spider-Man (2002)', 'Escape from L.A. (1996)'], ['Epic (2013)', 'Shrek the Halls (2007)', 'Pathology (2008)', 'Knock

In [49]:
box2 = pd.DataFrame(box2).reset_index()
box2.rename(columns={"index": "user", 0: "추천1", 1: "추천2", 2: "추천3", 3: "추천4", 4: "추천5"}, inplace=True)
box2["user"] = box2["user"] + 1

box2

Unnamed: 0,user,추천1,추천2,추천3,추천4,추천5
0,1,The Runner (2015),West Beirut (West Beyrouth) (1998),Coal Miner's Daughter (1980),Dog Days (Hundstage) (2001),Desierto (2016)
1,2,"Revolution Will Not Be Televised, The (a.k.a. ...",Regret to Inform (1998),Grizzly Man (2005),Earthlings (2006),Samsara (2011)
2,3,Chopping Mall (a.k.a. Killbots) (1986),AVPR: Aliens vs. Predator - Requiem (2007),The Purge: Election Year (2016),"Hidden, The (1987)",Hellraiser: Bloodline (1996)
3,4,Seven Years in Tibet (1997),Lawrence of Arabia (1962),"War, The (1994)",Revolution (1985),Beyond Rangoon (1995)
4,5,Fantasia (1940),Peter Pan (1953),Beauty and the Beast: The Enchanted Christmas ...,"Sword in the Stone, The (1963)",Strange Magic (2015)
...,...,...,...,...,...,...
605,606,Afro Samurai (2007),Escaflowne: The Movie (Escaflowne) (2000),Dead or Alive: Final (2002),More (1998),Steins;Gate the Movie: The Burden of Déjà vu (...
606,607,Silver Bullet (Stephen King's Silver Bullet) (...,Happy Birthday to Me (1981),Dead of Night (1945),"Howling, The (1980)","Dark Half, The (1993)"
607,608,G.I. Joe: Retaliation (2013),Spider-Man 3 (2007),Iron Man 2 (2010),Abraham Lincoln: Vampire Hunter (2012),Blade: Trinity (2004)
608,609,Buffalo Soldiers (2001),Freaks (1932),Guilty of Romance (Koi no tsumi) (2011),Sorrow (2015),The Hound of the Baskervilles (1988)


In [51]:
box2.to_excel("Recommend.xlsx", index=False)

### 풀이

In [30]:
from tqdm import tqdm

total = []
for u in tqdm(ratings['userId'].unique()):
    user = u
    sample = ratings[ratings["userId"] == user][['movieId', 'rating']]
    sample2 = movies2.merge(sample, left_index = True, right_on = "movieId", how = 'outer')
    sample2 = sample2.set_index("movieId")
    sample3 = sample2.dropna()
    
    X = sample3.drop("rating", axis = 1)
    Y = sample3['rating']
    
    model = RandomForestRegressor()
    model.fit(X, Y)
    
    pred_df = sample2[sample2['rating'].isnull()]
    
    X = pred_df.drop('rating', axis = 1)
    
    pred_df["Pred"] = model.predict(X)
    
    box = [user]
    for i in pred_df.sort_values("Pred", ascending = False).head().index:
        box.append(movies_idx[i])
    total.append(box)

100%|████████████████████████████████████████████████████████████████████████████████| 610/610 [01:16<00:00,  8.03it/s]


In [None]:
result = pd.DataFrame(total, columns = ['user', '추천1', '추천2', '추천3', '추천4', '추천5'])
result.to_excel("Recommend.xlsx", index = False)
result.head()