# 아이템 기반 최근접 이웃 협업 필터링_오늘의 집

###  협업필터링(Collaborative-Filtering)
### 사용자가 아이템에 매긴 평점 정보나 상품 구매 이력과 같은 사용자 행동 양식 기반 추천


### 협업필터링 기반 추천 시스템: 1. 최근접 이웃  2. 잠재 요인


### 1. 최근접 이웃 협업 필터링
#### 1) 사용자기반 협업 필터링: 당신과 비슷한 고객들이 다음 상품 구매
#### 2) 아이템 기반 협업 필터링: 이 상품을 선택한 다른 고객들은 다음 상품도 구매

### 아이템 기반 협업 필터링은 사용자가 아이템을 좋아하는 지 싫어하는 지 평가 척도(평점)이           유사한 아이템을 추천하는 기준이 되는 알고리즘


In [666]:
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error          #추천 시스템의 정확도 지표: RMSE
from sklearn.metrics.pairwise import cosine_similarity  #코사인 유사도 
from tqdm import tqdm_notebook

# 1. 데이터 불러오기

In [214]:
reviews = pd.read_excel("todayhouse_review.xlsx") #리뷰데이터

products= pd.read_excel("todayhouse_product_data.xlsx") #상품정보데이터

In [641]:
ratings = reviews.loc[:,["아이디","상품명","별점"]] #특정 정보만 가져오기
ratings.shape

(793223, 3)

In [642]:
reviews.loc[:,["상품명","가격","날짜","user_id","별점","내구성","가격.1","디자인","배송","좋아요","내용"]].head()

Unnamed: 0,상품명,가격,날짜,user_id,별점,내구성,가격.1,디자인,배송,좋아요,내용
0,크리스마스 LED 테이퍼 초 블랙폿 테이블 센터피스 장식 소품,"25,000원",2020.11.28 ∙ 오늘의집 구매,warmest_,4.8,4,5.0,5.0,5.0,0,오브제 덕분에 지인들과 짧지만 좋은 시간을 가졌습니다:) 디테일도 마음에 들고 확실...
1,크리스마스 LED 테이퍼 초 블랙폿 테이블 센터피스 장식 소품,"25,000원",2020.12.08 ∙ 오늘의집 구매,다현맘이,4.3,4,4.0,4.0,5.0,0,브라운을 시켰는데 레드가 왔구요..보니 빨간 열매 한알이 떨어져있네요 예쁘긴한데 내...
2,도일리 드림캐쳐(2size),"9,200원",2018.04.08 ∙ 오늘의집 구매,쏠S2,5.0,5,5.0,5.0,5.0,2,저는 속커튼사이에 두려고 구매했어요. 생각했던 그대로여서 너무만족합니다. 좋은꿈 꾸...
3,도일리 드림캐쳐(2size),"9,200원",2019.02.23 ∙ 오늘의집 구매,이쁜집이쁜,5.0,5,5.0,5.0,5.0,1,"드림캐처 사려고보다가 저희집이 블랙가구들이랑 꾸며놓은것들이 블랙이라 ,, 드림캐처도..."
4,도일리 드림캐쳐(2size),"9,200원",2021.01.01 ∙ 오늘의집 구매,나나♡2,5.0,5,5.0,5.0,5.0,0,생각보다는 조금 작았지만 예뻐요 커튼사이 포인트 주려고 구매했는데 걸어놓으니 예뻐요...


In [643]:
ratings = ratings.groupby(["아이디","상품명"]).mean().reset_index()


In [644]:
ratings.rename(columns= {"아이디":"user_id" , "상품명":"product_id", "별점" : "rating"}, inplace=True )

In [645]:
ratings.head()

Unnamed: 0,user_id,product_id,rating
0,0,DK053 3인용 풀커버 그레이 발수 패브릭 소파 (스툴 기본포함),5.0
1,0,[5%추가할인] 25mm 알루미늄블라인드 35colors,5.0
2,0,[오늘의딜] 빔프로젝터 풀세트 MK-F800 (파우치+리모컨+HDMI) [1년무상/AS],4.8
3,0,디망쉬 암막+디망쉬 레이스커튼 4장세트 4size 9colors,4.0
4,0,순수 원목 선반 신발장 2단,3.3


In [646]:
ratings.shape

(759921, 3)

In [647]:
ratings.groupby('product_id').size()

product_id
(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors        482
(1+1) 소프트 암막커튼 6color 3size                  488
(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형)        352
(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1     45
(1+1) 트루디 암막 아일렛 창형/중형/대형커튼                   48
                                            ... 
휴대용 장바구니 그물망 프로듀스백                             6
휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R              111
흔들리는 촛불 LED 무빙 캔들 3size                      489
흡착랙 욕실수납용품 모음전 18종 택1                        208
히체어 철제의자 7colors                               5
Length: 3650, dtype: int64

In [648]:
products.rename(columns= {"상품명":"product_id" }, inplace=True )
products.head()

Unnamed: 0,product_ID,product_id,product_price,Product_Url,대분류,대분류_1,중분류,중분류_1,중분류_2,소분류,...,소분류_2,brand,Thema_G,리뷰수,총점,별점평균,내구성,가격,디자인,배송
0,Product_0,디퓨저 리필액 200ml 1+1+1+1 + 캔들증정,14900,https://ohou.se/productions/423747/selling,홈데코,조명,디퓨져,디퓨져,,디퓨져,...,,코코도르,['Romantic'],496,4.7,4.686895,4.725806,4.649194,4.66129,4.677419
1,Product_1,줄무늬 글라스 유리병 2size,2890,https://ohou.se/productions/81677/selling,홈데코,조명,플라워,식물,,화병,...,,코제트,"['Romantic', 'Antique']",982,4.7,4.868432,4.831976,4.891039,4.874745,4.861507
2,Product_2,회전목마캔들 메리고라운드 오브제캔들 디자인 향초 필라왁스 인테리어소품,15800,https://ohou.se/productions/571494/selling,홈데코,조명,캔들,디퓨져,,캔들,...,,F5NATURE,['Romantic'],4,5.0,5.0,5.0,5.0,5.0,5.0
3,Product_3,디퓨저 섬유스틱 화이트,2900,https://ohou.se/productions/445003/selling,홈데코,조명,디퓨져,디퓨져,,디퓨져,...,,모던하우스,['Romantic'],1,5.0,5.0,5.0,5.0,5.0,5.0
4,Product_4,구슬 미니 돔앤 플레이트,3900,https://ohou.se/productions/333018/selling,수납,정리,화장대,테이블정리,,쥬얼리정리용품,...,,모던하우스,['Romantic'],5,5.0,4.9,5.0,4.8,4.8,5.0


### 1.1 최종 상품 데이터에 속한 리뷰 데이터만 가져오기

In [649]:
ratings = pd.merge(ratings, products.loc[:,["product_id"]], on="product_id")

In [650]:
ratings.shape

(745564, 3)

### 1.2 리뷰 2개 이상 상품 데이터만 가져오기

In [651]:
ratings_by_item = ratings.groupby('product_id').size()
ratings_by_item

product_id
(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors        482
(1+1) 소프트 암막커튼 6color 3size                  488
(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형)        352
(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1     45
(1+1) 트루디 암막 아일렛 창형/중형/대형커튼                   48
                                            ... 
휴대용 장바구니 그물망 프로듀스백                             6
휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R              111
흔들리는 촛불 LED 무빙 캔들 3size                      489
흡착랙 욕실수납용품 모음전 18종 택1                        208
히체어 철제의자 7colors                               5
Length: 3596, dtype: int64

In [652]:
active_item = ratings_by_item.index[ratings_by_item>=2]

In [653]:
active_item

Index(['(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors', '(1+1) 소프트 암막커튼 6color 3size',
       '(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형)',
       '(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1',
       '(1+1) 트루디 암막 아일렛 창형/중형/대형커튼', '(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors',
       '(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형', '(1+1)푸벨드마망 벽걸이 휴지통 7L/10L',
       '(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정',
       '(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정',
       ...
       '황칠나무 중형_토분 4type', '회전목마캔들 메리고라운드 오브제캔들 디자인 향초 필라왁스 인테리어소품', '휘브체크커튼',
       '휴대용 블루투스 스피커 RETRO20W (라디오,USB)', '휴대용 원목 우드 보풀제거기1+1',
       '휴대용 장바구니 그물망 프로듀스백', '휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R',
       '흔들리는 촛불 LED 무빙 캔들 3size', '흡착랙 욕실수납용품 모음전 18종 택1', '히체어 철제의자 7colors'],
      dtype='object', name='product_id', length=3322)

In [654]:
aa_ratings = pd.merge(pd.DataFrame(active_item),ratings, on="product_id")

In [655]:
aa_ratings.head()

Unnamed: 0,product_id,user_id,rating
0,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,-빤쓰,5.0
1,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,100정Honey,5.0
2,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,1mm_hyung,5.0
3,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,@지구@,5.0
4,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,Bambie94,5.0


### 1.3 리뷰 10개 이상 올린 사용자 데이터만 가져오기

In [657]:
ratings_by_user = ratings.groupby('user_id').size()
ratings_by_user

user_id
0         7
75        1
333       5
 小確幸73    1
! 정다송     3
         ..
🧸ꌗꍟꍏ🧸     1
🧸🛏🤍       2
🧸🧸        4
🧸🧸🧸~      1
🩰🤍        1
Length: 369488, dtype: int64

In [658]:
#리뷰 10개 이상 올린 사용자가 있는 상품 추출
ratings_by_user = aa_ratings.groupby('user_id').size()
ratings_by_user.describe()

count    369427.000000
mean          2.017422
std           1.923374
min           1.000000
25%           1.000000
50%           1.000000
75%           2.000000
max          60.000000
dtype: float64

In [663]:
active_user = ratings_by_user.index[ratings_by_user>=10]
a_ratings = pd.merge(pd.DataFrame(active_user),aa_ratings.loc[:,["user_id","product_id", "rating"]], on="user_id")

In [664]:
active_user

Index(['(•ㅇ_ㅇ)', '(ㅎ_ㅎ)', '*로지', ',eugene', '-/—/-', '-27', '-골리열리', '-김띠니',
       '-꿍이', '-리리',
       ...
       '💋둘둘', '💛연딩💛', '💜❤️🧡💚💙💛2', '💜🎀여나🎀💜93', '🖤혠니', '😘ㄹㅏㄴㅣ♡', '🙋‍♀️🙋‍♀️',
       '🤔튜운식', '🤪🤪🤪', '🥰을이'],
      dtype='object', name='user_id', length=4287)

In [662]:
a_ratings.groupby('product_id').size()

product_id
(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors         17
(1+1) 소프트 암막커튼 6color 3size                   18
(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형)          7
(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1      2
(1+1) 트루디 암막 아일렛 창형/중형/대형커튼                    1
                                            ... 
휴대용 원목 우드 보풀제거기1+1                            17
휴대용 장바구니 그물망 프로듀스백                             1
휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R               12
흔들리는 촛불 LED 무빙 캔들 3size                      117
흡착랙 욕실수납용품 모음전 18종 택1                         14
Length: 2707, dtype: int64

In [609]:
a_ratings.to_excel("house_0130.xlsx")

In [414]:
data= pd.read_excel("house_0130.xlsx") #user익명으로 변환한 데이터로 재업로드

In [416]:
data = data.loc[:,["user_id","product_id","rating"]] #특정 정보만 가져오기

data.head()

Unnamed: 0,user_id,product_id,rating
0,1,FADO 파도 달 조명_2size / 2colors(박스 안전포장),4.8
1,1,[10%쿠폰] AT-LP60XBT 자동 벨트 드라이브 블루투스 턴테이블,5.0
2,1,[주말특가] 단독컬러 모노 멜란지 이불커버세트 (S/QK겸용) 3colors,4.0
3,1,고무나무 원목 사다리 다용도 수납선반 3colors,4.0
4,1,모리 3단 1200 와이드 서랍장 시리즈 3colors,4.5


In [555]:
data.groupby('product_id').size()

product_id
(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors         17
(1+1) 소프트 암막커튼 6color 3size                   18
(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형)          7
(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1      2
(1+1) 트루디 암막 아일렛 창형/중형/대형커튼                    1
                                            ... 
휴대용 블루투스 스피커 RETRO20W (라디오,USB)                2
휴대용 원목 우드 보풀제거기1+1                            17
휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R               12
흔들리는 촛불 LED 무빙 캔들 3size                      116
흡착랙 욕실수납용품 모음전 18종 택1                         14
Length: 2366, dtype: int64

In [560]:
len(data['user_id'])

54115

In [519]:
rating_matrix #4188*2366 

product_id,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,(1+1) 소프트 암막커튼 6color 3size,(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors,(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형,(1+1)푸벨드마망 벽걸이 휴지통 7L/10L,(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정,(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정,...,화이트 워싱면 파티션 가리개커튼,화이트 자개모빌 만들기 재료 DIY 키트 패키지,화이트 접착식 조리 도구 걸이 8p,화이트골드 앤틱 촛대,화이트모던 테슬포인트 소형러그,"휴대용 블루투스 스피커 RETRO20W (라디오,USB)",휴대용 원목 우드 보풀제거기1+1,휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R,흔들리는 촛불 LED 무빙 캔들 3size,흡착랙 욕실수납용품 모음전 18종 택1
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4184,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0
4185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4186,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4187,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# 2. 아이템 기반 협업 필터링

### 2.1 리뷰 데이터 가져오기

In [556]:
data = data.loc[:,["user_id","product_id","rating"]] #사용자id, 상품명, 별점

### 2.2. 사용자-아이템 평점 행렬로 변환

In [None]:
# pivot_table 메소드를 사용해서 행렬 변환
rating_matrix = data.pivot_table('rating', index='user_id', columns='product_id')

# NaN 값을 모두 0으로 변환
rating_matrix=rating_matrix.fillna(0)

# 아이템-사용자 행렬로 transpose 한다.
ratings_matrix_T = rating_matrix.transpose()

### 2.3. 상품과 상품 간 유사도 산출

In [422]:
from sklearn.metrics.pairwise import cosine_similarity

# 코사인 유사도 산출
item_sim = cosine_similarity(ratings_matrix_T, ratings_matrix_T)

# cosine_similarity() 로 반환된 넘파이 행렬을 상품명을 매핑하여 DataFrame으로 변환
item_sim_df = pd.DataFrame(data=item_sim, index=rating_matrix.columns,
                          columns=rating_matrix.columns)

(2366, 2366)


In [518]:
print(item_sim_df.shape)
item_sim_df.head(5)

product_id,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,(1+1) 소프트 암막커튼 6color 3size,(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors,(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형,(1+1)푸벨드마망 벽걸이 휴지통 7L/10L,(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정,(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정,...,화이트 워싱면 파티션 가리개커튼,화이트 자개모빌 만들기 재료 DIY 키트 패키지,화이트 접착식 조리 도구 걸이 8p,화이트골드 앤틱 촛대,화이트모던 테슬포인트 소형러그,"휴대용 블루투스 스피커 RETRO20W (라디오,USB)",휴대용 원목 우드 보풀제거기1+1,휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R,흔들리는 촛불 LED 무빙 캔들 3size,흡착랙 욕실수납용품 모음전 18종 택1
product_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,1.0,0.0,0.0,0.0,0.0,0.055641,0.0,0.0,0.0,0.0,...,0.0,0.0,0.028802,0.0,0.0,0.0,0.0,0.0,0.0,0.0
(1+1) 소프트 암막커튼 6color 3size,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.107702,0.0,0.0,0.0,0.0,0.0,0.0
(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),0.0,0.0,1.0,0.2671,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.162692,0.0,0.0,0.0,0.0,0.0,0.0,0.036453,0.0
(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,0.0,0.0,0.2671,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [424]:
#자기 제외 유사한 아이템 5명 확인해보기
item_sim_df["(1+1) 소프트 암막커튼 6color 3size"].sort_values(ascending=False)[1:6]

product_id
자가발열 양털 방석 42x42cm 2colors           0.235702
인테리어 가죽 각티슈 커버/케이스 6종                0.163231
못 없이 설치하자! 안뚫어고리 ver.1 화이트/블랙 2개입    0.136083
[리퍼] GOLD20W 거치대형 스피커                0.117851
프리미엄 와인잔 특가모음                        0.110358
Name: (1+1) 소프트 암막커튼 6color 3size, dtype: float64

### 2.4 아이템 기반 인접 이웃 협업 필터링으로 개인화된 상품 추천

In [493]:
# 평점 벡터(행 벡터)와 유사도 벡터(열 벡터)를 내적(dot)해서 예측 평점을 계산하는 함수 정의
def predict_rating(data_arr, item_sim_arr):
    ratings_pred = data_arr.dot(item_sim_arr)/ np.array([np.abs(item_sim_arr).sum(axis=1)])
    return ratings_pred

In [494]:
ratings_pred = predict_rating(rating_matrix.values , item_sim_df.values)
ratings_pred

array([[0.07656713, 0.01196497, 0.02151635, ..., 0.0240911 , 0.01845153,
        0.05896044],
       [0.01682708, 0.02248802, 0.04152421, ..., 0.02241704, 0.05199626,
        0.03156371],
       [0.0652001 , 0.06406511, 0.04899728, ..., 0.04134985, 0.04146631,
        0.12379054],
       ...,
       [0.081927  , 0.08020396, 0.10657788, ..., 0.05312099, 0.04939268,
        0.02520161],
       [0.07749491, 0.07470059, 0.02340117, ..., 0.06887969, 0.09726195,
        0.05491645],
       [0.        , 0.01350006, 0.09794805, ..., 0.05214108, 0.01793692,
        0.        ]])

In [495]:
len(ratings_pred)

4188

In [496]:
# 데이터프레임으로 변환
ratings_pred_matrix = pd.DataFrame(data=ratings_pred, index= rating_matrix.index,
                                   columns = rating_matrix.columns)
print(ratings_pred_matrix.shape)

(4188, 2366)


#### 구매자들의 상품별 예측평점

In [497]:
ratings_pred_matrix.head(10)

product_id,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,(1+1) 소프트 암막커튼 6color 3size,(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors,(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형,(1+1)푸벨드마망 벽걸이 휴지통 7L/10L,(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정,(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정,...,화이트 워싱면 파티션 가리개커튼,화이트 자개모빌 만들기 재료 DIY 키트 패키지,화이트 접착식 조리 도구 걸이 8p,화이트골드 앤틱 촛대,화이트모던 테슬포인트 소형러그,"휴대용 블루투스 스피커 RETRO20W (라디오,USB)",휴대용 원목 우드 보풀제거기1+1,휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R,흔들리는 촛불 LED 무빙 캔들 3size,흡착랙 욕실수납용품 모음전 18종 택1
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.076567,0.011965,0.021516,0.0,0.0,0.017722,0.0,0.073697,0.0,0.0,...,0.025121,0.0,0.029937,0.006091,0.0,0.0,0.0,0.024091,0.018452,0.05896
2,0.016827,0.022488,0.041524,0.0,0.246504,0.026545,0.0,0.0,0.09849,0.052151,...,0.030331,0.071858,0.068323,0.063253,0.0,0.0,0.048246,0.022417,0.051996,0.031564
3,0.0652,0.064065,0.048997,0.035986,0.0,0.0369,0.0,0.041254,0.0,0.0,...,0.024407,0.0,0.069707,0.0,0.139264,0.171892,0.073684,0.04135,0.041466,0.123791
4,0.036543,0.108788,0.093134,0.0,0.0,0.065645,0.311347,0.021011,0.0,0.10996,...,0.035199,0.059565,0.040726,0.070855,0.146751,0.09422,0.066876,0.03065,0.052871,0.074881
5,0.073591,0.092525,0.00817,0.0,0.072614,0.035174,0.073644,0.0,0.041604,0.037014,...,0.054912,0.057234,0.02072,0.022389,0.035886,0.047616,0.034515,0.053657,0.030565,0.12366
6,0.069711,0.018393,0.03324,0.094625,0.0,0.013269,0.299612,0.0,0.094094,0.0,...,0.037168,0.0,0.043312,0.007405,0.0,0.0,0.018039,0.025594,0.059126,0.030746
7,0.043157,0.053794,0.12585,0.097154,0.0,0.038632,0.0,0.041136,0.0,0.0,...,0.012119,0.100369,0.009344,0.061194,0.140881,0.102483,0.011396,0.027668,0.032042,0.038915
8,0.041002,0.095344,0.00817,0.0,0.072614,0.048549,0.073644,0.0,0.061486,0.037014,...,0.053012,0.0,0.055847,0.032475,0.035886,0.177454,0.041428,0.028934,0.058115,0.099561
9,0.084913,0.078897,0.045097,0.0,0.283036,0.041767,0.564524,0.0,0.0,0.037307,...,0.029114,0.030179,0.042367,0.019697,0.0,0.028796,0.045461,0.020029,0.027644,0.029721
10,0.100171,0.11284,0.065671,0.0,0.253975,0.094193,0.0,0.096183,0.096113,0.110224,...,0.0812,0.0,0.044303,0.055416,0.057022,0.191795,0.053766,0.073713,0.046796,0.046737


### 2.5 예측 평점 정확도를 판단하기 위해 오차 함수인 RMSE를 이용


In [498]:
from sklearn.metrics import mean_squared_error

# 사용자가 평점을 부여한 영화에 대해서만 예측 성능 평가 MSE 를 구함. 
def get_mse(pred, actual):
    # Ignore nonzero terms.
    pred = pred[actual.nonzero()].flatten()
    actual = actual[actual.nonzero()].flatten()
    return mean_squared_error(pred, actual)

print('아이템 기반 모든 인접 이웃 MSE: ', get_mse(ratings_pred, rating_matrix.values ))


아이템 기반 모든 인접 이웃 MSE:  18.377619548752733


### 2.6 top-n 유사도를 가진 데이터들에 대해서만 예측 평점 계산

In [499]:
def predict_rating_topsim(ratings_arr, item_sim_arr, n=20):
    # 사용자-아이템 평점 행렬 크기만큼 0으로 채운 예측 행렬 초기화
    pred = np.zeros(ratings_arr.shape)

    # 사용자-아이템 평점 행렬의 열 크기만큼 Loop 수행. 
    for col in range(ratings_arr.shape[1]):
        # 유사도 행렬에서 유사도가 큰 순으로 n개 데이터 행렬의 index 반환
        top_n_items = [np.argsort(item_sim_arr[:, col])[:-n-1:-1]]
        # 개인화된 예측 평점을 계산
        for row in range(ratings_arr.shape[0]):
            pred[row, col] = item_sim_arr[col, :][top_n_items].dot(ratings_arr[row, :][top_n_items].T) 
            pred[row, col] /= np.sum(np.abs(item_sim_arr[col, :][top_n_items]))        
    return pred


In [525]:
# warning 메시지 무시하는 코드
import warnings ; warnings.filterwarnings(action='ignore')

In [530]:
ratings_pred = predict_rating_topsim(rating_matrix.values , item_sim_df.values, n=30)
print('아이템 기반 인접 TOP-30 이웃 MSE: ', get_mse(ratings_pred, rating_matrix.values ))

ratings_pred_matrix1 = pd.DataFrame(data=ratings_pred, index= rating_matrix.index,
                                   columns = rating_matrix.columns)

아이템 기반 인접 TOP-30 이웃 MSE:  10.942114269907961


In [526]:
ratings_pred = predict_rating_topsim(rating_matrix.values , item_sim_df.values, n=20)
print('아이템 기반 인접 TOP-20 이웃 MSE: ', get_mse(ratings_pred, rating_matrix.values ))

ratings_pred_matrix2 = pd.DataFrame(data=ratings_pred, index= rating_matrix.index,
                                   columns = rating_matrix.columns)

아이템 기반 인접 TOP-20 이웃 MSE:  9.091377657978763


In [532]:
ratings_pred = predict_rating_topsim(rating_matrix.values , item_sim_df.values, n=10)
print('아이템 기반 인접 TOP-10 이웃 MSE: ', get_mse(ratings_pred, rating_matrix.values ))

ratings_pred_matrix = pd.DataFrame(data=ratings_pred, index= rating_matrix.index,
                                   columns = rating_matrix.columns)

아이템 기반 인접 TOP-10 이웃 MSE:  5.8650995081961215


In [528]:
# 계산된 예측 평점 데이터는 DataFrame으로 재생성
#최종적인 상품별 예측 평점 데이터가 만들어졌다.

### 2.7 사용자에게 상품을 추천해보자

In [533]:
# 사용자 "1001"에게 상품을 추천해보자
user_rating_id = rating_matrix.loc[1001, :]
user_rating_id[ user_rating_id > 0].sort_values(ascending=False)[:10]

product_id
워싱내츄럴 쿠션커버 2type / 5size                 5.00
벽난로 콘솔장식장 3colors(L)                     5.00
레트로 카본 히터 BKH-6082CW / BKH-6082C         5.00
듀얼미스트 무선 미니 가습기 500ml                    5.00
[오늘의딜] 로망스 심플 단모 사계절 러그 - 4colors        5.00
디퓨저 200ml 1+1+1+1 + 캔들증정                 4.75
푸로 초음파 가습기 & 무드등 미니가습기 300ml             4.50
순면 피그먼트 워싱 줄누빔 이불겸패드 (SS/Q/K)-6color     4.50
로맨틱부케 식탁보                                4.50
FADO 파도 달 조명_2size / 2colors(박스 안전포장)    4.50
Name: 1001, dtype: float64

In [534]:
user_rating_id = rating_matrix.loc[1001, :]
dd = user_rating_id[ user_rating_id > 0].sort_values(ascending=False)[:10]
dd

product_id
워싱내츄럴 쿠션커버 2type / 5size                 5.00
벽난로 콘솔장식장 3colors(L)                     5.00
레트로 카본 히터 BKH-6082CW / BKH-6082C         5.00
듀얼미스트 무선 미니 가습기 500ml                    5.00
[오늘의딜] 로망스 심플 단모 사계절 러그 - 4colors        5.00
디퓨저 200ml 1+1+1+1 + 캔들증정                 4.75
푸로 초음파 가습기 & 무드등 미니가습기 300ml             4.50
순면 피그먼트 워싱 줄누빔 이불겸패드 (SS/Q/K)-6color     4.50
로맨틱부케 식탁보                                4.50
FADO 파도 달 조명_2size / 2colors(박스 안전포장)    4.50
Name: 1001, dtype: float64

In [535]:
ddd = pd.DataFrame(data=dd.values,index=dd.index,columns=['rating'])
ddd

Unnamed: 0_level_0,rating
product_id,Unnamed: 1_level_1
워싱내츄럴 쿠션커버 2type / 5size,5.0
벽난로 콘솔장식장 3colors(L),5.0
레트로 카본 히터 BKH-6082CW / BKH-6082C,5.0
듀얼미스트 무선 미니 가습기 500ml,5.0
[오늘의딜] 로망스 심플 단모 사계절 러그 - 4colors,5.0
디퓨저 200ml 1+1+1+1 + 캔들증정,4.75
푸로 초음파 가습기 & 무드등 미니가습기 300ml,4.5
순면 피그먼트 워싱 줄누빔 이불겸패드 (SS/Q/K)-6color,4.5
로맨틱부케 식탁보,4.5
FADO 파도 달 조명_2size / 2colors(박스 안전포장),4.5


In [538]:
products.head()

Unnamed: 0,product_ID,product_id,product_price,Product_Url,대분류,대분류_1,중분류,중분류_1,중분류_2,소분류,...,소분류_2,brand,Thema_G,리뷰수,총점,별점평균,내구성,가격,디자인,배송
0,Product_0,디퓨저 리필액 200ml 1+1+1+1 + 캔들증정,14900,https://ohou.se/productions/423747/selling,홈데코,조명,디퓨져,디퓨져,,디퓨져,...,,코코도르,['Romantic'],496,4.7,4.686895,4.725806,4.649194,4.66129,4.677419
1,Product_1,줄무늬 글라스 유리병 2size,2890,https://ohou.se/productions/81677/selling,홈데코,조명,플라워,식물,,화병,...,,코제트,"['Romantic', 'Antique']",982,4.7,4.868432,4.831976,4.891039,4.874745,4.861507
2,Product_2,회전목마캔들 메리고라운드 오브제캔들 디자인 향초 필라왁스 인테리어소품,15800,https://ohou.se/productions/571494/selling,홈데코,조명,캔들,디퓨져,,캔들,...,,F5NATURE,['Romantic'],4,5.0,5.0,5.0,5.0,5.0,5.0
3,Product_3,디퓨저 섬유스틱 화이트,2900,https://ohou.se/productions/445003/selling,홈데코,조명,디퓨져,디퓨져,,디퓨져,...,,모던하우스,['Romantic'],1,5.0,5.0,5.0,5.0,5.0,5.0
4,Product_4,구슬 미니 돔앤 플레이트,3900,https://ohou.se/productions/333018/selling,수납,정리,화장대,테이블정리,,쥬얼리정리용품,...,,모던하우스,['Romantic'],5,5.0,4.9,5.0,4.8,4.8,5.0


In [539]:
pd.merge(ddd, products.loc[:,["product_id","Thema_G","대분류","대분류_1","소분류"]], on="product_id")

Unnamed: 0,product_id,rating,Thema_G,대분류,대분류_1,소분류
0,워싱내츄럴 쿠션커버 2type / 5size,5.0,['Modern'],패브릭,,쿠션
1,벽난로 콘솔장식장 3colors(L),5.0,['Natural'],가구,,진열장
2,레트로 카본 히터 BKH-6082CW / BKH-6082C,5.0,['Europe'],가전,,전기히터
3,듀얼미스트 무선 미니 가습기 500ml,5.0,['Natural'],가전,,가습기
4,[오늘의딜] 로망스 심플 단모 사계절 러그 - 4colors,5.0,"['Europe', 'Romantic', 'Antique', 'Modern']",패브릭,,러그
5,디퓨저 200ml 1+1+1+1 + 캔들증정,4.75,"['Romantic', 'Antique']",홈데코,조명,디퓨져
6,푸로 초음파 가습기 & 무드등 미니가습기 300ml,4.5,['Vintage'],가전,,가습기
7,순면 피그먼트 워싱 줄누빔 이불겸패드 (SS/Q/K)-6color,4.5,['Natural'],패브릭,,패드
8,로맨틱부케 식탁보,4.5,['Romantic'],패브릭,,식탁보
9,FADO 파도 달 조명_2size / 2colors(박스 안전포장),4.5,"['Europe', 'Vintage', 'Natural', 'Antique', 'M...",홈데코,조명,단스탠드


### 2.8 사용자가 구매하지 않은 상품 중에서 상품을 추천해보자

In [541]:
def get_notbuy_products(ratings_matrix, userId):
    # userId로 입력받은 사용자의 모든 상품 정보 추출하여 Series로 반환함. 
    # 반환된 user_rating 은 영화명(title)을 index로 가지는 Series 객체임. 
    user_rating = rating_matrix.loc[userId,:]
    
    # user_rating이 0보다 크면 기존에 관람한 영화임. 대상 index를 추출하여 list 객체로 만듬
    already_buy = user_rating[ user_rating > 0].index.tolist()
    
    # 모든 영화명을 list 객체로 만듬. 
    product_list = rating_matrix.columns.tolist()
    
    # list comprehension으로 already_seen에 해당하는 movie는 movies_list에서 제외함. 
    notbuy_list = [product for product in product_list if product not in already_buy]
    
    return notbuy_list


In [542]:
# pred_df : 앞서 계산된 상품별 예측 평점
# notbuy_list : 사용자가 구매하지 않은 상품들
# top_n : 상위 n개를 가져온다.

def recomm_product_by_userid(pred_df, userId, notbuy_list, top_n=10):
    # 예측 평점 DataFrame에서 사용자id index와 notbuy_list로 들어온 상품명 컬럼을 추출하여
    # 가장 예측 평점이 높은 순으로 정렬함. 
    recomm_products = pred_df.loc[userId, notbuy_list].sort_values(ascending=False)[:top_n]
    return recomm_products

In [543]:
# 사용자가 구매하지 않은 상품명 추출   
notbuy_list = get_notbuy_products(rating_matrix, 1001)

# 아이템 기반의 인접 이웃 협업 필터링으로 상품 추천 
recomm_products = recomm_product_by_userid(ratings_pred_matrix, 1001, notbuy_list, top_n=10)

# 평점 데이타를 DataFrame으로 생성. 
recomm_products = pd.DataFrame(data=recomm_products.values,index=recomm_products.index,columns=['pred_score'])
recomm_products

Unnamed: 0_level_0,pred_score
product_id,Unnamed: 1_level_1
누덴 크리스마스 트리 캔들,0.714666
솔레 장스탠드_2colors(28일 순차배송 / 램프증정),0.596131
홈 쿠킹 미니 오븐기 2colors,0.563284
보들보들 마가렛 면모달 이불,0.448853
1+1+1 뉴CO디퓨저 2개 + 플라워 디퓨저,0.423876
1+1 듀얼쉐이드 네츄럴 콤비블라인드 36colors,0.392288
레트로 마리앙 쿠션커버_마리앙자수,0.377295
패브릭 드로잉포스터 인테리어소품 모음,0.374604
베가 플리아 투명 접이식의자 7colors,0.363516
심플 원형 러그 사계절 단모 극세사 워셔블 카페트 4colors 3size,0.336095


In [None]:
결론:
아이템 기반의 인접 이웃 협업 필터링으로
사용자의 상품 예측 평점을 계산해서
상위 10개의 상품을 추천해주었다.

In [544]:
pd.merge(recomm_products, products.loc[:,["product_id","Thema_G"]], on="product_id")

Unnamed: 0,product_id,pred_score,Thema_G
0,누덴 크리스마스 트리 캔들,0.714666,['Natural']
1,솔레 장스탠드_2colors(28일 순차배송 / 램프증정),0.596131,['Vintage']
2,홈 쿠킹 미니 오븐기 2colors,0.563284,['Natural']
3,보들보들 마가렛 면모달 이불,0.448853,['Romantic']
4,1+1+1 뉴CO디퓨저 2개 + 플라워 디퓨저,0.423876,['Romantic']
5,1+1 듀얼쉐이드 네츄럴 콤비블라인드 36colors,0.392288,['Romantic']
6,레트로 마리앙 쿠션커버_마리앙자수,0.377295,['Romantic']
7,패브릭 드로잉포스터 인테리어소품 모음,0.374604,"['Romantic', 'Antique']"
8,베가 플리아 투명 접이식의자 7colors,0.363516,"['Europe', 'Vintage', 'Natural', 'Antique', 'M..."
9,심플 원형 러그 사계절 단모 극세사 워셔블 카페트 4colors 3size,0.336095,['Romantic']


# 3. 잠재요인 협업필터링 

In [545]:
from tqdm import tqdm_notebook

In [546]:
def get_rmse(R, P, Q, non_zeros):
    error = 0
    # 두개의 분해된 행렬 P와 Q.T의 내적 곱으로 예측 R 행렬 생성
    full_pred_matrix = np.dot(P, Q.T)
    
    # 실제 R 행렬에서 널이 아닌 값의 위치 인덱스 추출하여 실제 R 행렬과 예측 행렬의 RMSE 추출
    x_non_zero_ind = [non_zero[0] for non_zero in non_zeros]
    y_non_zero_ind = [non_zero[1] for non_zero in non_zeros]
    R_non_zeros = R[x_non_zero_ind, y_non_zero_ind]
    
    full_pred_matrix_non_zeros = full_pred_matrix[x_non_zero_ind, y_non_zero_ind]
    
    mse = mean_squared_error(R_non_zeros, full_pred_matrix_non_zeros)
    rmse = np.sqrt(mse)
    
    return rmse

In [547]:
def matrix_factorization(R, K, steps=200, learning_rate=0.01, r_lambda = 0.01):
    num_users, num_items = R.shape
    # P와 Q 매트릭스의 크기를 지정하고 정규분포를 가진 랜덤한 값으로 입력합니다. 
    np.random.seed(1)
    P = np.random.normal(scale=1./K, size=(num_users, K))
    Q = np.random.normal(scale=1./K, size=(num_items, K))

    break_count = 0
       
    # R > 0 인 행 위치, 열 위치, 값을 non_zeros 리스트 객체에 저장. 
    non_zeros = [ (i, j, R[i,j]) for i in range(num_users) for j in range(num_items) if R[i,j] > 0 ]
   
    # P와 Q 매트릭스를 계속 업데이트(확률적 경사하강법)
    for step in tqdm_notebook(range(steps)):
        for i, j, r in non_zeros:
            # 실제 값과 예측 값의 차이인 오류 값 구함
            eij = r - np.dot(P[i, :], Q[j, :].T)
            # Regularization을 반영한 SGD 업데이트 공식 적용
            P[i,:] = P[i,:] + learning_rate*(eij * Q[j, :] - r_lambda*P[i,:])
            Q[j,:] = Q[j,:] + learning_rate*(eij * P[i, :] - r_lambda*Q[j,:])
       
        rmse = get_rmse(R, P, Q, non_zeros)
        if (step % 10) == 0 :
            print("### iteration step : ", step," rmse : ", rmse)
            
    return P, Q

In [548]:
rating_matrix.head(10)

product_id,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,(1+1) 소프트 암막커튼 6color 3size,(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors,(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형,(1+1)푸벨드마망 벽걸이 휴지통 7L/10L,(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정,(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정,...,화이트 워싱면 파티션 가리개커튼,화이트 자개모빌 만들기 재료 DIY 키트 패키지,화이트 접착식 조리 도구 걸이 8p,화이트골드 앤틱 촛대,화이트모던 테슬포인트 소형러그,"휴대용 블루투스 스피커 RETRO20W (라디오,USB)",휴대용 원목 우드 보풀제거기1+1,휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R,흔들리는 촛불 LED 무빙 캔들 3size,흡착랙 욕실수납용품 모음전 18종 택1
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 3.1 경사하강법을 이용한 행렬 분해

In [549]:
%%time
P, Q = matrix_factorization(rating_matrix.values
                            , K=50, steps=200, learning_rate=0.01, r_lambda = 0.01)

pred_matrix = np.dot(P, Q.T)

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=200.0), HTML(value='')))

### iteration step :  0  rmse :  4.829382523560343
### iteration step :  10  rmse :  0.6064579312006436
### iteration step :  20  rmse :  0.35365056020869046
### iteration step :  30  rmse :  0.28025760805600747
### iteration step :  40  rmse :  0.23742645486656516
### iteration step :  50  rmse :  0.20377263134866816
### iteration step :  60  rmse :  0.17571368752259206
### iteration step :  70  rmse :  0.15254489958779494
### iteration step :  80  rmse :  0.13355696478283396
### iteration step :  90  rmse :  0.11800657785374091
### iteration step :  100  rmse :  0.1052446882281458
### iteration step :  110  rmse :  0.09472860057972546
### iteration step :  120  rmse :  0.08601522888979889
### iteration step :  130  rmse :  0.0787508487659086
### iteration step :  140  rmse :  0.07265696823055655
### iteration step :  150  rmse :  0.06751501670783454
### iteration step :  160  rmse :  0.06315261350986974
### iteration step :  170  rmse :  0.05943263726475714
### iteration step :  180 

In [550]:
%%time
# 경사하강법을 이용한 행렬 분해
P, Q = matrix_factorization(rating_matrix.values, K=100, steps=200, learning_rate=0.01, r_lambda = 0.01)

pred_matrix = np.dot(P, Q.T)

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=200.0), HTML(value='')))

### iteration step :  0  rmse :  4.8357446668877095
### iteration step :  10  rmse :  0.6201678738352999
### iteration step :  20  rmse :  0.36211068446085326
### iteration step :  30  rmse :  0.29118295239743885
### iteration step :  40  rmse :  0.250880194911974
### iteration step :  50  rmse :  0.21840277228374394
### iteration step :  60  rmse :  0.1898419570194762
### iteration step :  70  rmse :  0.16521402688451436
### iteration step :  80  rmse :  0.144514106805319
### iteration step :  90  rmse :  0.12724343140808816
### iteration step :  100  rmse :  0.11284941757320709
### iteration step :  110  rmse :  0.10085244889945436
### iteration step :  120  rmse :  0.09083649146066392
### iteration step :  130  rmse :  0.08244858941181786
### iteration step :  140  rmse :  0.07539835411338532
### iteration step :  150  rmse :  0.0694500892200999
### iteration step :  160  rmse :  0.0644124794364081
### iteration step :  170  rmse :  0.06012961929610812
### iteration step :  180  rms

In [551]:
%%time
# 경사하강법을 이용한 행렬 분해
P, Q = matrix_factorization(rating_matrix.values,
                            K=200, steps=200, learning_rate=0.01, r_lambda = 0.01)

pred_matrix = np.dot(P, Q.T)

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=200.0), HTML(value='')))

### iteration step :  0  rmse :  4.837523332316664
### iteration step :  10  rmse :  0.6293554908241465
### iteration step :  20  rmse :  0.368963685483619
### iteration step :  30  rmse :  0.3008529312084611
### iteration step :  40  rmse :  0.2638602465708202
### iteration step :  50  rmse :  0.2340245256787019
### iteration step :  60  rmse :  0.20664504299666445
### iteration step :  70  rmse :  0.1817290378824638
### iteration step :  80  rmse :  0.15998099953857334
### iteration step :  90  rmse :  0.14141724198565633
### iteration step :  100  rmse :  0.12566273354816995
### iteration step :  110  rmse :  0.11231797131129706
### iteration step :  120  rmse :  0.10102200494942272
### iteration step :  130  rmse :  0.0914501153403326
### iteration step :  140  rmse :  0.08332057300203144
### iteration step :  150  rmse :  0.07639766837289004
### iteration step :  160  rmse :  0.07048682054330568
### iteration step :  170  rmse :  0.06542690017853381
### iteration step :  180  rmse

In [None]:
#경사하강법이 진행되면서 RMSE 값이 계속 줄어들었다.

### 3.2 예측 평점 행렬 확인

In [443]:
ratings_pred_matrix = pd.DataFrame(data=pred_matrix, index= rating_matrix.index,
                                   columns = rating_matrix.columns)

print(ratings_pred_matrix.shape)
ratings_pred_matrix.head(15)

(4188, 2366)


product_id,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,(1+1) 소프트 암막커튼 6color 3size,(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors,(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형,(1+1)푸벨드마망 벽걸이 휴지통 7L/10L,(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정,(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정,...,화이트 워싱면 파티션 가리개커튼,화이트 자개모빌 만들기 재료 DIY 키트 패키지,화이트 접착식 조리 도구 걸이 8p,화이트골드 앤틱 촛대,화이트모던 테슬포인트 소형러그,"휴대용 블루투스 스피커 RETRO20W (라디오,USB)",휴대용 원목 우드 보풀제거기1+1,휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R,흔들리는 촛불 LED 무빙 캔들 3size,흡착랙 욕실수납용품 모음전 18종 택1
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.765622,4.810083,4.57435,4.67041,4.553495,4.764897,4.572814,4.595425,4.697394,4.554268,...,4.544758,4.582372,4.671399,4.764206,4.657717,4.77407,4.632157,4.67746,4.731418,4.401401
2,4.706944,4.830039,4.693312,4.575071,4.793403,4.838593,4.711044,4.656394,4.775946,4.568698,...,4.579984,4.635161,4.76303,4.675874,4.670192,4.919723,4.753495,4.58627,4.700324,4.412033
3,4.748222,4.821881,4.768862,4.81322,4.705249,4.808381,4.735151,4.664378,4.795832,4.623008,...,4.885258,4.703198,4.74297,4.759407,4.677225,4.948386,4.832046,4.839995,4.695963,4.602007
4,5.006503,5.051407,4.800377,4.863074,4.855048,5.007912,4.840953,4.753305,4.876544,4.819357,...,4.681678,4.844595,4.83373,4.993526,4.864562,5.084572,4.979229,4.892758,4.883219,4.731924
5,4.871037,4.977518,4.783695,4.867459,4.827938,4.962573,4.899101,4.634167,4.890823,4.906428,...,4.852905,4.778926,4.720503,4.881021,4.848825,5.090849,4.822062,4.979242,5.09635,4.847206
6,4.934293,4.984801,4.768379,4.769757,4.832027,4.943806,4.829824,4.722771,4.860102,4.70423,...,4.833295,4.791144,4.782623,4.924561,4.823612,5.099787,4.870413,4.898216,4.974542,4.696817
7,4.934621,4.918254,4.720199,4.708074,4.729832,4.817699,4.738152,4.67747,4.753213,4.685362,...,4.787832,4.702256,4.855839,4.738111,4.800365,4.98678,4.7644,4.815041,4.758306,4.578471
8,4.869292,4.98166,4.761702,4.801576,4.761362,4.937628,4.845389,4.773456,4.85034,4.839876,...,4.818667,4.775195,4.649353,4.913618,4.767126,5.082089,4.775053,4.883099,4.813694,4.713574
9,4.949836,4.986261,4.754135,4.811191,4.795989,4.95846,4.797069,4.739071,4.842807,4.693607,...,4.682068,4.821975,4.84013,4.942183,4.822233,5.022447,4.929012,4.889989,4.85785,4.664727
10,4.973143,4.999925,4.818022,4.827886,4.773338,4.925501,4.859197,4.863854,4.875108,4.825318,...,4.7146,4.805184,4.77387,4.85842,4.813638,5.076989,4.876437,4.881818,4.867032,4.591578


In [446]:
# 원본 행렬 확인
print(rating_matrix.shape)
rating_matrix.head(15)

(4188, 2366)


product_id,(1+1) 냉기우풍차단 3중직 베이직 암막커튼 큰창 12colors,(1+1) 소프트 암막커튼 6color 3size,(1+1) 솔리드 방한 암막커튼 핀형/아일렛형 (작은창/긴창/대형),(1+1) 어썸 화이트 암막커튼(창문/소형/대형/특대형) 핀형&아일렛 택1,(1+1) 트루디 암막 아일렛 창형/중형/대형커튼,(1+1) 호텔식 더뷰 암막커튼 (창문형/긴창형) 9colors,(1+1) 호텔식 화이트 쉬폰 커튼 130x230 - 아일렛형,(1+1)푸벨드마망 벽걸이 휴지통 7L/10L,(3+1) 천연디퓨저 200ml 18가지 향기 드라이플라워 증정,(3+1) 천연디퓨저 500ml 18가지 향기 드라이플라워 증정,...,화이트 워싱면 파티션 가리개커튼,화이트 자개모빌 만들기 재료 DIY 키트 패키지,화이트 접착식 조리 도구 걸이 8p,화이트골드 앤틱 촛대,화이트모던 테슬포인트 소형러그,"휴대용 블루투스 스피커 RETRO20W (라디오,USB)",휴대용 원목 우드 보풀제거기1+1,휴대용 포토프린터 즉석카메라 미니샷 2 레트로 C210R,흔들리는 촛불 LED 무빙 캔들 3size,흡착랙 욕실수납용품 모음전 18종 택1
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 3.3 사용자에게 아직 구매하지 않은 상품을 예측 평점 높은 순으로 추천해주기


In [447]:
def get_notbuy_products(ratings_matrix, userId):
    # userId로 입력받은 사용자의 모든 상품 정보 추출하여 Series로 반환함. 
    # 반환된 user_rating 은 영화명(title)을 index로 가지는 Series 객체임. 
    user_rating = rating_matrix.loc[userId,:]
    
    # user_rating이 0보다 크면 기존에 구매한 상품임. 대상 index를 추출하여 list 객체로 만듬
    already_buy = user_rating[ user_rating > 0].index.tolist()
    
    # 모든 상품명을 list 객체로 만듬. 
    product_list = rating_matrix.columns.tolist()
    
    # list comprehension으로 already_buy 해당하는 product는 product_list 제외함. 
    notbuy_list = [product for product in product_list if product not in already_buy]
    
    return notbuy_list

In [448]:
def recomm_product_by_userid(pred_df, userId, notbuy_list, top_n=10):
    # 예측 평점 DataFrame에서 사용자id index와 notbuy_list로 들어온 상품명 컬럼을 추출하여
    # 가장 예측 평점이 높은 순으로 정렬함. 
    recomm_products = pred_df.loc[userId, notbuy_list].sort_values(ascending=False)[:top_n]
    return recomm_products

In [668]:
# 사용자가 구매하지 않은 상품명 추출   
notbuy_list = get_notbuy_products(rating_matrix, 3283)

# 잠재 요인 협업 필터링으로 상품 추천 
recomm_products = recomm_product_by_userid(ratings_pred_matrix, 3283, notbuy_list, top_n=10)

# 평점 데이타를 DataFrame으로 생성. 
recomm_products = pd.DataFrame(data=recomm_products.values,index=recomm_products.index,columns=['pred_score'])
recomm_products

Unnamed: 0_level_0,pred_score
product_id,Unnamed: 1_level_1
시나몬스틱 인테리어 소품,1.781537
우드코스터 캔들트레이,1.603366
레이스 코스터 매트,1.450415
감성도자기 양념통S 4colors,1.340283
구름초 4color,1.108178
에스프레소 커피머신기 CE1000 3colors,0.986468
라탄 코스터 매트 3type,0.939818
큐브 캔들 오브제캔들 필라 디자인 양초,0.804503
헤이데이 디저트 플레이트 (3color),0.677742
와인잔 캔들 / 악세사리 홀더,0.651267


#### MF(행렬 분해) 기반의 잠재 요인 협업필터링으로 상품을 추천해주었다.

In [630]:
zz= pd.merge(recomm_products, products.loc[:,["product_id","Thema_G", "Product_Url"]], on="product_id")

In [634]:
pd.merge(recomm_products, products.loc[:,["product_id","Thema_G","대분류","대분류_1","소분류"]], on="product_id")


Unnamed: 0,product_id,pred_score,Thema_G,대분류,대분류_1,소분류
0,시나몬스틱 인테리어 소품,1.781537,"['Europe', 'Vintage', 'Natural', 'Antique']",홈데코,조명,향
1,우드코스터 캔들트레이,1.603366,"['Europe', 'Vintage', 'Natural']",홈데코,조명,캔들홀더
2,레이스 코스터 매트,1.450415,"['Vintage', 'Natural']",홈데코,조명,기타장식용품
3,감성도자기 양념통S 4colors,1.340283,['Antique'],주방,,양념통
4,구름초 4color,1.108178,"['Romantic', 'Antique']",홈데코,조명,캔들
5,에스프레소 커피머신기 CE1000 3colors,0.986468,['Europe'],가전,,커피메이커
6,라탄 코스터 매트 3type,0.939818,['Vintage'],홈데코,조명,트레이
7,큐브 캔들 오브제캔들 필라 디자인 양초,0.804503,['Modern'],홈데코,조명,캔들
8,헤이데이 디저트 플레이트 (3color),0.677742,['Natural'],주방,,접시
9,와인잔 캔들 / 악세사리 홀더,0.651267,['Natural'],홈데코,조명,캔들홀더


In [635]:
### zz.to_excel("houseid_210130.xlsx")