# 플레이리스트에 어울리는 곡 예측하기

* 목표 : 플레이리스트에 수록된 곡과 태그의 절반 또는 전부가 숨겨져 있을 때 주어지지 않은 곡들과 태그를 예측
* 기대 효과 : 예측 모델을 만든다면, 플레이리스트에 들어있는 곡이 주어졌을 때 이 모델이 해당 플레이리스트와 어울리는 곡들을 추천

## 사용 환경

* OS : Windows 10
* 언어 : Python3
* tensorflow gpu 2.0, Ram 32gb, GTX 1050 Ti

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow.keras as kr
from tqdm.notebook import tqdm
from tensorflow.keras.models import load_model
from sklearn.model_selection import train_test_split
import json

## 데이터셋 구성

* 플레이리스트 메타 데이터 (train, val)
    - 플레이리스트 제목
    - 플레이리스트에 수록된 곡
    - 플레이리스트에 달려있는 태그 목록
    - 플레이리스트 좋아요 수
    - 플레이리스트가 최종 수정된 시각
* 곡 메타데이터 (song_meta)
    - 곡 제목
    - 앨범 제목
    - 아티스트명
    - 장르
    - 발매일

In [2]:
song_meta = pd.read_json("../data/song/song_meta.json")
train = pd.read_json("../data/song/train.json")
val = pd.read_json("../data/song/val.json")

## 모델 적용 및 훈련 과정
- 예측 모델로 추천 시스템에 많이 사용되는 오토인코더를 사용한다.
- 오토인코더는 입력값과 유사한 출력값을 뽑아낼 수 있다.
- train_data를 입력하면 출력값으로 train_data와 유사한 플레이리스트들을 뽑아 낼 수 있을 것이라 예측한다.

## train_data, val_data
- 최종 데이터 형태는 오토인코더 모델이 인식할 수 있게 행렬 형태로 변환해야 한다. 
- 행 : 플레이리스트 id, 열 : 노래 id 로 구성되어 있는 행렬 형태로 해당하는 플레이리스트에 해당하는 노래가 들어있으면 가중치를 부여하고, 들어있지 않으면 0을 부여한다.
- 가중치는 언급된 플레이리스트 좋아요 수를 합산한 값이다.

## 데이터 전처리
* train(val)의 수록된 곡, 좋아요 수를 이용하여 행렬로 변환하기 좋은 형태로 만든다.

In [3]:
train_data = train[['id', 'songs','like_cnt']]
train_data_unnest = np.dstack((np.repeat(train_data.id.values, list(map(len, train_data.songs))),
                               np.concatenate(train_data.songs.values),
                               np.repeat(train_data.like_cnt.values, list(map(len, train_data.songs)))))
train_data = pd.DataFrame(data = train_data_unnest[0], columns = train_data.columns)
train_data['id'] = train_data['id'].astype(int)
train_data['songs'] = train_data['songs'].astype(int)
train_data['like_cnt'] = train_data['like_cnt'].astype(int)
train_data.columns = ['listid','songid', 'likecnt']

del train_data_unnest
train_data

Unnamed: 0,listid,songid,likecnt
0,61281,525514,71
1,61281,129701,71
2,61281,383374,71
3,61281,562083,71
4,61281,297861,71
...,...,...,...
5285866,100389,111365,17
5285867,100389,51373,17
5285868,100389,640239,17
5285869,100389,13759,17


In [4]:
val_data = val[['id', 'songs', 'like_cnt']]
val_data_unnest = np.dstack((np.repeat(val_data.id.values, list(map(len, val_data.songs))),
                             np.concatenate(val_data.songs.values),
                             np.repeat(val_data.like_cnt.values, list(map(len, val_data.songs)))))
val_data = pd.DataFrame(data = val_data_unnest[0], columns = val_data.columns)
val_data['id'] = val_data['id'].astype(int)
val_data['songs'] = val_data['songs'].astype(int)
val_data['like_cnt'] = val_data['like_cnt'].astype(int)
val_data.columns = ['listid','songid', 'likecnt']

del val_data_unnest
val_data

Unnamed: 0,listid,songid,likecnt
0,118598,373313,1675
1,118598,151080,1675
2,118598,275346,1675
3,118598,696876,1675
4,118598,165237,1675
...,...,...,...
421194,65189,193899,19
421195,65189,398886,19
421196,65189,234875,19
421197,65189,243850,19


* 데이터 용량을 줄이기 위해 train과 val에 속한 노래 중 4000개 가량의 노래만 입력값과 출력값에 들어갈 수 있도록 전처리 한다.

In [5]:
playlist_map = pd.concat([train_data, val_data], ignore_index=True)
playlist_map

Unnamed: 0,listid,songid,likecnt
0,61281,525514,71
1,61281,129701,71
2,61281,383374,71
3,61281,562083,71
4,61281,297861,71
...,...,...,...
5707065,65189,193899,19
5707066,65189,398886,19
5707067,65189,234875,19
5707068,65189,243850,19


* 좋아요 수 합산 40000개 이상의 노래 개수는 3925개이다.

In [6]:
like_count = playlist_map.groupby('songid').sum()['likecnt']
(like_count.sort_values(ascending = False) > 40000).sum()

3925

In [7]:
like_count_40000 = pd.DataFrame(like_count)
like_count_40000.columns = ['likecount']
like_count_40000 = like_count_40000[like_count_40000['likecount'] >= 40000].sort_values(ascending = False, by = 'likecount')
like_count_40000.reset_index(inplace = True)
like_count_40000['songnewid'] = range(len(like_count_40000))
like_count_40000

Unnamed: 0,songid,likecount,songnewid
0,366786,352819,0
1,133143,277721,1
2,625875,277206,2
3,610933,277176,3
4,580074,272594,4
...,...,...,...
3920,189218,40019,3920
3921,315691,40019,3921
3922,138295,40016,3922
3923,153225,40015,3923


* 좋아요 수 합산 40000개 이상의 노래의 곡 정보

In [8]:
song_meta_40000 = song_meta[song_meta['id'].isin(like_count_40000['songid'])]
song_meta_40000.rename(columns={'id': 'songid'}, inplace=True)
song_meta_40000 = pd.merge(song_meta_40000, like_count_40000, on = 'songid', how = 'left')
song_meta_40000

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,song_gn_dtl_gnr_basket,issue_date,album_name,album_id,artist_id_basket,song_name,song_gn_gnr_basket,artist_name_basket,songid,likecount,songnewid
0,[GN0101],20060830,Lucky 7,325657,[100148],잠시 길을 잃다 (Feat. 신보경),[GN0100],[015B],62,53440,2378
1,"[GN1304, GN1301]",20110810,Simple Steps,2058451,[614091],Rollercoaster,[GN1300],[Sam Ock],71,66252,1502
2,"[GN0105, GN0101]",20130404,The 3rd Album Part.2 `Love Blossom (러브블러썸)`,2180379,[175139],Love Blossom (러브블러썸),[GN0100],[케이윌],250,130794,246
3,"[GN0401, GN0403]",20181122,Sun And Moon,10215272,[757273],그 여름밤,[GN0400],[샘김 (Sam Kim)],289,40922,3767
4,"[GN0401, GN0403, GN0402]",20151012,No Make Up,2644221,[537920],No Make Up,[GN0400],[Zion.T],311,81350,951
...,...,...,...,...,...,...,...,...,...,...,...
3920,"[GN0105, GN0101]",20110311,잠 못 드는 밤에,1199953,[4699],잠 못 드는 밤에,[GN0100],[문명진],707564,85700,852
3921,[GN0901],20150904,WILD,2638374,[780919],FOOLS,[GN0900],[Troye Sivan],707573,106095,501
3922,"[GN0104, GN0101]",20080819,김범수 6집,393851,[6502],슬픔활용법,[GN0100],[김범수],707621,86021,845
3923,"[GN0105, GN1501, GN0101, GN1504]",20160510,또 오해영 OST Part.2,2683798,[655051],꿈처럼,"[GN1500, GN0100]",[벤],707724,77713,1057


In [9]:
playlist_map_40000 = playlist_map[playlist_map['songid'].isin(like_count_40000['songid'])]
playlist_map_40000

Unnamed: 0,listid,songid,likecnt
21,10532,497066,1
28,10532,532114,1
29,10532,586541,1
43,10532,6546,1
44,10532,152422,1
...,...,...,...
5707062,65189,581789,19
5707064,65189,701557,19
5707065,65189,193899,19
5707067,65189,234875,19


In [10]:
playlist_map_40000_newid = pd.merge(playlist_map_40000, like_count_40000, on = 'songid', how = 'left')
del playlist_map_40000_newid['songid']
del playlist_map_40000_newid['likecnt']
playlist_map_40000_newid

Unnamed: 0,listid,likecount,songnewid
0,10532,207995,26
1,10532,152699,125
2,10532,72717,1233
3,10532,130333,251
4,10532,103515,537
...,...,...,...
1357565,65189,68308,1399
1357566,65189,130391,250
1357567,65189,43095,3482
1357568,65189,49214,2766


In [11]:
num_song = playlist_map_40000_newid.songnewid.nunique()
print(num_song)

3925


In [12]:
playlist_map_40000_newid_train = playlist_map_40000_newid[
    playlist_map_40000_newid['listid'].isin(train['id'])]
playlist_map_40000_newid_train

Unnamed: 0,listid,likecount,songnewid
0,10532,207995,26
1,10532,152699,125
2,10532,72717,1233
3,10532,130333,251
4,10532,103515,537
...,...,...,...
1257927,100389,87551,805
1257928,100389,61719,1769
1257929,100389,53922,2324
1257930,100389,103754,533


In [13]:
playlist_map_40000_newid_val = playlist_map_40000_newid[playlist_map_40000_newid['listid'].isin(val['id'])]
playlist_map_40000_newid_val

Unnamed: 0,listid,likecount,songnewid
1257932,118598,55344,2182
1257933,118598,124551,307
1257934,118598,42297,3579
1257935,118598,51954,2516
1257936,45144,55058,2207
...,...,...,...
1357565,65189,68308,1399
1357566,65189,130391,250
1357567,65189,43095,3482
1357568,65189,49214,2766


* train과 val의 플레이리스트 갯수

In [14]:
num_user_train = playlist_map_40000_newid_train.listid.nunique()
num_user_val = playlist_map_40000_newid_val.listid.nunique()
print(num_user_train)
print(num_user_val)

92195
13570


* 가중치를 로그 변환하여 가중치가 지나치게 과대평가 되지 않도록 조정해준다.

In [15]:
playlist_map_40000_newid_trainlog = playlist_map_40000_newid_train
playlist_map_40000_newid_trainlog['likecount'] = np.log(playlist_map_40000_newid_train['likecount'])
playlist_map_40000_newid_trainlog

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,listid,likecount,songnewid
0,10532,12.245269,26
1,10532,11.936224,125
2,10532,11.194330,1233
3,10532,11.777848,251
4,10532,11.547472,537
...,...,...,...
1257927,100389,11.379977,805
1257928,100389,11.030347,1769
1257929,100389,10.895294,2324
1257930,100389,11.549778,533


In [16]:
playlist_map_40000_newid_vallog = playlist_map_40000_newid_val
playlist_map_40000_newid_vallog['likecount'] = np.log(playlist_map_40000_newid_val['likecount'])
playlist_map_40000_newid_vallog

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,listid,likecount,songnewid
1257932,118598,10.921324,2182
1257933,118598,11.732471,307
1257934,118598,10.652471,3579
1257935,118598,10.858114,2516
1257936,45144,10.916142,2207
...,...,...,...
1357565,65189,11.131782,1399
1357566,65189,11.778293,250
1357567,65189,10.671162,3482
1357568,65189,10.803933,2766


* csv 파일로 저장

In [17]:
playlist_map_40000_newid_trainlog.to_csv('../data/song/playlist_map_40000_newid_trainlog.csv')
playlist_map_40000_newid_vallog.to_csv('../data/song/playlist_map_40000_newid_vallog.csv')

In [18]:
playlist_map_40000_newid_trainlog = pd.read_csv('../data/song/playlist_map_40000_newid_trainlog.csv')
playlist_map_40000_newid_vallog = pd.read_csv('../data/song/playlist_map_40000_newid_vallog.csv')

In [19]:
song_meta_40000.to_csv('../data/song/song_meta_40000.csv')

## 행렬 변환
* 전처리한 train, val 데이터를 모델에 맞게 행렬 형태로 변환해준다.

In [20]:
user_dict_train = {}
song_dict_train = {}
n_mapped_user_train = 0
n_mapped_song_train = 0
R_train = np.zeros((num_user_train, num_song))
for index, row in tqdm(playlist_map_40000_newid_trainlog.iterrows()):
  if row.listid in user_dict_train:
    user_cur = user_dict_train[row.listid]
  else:
    user_cur = n_mapped_user_train
    n_mapped_user_train += 1
    user_dict_train[row.listid] = user_cur
  if row.songnewid in song_dict_train:
    song_cur = song_dict_train[row.songnewid]
  else:
    song_cur = n_mapped_song_train
    n_mapped_song_train += 1
    song_dict_train[row.songnewid] = song_cur
  R_train[user_cur, song_cur] = row.likecount

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




In [21]:
user_dict_val = {}
song_dict_val = {}
n_mapped_user_val = 0
n_mapped_song_val = 0
R_val = np.zeros((num_user_val, num_song))
for index, row in tqdm(playlist_map_40000_newid_vallog.iterrows()):
  if row.listid in user_dict_val:
    user_cur = user_dict_val[row.listid]
  else:
    user_cur = n_mapped_user_val
    n_mapped_user_val += 1
    user_dict_val[row.listid] = user_cur
  if row.songnewid in song_dict_val:
    song_cur = song_dict_val[row.songnewid]
  else:
    song_cur = n_mapped_song_val
    n_mapped_song_val += 1
    song_dict_val[row.songnewid] = song_cur
  R_val[user_cur, song_cur] = row.likecount

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))




In [22]:
R_train_tr, R_train_te = train_test_split(R_train, test_size = 0.1)

# 오토인코더 모델
* 추천시스템에 많이 사용되는 오토인코더 모델을 사용하여 train_data를 입력하면 출력값으로 train_data와 유사한 플레이리스트들을 뽑아 낼 수 있다.

In [23]:
input_ = kr.layers.Input(shape=R_train.shape[1:])
hidden1 = kr.layers.Dense(600, activation='relu', kernel_regularizer=kr.regularizers.l2(0.001))(input_)
hidden2 = kr.layers.Dense(128, activation='relu', kernel_regularizer=kr.regularizers.l2(0.001))(hidden1)
hidden3 = kr.layers.Dense(600, activation='relu', kernel_regularizer=kr.regularizers.l2(0.001))(hidden2)
output_ = kr.layers.Dense(R_train.shape[1], kernel_regularizer=kr.regularizers.l2(0.01))(hidden3)
model = kr.Model(inputs=[input_], outputs=[output_])

In [24]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])

In [25]:
model.fit(R_train_tr, R_train_tr, validation_data=(R_train_te, R_train_te), epochs=35)

Epoch 1/35
Epoch 2/35
Epoch 3/35
Epoch 4/35
Epoch 5/35
Epoch 6/35
Epoch 7/35
Epoch 8/35
Epoch 9/35
Epoch 10/35
Epoch 11/35
Epoch 12/35
Epoch 13/35
Epoch 14/35
Epoch 15/35
Epoch 16/35
Epoch 17/35
Epoch 18/35
Epoch 19/35
Epoch 20/35
Epoch 21/35
Epoch 22/35
Epoch 23/35
Epoch 24/35
Epoch 25/35
Epoch 26/35
Epoch 27/35
Epoch 28/35
Epoch 29/35
Epoch 30/35
Epoch 31/35
Epoch 32/35
Epoch 33/35
Epoch 34/35
Epoch 35/35


<tensorflow.python.keras.callbacks.History at 0x1f8f04e8308>

In [26]:
model.save('../data/song/simple_40000b.h5')

In [27]:
model = load_model('../data/song/simple_40000b.h5')

## 결과 파일 생성
* 카카오 아레나에 올라온 Word2vec 태그 예측 결과와 오토인코더 모델의 곡 예측 결과를 합친다.

In [28]:
a = pd.DataFrame(data=model.predict(R_val))
a

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,3915,3916,3917,3918,3919,3920,3921,3922,3923,3924
0,-2329.055176,-2070.655518,1637.715698,2836.367676,-2308.298584,-1357.481934,1214.007080,1067.177368,1548.320679,1256.416748,...,-281.558594,-291.947662,-226.569427,-193.289062,-195.692200,-190.954361,-243.907806,-210.628845,-195.814102,-335.582855
1,-3288.089844,-2923.331787,2311.147705,4002.981445,-3258.803467,-1916.803467,1713.090942,1505.789551,2184.981689,1772.911133,...,-398.544128,-413.210114,-320.898376,-274.014130,-277.399963,-270.632355,-345.389618,-298.449707,-277.563354,-474.836517
2,-187.604904,-167.030136,125.591293,219.521301,-185.921860,-111.616714,92.193146,80.691353,118.724106,95.703072,...,-28.467682,-28.903318,-23.644405,-21.837975,-21.394806,-20.813526,-25.586050,-22.491943,-21.467196,-32.387695
3,-790.846802,-703.681885,545.682617,948.617920,-783.764587,-464.590240,403.033325,353.557251,515.790405,417.475159,...,-105.691010,-108.710480,-86.507935,-76.553230,-76.446281,-74.394470,-93.079559,-81.340561,-76.584862,-123.507866
4,-920.783508,-818.612610,647.410339,1121.153687,-912.593079,-536.802551,479.931549,421.950134,612.089172,496.700378,...,-111.709412,-115.777618,-89.890587,-76.888870,-77.697235,-75.784676,-96.893753,-83.593895,-77.754074,-132.999222
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13565,-11196.586914,-9954.591797,7870.080078,13631.464844,-11096.796875,-6526.690918,5833.457520,5127.274902,7440.307129,6037.100586,...,-1356.057373,-1406.144897,-1091.938477,-931.758301,-943.815674,-920.844971,-1174.793091,-1015.507446,-944.339844,-1616.128174
13566,-10392.353516,-9239.426758,7305.397461,12653.396484,-10299.750000,-6057.590332,5415.009766,4759.559570,6906.680664,5604.114746,...,-1257.611694,-1304.144897,-1012.528870,-863.674255,-874.912537,-853.745117,-1089.390625,-941.487000,-875.407227,-1498.993042
13567,-3069.885986,-2729.335449,2157.497314,3736.732422,-3042.531494,-1789.836914,1599.178711,1405.650757,2039.697998,1655.003906,...,-372.910492,-386.640411,-300.392212,-256.757019,-259.839355,-253.404144,-323.334961,-279.473206,-260.006439,-444.166138
13568,-9712.972656,-8634.917969,6832.749512,11832.449219,-9626.228516,-5659.907227,5065.178223,4452.563477,6459.890137,5242.111816,...,-1170.478271,-1214.574585,-941.779480,-801.902222,-812.814575,-793.220764,-1013.363342,-875.191956,-813.333374,-1396.525269


In [29]:
b = song_meta_40000.sort_values('songnewid')
b.reset_index(inplace = True)
b

Unnamed: 0,index,song_gn_dtl_gnr_basket,issue_date,album_name,album_id,artist_id_basket,song_name,song_gn_gnr_basket,artist_name_basket,songid,likecount,songnewid
0,2019,"[GN0805, GN0509, GN0502, GN0801, GN0501]",20101007,가을방학,1035872,[437760],가끔 미치도록 네가 안고 싶어질 때가 있어,"[GN0500, GN0800]",[가을방학],366786,352819,0
1,746,"[GN0805, GN0501, GN0502, GN0801, GN0509]",20111010,Unplugged,2018706,[192827],"그대와 나, 설레임 (Feat. 소울맨)","[GN0500, GN0800]",[어쿠스틱 콜라보],133143,277721,1
2,3463,[GN0901],20140527,In The Lonely Hour,2258028,[718042],I`m Not The Only One,[GN0900],[Sam Smith],625875,277206,2
3,3365,"[GN0805, GN0501, GN0502, GN0801, GN0509]",20100914,원모어찬스,1023955,[472980],널 생각해,"[GN0500, GN0800]",[원 모어 찬스 (one more chance)],610933,277176,3
4,3224,"[GN0101, GN0103]",19990100,A Night In Seoul,5422,[6017],여전히 아름다운지,[GN0100],[김연우],580074,272594,4
...,...,...,...,...,...,...,...,...,...,...,...,...
3920,1051,"[GN0205, GN0201]",20121101,If You Love Me (Feat. 박재범),2165341,[433746],If You Love Me (Feat. 박재범),[GN0200],[NS 윤지],189218,40019,3920
3921,1753,"[GN0901, GN0902, GN1001]",20131115,The Collection,2216981,[33498],Stop This Train,"[GN0900, GN1000]",[John Mayer],315691,40019,3921
3922,772,"[GN0303, GN0301]",20051004,Swan Songs,308313,[108356],Fly (Feat. Amin. J of Soulciety),[GN0300],[에픽하이 (EPIK HIGH)],138295,40016,3922
3923,856,"[GN0104, GN0101]",20050506,오월지련,302491,[1741],같은 생각,[GN0100],[신혜성],153225,40015,3923


In [30]:
c = b['songid']
a.columns = c.tolist()
a

Unnamed: 0,366786,133143,625875,610933,580074,116573,654757,140867,173943,207558,...,567046,238116,265779,382240,276555,189218,315691,138295,153225,334675
0,-2329.055176,-2070.655518,1637.715698,2836.367676,-2308.298584,-1357.481934,1214.007080,1067.177368,1548.320679,1256.416748,...,-281.558594,-291.947662,-226.569427,-193.289062,-195.692200,-190.954361,-243.907806,-210.628845,-195.814102,-335.582855
1,-3288.089844,-2923.331787,2311.147705,4002.981445,-3258.803467,-1916.803467,1713.090942,1505.789551,2184.981689,1772.911133,...,-398.544128,-413.210114,-320.898376,-274.014130,-277.399963,-270.632355,-345.389618,-298.449707,-277.563354,-474.836517
2,-187.604904,-167.030136,125.591293,219.521301,-185.921860,-111.616714,92.193146,80.691353,118.724106,95.703072,...,-28.467682,-28.903318,-23.644405,-21.837975,-21.394806,-20.813526,-25.586050,-22.491943,-21.467196,-32.387695
3,-790.846802,-703.681885,545.682617,948.617920,-783.764587,-464.590240,403.033325,353.557251,515.790405,417.475159,...,-105.691010,-108.710480,-86.507935,-76.553230,-76.446281,-74.394470,-93.079559,-81.340561,-76.584862,-123.507866
4,-920.783508,-818.612610,647.410339,1121.153687,-912.593079,-536.802551,479.931549,421.950134,612.089172,496.700378,...,-111.709412,-115.777618,-89.890587,-76.888870,-77.697235,-75.784676,-96.893753,-83.593895,-77.754074,-132.999222
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13565,-11196.586914,-9954.591797,7870.080078,13631.464844,-11096.796875,-6526.690918,5833.457520,5127.274902,7440.307129,6037.100586,...,-1356.057373,-1406.144897,-1091.938477,-931.758301,-943.815674,-920.844971,-1174.793091,-1015.507446,-944.339844,-1616.128174
13566,-10392.353516,-9239.426758,7305.397461,12653.396484,-10299.750000,-6057.590332,5415.009766,4759.559570,6906.680664,5604.114746,...,-1257.611694,-1304.144897,-1012.528870,-863.674255,-874.912537,-853.745117,-1089.390625,-941.487000,-875.407227,-1498.993042
13567,-3069.885986,-2729.335449,2157.497314,3736.732422,-3042.531494,-1789.836914,1599.178711,1405.650757,2039.697998,1655.003906,...,-372.910492,-386.640411,-300.392212,-256.757019,-259.839355,-253.404144,-323.334961,-279.473206,-260.006439,-444.166138
13568,-9712.972656,-8634.917969,6832.749512,11832.449219,-9626.228516,-5659.907227,5065.178223,4452.563477,6459.890137,5242.111816,...,-1170.478271,-1214.574585,-941.779480,-801.902222,-812.814575,-793.220764,-1013.363342,-875.191956,-813.333374,-1396.525269


In [31]:
d = playlist_map_40000_newid_vallog['listid'].unique()
d

array([118598,  45144, 127575, ..., 122127,  77438,  65189], dtype=int64)

In [32]:
e = []
for i in tqdm(range(len(a))):
    f = ((a.iloc[i, :].sort_values(ascending = False)).index)[:100]
    f = f.tolist()
    e.append(f)

HBox(children=(FloatProgress(value=0.0, max=13570.0), HTML(value='')))




In [33]:
g = pd.DataFrame({'id' : d, 'songs' : e})
g

Unnamed: 0,id,songs
0,118598,"[538200, 539525, 85278, 543751, 373037, 657888..."
1,45144,"[538200, 539525, 85278, 543751, 373037, 657888..."
2,127575,"[538200, 539525, 85278, 543751, 373037, 657888..."
3,53131,"[538200, 539525, 85278, 543751, 373037, 657888..."
4,44037,"[538200, 539525, 85278, 543751, 373037, 657888..."
...,...,...
13565,17766,"[538200, 539525, 85278, 543751, 373037, 657888..."
13566,101722,"[538200, 539525, 85278, 543751, 373037, 657888..."
13567,122127,"[538200, 539525, 85278, 543751, 373037, 657888..."
13568,77438,"[538200, 539525, 85278, 543751, 373037, 657888..."


In [34]:
result = pd.read_json("../data/song/results.json")
result

Unnamed: 0,id,songs,tags
0,118598,"[207912, 623047, 703323, 422438, 638488, 32221...","[OST, 디즈니, 애니메이션, 영화, 기분전환, 추억, 디즈니OST, 휴식, 힐링..."
1,131447,"[144663, 116573, 357367, 366786, 654757, 13314...","[기분전환, 감성, 휴식, 발라드, 잔잔한, 드라이브, 힐링, 사랑, 새벽, 밤]"
2,51464,"[291080, 193610, 270647, 29532, 500248, 572932...","[발라드, 슬픔, 이별, 추억, 회상, 사랑, 밤, 설렘, 새벽, 잔잔한]"
3,45144,"[144663, 367963, 357367, 351888, 576186, 47875...","[감성, 발라드, 사랑, 이별, 잔잔한, 기분전환, 새벽, 가을, 카페, 인디]"
4,79929,"[412769, 211220, 445299, 70314, 106129, 623728...","[CCM, 찬양, 은혜, 예배, 기도, 국내ccm, 교회, 사랑, 복음성가, 찬송]"
...,...,...,...
23010,101722,"[116573, 473514, 13142, 366786, 339802, 281936...","[새벽, 밤, 추억, 힐링, 휴식, 회상, 이별, 슬픔, 발라드, 가을]"
23011,122127,"[352228, 48209, 138932, 630552, 473514, 4173, ...","[추억, 회상, 사랑, 힐링, 엄마, 부모님, 휴식, 잔잔한, 발라드, 설렘]"
23012,77438,"[274504, 140867, 679436, 493762, 21125, 360062...","[팝, 팝송, Pop, 기분전환, 드라이브, 카페, 휴식, 감성, 잔잔한, 힐링]"
23013,36231,"[548041, 50031, 699175, 46497, 236711, 198144,...","[클래식, 힐링, 피아노, 휴식, 조성진, 잔잔한, 쇼팽, 키즈클래식, 아기클래식,..."


In [35]:
final = pd.merge(result, g, on = 'id', how = 'left')
final

Unnamed: 0,id,songs_x,tags,songs_y
0,118598,"[207912, 623047, 703323, 422438, 638488, 32221...","[OST, 디즈니, 애니메이션, 영화, 기분전환, 추억, 디즈니OST, 휴식, 힐링...","[538200, 539525, 85278, 543751, 373037, 657888..."
1,131447,"[144663, 116573, 357367, 366786, 654757, 13314...","[기분전환, 감성, 휴식, 발라드, 잔잔한, 드라이브, 힐링, 사랑, 새벽, 밤]",
2,51464,"[291080, 193610, 270647, 29532, 500248, 572932...","[발라드, 슬픔, 이별, 추억, 회상, 사랑, 밤, 설렘, 새벽, 잔잔한]",
3,45144,"[144663, 367963, 357367, 351888, 576186, 47875...","[감성, 발라드, 사랑, 이별, 잔잔한, 기분전환, 새벽, 가을, 카페, 인디]","[538200, 539525, 85278, 543751, 373037, 657888..."
4,79929,"[412769, 211220, 445299, 70314, 106129, 623728...","[CCM, 찬양, 은혜, 예배, 기도, 국내ccm, 교회, 사랑, 복음성가, 찬송]",
...,...,...,...,...
23010,101722,"[116573, 473514, 13142, 366786, 339802, 281936...","[새벽, 밤, 추억, 힐링, 휴식, 회상, 이별, 슬픔, 발라드, 가을]","[538200, 539525, 85278, 543751, 373037, 657888..."
23011,122127,"[352228, 48209, 138932, 630552, 473514, 4173, ...","[추억, 회상, 사랑, 힐링, 엄마, 부모님, 휴식, 잔잔한, 발라드, 설렘]","[538200, 539525, 85278, 543751, 373037, 657888..."
23012,77438,"[274504, 140867, 679436, 493762, 21125, 360062...","[팝, 팝송, Pop, 기분전환, 드라이브, 카페, 휴식, 감성, 잔잔한, 힐링]","[538200, 539525, 85278, 543751, 373037, 657888..."
23013,36231,"[548041, 50031, 699175, 46497, 236711, 198144,...","[클래식, 힐링, 피아노, 휴식, 조성진, 잔잔한, 쇼팽, 키즈클래식, 아기클래식,...",


In [36]:
final['songs'] = np.where(pd.notnull(final['songs_y']) == True, final['songs_y'], final['songs_x'])
final

Unnamed: 0,id,songs_x,tags,songs_y,songs
0,118598,"[207912, 623047, 703323, 422438, 638488, 32221...","[OST, 디즈니, 애니메이션, 영화, 기분전환, 추억, 디즈니OST, 휴식, 힐링...","[538200, 539525, 85278, 543751, 373037, 657888...","[538200, 539525, 85278, 543751, 373037, 657888..."
1,131447,"[144663, 116573, 357367, 366786, 654757, 13314...","[기분전환, 감성, 휴식, 발라드, 잔잔한, 드라이브, 힐링, 사랑, 새벽, 밤]",,"[144663, 116573, 357367, 366786, 654757, 13314..."
2,51464,"[291080, 193610, 270647, 29532, 500248, 572932...","[발라드, 슬픔, 이별, 추억, 회상, 사랑, 밤, 설렘, 새벽, 잔잔한]",,"[291080, 193610, 270647, 29532, 500248, 572932..."
3,45144,"[144663, 367963, 357367, 351888, 576186, 47875...","[감성, 발라드, 사랑, 이별, 잔잔한, 기분전환, 새벽, 가을, 카페, 인디]","[538200, 539525, 85278, 543751, 373037, 657888...","[538200, 539525, 85278, 543751, 373037, 657888..."
4,79929,"[412769, 211220, 445299, 70314, 106129, 623728...","[CCM, 찬양, 은혜, 예배, 기도, 국내ccm, 교회, 사랑, 복음성가, 찬송]",,"[412769, 211220, 445299, 70314, 106129, 623728..."
...,...,...,...,...,...
23010,101722,"[116573, 473514, 13142, 366786, 339802, 281936...","[새벽, 밤, 추억, 힐링, 휴식, 회상, 이별, 슬픔, 발라드, 가을]","[538200, 539525, 85278, 543751, 373037, 657888...","[538200, 539525, 85278, 543751, 373037, 657888..."
23011,122127,"[352228, 48209, 138932, 630552, 473514, 4173, ...","[추억, 회상, 사랑, 힐링, 엄마, 부모님, 휴식, 잔잔한, 발라드, 설렘]","[538200, 539525, 85278, 543751, 373037, 657888...","[538200, 539525, 85278, 543751, 373037, 657888..."
23012,77438,"[274504, 140867, 679436, 493762, 21125, 360062...","[팝, 팝송, Pop, 기분전환, 드라이브, 카페, 휴식, 감성, 잔잔한, 힐링]","[538200, 539525, 85278, 543751, 373037, 657888...","[538200, 539525, 85278, 543751, 373037, 657888..."
23013,36231,"[548041, 50031, 699175, 46497, 236711, 198144,...","[클래식, 힐링, 피아노, 휴식, 조성진, 잔잔한, 쇼팽, 키즈클래식, 아기클래식,...",,"[548041, 50031, 699175, 46497, 236711, 198144,..."


In [37]:
real_final = pd.DataFrame(final[['id','songs','tags']])
real_final

Unnamed: 0,id,songs,tags
0,118598,"[538200, 539525, 85278, 543751, 373037, 657888...","[OST, 디즈니, 애니메이션, 영화, 기분전환, 추억, 디즈니OST, 휴식, 힐링..."
1,131447,"[144663, 116573, 357367, 366786, 654757, 13314...","[기분전환, 감성, 휴식, 발라드, 잔잔한, 드라이브, 힐링, 사랑, 새벽, 밤]"
2,51464,"[291080, 193610, 270647, 29532, 500248, 572932...","[발라드, 슬픔, 이별, 추억, 회상, 사랑, 밤, 설렘, 새벽, 잔잔한]"
3,45144,"[538200, 539525, 85278, 543751, 373037, 657888...","[감성, 발라드, 사랑, 이별, 잔잔한, 기분전환, 새벽, 가을, 카페, 인디]"
4,79929,"[412769, 211220, 445299, 70314, 106129, 623728...","[CCM, 찬양, 은혜, 예배, 기도, 국내ccm, 교회, 사랑, 복음성가, 찬송]"
...,...,...,...
23010,101722,"[538200, 539525, 85278, 543751, 373037, 657888...","[새벽, 밤, 추억, 힐링, 휴식, 회상, 이별, 슬픔, 발라드, 가을]"
23011,122127,"[538200, 539525, 85278, 543751, 373037, 657888...","[추억, 회상, 사랑, 힐링, 엄마, 부모님, 휴식, 잔잔한, 발라드, 설렘]"
23012,77438,"[538200, 539525, 85278, 543751, 373037, 657888...","[팝, 팝송, Pop, 기분전환, 드라이브, 카페, 휴식, 감성, 잔잔한, 힐링]"
23013,36231,"[548041, 50031, 699175, 46497, 236711, 198144,...","[클래식, 힐링, 피아노, 휴식, 조성진, 잔잔한, 쇼팽, 키즈클래식, 아기클래식,..."


In [38]:
real_final.to_json("../data/song/real_final.json")

In [39]:
df = pd.read_json("../data/song/real_final.json", encoding = 'utf-8')
df_records = df.to_dict('records')

In [40]:
with open("../data/song/real_final.json", 'w', encoding = 'utf-8') as make_file:
    json.dump(df_records, make_file, indent = '\t')