## 数据解析

### 原始数据=>歌单数据 (此系统暂时只使用部分数据)
抽取 _**歌单名称，歌单id，收藏数，所属分类**_ 4个歌单维度的信息 <br>
抽取 _**歌曲id，歌曲名，歌手，歌曲热度**_ 等4个维度信息歌曲的信息<br>

组织成如下格式：
```
漫步西欧小镇上##小语种,旅行##69413685##474	18682332::Wäg vo dir::Joy Amelie::70.0	4335372::Only When I Sleep::The Corrs::60.0	2925502::Si Seulement::Lynnsha::100.0	21014930::Tu N'As Pas Cherché...::La Grande Sophie::100.0	20932638::Du behöver aldrig mer vara rädd::Lasse Lindh::25.0	17100518::Silent Machine::Cat Power::60.0	3308096::Kor pai kon diew : ชอไปคนเดียว::Palmy::5.0	1648250::les choristes::Petits Chanteurs De Saint Marc::100.0	4376212::Paddy's Green Shamrock Shore::The High Kings::25.0	2925400::A Todo Color::Las Escarlatinas::95.0	19711402::Comme Toi::Vox Angeli::75.0	3977526::Stay::Blue Cafe::100.0	2538518::Shake::Elize::85.0	2866799::Mon Ange::Jena Lee::85.0	5191949::Je M'appelle Helene::Hélène Rolles::85.0	20036323::Ich Lieb' Dich Immer Noch So Sehr::Kate & Ben::100.0

In [1]:
#coding: utf-8
import json
import sys

def parse_song_line(in_line):
    data = json.loads(in_line)
    name = data['result']['name']
    tags = ",".join(data['result']['tags'])
    subscribed_count = data['result']['subscribedCount']
    if(subscribed_count<100):
        return False
    playlist_id = data['result']['id']
    song_info = ''
    songs = data['result']['tracks']
    for song in songs:
        try:
            song_info += "\t"+"::".join([str(song['id']),song['name'],song['artists'][0]['name'],str(song['popularity'])])
        except Exception as e:
            #print (e)
            #print (song)
            continue
    return name+"##"+tags+"##"+str(playlist_id)+"##"+str(subscribed_count)+song_info

def parse_file(in_file, out_file):
    out = open(out_file, 'w', encoding='utf-8')
    for line in open(in_file, encoding='utf-8'):
        result = parse_song_line(line)
        if(result):
            out.write(result.strip()+"\n")
    out.close()
    

parse_file("./ori_data/playlist_detail_all.json", "./pro_data/163_music_playlist.txt")

### 歌单数据=>推荐系统格式数据
主流的python推荐系统框架，支持的最基本数据格式为movielens dataset，其评分数据格式为 user item rating timestamp

In [2]:
import surprise 
import lightfm



### 1.针对用户推荐 网易云音乐(每日30首歌/7首歌)
### 2.针对歌曲 在你听某首歌的时候，找“相似歌曲”

In [3]:
# 解析成userid itemid rating timestamp行格式
def is_null(s): 
    return len(s.split(","))>2

def parse_song_info(song_info):
    try:
        song_id, name, artist, popularity = song_info.split("::")
        #return ",".join([song_id, name, artist, popularity])
        return ",".join([song_id,"1.0",'1300000'])
    except Exception as e:
        #print (e)
        #print (song_info)
        return ""

def parse_playlist_line(in_line):
    try:
        contents = in_line.strip().split("\t")
        name, tags, playlist_id, subscribed_count = contents[0].split("##")
        songs_info = map(lambda x:playlist_id+","+parse_song_info(x), contents[1:])
        songs_info = filter(is_null, songs_info)
        return "\n".join(songs_info)
    except Exception as e:
        print (e)
        return False


def parse_file(in_file, out_file):
    out = open(out_file, 'w', encoding='utf-8')
    for line in open(in_file, encoding='utf-8'):
        result = parse_playlist_line(line)
        if(result):
            out.write(result.strip()+"\n")
    out.close()

In [4]:
parse_file("./pro_data/163_music_playlist.txt", "./pro_data/163_music_suprise_format.txt")

In [5]:
parse_file("./ori_data/popular.playlist", "./pro_data/popular_music_suprise_format.txt")

### 保存歌单和歌曲信息备用
需要保存 **歌单id=>歌单名** 和 **歌曲id=>歌曲名** 的信息后期备用。

In [6]:
import pickle

def parse_playlist_get_info(in_line, playlist_dic, song_dic):
    contents = in_line.strip().split("\t")
    name, tags, playlist_id, subscribed_count = contents[0].split("##")
    playlist_dic[playlist_id] = name
    for song in contents[1:]:
        try:
            song_id, song_name, artist, popularity = song.split("::")
            song_dic[song_id] = song_name+"\t"+artist
        except:
            print ("song format error")
            print (song+"\n")



def parse_file1(in_file, out_playlist, out_song):
    #从歌单id到歌单名称的映射字典
    playlist_dic = {}
    #从歌曲id到歌曲名称的映射字典
    song_dic = {}
    for line in open(in_file, encoding='utf-8'):
        parse_playlist_get_info(line, playlist_dic, song_dic)
    #把映射字典保存在二进制文件中
    pickle.dump(playlist_dic, open(out_playlist,"wb")) 
    #可以通过 playlist_dic = pickle.load(open("playlist.pkl","rb"))重新载入
    pickle.dump(song_dic, open(out_song,"wb"))
    
    
def parse_file2(in_file, out_playlist, out_song):
    #从歌单id到歌单名称的映射字典
    playlist_dic = {}
    #从歌曲id到歌曲名称的映射字典
    song_dic = {}
    for line in open(in_file, encoding='utf-8'):
        parse_playlist_get_info(line, playlist_dic, song_dic)
    #把映射字典保存在二进制文件中
    pickle.dump(playlist_dic, open(out_playlist,"wb")) 
    #可以通过 playlist_dic = pickle.load(open("playlist.pkl","rb"))重新载入
    pickle.dump(song_dic, open(out_song,"wb"))

In [7]:
parse_file1("./pro_data/163_music_playlist.txt", "./pro_data/playlist.pkl", "./pro_data/song.pkl")

song format error
19169096::

song format error
 Time to Say Goodbye (Con te partirò)::Sarah Brightman::100.0

song format error
28935319::

song format error
สั่น (Album Version)::Boy Sompob::100.0

song format error
26902203::What’s your name? (collaboration with 壇蜜)

song format error
::SoulJa::100.0

song format error
532324::モノノ怪::オボロ怪~oboroge~::高梨康治::75.0

song format error
4948095::银魂::决闘前には用を足せ::V.A.::75.0

song format error
4949090::4代目青学&氷帝A::リフレッシュ 新たな自分へ|氷点下の情热II~THE TOPIII::日本ACG::15.0

song format error
4949093::5代目青学&氷帝B::リフレッシュ 新たな自分へ|氷点下の情热II~THE TOPIII ~Forever with you Version.::日本ACG::5.0

song format error
4949092::5代目青学&氷帝B::リフレッシュ 新たな自分へ|氷点下の情热II~THE TOPIII::日本ACG::5.0

song format error
4967359::テニスの王子様::ここからが・・・俺達::V.A.::5.0

song format error
4967372::テニスの王子様::南の岛から来た刺客::V.A.::15.0

song format error
4967363::テニスの王子様::チャチャッと溃すぜ~Bloodshot::V.A.::15.0

song format error
4967373::テニスの王子様::ここからが・・・俺達II

song format error
::V.A.::5.0

song format error
4967364::テニス


song format error
22773056::ODIN SPHERE’s Theme-Shanachie ver.-【エンディング】

song format error
::崎元仁::15.0

song format error
405599088::Make Them Wheels Roll

song format error
::SAFIA::100.0

song format error
33497915::

song format error
Frédéric Chopin: Ballade for piano No. 1 in G minor, Op. 23, CT. 2::Jorge Bolet::5.0

song format error
26902203::What’s your name? (collaboration with 壇蜜)

song format error
::SoulJa::100.0

song format error
22499491::Job - A Masque for Dancing, Scene VII:: Pavane of the Sons of the Morning::Various Artists::5.0

song format error
4964207::School Days-スクールデイズ-::茜色の空::大久保薫::55.0

song format error
592112::エル・カザド::ennui::梶浦由記::15.0

song format error
29722334::For You

song format error
::FiFi Rong::90.0

song format error
27511866::첫 사랑니 (Rum Pum Pum Pum)

song format error
::f(x)::95.0

song format error
405599088::Make Them Wheels Roll

song format error
::SAFIA::100.0

song format error
1201294::Farewell She Whispered::cecilia::eyes::5.0

song for

27511866::첫 사랑니 (Rum Pum Pum Pum)

song format error
::f(x)::95.0

song format error
19169096::

song format error
 Time to Say Goodbye (Con te partirò)::Sarah Brightman::100.0

song format error
4954511::リズム天国全曲集::リミックス3 (恋のハニースイ~トエンジェル)::V.A.::25.0

song format error
37610720::びっくり熱血新記録!はるかなる金メダル::おしまい::V.A.::65.0

song format error
4954513::リズム天国全曲集::ナイトウォーク::V.A.::25.0

song format error
4954645::リズム天国全曲集::ケロケロダンス Perfect Version(リズム天国ゴールド)[BONUS TRACK]::V.A.::5.0

song format error
37610736::熱血格闘伝説::第5回戦::V.A.::50.0

song format error
22455155::Don Giovanni (2002 Digital Remaster), Act I, Scene Three, Recitative & Aria:: Aria: Or sai chi l'onore (Donna Anna)::Various Artists::5.0

song format error
545787::パパとムスメの7日间::7日间の思い出::山下康介::20.0

song format error
4880130::正义の味方::槇子のテーマ.::小西康陽::20.0

song format error
4880131::正义の味方::容子のツイスト.::小西康陽::70.0

song format error
4880136::正义の味方::ピチカート・ポルカ::小西康陽::25.0

song format error
4880139::正义の味方::槇子のカノン.::小西康陽::5.0

song format error
415583

song format error
::f(x)::95.0

song format error
33054290::

song format error
Heartbeats::Dabin::95.0

song format error
785061::幻奏ノ滝

song format error
时音(フォールオブフォール~秋めく滝)::沙紗飛鳥::60.0

song format error
1870986::彩云国物语 セカンドシリーズ::遥かなる想い─胡弓version::梁邦彦::20.0

song format error
27511866::첫 사랑니 (Rum Pum Pum Pum)

song format error
::f(x)::95.0

song format error
4965892::桃华月惮::石剣-セキケン-::多田彰文::80.0

song format error
532324::モノノ怪::オボロ怪~oboroge~::高梨康治::75.0

song format error
4900979::千と千寻の神隠し::あの夏へ::Sal::30.0

song format error
494715::おもひでぽろぽろ::メイン・テーマ::面谷誠二::25.0

song format error
4900970::风の谷のナウシカ::风の伝説::Takero Ogata::20.0

song format error
4900972::魔女の宅急便::海の见える街::Lava::20.0

song format error
4900979::千と千寻の神隠し::あの夏へ::Sal::30.0

song format error
4900973::ハウルの动く城::人生のメリーゴーランド::Morphil::20.0

song format error
4900977::もののけ姫::もののけ姫::D.Locke::15.0

song format error
4900980::ゲド戦记::テルーの呗::SMOOTH J::20.0

song format error
4900969::红の豚::マルコとジーナ::Lo Three S::20.0

song format error
49009


song format error
37610703::いけいけ熱血ホッケー部 すべってころんで大乱闘::ゲームオーバー::V.A.::5.0

song format error
37610778::熱血!ストリートバスケット::スタッフロール::V.A.::20.0

song format error
37610777::熱血!ストリートバスケット::エンディング::V.A.::20.0

song format error
37610776::熱血!ストリートバスケット::ナイアガラステージ::V.A.::20.0

song format error
37610775::熱血!ストリートバスケット::ラスベガスステージ::V.A.::20.0

song format error
37610773::熱血!ストリートバスケット::サンフランシスコステージ::V.A.::20.0

song format error
37610774::熱血!ストリートバスケット::対ラスベガス・チームのテーマ::V.A.::15.0

song format error
37610772::熱血!ストリートバスケット::対サンフランシスコ・チームのテーマ::V.A.::20.0

song format error
37610771::熱血!ストリートバスケット::ハワイステージ::V.A.::25.0

song format error
37610770::熱血!ストリートバスケット::対ハワイ・チームのテーマ::V.A.::15.0

song format error
37610769::熱血!ストリートバスケット::テキサスステージ::V.A.::25.0

song format error
37610768::熱血!ストリートバスケット::対テキサス・チームのテーマ::V.A.::20.0

song format error
37610766::熱血!ストリートバスケット::対フロリダ・チームのテーマ::V.A.::20.0

song format error
37610765::熱血!ストリートバスケット::US NAVYステージ::V.A.::25.0

song format error
37610764::熱血!ストリートバスケット::対 US

31246045::黒執事 華麗なるドラマCD::グレルの執事ヴォイス お召し物が!::小野大輔::50.0

song format error
31246032::黒執事 華麗なるドラマCD::セバスチャンの執事ヴォイス ティータイム::小野大輔::55.0

song format error
31246027::黒執事 華麗なるドラマCD::セバスチャンの執事ヴォイス 朝のご挨拶::小野大輔::60.0

song format error
31246042::黒執事 華麗なるドラマCD::グレルの執事ヴォイス 自己紹介::小野大輔::55.0

song format error
31246046::黒執事 華麗なるドラマCD::グレルの執事ヴォイス お嬢様お散歩ですか::小野大輔::55.0

song format error
617067::『圣闘士星矢』より::ペガサス幻想(ファミソン オリジナル カラオケ(ショートサイズ))::桃井はるこ::75.0

song format error
456177::true tears::一阵の风::菊地創::95.0

song format error
26902203::What’s your name? (collaboration with 壇蜜)

song format error
::SoulJa::100.0

song format error
453815::バンビ~ノ!::チャオ!!::菅野祐悟::5.0

song format error
33497925::

song format error
Frédéric Chopin: Etude for piano No. 23 in A minor, Op. 25/11, CT. 36::Jorge Bolet::70.0

song format error
499651::象の背中::爱に愈されて (Title)::千住明::60.0

song format error
499656::象の背中::覚めぬ悪梦 (ver.I)::千住明::45.0

song format error
499665::象の背中::无言の友情::千住明::55.0

song format error
28482246::トップをねらえ!予告編

song format error
31246044::黒執事 華麗なるドラマCD::グレルの執事ヴォイス 紅茶のおかわりですね::小野大輔::70.0

song format error
31246051::黒執事 華麗なるドラマCD::グレルの執事ヴォイス このドレス、私のものDEATHッッ!::小野大輔::70.0

song format error
532324::モノノ怪::オボロ怪~oboroge~::高梨康治::75.0

song format error
26902203::What’s your name? (collaboration with 壇蜜)

song format error
::SoulJa::100.0

song format error
17742280::

song format error
Johann Sebastian Bach: Prelude and Fugue in C sharp (WTK, Book, No.3), BWV 848 - Prelude::Friedrich Gulda::25.0

song format error
1375664::Suite No. 3 In G Minor, BWV 808 IV. Sarabande - Les Agréments De La Même Sarabande

song format error
::Glenn Gould::20.0

song format error
31477705::

song format error
Johann Sebastian Bach: Goldberg Variations, BWV 988 - Variation 9 Canone alla Terza::Elena Barshai::20.0

song format error
31477724::

song format error
Johann Sebastian Bach: Goldberg Variations, BWV 988 - Variation 27 Canone alla Nona::Elena Barshai::15.0

song format error
33190071::Theme of Blood Blockade 

4965888::桃华月惮::龙皇-リュウオウ-::多田彰文::25.0

song format error
456177::true tears::一阵の风::菊地創::95.0

song format error
22642373::

song format error
 FAIRY TAIL メインテーマ -Slow ver.-::高梨康治::95.0

song format error
456177::true tears::一阵の风::菊地創::95.0

song format error
573503::WHITE ALBUM::WHITE ALBUM::平野綾::90.0

song format error
31563610::

song format error
苍之礼赞::花之祭P::60.0

song format error
4954593::リズム天国全曲集::恋の実験室::V.A.::55.0

song format error
4954596::リズム天国全曲集::シンクロ::V.A.::60.0

song format error
31654811::

song format error
American Cowboys::Tim Wynn::65.0

song format error
29005604::

song format error
Sehnsucht nach dem Frühling ("Komm, lieber Mai"), song for voice & pian::Sky::85.0

song format error
19169096::

song format error
 Time to Say Goodbye (Con te partirò)::Sarah Brightman::100.0

song format error
2924300::

song format error
 Les Peulles Mortes (Autum Leaves)::Laura Fygi::85.0

song format error
1201232::For The Fallen::cecilia::eyes::15.0

song format error
1201288::Our


song format error
654327::4. ヴァイオリン协奏曲 ニ长调 作品35 第1楽章:Allegro moderato

song format error
试聴する::诹访内晶子::10.0

song format error
654333::5. ヴァイオリン协奏曲 ニ长调 作品35 第2楽章:Canzonetta:Andante

song format error
试聴する::诹访内晶子::10.0

song format error
654338::6. ヴァイオリン协奏曲 ニ长调 作品35 第3楽章:Finale:Allegro vivacissimo

song format error
试聴する::诹访内晶子::10.0

song format error
4880140::正义の味方::槇子のテーマ.-alternate mix-::小西康陽::25.0

song format error
31090266::特救指令ソルブレイン::愛に抱かれて::宮内タカユキ::15.0

song format error
28935319::

song format error
สั่น (Album Version)::Boy Sompob::100.0

song format error
31563610::

song format error
苍之礼赞::花之祭P::60.0

song format error
415227::ラスト・フレンズ::Prisoner Of Love (Inst.Ver.) - instrumental::井筒昭雄::75.0

song format error
4948093::银魂::晴れの日に雨伞さす奴には御用心::Audio Highs::60.0

song format error
37610758::熱血!ストリートバスケット::テクノスジャパンロゴ::V.A.::45.0

song format error
37610759::熱血!ストリートバスケット::オープニング::V.A.::40.0

song format error
37610761::熱血!ストリートバスケット::チーム選択::V.A.::45.0

song format error
376107

song format error
31654791::

song format error
European Intrigue::Tim Wynn::45.0

song format error
4948080::银魂::私と仕事どっちが大事なのとかいう女にはジャーマンスープレックス::Audio Highs::75.0

song format error
570550::『Phantom ~Requiem for the Phantom~』オリジナルサウンドトラックVol.1::Requiem for the Phantom part I::伊藤真澄::30.0

song format error
28590385::電脳コイル::戸惑い::斉藤恒芳::25.0

song format error
426291251::Brand New Moves (Nomekop Remix) 

song format error
::Hey Violet::25.0

song format error
29722334::For You

song format error
::FiFi Rong::90.0

song format error
456177::true tears::一阵の风::菊地創::95.0

song format error
4964190::School Days-スクールデイズ-::见ているだけで…::大久保薫::85.0

song format error
22642373::

song format error
 FAIRY TAIL メインテーマ -Slow ver.-::高梨康治::95.0

song format error
29715639::The Vow

song format error
::Steven Cravis::25.0

song format error
19169096::

song format error
 Time to Say Goodbye (Con te partirò)::Sarah Brightman::100.0

song format error
2772433::Sonata for piano & violin no.5:: III. Allegro mo

In [8]:
parse_file2("./ori_data/popular.playlist", "./pro_data/popular_playlist.pkl", "./pro_data/popular_song.pkl")

song format error
1870957::彩云国物语 セカンドシリーズ::君を想う::梁邦彦::80.0

song format error
4965888::桃华月惮::龙皇-リュウオウ-::多田彰文::25.0

song format error
456177::true tears::一阵の风::菊地創::95.0

song format error
22642373::

song format error
 FAIRY TAIL メインテーマ -Slow ver.-::高梨康治::95.0

song format error
31563610::

song format error
苍之礼赞::花之祭P::60.0

song format error
4954593::リズム天国全曲集::恋の実験室::V.A.::55.0

song format error
4954596::リズム天国全曲集::シンクロ::V.A.::60.0

song format error
31654811::

song format error
American Cowboys::Tim Wynn::65.0

song format error
19169096::

song format error
 Time to Say Goodbye (Con te partirò)::Sarah Brightman::100.0

song format error
31563610::

song format error
苍之礼赞::花之祭P::60.0

song format error
31563610::

song format error
苍之礼赞::花之祭P::60.0

song format error
31563610::

song format error
苍之礼赞::花之祭P::60.0

song format error
19169096::

song format error
 Time to Say Goodbye (Con te partirò)::Sarah Brightman::100.0

song format error
376653::野弧禅狂叱(宿香之战)

song format error
:

## 使用推荐系统库Surprise构建模型

In [9]:
import os
import io

from surprise import KNNBaseline, Reader
from surprise import Dataset

# 重建歌单id到歌单名的映射字典
id_name_dic = pickle.load(open("./pro_data/popular_playlist.pkl","rb"))
print("加载歌单id到歌单名的映射字典完成...")
# 重建歌单名到歌单id的映射字典
name_id_dic = {}
for playlist_id in id_name_dic:
    name_id_dic[id_name_dic[playlist_id]] = playlist_id
print("加载歌单名到歌单id的映射字典完成...")


file_path = os.path.expanduser('./pro_data/popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 计算歌曲和歌曲之间的相似度
print("构建数据集...")
trainset = music_data.build_full_trainset()
#sim_options = {'name': 'pearson_baseline', 'user_based': False}

加载歌单id到歌单名的映射字典完成...
加载歌单名到歌单id的映射字典完成...
构建数据集...


In [10]:
list(id_name_dic.keys())[2]

'308252386'

In [11]:
print(id_name_dic[list(id_name_dic.keys())[2]])

《从你的全世界路过》电影原声带


In [12]:
trainset.n_items

130573

In [13]:
trainset.n_users

3771

### 查找最近的user(在这里是歌单)

In [14]:
print("开始训练模型...")
#sim_options = {'user_based': False}
#algo = KNNBaseline(sim_options=sim_options)
algo = KNNBaseline()
algo.train(trainset)

current_playlist = list(name_id_dic.keys())[39]
print("歌单名称", current_playlist)

# 取出近邻
# 映射名字到id
playlist_id = name_id_dic[current_playlist]
print("歌单id", playlist_id)
# 取出来对应的内部user id => to_inner_uid
playlist_inner_id = algo.trainset.to_inner_uid(playlist_id)
print("内部id", playlist_inner_id)

playlist_neighbors = algo.get_neighbors(playlist_inner_id, k=10)

# 把歌曲id转成歌曲名字
# to_raw_uid映射回去
playlist_neighbors = (algo.trainset.to_raw_uid(inner_id)
                       for inner_id in playlist_neighbors)
playlist_neighbors = (id_name_dic[playlist_id]
                       for playlist_id in playlist_neighbors)

print()
print("和歌单 《", current_playlist, "》 最接近的10个歌单为：\n")
for playlist in playlist_neighbors:
    print(playlist, algo.trainset.to_inner_uid(name_id_dic[playlist]))

开始训练模型...
Estimating biases using als...




Computing the msd similarity matrix...
Done computing similarity matrix.
歌单名称 卖血哥翻唱专辑
歌单id 132021932
内部id 1255

和歌单 《 卖血哥翻唱专辑 》 最接近的10个歌单为：

评论数过500的鬼畜、调教及翻唱精品 586
【普通disco】各版本合集 622
《原来翻唱也可以这么好听~》同步更新中... 883
自用·～安静的 在闲暇时听这些歌 887
华语翻唱│同一旋律男女别韵味 ❀ 1041
♫ 越听越好听的中文歌 1481
【华语】冷门好听の优质男声⭐ 1510
热门华语（AB站向？ ） 2216
写歌的人假正经，听歌的人最无情。 2783
單 身 情 歌 3168


### 针对用户进行预测

In [15]:
import pickle
# 重建歌曲id到歌曲名的映射字典
song_id_name_dic = pickle.load(open("./pro_data/popular_song.pkl","rb"))
print("加载歌曲id到歌曲名的映射字典完成...")
# 重建歌曲名到歌曲id的映射字典
song_name_id_dic = {}
for song_id in song_id_name_dic:
    song_name_id_dic[song_id_name_dic[song_id]] = song_id
print("加载歌曲名到歌曲id的映射字典完成...")

加载歌曲id到歌曲名的映射字典完成...
加载歌曲名到歌曲id的映射字典完成...


In [16]:
#内部编码的4号用户
user_inner_id = 4
user_rating = trainset.ur[user_inner_id]
items = map(lambda x:x[0], user_rating)
for song in items:
    print(algo.predict(user_inner_id, song, r_ui=1), song_id_name_dic[algo.trainset.to_raw_iid(song)])

user: 4          item: 361        r_ui = 1.00   est = 1.00   {'was_impossible': False} 家	许巍
user: 4          item: 362        r_ui = 1.00   est = 1.00   {'was_impossible': False} 老街	李荣浩
user: 4          item: 363        r_ui = 1.00   est = 1.00   {'was_impossible': False} 滴答	侃侃
user: 4          item: 364        r_ui = 1.00   est = 1.00   {'was_impossible': False} 彩虹	周杰伦
user: 4          item: 365        r_ui = 1.00   est = 1.00   {'was_impossible': False} 米店	张玮玮和郭龙
user: 4          item: 366        r_ui = 1.00   est = 1.00   {'was_impossible': False} 情人	Beyond
user: 4          item: 367        r_ui = 1.00   est = 1.00   {'was_impossible': False} 喜欢你	Beyond
user: 4          item: 220        r_ui = 1.00   est = 1.00   {'was_impossible': False} 灰姑娘	郑钧
user: 4          item: 235        r_ui = 1.00   est = 1.00   {'was_impossible': False} 安和桥	宋冬野
user: 4          item: 240        r_ui = 1.00   est = 1.00   {'was_impossible': False} 去大理	郝云
user: 4          item: 368        r_ui = 1.00   est 

### 用矩阵分解进行预测

In [17]:
### 使用NMF
from surprise import NMF, evaluate
from surprise import Dataset

file_path = os.path.expanduser('./pro_data/popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 构建数据集和建模
algo = NMF()
trainset = music_data.build_full_trainset()
algo.train(trainset)



<surprise.prediction_algorithms.matrix_factorization.NMF at 0x111ebff8278>

In [18]:
user_inner_id = 4
user_rating = trainset.ur[user_inner_id]
items = map(lambda x:x[0], user_rating)
for song in items:
    print(algo.predict(algo.trainset.to_raw_uid(user_inner_id), algo.trainset.to_raw_iid(song), r_ui=1), song_id_name_dic[algo.trainset.to_raw_iid(song)])

user: 69758545   item: 167751     r_ui = 1.00   est = 1.00   {'was_impossible': False} 家	许巍
user: 69758545   item: 133998     r_ui = 1.00   est = 1.00   {'was_impossible': False} 老街	李荣浩
user: 69758545   item: 25638325   r_ui = 1.00   est = 1.00   {'was_impossible': False} 滴答	侃侃
user: 69758545   item: 185809     r_ui = 1.00   est = 1.00   {'was_impossible': False} 彩虹	周杰伦
user: 69758545   item: 26494698   r_ui = 1.00   est = 1.00   {'was_impossible': False} 米店	张玮玮和郭龙
user: 69758545   item: 347355     r_ui = 1.00   est = 1.00   {'was_impossible': False} 情人	Beyond
user: 69758545   item: 346073     r_ui = 1.00   est = 1.00   {'was_impossible': False} 喜欢你	Beyond
user: 69758545   item: 186842     r_ui = 1.00   est = 1.00   {'was_impossible': False} 灰姑娘	郑钧
user: 69758545   item: 27646205   r_ui = 1.00   est = 1.00   {'was_impossible': False} 安和桥	宋冬野
user: 69758545   item: 28977819   r_ui = 1.00   est = 1.00   {'was_impossible': False} 去大理	郝云
user: 69758545   item: 65538      r_ui = 1.00   est 

## 模型存储

In [19]:
import surprise
surprise.dump.dump('./model/recommendation.model', algo=algo)
# 可以用下面的方式载入
algo = surprise.dump.load('./model/recommendation.model')

## 不同的推荐系统算法评估

In [20]:
# 载入数据
import os
from surprise import Reader, Dataset
# 指定文件路径
file_path = os.path.expanduser('./pro_data/popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 分成5折
music_data.split(n_folds=5)

In [21]:
music_data

<surprise.dataset.DatasetAutoFolds at 0x111ebff8080>

In [22]:
music_data.raw_ratings[:20]

[('428320388', '28731643', 1.0, '1300000'),
 ('90448075', '5254908', 1.0, '1300000'),
 ('16486092', '26830567', 1.0, '1300000'),
 ('38014760', '261433', 1.0, '1300000'),
 ('430012080', '5273228', 1.0, '1300000'),
 ('148474853', '16435049', 1.0, '1300000'),
 ('54840743', '5283432', 1.0, '1300000'),
 ('81785948', '331579', 1.0, '1300000'),
 ('93476766', '125399', 1.0, '1300000'),
 ('69000117', '189259', 1.0, '1300000'),
 ('95990977', '29750124', 1.0, '1300000'),
 ('56842248', '22682066', 1.0, '1300000'),
 ('142425394', '29812280', 1.0, '1300000'),
 ('4990469', '5284516', 1.0, '1300000'),
 ('168780705', '29544645', 1.0, '1300000'),
 ('422392659', '28308080', 1.0, '1300000'),
 ('34670012', '28912419', 1.0, '1300000'),
 ('79906957', '32217107', 1.0, '1300000'),
 ('127886176', '135497', 1.0, '1300000'),
 ('92513299', '41652722', 1.0, '1300000')]

In [23]:
### 使用NormalPredictor
from surprise import NormalPredictor, evaluate
algo = NormalPredictor()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm NormalPredictor.

------------
Fold 1
RMSE: 0.0000
MAE:  0.0000
------------
Fold 2
RMSE: 0.0000
MAE:  0.0000
------------
Fold 3
RMSE: 0.0000
MAE:  0.0000
------------
Fold 4
RMSE: 0.0000
MAE:  0.0000
------------
Fold 5
RMSE: 0.0000
MAE:  0.0000
------------
------------
Mean RMSE: 0.0000
Mean MAE : 0.0000
------------
------------


In [24]:
### 使用BaselineOnly
from surprise import BaselineOnly, evaluate
algo = BaselineOnly()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm BaselineOnly.

------------
Fold 1
Estimating biases using als...
RMSE: 0.0000
MAE:  0.0000
------------
Fold 2
Estimating biases using als...
RMSE: 0.0000
MAE:  0.0000
------------
Fold 3
Estimating biases using als...
RMSE: 0.0000
MAE:  0.0000
------------
Fold 4
Estimating biases using als...
RMSE: 0.0000
MAE:  0.0000
------------
Fold 5
Estimating biases using als...
RMSE: 0.0000
MAE:  0.0000
------------
------------
Mean RMSE: 0.0000
Mean MAE : 0.0000
------------
------------


In [25]:
### 使用基础版协同过滤
from surprise import KNNBasic, evaluate
algo = KNNBasic()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm KNNBasic.

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 4
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 5
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
------------
Mean RMSE: 0.0000
Mean MAE : 0.0000
------------
------------


In [26]:
### 使用均值协同过滤
from surprise import KNNWithMeans, evaluate
algo = KNNWithMeans()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm KNNWithMeans.

------------
Fold 1
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 2
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 3
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 4
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 5
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
------------
Mean RMSE: 0.0000
Mean MAE : 0.0000
------------
------------


In [27]:
### 使用协同过滤baseline
from surprise import KNNBaseline, evaluate
algo = KNNBaseline()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm KNNBaseline.

------------
Fold 1
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 2
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 3
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 4
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
Fold 5
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.0000
MAE:  0.0000
------------
------------
Mean RMSE: 0.0000
Mean MAE : 0.0000
------------
------------


In [28]:
### 使用SVD
from surprise import SVD, evaluate
algo = SVD()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 0.0407
MAE:  0.0192
------------
Fold 2
RMSE: 0.0407
MAE:  0.0192
------------
Fold 3
RMSE: 0.0406
MAE:  0.0191
------------
Fold 4
RMSE: 0.0406
MAE:  0.0191
------------
Fold 5
RMSE: 0.0405
MAE:  0.0191
------------
------------
Mean RMSE: 0.0406
Mean MAE : 0.0191
------------
------------


In [None]:
### 使用SVD++
from surprise import SVDpp, evaluate
algo = SVDpp()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])



Evaluating RMSE, MAE of algorithm SVDpp.

------------
Fold 1


In [None]:
### 使用NMF
from surprise import NMF
algo = NMF()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])
print_perf(perf)