### 协作型过滤（Collaborative Filtering）
>*从一大群人中进行搜索， 找出与我们口味相近的人。* 时下，有很多网站都采用这样活着那样的协作型过滤的算法。主要涉及到电影、音乐、书籍、交友、购物等。

---
### 搜集偏好（Collecting Prefrences）

In [1]:
# 一个影评者对几部电影的评分情况的字典
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 
 'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 
 'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 
 'You, Me and Dupree': 3.5}, 
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
 'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
 'The Night Listener': 4.5, 'Superman Returns': 4.0, 
 'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 
 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
 'You, Me and Dupree': 2.0}, 
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}

In [2]:
critics['Lisa Rose']['Lady in the Water']

2.5

In [3]:
critics['Toby']['Snakes on Plane'] = 4.5

In [4]:
critics['Toby']

{'Snakes on Plane': 4.5,
 'Snakes on a Plane': 4.5,
 'Superman Returns': 4.0,
 'You, Me and Dupree': 1.0}

---
### 寻找相近的用户（Finding Similar Users） 
- 欧几里得距离
- 皮尔逊相关度

In [5]:
# 欧几里得距离
from math import sqrt

def sim_distance(prefs, person1, person2):
    si = {}
    for item in prefs[person1]:
        if item in prefs[person2]:
            si[item] = 1
    if len(si) == 0:
        return 0
    
    # 计算所有差值的平方和
    sum_of_squares = sum([pow(prefs[person1][item] - prefs[person2][item], 2) 
                          for item in prefs[person1] if item in prefs[person2]])
    
    return 1/(1+sum_of_squares)

In [6]:
sim_distance(critics, 'Lisa Rose', 'Gene Seymour')

0.14814814814814814

In [11]:
# 皮尔逊系数
def sim_pearson(prefs, p1, p2):
    si = {}
    for item in prefs[p1]: 
        if item in prefs[p2]: 
            si[item] = 1
    if len(si) == 0:
        return 0
    n = len(si)
    
    sum1 = sum([prefs[p1][it] for it in si])
    sum2 = sum([prefs[p2][it] for it in si])
    
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
    
    # 求平方和
    pSum = sum([prefs[p1][it]*prefs[p2][it] for it in si])
    
    # 计算皮尔逊相关系数
    num = pSum - (sum1*sum2/n)
    den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
    if den == 0:
        return 0
    r = num/den
    
    return r

In [12]:
sim_pearson(critics, 'Lisa Rose', 'Gene Seymour')

0.39605901719066977

---
### 为评论者打分（Ranking the Critics）

In [13]:
def top_mathches(prefs, person, n=5, similarity=sim_pearson):
    scores=[(similarity(prefs, person, other), other) for other in prefs if other != person]
    scores.sort()
    scores.reverse()
    return scores[0:n]

In [14]:
top_mathches(critics, 'Toby', n=3)

[(0.9912407071619299, 'Lisa Rose'),
 (0.9244734516419049, 'Mick LaSalle'),
 (0.8934051474415647, 'Claudia Puig')]

---
### 推荐物品

In [17]:
# 利用所有他人的评价值加权平均， 为某人提供建议
def get_recommendations(prefs, person, similarity=sim_pearson):
    totals={}
    simSums={}
    for other in prefs:
        # 不要和自己做比较
        if other==person:
            continue
        sim=similarity(prefs, person, other)
        
        # 忽略评价值为零或者小于零的情况
        if sim<=0:
            continue
        for item in prefs[other]:
            # 只对自己还未曾看过的影片进行评价
            if item not in prefs[person] or prefs[person][item]==0:
                # 相似度*评价值
                totals.setdefault(item, 0)
                totals[item] += prefs[other][item]*sim
                # 相似度之和
                simSums.setdefault(item,0)
                simSums[item] += sim
    # 建立一个归一化的列表
    rankings=[(total/simSums[item],item) for item,total in totals.items()]
    
    # 返回经过排序的列表
    rankings.sort()
    rankings.reverse()
    return rankings

In [18]:
get_recommendations(critics, 'Toby')

[(3.3477895267131013, 'The Night Listener'),
 (2.8325499182641614, 'Lady in the Water'),
 (2.5309807037655645, 'Just My Luck')]

---
### 匹配商品


In [23]:
# 首先将字典转换，将商品作为键
def transformPerfs(prefs):
    result = {}
    for person in prefs:
        for item in prefs[person]:
            result.setdefault(item,{})
            result[item][person] = prefs[person][item]
    return result

In [24]:
movies = transformPerfs(critics)

---
### 构建一个基于del.icio.us的链接推荐系统

In [26]:
import pydelicious

ImportError: No module named pydelicious