## Swing （基于itemcf的变体）

In [1]:
%cd /playground/sgd_deep_learning/sgd_rec_sys/
import sys 
sys.path.append('./python')

/playground/sgd_deep_learning/sgd_rec_sys


In [2]:
import numpy as np
import random
from sgd_rec_sys.retrieval import Swing, RateInfo

## rate_info

* 从文件中读取用户、物品的meta info（比如id-name的映射关系）
* 读取用户历史评分文件，针对不同算法整理对应数据
  * itemcf：需要每个物品 对应的 用户评价list
  * usercf：需要每个用户 评价过的 所有物品的list

In [3]:
rate_info = RateInfo(user_file='./data/retrieval/user2id.txt',
                     item_file='./data/retrieval/item2id.txt',
                    rate_file='./data/retrieval/userid_itemid_rate.txt')

In [4]:
# 用户侧信息
rate_info.user_meta_info()

{'col_name': ['user_id', 'user_name'],
 'id2name': {1: 'A', 2: 'B', 3: 'C', 4: 'D', 5: 'E'},
 'name2id': {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5}}

In [5]:
# 物品侧信息
rate_info.item_meta_info()

{'col_name': ['item_id', 'item_name'],
 'id2name': {1: 'story_book', 2: 'magazine', 3: 'tv', 4: 'ps4'},
 'name2id': {'story_book': 1, 'magazine': 2, 'tv': 3, 'ps4': 4}}

In [6]:
#  rate 信息
rate_info.rate_meta_info()

{'col_name': ['userid', 'itemid', 'rate'],
 'rate_pairs': [[1, 1, 1],
  [1, 2, -1],
  [1, 3, 1],
  [1, 4, 1],
  [2, 2, 1],
  [2, 3, -1],
  [2, 4, -1],
  [3, 1, 1],
  [3, 2, 1],
  [3, 3, -1],
  [4, 1, -1],
  [4, 3, 1],
  [5, 1, 1],
  [5, 2, 1],
  [5, 4, -1]]}

## swing

In [7]:
swing = Swing(meta_info=rate_info)

item_info = rate_info.item_meta_info()
iids = list(item_info['id2name'].keys())
print("all item ids:", iids)
print(item_info['id2name'])
print()

# 计算两两物品间的cos sim (耗时操作可离线计算)
for i in range(len(iids)-1):
    for j in range(i, len(iids)):
        id1, id2 = iids[i], iids[j]
        print("sim score of {}-{} :\t {}\n".format(id1, id2, swing.sim(id1, id2, alpha=4)))

all item ids: [1, 2, 3, 4]
{1: 'story_book', 2: 'magazine', 3: 'tv', 4: 'ps4'}

w1, w2 {1: 1, 3: 1, 5: 1} {1: 1, 3: 1, 5: 1}
common [1, 3, 5]
sim score of 1-1 :	 1.042857142857143

w1, w2 {1: 1, 3: 1, 5: 1} {2: 1, 3: 1, 5: 1}
common [3, 5]
sim score of 1-2 :	 0.5

w1, w2 {1: 1, 3: 1, 5: 1} {1: 1, 4: 1}
common [1]
sim score of 1-3 :	 0.14285714285714285

w1, w2 {1: 1, 3: 1, 5: 1} {1: 1}
common [1]
sim score of 1-4 :	 0.14285714285714285

w1, w2 {2: 1, 3: 1, 5: 1} {2: 1, 3: 1, 5: 1}
common [2, 3, 5]
sim score of 2-2 :	 1.1

w1, w2 {2: 1, 3: 1, 5: 1} {1: 1, 4: 1}
common []
sim score of 2-3 :	 0

w1, w2 {2: 1, 3: 1, 5: 1} {1: 1}
common []
sim score of 2-4 :	 0

w1, w2 {1: 1, 4: 1} {1: 1, 4: 1}
common [1, 4]
sim score of 3-3 :	 0.5428571428571429

w1, w2 {1: 1, 4: 1} {1: 1}
common [1]
sim score of 3-4 :	 0.14285714285714285



## 总结：
* Swing 与ItemCF 唯⼀的区别在于物品相似度。
* ItemCF：两个物品重合的⽤户⽐例⾼，则判定两个
物品相似。
* Swing：额外考虑重合的⽤户是否来⾃⼀个⼩圈⼦。
  * 同时喜欢两个物品的⽤户记作集合𝒱。
  * 对于𝒱 中的⽤户𝑢1 和𝑢2，重合度记作overlap(𝑢1, 𝑢2)
  * 两个⽤户重合度⼤，则可能来⾃⼀个⼩圈⼦，权重降低。