### Retrieval by FM

$$
\hat{y}(\mathbf{x}):=w_0+\sum_{i=1}^n w_i x_i+\sum_{i=1}^n \sum_{j=i+1}^n\left\langle\mathbf{v}_i, \mathbf{v}_j\right\rangle x_i x_j
$$

$$
\begin{aligned}
& \frac{1}{2} \sum_{f=1}^k\left(\left(\sum_{i=1}^n v_{i, f} x_i\right)^2-\sum_{i=1}^n v_{i, f}^2 x_i^2\right) \\
= & \frac{1}{2} \sum_{f=1}^k\left(\left(\sum_{u \in U} v_{u, f} x_u+\sum_{t \in I} v_{t, f} x_t\right)^2-\sum_{u \in U} v_{u, f}^2 x_u^2-\sum_{t \in I} v_{t, f}^2 x_t^2\right) \\
= & \frac{1}{2} \sum_{f=1}^k\left(\left(\sum_{u \in U} v_{u, f} x_u\right)^2+\left(\sum_{t \in I} v_{t, f} x_t\right)^2+2 \sum_{u \in U} v_{u, f} x_u \sum_{t \in I} v_{t, f} x_t-\sum_{u \in U} v_{u, f}^2 x_u^2-\sum_{t \in I} v_{t, f}^2 x_t^2\right)
\end{aligned}
$$

用户向量：

用户向量由两项表达式拼接得到。
* 第一项为常数 1
* 第二项是将用户相关的特征向量进行 sum pooling

$$
V_{\text {user }}=\left[1 ; \quad \sum_{u \in U} v_u x_u\right]
$$

物品向量：

用户向量由两项表达式拼接得到。
* 第一项表示物品相关特征向量的一阶、二阶特征交互。
* 第二项是将物品相关的特征向量进行 sum pooling 。

$$
V_{item}=\left[\sum_{t \in I} w_t x_t+\frac{1}{2} \sum_{f=1}^k\left(\left(\sum_{t \in I} v_{t, f} x_t\right)^2-\sum_{t \in I} v_{t, f}^2 x_t^2\right) ; \quad \sum_{t \in I} v_t x_t\right]
$$

In [1]:
from nltk.corpus import wordnet as wn
poses = { 'n':'noun', 'v':'verb', 's':'adj (s)', 'a':'adj', 'r':'adv'}
for synset in wn.synsets("good"):
  print("{}: {}".format(poses[synset.pos()],", ".join([l.name() for l in synset.lemmas()])))

noun: good
noun: good, goodness
noun: good, goodness
noun: commodity, trade_good, good
adj: good
adj (s): full, good
adj: good
adj (s): estimable, good, honorable, respectable
adj (s): beneficial, good
adj (s): good
adj (s): good, just, upright
adj (s): adept, expert, good, practiced, proficient, skillful, skilful
adj (s): good
adj (s): dear, good, near
adj (s): dependable, good, safe, secure
adj (s): good, right, ripe
adj (s): good, well
adj (s): effective, good, in_effect, in_force
adj (s): good
adj (s): good, serious
adj (s): good, sound
adj (s): good, salutary
adj (s): good, honest
adj (s): good, undecomposed, unspoiled, unspoilt
adj (s): good
adv: well, good
adv: thoroughly, soundly, good


In [2]:
panda = wn.synset("panda.n.01")
hyper = lambda s: s.hypernyms()
list(panda.closure(hyper))

[Synset('procyonid.n.01'),
 Synset('carnivore.n.01'),
 Synset('placental.n.01'),
 Synset('mammal.n.01'),
 Synset('vertebrate.n.01'),
 Synset('chordate.n.01'),
 Synset('animal.n.01'),
 Synset('organism.n.01'),
 Synset('living_thing.n.01'),
 Synset('whole.n.02'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

### Word2vec

$$
-\frac{1}{T} \sum_{t=1}^T \sum_{\substack{-m \leq j \leq m \\ j \neq 0}} \log P\left(w_{t+j} \mid w_t ; \theta\right)
$$

In [4]:
def naiveSoftmaxLossAndGradient(
    centerWordVec,
    outsideWordIdx,
    outsideVectors,
    dataset
):
    """ Naive Softmax loss & gradient function for word2vec models

    Arguments:
    centerWordVec -- numpy ndarray, center word's embedding
                    in shape (word vector length, )
                    (v_c in the pdf handout)
    outsideWordIdx -- integer, the index of the outside word
                    (o of u_o in the pdf handout)
    outsideVectors -- outside vectors is
                    in shape (num words in vocab, word vector length) 
                    for all words in vocab (tranpose of U in the pdf handout)
    dataset -- needed for negative sampling, unused here.

    Return:
    loss -- naive softmax loss
    gradCenterVec -- the gradient with respect to the center word vector
                     in shape (word vector length, )
                     (dJ / dv_c in the pdf handout)
    gradOutsideVecs -- the gradient with respect to all the outside word vectors
                    in shape (num words in vocab, word vector length) 
                    (dJ / dU)
    """

    # centerWordVec:  (embedding_dim,1)
    # outsideVectors: (vocab_size,embedding_dim)

    scores = np.matmul(outsideVectors, centerWordVec)  # size=(vocab_size, 1)
    probs = softmax(scores)  # size=(vocab, 1)

    loss = -np.log(probs[outsideWordIdx])  # scalar

    dscores = probs.copy()  # size=(vocab, 1)
    dscores[outsideWordIdx] = dscores[outsideWordIdx] - 1  # dscores=y_hat - y
    gradCenterVec = np.matmul(outsideVectors, dscores)  # J关于vc的偏导数公式  size=(vocab_size, 1)
    gradOutsideVecs = np.outer(dscores, centerWordVec)  # J关于u的偏导数公式  size=(vocab_size, embedding_dim)

    return loss, gradCenterVec, gradOutsideVecs

In [5]:
def negSamplingLossAndGradient(
    centerWordVec,
    outsideWordIdx,
    outsideVectors,
    dataset,
    K=10
):
  
    negSampleWordIndices = getNegativeSamples(outsideWordIdx, dataset, K)
    indices = [outsideWordIdx] + negSampleWordIndices

    gradCenterVec =np.zeros(centerWordVec.shape)  # (embedding_size,1)
    gradOutsideVecs = np.zeros(outsideVectors.shape)  # (vocab_size, embedding_size)
    loss = 0.0

    u_o = outsideVectors[outsideWordIdx]  # size=(embedding_size,1)
    z = sigmoid(np.dot(u_o, centerWordVec))  # size=(1, )
    loss -= np.log(z) # 损失函数的第一部分
    gradCenterVec += u_o * (z - 1)   # J关于vc的偏导数的第一部分
    gradOutsideVecs[outsideWordIdx] = centerWordVec * (z - 1)  # J关于u_o的偏导数计算

    for i in range(K):
        neg_id = indices[1 + i]
        u_k = outsideVectors[neg_id]
        z = sigmoid(-np.dot(u_k, centerWordVec))
        loss -= np.log(z)
        gradCenterVec += u_k * (1-z)
        gradOutsideVecs[neg_id] += centerWordVec * (1 - z)


    return loss, gradCenterVec, gradOutsideVecs

### AirBnB Listing Embedding

Real-time Personalization using Embeddings for Search Ranking at Airbnb

https://zhuanlan.zhihu.com/p/133566801

https://zhuanlan.zhihu.com/p/43295545

### YouTubeDNN