# Chapter 4: Acceralation of word2vec

The problem of word2vec which implemented previous chapter is that it would be slow if the courpus is huge. So, we need to improve word2vec.

## 4.1 word2vec improvement : Embedding class

forward method of Embedding class just extract the word vector which associated  with current id from the weight matrix.


In [5]:
import numpy as np

W = np.arange(21).reshape(7, 3)

print(W)
print(W[2])

idx = np.array([1, 0, 2])
print(W[idx])

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
 [15 16 17]
 [18 19 20]]
[6 7 8]
[[3 4 5]
 [0 1 2]
 [6 7 8]]


In [6]:
class Embedding:
    def __init__(self, W):
        self.params = [W]
        self.grads = [np.zeros_like(W)]
        self.idx = None

    def forward(self, idx):
        W, = self.params
        self.idx = idx
        out = W[idx]
        return out

    def backward(self, dout):
        dW, = self.grads
        dW[...] = 0
        # np.add.at(dW, self.idx, dout)
        # return None
        for i, word_id in enumerate(self.idx):
            dW[word_id] += dout[i]
        return None

## 4.2 word2vec improvement : Negative sampling

### 4.2.1 calculation problem after middle layer

When the corpus size is huge, calculation takes time in following process.

- the product of the neurons of the hidden layer and the weight matrix
- calculation of Softmax layer

For example, softmax is shown as following assuming that the corpus size is 100,000.

$$
y = \frac{\exp(x_i)}{\sum_j^{1000000} \exp(x_j)}
$$

### 4.2.2 Negative sampling

To reduce the calculation problem, we use negative sampling in which we randomly select negative samples from the corpus.

### 4.2.3 Sigmoid function and Cross entropy error

Sigmoid function and Cross entropy error are shown as following.

Sigmoid:
$$
y = \frac{1}{1 + e^{-x}}
$$

Cross entropy error:
$$
L = -(t \log y + (1 - t) \log (1 - y))
$$

![../images/4.2.3.png](./fig/4_2_3.drawio.svg)

### 4.2.4 From multiclass classification to binary classification

![img](./fig/4_2_4.drawio.svg)


In [7]:
# embedding dot class
class EmbeddingDot:
    def __init__(self, W):
        self.embed = Embedding(W)
        self.params = self.embed.params
        self.grads = self.embed.grads
        self.cache = None

    def forward(self, h, idx):
        target_W = self.embed.forward(idx)
        out = np.sum(target_W * h, axis=1)

        self.cache = (h, target_W)
        return out

    def backward(self, dout):
        h, target_W = self.cache
        dout = dout.reshape(dout.shape[0], 1)

        dtarget_W = dout * h
        self.embed.backward(dtarget_W)
        dh = dout * target_W
        return dh

In [14]:
import numpy as np
W = np.arange(21).reshape(7, 3)
h = np.arange(9).reshape(3, 3)
idx = np.array([0, 3, 1])

embed = Embedding(W)
target_W = embed.forward(idx)
out = np.sum(target_W * h, axis=1)

print(out.reshape(out.shape[0], 1))
print(target_W)
print(h)

[[  5]
 [122]
 [ 86]]
[[ 0  1  2]
 [ 9 10 11]
 [ 3  4  5]]
[[0 1 2]
 [3 4 5]
 [6 7 8]]


### 4.2.5 Negative sampling

### 4.2.6 Sampling method of Negative sampling


$$
P'(W_i) = \frac{P(W_i)^{0.75}}{\sum_j^{n} P(W_j)^{0.75}}
$$