# 一个从机器翻译而来的伟大模型Sequence-to-Sequence

英文到中文的翻译

- I love you
- 我 爱 你

- Hi
- 你 好

Sequence to Sequence 就是 从一个不定长的序列转换到另一个不定长的序列

- 问答：问题序列到答案序列的转换
- 闲聊：闲聊之间的上下句的序列转换
- 问题生成：从答案到问题
- 从人类语言到机器语言的翻译：从句子到SQL数据库查询语言的翻译
  - 北京有多少常住人口，select population from data where city = 'beijing'
- 从机器语言到人类语言的翻译，NLG
- 股价预测：输入序列是大前天、前天、昨天、今天的股价，输出序列是明天、后天、大后天的股价

模型可能能做到什么 不等于 模型能做到什么

人识别一个事情，需要5分钟，准确率是90%

机器识别一个同样的事情，需要1秒钟，准确率是70%

机器识别一个事情1秒钟，人判断机器是否做错了需要1分钟

0.7 * （1s + 人判断的时间1m） + 0.3 （1s + 人修整的时间5m）  < 5m

s2s 分为 encoder 和 decoder

encoder是理解输入序列的部分

decoder是根据这个理解来的临时记忆，生成目标序列的部分

encoder('I love you') = 临时记忆

decoder(临时记忆，开始符号/上一轮输出的结果) = 当前结果

- decoder的输入：开始符号 我 爱 你
- decoder的输出：我 爱 你 结束符号

In [3]:
# 输入一个1到9的奇数的序列，如果输入的是1 3 7 9，那么输出对应的2 4 8 10，输入5，输出 6 6

- Input: 1 7 7
 - Output: 2 8 8
- Input: 5 7
 - Output 6 6 8
- Input 1
 - Output 2
- Input 1 1 1 1
 - Output 2 2 2 2

特殊字符：

- 0用来填充
- 11 代表decoder的开始符号
- 12 代表decoder的结束符号

0 1-10 11 12

词表 = 13

In [4]:
import numpy as np
import tensorflow as tf

In [9]:
encoder_input = tf.keras.layers.Input((None,), dtype=tf.int32)
x = encoder_input
x = tf.keras.layers.Embedding(13, 32, mask_zero=True)(x)
x = tf.keras.layers.LSTM(32, return_state=True)(x)
_, encoder_hidden, encoder_carry = x
encoder = tf.keras.Model(
    inputs=encoder_input,
    outputs=[encoder_hidden, encoder_carry]
)

In [13]:
# encoder(tf.constant([
#     [1, 2],
#     [1, 3],
#     [1, 4]
# ]))

In [14]:
decoder_input = tf.keras.layers.Input((None,), dtype=tf.int32)
decoder_hidden_input = tf.keras.layers.Input((32), dtype=tf.float32)
decoder_carry_input = tf.keras.layers.Input((32), dtype=tf.float32)

x = decoder_input
x = tf.keras.layers.Embedding(13, 32, mask_zero=True)(x)
x = tf.keras.layers.LSTM(32, return_sequences=True)(
    x,
    initial_state=[decoder_hidden_input, decoder_carry_input]
)
x = tf.keras.layers.Dense(13)(x)
x = tf.keras.layers.Activation('softmax')(x)

decoder = tf.keras.Model(
    inputs=[
        decoder_input,
        decoder_hidden_input,
        decoder_carry_input
    ],
    outputs=x
)

In [15]:
model_encoder_input = tf.keras.layers.Input((None,), dtype=tf.int32)
model_decoder_input = tf.keras.layers.Input((None,), dtype=tf.int32)

h, c = encoder(model_encoder_input)
outputs = decoder([model_decoder_input, h, c])

model = tf.keras.Model(
    inputs=[model_encoder_input, model_decoder_input],
    outputs=outputs
)

In [16]:
x0 = tf.constant([
    [1, 3, 5]
])
x1 = tf.constant([
    [11, 2, 4, 6, 6]
])
y = tf.constant([
    [2, 4, 6, 6, 12]
])

In [18]:
model([x0, x1])

<tf.Tensor: shape=(1, 5, 13), dtype=float32, numpy=
array([[[0.07671642, 0.07746661, 0.07570892, 0.07719516, 0.07682551,
         0.07719752, 0.07587711, 0.07541024, 0.07751805, 0.07759541,
         0.07775977, 0.07750116, 0.07722816],
        [0.07662804, 0.07743538, 0.07606794, 0.07681329, 0.0768034 ,
         0.07688133, 0.07663078, 0.07598786, 0.07685236, 0.07759423,
         0.07814002, 0.07705045, 0.07711499],
        [0.07674754, 0.07716187, 0.0762522 , 0.07715333, 0.07726897,
         0.07707006, 0.07612883, 0.07673416, 0.07721787, 0.07681563,
         0.07747332, 0.07720092, 0.07677533],
        [0.07741703, 0.07771051, 0.07592827, 0.07690234, 0.0773251 ,
         0.07696477, 0.07688358, 0.07663378, 0.07680194, 0.07655036,
         0.07755263, 0.07681946, 0.07651027],
        [0.07794312, 0.07806742, 0.07572433, 0.07679151, 0.07729454,
         0.07690705, 0.07735759, 0.07660859, 0.07661653, 0.0763388 ,
         0.07746013, 0.0765974 , 0.07629302]]], dtype=float32)>

In [19]:
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    optimizer=tf.keras.optimizers.Adam()
)

In [20]:
model.train_on_batch([x0, x1], y)

2.572266101837158

In [27]:
model.train_on_batch([x0, x1], y)

2.529433488845825

In [42]:
def make_fake_data(batch_size=32):
    x0, x1, y = [], [], []
    for _ in range(batch_size):
        length = np.random.randint(1, 4)
        ix0 = [np.random.choice([1, 3, 5, 7, 9]) for _ in range(length)]
        outs = []
        for x in ix0:
            if x == 5:
                outs.append(6)
                outs.append(6)
            else:
                outs.append(x + 1)
        ix1 = [11] + outs
        iy = outs + [12]
        x0.append(ix0)
        x1.append(ix1)
        y.append(iy)
    x0 = tf.ragged.constant(x0).to_tensor()
    x1 = tf.ragged.constant(x1).to_tensor()
    y = tf.ragged.constant(y).to_tensor()
    return x0, x1, y

In [43]:
x0, x1, y = make_fake_data(4)

In [48]:
for i in range(1000):
    x0, x1, y = make_fake_data(32)
    loss = model.train_on_batch([x0, x1], y)
    if i % 10 == 0:
        print(i, loss)

0 1.7171523571014404
10 1.3783189058303833
20 1.4399694204330444
30 1.565427303314209
40 1.1630637645721436
50 1.164230465888977
60 1.043911099433899
70 0.9607015252113342
80 0.7621504068374634
90 0.8628935813903809
100 0.6776946187019348
110 0.8877109289169312
120 0.742041289806366
130 0.6445074081420898
140 0.6282411813735962
150 0.5600652694702148
160 0.5990970730781555
170 0.7273343801498413
180 0.48921307921409607
190 0.617664098739624
200 0.45429527759552
210 0.389764666557312
220 0.39568212628364563
230 0.3505368232727051
240 0.31553593277931213
250 0.34726014733314514
260 0.37494605779647827
270 0.3676977753639221
280 0.23363129794597626
290 0.2494061440229416
300 0.3404213786125183
310 0.259177565574646
320 0.27650898694992065
330 0.3305889070034027
340 0.23245744407176971
350 0.1866666078567505
360 0.17823581397533417
370 0.15722131729125977
380 0.13178998231887817
390 0.20035308599472046
400 0.14655131101608276
410 0.15461179614067078
420 0.1921398937702179
430 0.11749550700

In [49]:
x0 = tf.constant([
    [1, 3, 5]
])
x1 = tf.constant([
    [11]
])

In [52]:
model.predict([x0, x1]).argmax(-1)

array([[2]])

In [53]:
x0 = tf.constant([
    [1, 3, 5]
])
x1 = tf.constant([
    [11, 2]
])

In [54]:
model.predict([x0, x1]).argmax(-1)

array([[2, 4]])

In [55]:
x0 = tf.constant([
    [1, 3, 5]
])
x1 = tf.constant([
    [11, 2, 4]
])

In [56]:
model.predict([x0, x1]).argmax(-1)

array([[2, 4, 6]])

In [57]:
x0 = tf.constant([
    [1, 3, 5]
])
x1 = tf.constant([
    [11, 2, 4, 6]
])

In [58]:
model.predict([x0, x1]).argmax(-1)

array([[2, 4, 6, 6]])

In [59]:
x0 = tf.constant([
    [1, 3, 5]
])
x1 = tf.constant([
    [11, 2, 4, 6, 6]
])

In [60]:
model.predict([x0, x1]).argmax(-1)

array([[ 2,  4,  6,  6, 12]])

In [61]:
def predict(input_sequence=[1, 3, 5]):
    x0 = tf.constant([input_sequence])
    x1 = [11]
    while True:
        pred = model.predict([x0, tf.constant([x1])]).argmax(-1)
        pred = pred[-1][-1]
        print(pred)
        if pred == 12:
            break
        x1.append(pred)

In [68]:
predict([5, 7, 9])

6
6
8
10
12
