# RNN 基于词和矩阵

[Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano – WildML](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/)

In [1]:
import csv
import itertools
import operator
import numpy as np
import nltk
import sys
from datetime import datetime
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
!head data/re*

body
"I joined a new league this year and they have different scoring rules than I'm used to. It's a slight PPR league- .2 PPR. Standard besides 1 points for 15 yards receiving, .2 points per completion, 6 points per TD thrown, and some bonuses for rec/rush/pass yardage. My question is, is it wildly clear that QB has the highest potential for points? I put in the rules at a ranking site and noticed that top QBs had 300 points more than the top RB/WR. Would it be dumb not to grab a QB in the first round?"
"In your scenario, a person could just not run the mandatory background check on the buyer and still sell the gun to the felon. There's no way to enforce it. An honest seller is going to not sell the gun to them when they see they're a felon on the background check. A dishonest seller isn't going to run the check in the first place. No one is going to be honest enough to run the check, see they're a felon, and then all of a sudden immediately turn dishonest and say ""nah, you know wh

## 分段解析

### 数据预处理

In [3]:
vocabulary_size = 8000 # 词表大小
unknown_token = "UNKNOWN_TOKEN" # 不在词表中的词
sentence_start_token = "SENTENCE_START" # 序列（句子）开始标记
sentence_end_token = "SENTENCE_END" # 结束标记

# 读取数据并添加开始和结束标记
print "Reading CSV file..."
with open('data/reddit-comments-2015-08.csv', 'rb') as f:
    reader = csv.reader(f, skipinitialspace=True)
    reader.next()
    # Split full comments into sentences
    sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
    sentences = ["%s %s %s" % (sentence_start_token, x, sentence_end_token) for x in sentences]
print "Parsed %d sentences." % (len(sentences))

Reading CSV file...
Parsed 79170 sentences.


In [4]:
# Tokenize the sentencers into word
# 把句子切成一个一个词
tokenized_sentences = [nltk.word_tokenize(sent) for sent in sentences]

In [5]:
# Count the word frequencies
word_freq = nltk.FreqDist(itertools.chain(*tokenized_sentences))

In [6]:
# Get the most common words and build index to word and word to index vectors
vocab = word_freq.most_common(vocabulary_size - 1)
index_to_word = [x[0] for x in vocab]
index_to_word.append(unknown_token)
word_to_index = dict([(w,i) for i,w in enumerate(index_to_word)])

In [7]:
print "Using vocabulary size %d." % vocabulary_size
print "The least frequent word in our vocabulary is '%s' and appeared %d times." % (vocab[-1][0], vocab[-1][1])

Using vocabulary size 8000.
The least frequent word in our vocabulary is 'devoted' and appeared 10 times.


In [8]:
# Replace all words not in vocabulary with the unknown token
for i, sent in enumerate(tokenized_sentences):
    tokenized_sentences[i] = [w if w in word_to_index else unknown_token for w in sent]

In [9]:
print "\nExample sentence: '%s'" % sentences[0]
print "\nExample sentence after Pre-processing: '%s'" % tokenized_sentences[0]


Example sentence: 'SENTENCE_START i joined a new league this year and they have different scoring rules than i'm used to. SENTENCE_END'

Example sentence after Pre-processing: '[u'SENTENCE_START', u'i', u'joined', u'a', u'new', u'league', u'this', u'year', u'and', u'they', u'have', u'different', u'scoring', u'rules', u'than', u'i', u"'m", u'used', u'to', u'.', u'SENTENCE_END']'


In [10]:
# Create the training data
x_train = np.asarray([[word_to_index[w] for w in sent[:-1]] for sent in tokenized_sentences])
y_train = np.asarray([[word_to_index[w] for w in sent[1:]] for sent in tokenized_sentences])

In [11]:
x_train.shape

(79170,)

In [12]:
y_train.shape

(79170,)

### 初始化

In [13]:
class RNNNumpy:
    
    def __init__(self, word_dim, hidden_dim=100, bptt_truncate=4):
        # Assign instance variables
        self.word_dim = word_dim # embedding，词表大小
        self.hidden_dim = hidden_dim # 隐层神经元数量
        self.bptt_truncate = bptt_truncate
        # Random initialize the network parameters
        # 输入层到隐层的 W 矩阵
        self.U = np.random.uniform(-np.sqrt(1./word_dim), np.sqrt(1./word_dim), (hidden_dim, word_dim))
        self.V = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (word_dim, hidden_dim))
        self.W = np.random.uniform(-np.sqrt(1./hidden_dim), np.sqrt(1./hidden_dim), (hidden_dim, hidden_dim))

### 前向传播

In [22]:
def forward_propagation(self, x):
    # The total number of time steps
    T = len(x)
    # During forward propagation we save all hidden states in s because need them later
    # We add one additional element ofr the initial hidden, which we save them for zero
    # 把所有隐层状态存储起来，第一列和输入矩阵分别决定下一个输出，以此类推
    # 每个时间点输入一个词或字（word or character），也就是一个 One-hot 向量
    s = np.zeros((T+1, self.hidden_dim))
    # 第一个词之前 “词” 的输出（其实没有）
    s[-1] = np.zeros(self.hidden_dim)
    # 最后的输出
    o = np.zeros((T, self.word_dim))
    # For each time step...，也就是训练序列中的每个词（字）
    for t in np.arange(T):
        # Note that we are indexing U by x[t].
        # This is the same as multiplying U with One-Hot vector
        # U × X[t] 其实就是 U 的第 X[t] 列啦，
        # 因为 X[t] 是 (word_dim * 1), U * X[t] 就是：(hidden_dim * 1)，也就是 U 的第 X[t] 列
        s[t] = np.tanh(self.U[:,x[t]] + self.W.dot(s[t-1])) # 前一半就等于：U × X[t]，后一半是上一隐层的【输出】
        #print s[t].shape
        # o[t] 大小为 word_dim * 1，所以 o 的大小为：序列长度 × word_dim（词表大小）
        o[t] = softmax(self.V.dot(s[t]))
    return [o, s]

In [23]:
RNNNumpy.forward_propagation = forward_propagation

In [16]:
def softmax(x):
    return np.exp(x)/np.sum(np.exp(x),axis=0)

In [17]:
def predict(self, x):
    # Perform forward propagation and return index of the highest score
    # 前向传播预测，选择概率最大的那个（每个词都对应一个 词表大小的概率表）
    o, s = self.forward_propagation(x)
    # 返回最大值的 位置（ID）
    # 返回的矩阵大小为：序列长度 × word_dim（词表大小），每一行代表一个词，返回每一行概率最大的位置，可以对应一个词
    # axis=1 表示按列，0 表示按行
    return np.argmax(o, axis=1)

In [18]:
RNNNumpy.predict = predict

In [48]:
tt = model.forward_propagation([x_train[10][1]])

In [50]:
len(tt)

2

In [52]:
tt[0].shape

(1, 8000)

In [55]:
tt[-1].shape

(2, 100)

In [24]:
np.random.seed(10)
model = RNNNumpy(vocabulary_size)
o, s = model.forward_propagation(x_train[10])
print o.shape
print o

(45, 8000)
[[ 0.00012408  0.0001244   0.00012603 ...,  0.00012515  0.00012488
   0.00012508]
 [ 0.00012536  0.00012582  0.00012436 ...,  0.00012482  0.00012456
   0.00012451]
 [ 0.00012387  0.0001252   0.00012474 ...,  0.00012559  0.00012588
   0.00012551]
 ..., 
 [ 0.00012414  0.00012455  0.0001252  ...,  0.00012487  0.00012494
   0.0001263 ]
 [ 0.0001252   0.00012393  0.00012509 ...,  0.00012407  0.00012578
   0.00012502]
 [ 0.00012472  0.0001253   0.00012487 ...,  0.00012463  0.00012536
   0.00012665]]


In [25]:
s.shape

(46, 100)

In [26]:
s

array([[ 0.00606684,  0.00944145,  0.00682396, ..., -0.00758421,
         0.0014529 , -0.00903755],
       [-0.00343015, -0.00192321, -0.00149617, ...,  0.00043284,
         0.00885342,  0.00129919],
       [ 0.01200128, -0.01433878, -0.00834751, ..., -0.00990714,
         0.00733898,  0.00026428],
       ..., 
       [-0.00194572,  0.01046511,  0.0075063 , ..., -0.00564678,
         0.00409187,  0.00470521],
       [ 0.00294306, -0.00733632,  0.00389336, ...,  0.00577147,
        -0.01456043, -0.00221772],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

In [27]:
# 返回 45 行中最大概率的位置，对应 45 个词
predictions = model.predict(x_train[10])
print predictions.shape
print predictions
print " ".join([index_to_word[i] for i in predictions])

(45,)
[1284 5221 7653 7430 1013 3562 7366 4860 2212 6601 7299 4556 2481  238 2539
   21 6548  261 1780 2005 1810 5376 4146  477 7051 4832 4991  897 3485   21
 7291 2007 6006  760 4864 2182 6569 2800 2752 6821 4437 7021 7875 6912 3575]
students shortly museum ruining background hunt madden wr chicken immoral hadith lighter rude questions achieve but sells making fill arguing purchase grows feat head lube winners downside states steal but researchers christian utilize fire domain resolution 10-15 genuinely magical worship в branches memes node preferred


### 损失函数

作用：评估误差  
使用 cross-entropy loss 交叉熵
$$L(y,o) = - \frac {1} {N} \sum_{n \in N} y_nlogo_n$$  
可以很容易看出，$y$ 和 $o$ 相差越远，loss 越大

In [28]:
def calculate_total_loss(self, x, y):
    L = 0
    # For each sentence...
    # 每个序列，也就是每个训练样本
    for i in np.arange(len(y)):
        o, s = self.forward_propagation(x[i])
        # We only care about our prediciton of the "correct" words
        # 返回 y[i] 位置的概率，一共 len(序列) 个，就是输出 o 的每一行（len(y[i]) 行）的 y[i]（ID） 列
        # 每一行都是一个词，而 y[i] 对应的 ID 位置的概率应该是（我们的要求）最大的（One-Hot 时这里是 ID，其他地方是 0）
        # 所以返回来的就是 输入序列中每个词对应 y（下个词）的概率值，共有 序列长度 个值，大小为：（序列长度,1）
        correct_word_predictions = o[np.arange(len(y[i])), y[i]]
        # Add to the loss based on how off we were
        # y 肯定是 1 嘛，也就是 o 要靠近的目标
        # 一个序列里的加起来
        L += -1 * np.sum(np.log(correct_word_predictions))
    return L

In [29]:
def calculate_loss(self, x, y):
    # Divide the total loss by the number of training example
    # 每个 y 序列的长度
    # N 表示训练样本数
    N = np.sum((len(y_i) for y_i in y))
    return self.calculate_total_loss(x,y)/N

In [30]:
RNNNumpy.calculate_total_loss = calculate_total_loss
RNNNumpy.calculate_loss = calculate_loss

In [31]:
# Limit to 1000 examples to save time
# 每个词的概率应该是 1/词表大小 C，所以 Loss 应该等于 -1/N*N*log(1/C)  = logC
print "Expected Loss for random predictions: %f" % np.log(vocabulary_size)
print "Actual loss: %f" % model.calculate_loss(x_train[:1000], y_train[:1000])

Expected Loss for random predictions: 8.987197
Actual loss: 8.987440


### 随机梯度反向传播

- SGD: Stochastic Gradient Descent
- BPTT: BackPropagation Through Time

In [32]:
def bptt(self, x, y):
    # 序列长度，每个训练集的循环次数
    T = len(y)
    # 前向传播
    o, s = self.forward_propagation(x)
    # 计算梯度变量
    # 更新后的权重矩阵
    dLdU = np.zeros(self.U.shape)
    dLdV = np.zeros(self.V.shape)
    dLdW = np.zeros(self.W.shape)
    delta_o = o
    # 与计算 loss 时的 correct_word_predictions = o[np.arange(len(y[i])), y[i]] 是一样的
    # 这里的意思是，把每个 y（下个词，标签）的概率 -1
    # 因为 softmax 的导数就是这样……
    delta_o[np.arange(len(y)), y] -= 1.
    # for each output backwards
    for t in np.arange(T)[::-1]:
        dLdV += np.outer(delta_o[t], s[t].T) 
        # Initial delta calculation
        # 就是 RNN-add-iamtrask 中的 delta，用 误差 × 导数
        # 这里没有考虑隐层下一时间点的输出
        delta_t = self.V.T.dot(delta_o[t]) * (1 - (s[t] ** 2)) 
        # Backpropagation through time ( for at most self.bptt_truncate steps)
        # self.bptt_truncate 截断，从截断开始到最后时刻的下一时刻
        # 主要是处理变长数据，如果是等长可以不需要这一步
        for bptt_step in np.arange(max(0, t-self.bptt_truncate), t+1)[::-1]:
            # print "Backpropagation step t=%d bptt step=%d " % (t, bptt_step)
            dLdW += np.outer(delta_t, s[bptt_step-1])
            dLdU[:, x[bptt_step]] += delta_t
            # Update delta for next step
            # 下一时点的输出
            delta_t = self.W.T.dot(delta_t) * (1 - s[bptt_step-1] ** 2)
    return [dLdU, dLdV, dLdW]

RNNNumpy.bptt = bptt

In [33]:
o.shape

(45, 8000)

In [34]:
o[np.arange(len(y_train[10])), y_train[10]]

array([ 0.00012524,  0.00012446,  0.00012589,  0.00012655,  0.00012448,
        0.00012524,  0.0001237 ,  0.00012588,  0.00012445,  0.00012516,
        0.00012504,  0.00012446,  0.00012516,  0.00012481,  0.00012476,
        0.00012493,  0.00012449,  0.00012591,  0.00012496,  0.00012457,
        0.00012577,  0.00012439,  0.00012533,  0.00012561,  0.00012543,
        0.00012421,  0.00012472,  0.00012406,  0.00012533,  0.00012504,
        0.00012509,  0.00012479,  0.00012569,  0.00012517,  0.00012556,
        0.00012576,  0.00012605,  0.00012541,  0.00012539,  0.00012501,
        0.00012434,  0.00012582,  0.0001252 ,  0.00012432,  0.0001253 ])

In [46]:
np.outer(np.array([1,2,3]), np.array([4,5,6]))

array([[ 4,  5,  6],
       [ 8, 10, 12],
       [12, 15, 18]])

### Gradient Checking

In [35]:
def gradient_check(self, x, y, h=0.001, error_threshold=0.01):
    # Calculate the gradients using backpropagation. We want to checker if these are correct
    bptt_gradients = self.bptt(x, y)
    # List of all parameters we want to check
    model_parameters = ['U', 'V', 'W']
    # Gradient check for each parameter
    for pidx, pname in enumerate(model_parameters):
        # Get the actual parameter value from the mode, e.g. model.W
        # Return a callable object that fetches the given attribute(s) from its operand.
        # After f = attrgetter('name'), the call f(r) returns r.name.
        # After g = attrgetter('name', 'date'), the call g(r) returns (r.name, r.date).
        # After h = attrgetter('name.first', 'name.last'), the call h(r) returns
        # (r.name.first, r.name.last).
        # operator.attrgetter(pname)(self) 返回的是 self.pname，如 self.U，其实就是对应的 U 矩阵
        parameter = operator.attrgetter(pname)(self)
        # 测试用，原代码没有
        test = operator.attrgetter(pname)
        # np.prod 元素乘积，如果有 axis 则按行/列 乘
        print "Performing gradient check for parameter %s with size %d." % (pname, np.prod(parameter.shape))
        # Iterate over each element of the parameter matrix, e.g.(0, 0), (0, 1)...，
        # 一对对的是矩阵中的位置，多少个元素就多少个位置，readwrite 表示遍历过程中会更改元素值
        # parameter 返回的是系数矩阵，
        # [Iterating Over Arrays — NumPy v1.13 Manual](https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html)
        # [nditer —— numpy.ndarray 多维数组的迭代 - 计算机科学与艺术 - 博客频道 - CSDN.NET](http://blog.csdn.net/lanchunhui/article/details/55657135)
        it = np.nditer(parameter, flags=['multi_index'], op_flags=['readwrite'])
        while not it.finished:
            ix = it.multi_index
            # Save the original value so we can reset it later
            original_value = parameter[ix] # 就是系数矩阵中的某个位置，一个数字，ix 就是位置，如（0,0）
            # Estimate the gradient using (f(x+h) - f(x-h))/(2*h)
            parameter[ix] = original_value + h
            gradplus = self.calculate_total_loss([x], [y])
            parameter[ix] = original_value - h
            gradminus = self.calculate_total_loss([x], [y])
            estimated_gradient = (gradplus - gradminus)/(2*h)
            # Reset parameter to original value
            parameter[ix] = original_value
            # The gradient for this parameter calculated using backpropagation
            # 系数矩阵，就是 L 对 $\theta$ 的导数
            backprop_gradient = bptt_gradients[pidx][ix]
            # calculate The relative error: (|x - y|/(|x| + |y|))
            relative_error = np.abs(backprop_gradient - estimated_gradient)\
            /(np.abs(backprop_gradient) + np.abs(estimated_gradient))
            # If the error is to large fail the gradient check
            if relative_error > error_threshold:
                print "Gradient Check ERROR: parameter=%s ix=%s" % (pname, ix)
                print "+h Loss: %f" % gradplus
                print "-h Loss: %f" % gradminus
                print "Estimated_gradient: %f" % estimated_gradient
                print "Backpropagation gradient: %f" % backprop_gradient
                print "Relative Error: %f" % relative_error
                return
            it.iternext()
        print "Gradient check for parameter %s passed." % (pname)
    return bptt_gradients, parameter, it, test, self, original_value

RNNNumpy.gradient_check = gradient_check

In [36]:
# smaller vocabulary size for checking
grad_check_vocab_size = 100 # word dim
np.random.seed(10)
model = RNNNumpy(grad_check_vocab_size, 10, bptt_truncate=1000) # 10 hidden dim
bptt_gradients, parameter, it, test, test_self, test_value = model.gradient_check([0,1,2,3],[1,2,3,4])

Performing gradient check for parameter U with size 1000.




Gradient check for parameter U passed.
Performing gradient check for parameter V with size 1000.
Gradient check for parameter V passed.
Performing gradient check for parameter W with size 100.
Gradient check for parameter W passed.


In [194]:
test_value

0.27474631554415679

#### `np.nditer`

In [139]:
np.nditer?

In [138]:
it

<numpy.nditer at 0x7f03dd7a5c10>

In [141]:
a

array([[0, 1, 2],
       [3, 4, 5]])

In [143]:
a.T

array([[0, 3],
       [1, 4],
       [2, 5]])

In [140]:
a = np.arange(6).reshape(2,3)
for x in np.nditer(a):
    print x,

0 1 2 3 4 5


In [142]:
a = np.arange(6).reshape(2,3)
for x in np.nditer(a.T):
    print x,

0 1 2 3 4 5


默认是行序优先（row-major order，或者说是 C-order），这样迭代遍历的目的在于，实现和内存分布格局的一致性，以提升访问的便捷性；

In [147]:
for x in np.nditer(a.T.copy(order='C')):
    print x,

0 3 1 4 2 5


##### 控制遍历顺序

`for x in np.nditer(a, order='F')`: Fortran order，也即是列序优先；  
`for x in np.nditer(a.T, order='C')`: C order，也即是行序优先；

In [149]:
for x in np.nditer(a, order='F'):
    print x,

0 3 1 4 2 5


In [151]:
for x in np.nditer(a.T, order='C'):
    print x,

0 3 1 4 2 5


##### 修改数组中元素的值

默认情况下，nditer将视待迭代遍历的数组为只读对象（read-only），为了在遍历数组的同时，实现对数组元素值得修改，必须指定 read-write 或者 write-only的模式。

In [154]:
a

array([[0, 1, 2],
       [3, 4, 5]])

In [160]:
for x in np.nditer(a, op_flags=['readwrite']):
    x[...] = 2 * x

In [161]:
a

array([[ 0,  4,  8],
       [12, 16, 20]])

In [162]:
a[...] # 所有元素……

array([[ 0,  4,  8],
       [12, 16, 20]])

##### 使用外部循环

将一维的最内层的循环转移到外部循环迭代器，使得 numpy 的矢量化操作在处理更大规模数据时变得更有效率。

In [164]:
a = np.arange(6).reshape(2,3)
a

array([[0, 1, 2],
       [3, 4, 5]])

In [168]:
# 注意，变为 list 了
for x in np.nditer(a, flags=['external_loop']):
    print x,

[0 1 2 3 4 5]


In [170]:
for x in np.nditer(a, flags=['external_loop'], order='F'):
    print x,

[0 3] [1 4] [2 5]


In [177]:
np.nditer?

In [174]:
# 等价于 order = F
for x in np.nditer(a, flags=['multi_index']):
    print x,

0 1 2 3 4 5


In [175]:
# 源代码
for x in np.nditer(a, flags=['multi_index'], op_flags=['readwrite']):
    print x,

0 1 2 3 4 5


##### 追踪单个索引或多重索引（Multi-index）

In [178]:
a = np.arange(6).reshape(2,3)
a

array([[0, 1, 2],
       [3, 4, 5]])

In [181]:
it_test = np.nditer(a, flags=['f_index'])

In [185]:
while not it_test.finished:
    print "%d <%d>" % (it_test[0], it_test.index),
    it_test.iternext()

0 <0> 1 <2> 2 <4> 3 <1> 4 <3> 5 <5>


In [188]:
# 代码中的源代码
it_raw = np.nditer(a, flags=['multi_index'])
while not it_raw.finished:
    print "%d <%s>" % (it_raw[0], it_raw.multi_index),
    it_raw.iternext()

0 <(0, 0)> 1 <(0, 1)> 2 <(0, 2)> 3 <(1, 0)> 4 <(1, 1)> 5 <(1, 2)>


#### `operator.attrgetter`

In [132]:
test_self.U.shape, test_self.V.shape, test_self.W.shape

((10, 100), (100, 10), (10, 10))

In [118]:
parameter.shape # 返回的其实是 W

(10, 10)

In [110]:
len(bptt_gradients)

3

In [116]:
bptt_gradients[0].shape, bptt_gradients[1].shape, bptt_gradients[2].shape

((10, 100), (100, 10), (10, 10))

In [101]:
model_parameters = ['U', 'V', 'W']
# Gradient check for each parameter
for pidx, pname in enumerate(model_parameters):
    print pidx, pname

0 U
1 V
2 W


### SGD Implementation

two steps:   
- A function sdg_step that calculates the gradients and performs the updates for one batch. 
- An outer loop that iterates through the training set and adjusts the learning rate.

In [37]:
# Performs one step of SGD
def numpy_sgd_step(self, x, y, learning_rate):
    # Calculate the gradients
    dLdU, dLdV, dLdW = self.bptt(x, y)
    # Change parameters according to gradients and learning rate
    self.U -= learning_rate * dLdU
    self.V -= learning_rate * dLdV
    self.W -= learning_rate * dLdW

RNNNumpy.sgd_step = numpy_sgd_step

In [39]:
# Outer SGD Loop
# - model: The RNN model instance
# - X_train: The training data set
# - y_train: The training data labels
# - learning_rate: Initial learning rate for SGD
# - nepoch: Number of times to iterate through the complete dataset
# - evaluate_loss_after: Evaluate the loss after this many epochs
def train_with_sgd(model, X_train, y_train, learning_rate=0.005, nepoch=100, evaluate_loss_after=5):
    # 保存一下 loss
    losses = []
    num_examples_seen = 0
    for epoch in range(nepoch):
        # Optionally evaluate the loss
        if (epoch % evaluate_loss_after == 0):
            loss = model.calculate_loss(X_train, y_train)
            losses.append((num_examples_seen, loss))
            time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            print("%s: Loss after num_examples_seen=%d epoch=%d: %f" % (time, num_examples_seen, epoch, loss))
            # Adjust the learning rate if loss increase
            # 如果有了 1 个以上的 loss，并且下一个比上一个要大，就调整 learning rate
            if (len(losses) > 1 and losses[-1][1] > losses[-2][1]):
                learning_rate = learning_rate * 0.5
                print("Setting learning rate to %f" % learning_rate)
            sys.stdout.flush()
        # for each training examples    
        for i in range(len(y_train)):
            # One SGD step
            model.sgd_step(X_train[i], y_train[i], learning_rate)
            num_examples_seen += 1

np.random.seed(10)
model = RNNNumpy(vocabulary_size)
%timeit model.sgd_step(x_train[10], y_train[10], 0.005)

1 loop, best of 3: 484 ms per loop


In [177]:
np.random.seed(10)
# 小数据集测试
model = RNNNumpy(vocabulary_size)
losses = train_with_sgd(model, x_train[:100], y_train[:100], nepoch=10, evaluate_loss_after=1)

2017-07-07 10:56:57: Loss after num_examples_seen=0 epoch=0: 8.987425
2017-07-07 10:57:35: Loss after num_examples_seen=100 epoch=1: 8.976270
2017-07-07 10:58:03: Loss after num_examples_seen=200 epoch=2: 8.960212
2017-07-07 10:58:34: Loss after num_examples_seen=300 epoch=3: 8.930430
2017-07-07 10:59:09: Loss after num_examples_seen=400 epoch=4: 8.862264
2017-07-07 10:59:39: Loss after num_examples_seen=500 epoch=5: 6.913570
2017-07-07 11:00:10: Loss after num_examples_seen=600 epoch=6: 6.302493
2017-07-07 11:00:42: Loss after num_examples_seen=700 epoch=7: 6.014995
2017-07-07 11:01:14: Loss after num_examples_seen=800 epoch=8: 5.833877
2017-07-07 11:01:47: Loss after num_examples_seen=900 epoch=9: 5.710718


### Generating Text

In [160]:
x_length = [len(x) for x in x_train]

In [214]:
def generate_sentence(model):
    # Start with start token
    new_sentence = [word_to_index[sentence_start_token]]
    # Repeat until end token
    while new_sentence[-1] != word_to_index[sentence_end_token]:
        #print new_sentence
        next_word_probs, _ = model.forward_propagation(new_sentence)
        #sampled_word = word_to_index[unknown_token]
        # next_word_probs 的大小会一直增加，所以，我们每次取最后一个 ID 即可
        # 训练次数少时，直接用下面的随机生成
        #sampled_word = np.argmax(next_word_probs[-1])
        sampled_word = word_to_index[unknown_token]
        #print sampled_word
        # don't want unknown words
        while sampled_word == word_to_index[unknown_token]:
            # 随机找一个……
            samples = np.random.multinomial(1, next_word_probs[-1])
            sampled_word = np.argmax(samples)
        new_sentence.append(sampled_word)
        #print new_sentence
    sentence_str = [index_to_word[x] for x in new_sentence[1:-1]]
    return sentence_str

In [215]:
num_sentences = 10
senten_min_length = 7

for i in range(num_sentences):
    sent = []
    # want long sentences
    while len(sent) < senten_min_length:
        sent = generate_sentence(model)
    print " ".join(sent)

having simply is roughly wait players ) wiring ratio and shoulders .
be gorgeous n't his rebel do the for having
during new the only the to which used complex ignorance they son .
penance gave who the . some right excel una your experiment instagram our SENTENCE_START looking in & 's but picking above not slightly to why it '' .
learn which 3 and events would think on get documentation made the or the the to the into to it some was the maker , .
decision few had god very the possibility all 20th i .
jerk felon the the , the have no people complaints to going a 's in the the
wo rugby 'm meets abilities to drinking per used make going of having that ! decide merits future tourist ) home hh be jews and .
tokens downvoting here the want having this the prison . other the anyone on with run heard answered . of for than frustration meant the which so-called people the be a ,
shaped colours . the or had the original do friends them to do players the back had the have and coefficient a the n't

#### `np.random.multinomial`

In [67]:
np.random.multinomial(100, [1.0 / 3, 2.0 / 3])  # RIGHT

array([35, 65])

In [86]:
np.argmax(np.random.multinomial(100, [1.0 / 3, 2.0 / 3]))

1

In [105]:
np.random.multinomial(1, next_word_probs[0][0])

array([0, 0, 0, ..., 0, 0, 0])

In [100]:
np.random.multinomial(1, [1.0 / 3, 2.0 / 3])

array([1, 0])

In [103]:
np.argmax(next_word_probs[0][0])

6

In [107]:
np.argmax(np.random.multinomial(1, next_word_probs[0][0]))

6372

In [70]:
sum(next_word_probs[0][0])

0.99999999999999767

In [37]:
np.random.multinomial(1, [1/6.]*6)

array([0, 1, 0, 0, 0, 0])

In [38]:
[1/6.]*6

[0.16666666666666666,
 0.16666666666666666,
 0.16666666666666666,
 0.16666666666666666,
 0.16666666666666666,
 0.16666666666666666]

### 重新训练生成

In [221]:
np.random.seed(10)
# 小数据集测试
model = RNNNumpy(vocabulary_size)
losses = train_with_sgd(model, x_train[:10000], y_train[:10000], nepoch=10, evaluate_loss_after=5)

2017-07-07 11:31:55: Loss after num_examples_seen=0 epoch=0: 8.987458
2017-07-07 13:36:31: Loss after num_examples_seen=50000 epoch=5: 5.167204


In [230]:
def generate_sentence(model):
    # Start with start token
    new_sentence = [word_to_index[sentence_start_token]]
    # Repeat until end token
    while new_sentence[-1] != word_to_index[sentence_end_token]:
        #print new_sentence
        next_word_probs, _ = model.forward_propagation(new_sentence)
        #sampled_word = word_to_index[unknown_token]
        # next_word_probs 的大小会一直增加，所以，我们每次取最后一个 ID 即可
        # 训练次数少时，直接用下面的随机生成
        #sampled_word = np.argmax(next_word_probs[-1])
        sampled_word = word_to_index[unknown_token]
        #print sampled_word
        # don't want unknown words
        while sampled_word == word_to_index[unknown_token]:
            # 随机找一个……
            samples = np.random.multinomial(1, next_word_probs[-1])
            sampled_word = np.argmax(samples)
        new_sentence.append(sampled_word)
        #print new_sentence
    sentence_str = [index_to_word[x] for x in new_sentence[1:-1]]
    return sentence_str

In [231]:
num_sentences = 10
senten_min_length = 7

for i in range(num_sentences):
    sent = []
    # want long sentences
    while len(sent) < senten_min_length:
        sent = generate_sentence(model)
    print " ".join(sent)

that was do the flow advantage walks was .
i can have it on that short sucks of it 's not .
this # was in this post and natural would of the warrior whether dishes with restart .
alternatively her but a grown evga the guy and may and while i would so woke .
click cant i would easily i guess , i could issue ... users a guy if you are able to get system .
i 'm buy i could like the winter uses of the way to spend it shade and far was biased to learn a good poster zone .
and 're without a bot . '' .
bella legal and want and does it did n't get free .
applications minutes are says , interesting to get the amount .
good is november if just countries ] .


In [None]:
len(new_sentence)<20 and 