In this series (includes three parts), I focus on text classification task based on Recurrent Neural Networks (RNNs). From part 1 to part 3, I will add more concepts to deal with more sophisticated scenarios gradually.

If you are not familiar with concepts of RNNs, the following list gives popular tutorials about RNNs:
- Christopher Olah's [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
- Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
- R2RT's [Recurrent Neural Networks in Tensorflow I II III](https://r2rt.com/recurrent-neural-networks-in-tensorflow-i.html)
- Danijar Hafner's [Introduction to Recurrent Networks in TensorFlow](https://danijar.com/introduction-to-recurrent-networks-in-tensorflow)
- Denny Britz's [Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)

The example data used in this series is from [Sogou](http://www.sogou.com/labs/resource/cs.php) corpus. I have downloaded the data, performed duplicates removing, Chinese word segmentation, stopwords removing etc. The example data used here includes six categories, i.e, `{1:auto, 2:business, 3: it, 4:health; 5:sports, 6:yule}`. Each category includes `training_x.cs` (contains `15,000` articles) and `testing_x.cs` (contains `3,000` articles). You can download the data from [here](http://pan.baidu.com/s/1hs27uHA).

Core concepts in part 1:
- One layer RNNs, which includes **three cell types**: `tf.contrib.rnn.BasicRNNCell`, `tf.contrib.rnn.BasicLSTMCell` and `tf.contrib.rnn.GRUCell`, are used. 
- The length of input sequences (means articles here) is **fixed**. 
- The inputs and outputs of `outputs, last_state = tf.contrib.rnn.static_rnn(cell, inputs)` are a **list**. 
- **`outputs[-1]`** means the output at last time step. `outputs[-1]` is focused since we only interests on the output of RNNs at last time step for classification task.
- For one layer RNNs, **`outputs[-1] == last_state`** for `BasicLSTMCell` and `GRUCell`. **`output[-1] == last_state.h`** for `BasicLSTMCell`. Why ? See below.
- Two `tf.summary.FileWriter` are initialized to save `accuracy` and `loss` of both **training** and **testing** steps to TensorBoard.
- Tensorflow model architecture (followed from [here](https://danijar.com/structuring-your-tensorflow-models))

In [1]:
import os
import codecs
import itertools
from collections import Counter
from random import shuffle
import tensorflow as tf
import numpy as np

Class `DataGenerator` is used to read input files, convert words to index and generate batch training or testing data. 

Since the RNNs input has fixed length. Longer sequences are truncated to `Arguments.MAX_SEQ_LENGTH` and shorter sequences are padded to `Arguments.MAX_SEQ_LENGTH`. 

Two extra word are introduced, `PAD` for padding shorter sequences and `OOV` for representing out-of-vocabulary words.

In [2]:
class DataGenerator():
    """
    reading each training and testing files, and generating batch data.
    """
    
    def __init__(self, args):
        self.folder_path = args.FOLDER_PATH
        self.batch_size = args.BATCH_SIZE
        self.vocab_size = args.VOCAB_SIZE
        self.max_seq_len = args.MAX_SEQ_LENGTH
        self.num_epoch = args.NUM_EPOCH
        self.read_build_input()
        self.single_generator_training = self.generate_sample_training()
        self.single_generator_testing = self.generate_sample_testing()
        self.label_dict = {0:'auto', 1:'business', 2:'IT', 3:'health', 4:'sports', 5:'yule'}
        
        
    def read_build_input(self):
        training_src = []
        testing_src = []
        article_len = []

        for cur_category in range(1, 7):
            
            print('parsing file >>>>>>>>>>>>>>> ', cur_category)
            print('-'*100)
            
            training_input_file = codecs.open(filename=os.path.join(self.folder_path, 'training_' + str(cur_category) + '.cs'), mode='r', encoding='utf-8')
            for tmp_line in training_input_file:
                training_src.append((tmp_line.split(), cur_category-1))
                article_len.append(len(tmp_line.split()))

            testing_input_file = codecs.open(filename=os.path.join(self.folder_path, 'testing_' + str(cur_category) + '.cs'), mode='r', encoding='utf-8')
            for tmp_line in testing_input_file:
                testing_src.append((tmp_line.split(), cur_category-1))
                article_len.append(len(tmp_line.split()))

        shuffle(training_src)
        shuffle(testing_src)
        
        assert(len(article_len) == (len(training_src) + len(testing_src)))
        print('='*100)
        print('Size of training data:', len(training_src))
        print('Size of testing data:', len(testing_src))
        print('Average length of all articles', sum(article_len)/len(article_len))
    
        self.TRAINING_SIZE = len(training_src)
        args.TESTING_SIZE = len(testing_src)
        
        training_X_src = [pair[0] for pair in training_src]
        testing_X_src = [pair[0] for pair in testing_src]
        all_data = list(itertools.chain.from_iterable(training_X_src + testing_X_src))
        word_counter = Counter(all_data).most_common(self.vocab_size)
        del all_data
        
        print('='*100)
        print('top 10 frequent words:')
        print(word_counter[0:10])
        self.word2idx = {val[0]: idx+1 for idx, val in enumerate(word_counter)}
        self.word2idx['PAD'] = 0 # padding word
        self.word2idx['OOV'] = self.vocab_size + 1 # out-of-vocabulary
        self.idx2word = dict(zip(self.word2idx.values(), self.word2idx.keys()))
        print('Total vocabulary size:{}'.format(len(self.word2idx)))
        
        self.training = [([self.word2idx[w] if w in self.word2idx else self.word2idx['OOV'] for w in tmp_pair[0][0:self.max_seq_len]], tmp_pair[1]) for tmp_pair in training_src]
        self.testing_ori =  [([self.word2idx[w] if w in self.word2idx else self.word2idx['OOV'] for w in tmp_pair[0][0:self.max_seq_len]], tmp_pair[1]) for tmp_pair in testing_src]
        self.testing = [(tmp_pair[0] + [self.word2idx['PAD']] * (self.max_seq_len - len(tmp_pair[0])), tmp_pair[1]) if len(tmp_pair[0]) < self.max_seq_len else tmp_pair for tmp_pair in self.testing_ori]
    
    def generate_sample_training(self):
        """
        If len(each article) < self.max_seq_len:
            padding them with 0
        else:
            truncating them to self.max_seq_len
        """
        outer_index = 0
        for X_y_pair in itertools.cycle(self.training):  # infinite loop each article
            tmp_input_len = len(X_y_pair[0])
            if tmp_input_len < self.max_seq_len:
                input_X = X_y_pair[0] + [self.word2idx['PAD']] * (self.max_seq_len - tmp_input_len)
            else:
                input_X = X_y_pair[0]
            
            output_y = X_y_pair[1]
            if outer_index in [0, self.batch_size-1]:
                print('='*100)
                print('Training text:', ' '.join([self.idx2word[tmp_id] for tmp_id in input_X]))
                print('Training text length:', len(input_X))
                print('Training label:', self.label_dict[output_y])
                
            yield input_X, output_y
            outer_index += 1
    
    def generate_sample_testing(self):
        """
        If len(each article) < self.max_seq_len:
            padding them with 0
        else:
            truncating them to self.max_seq_len
        """
        outer_index = 0
        for X_y_pair in itertools.cycle(self.testing):  # infinite loop each article
            tmp_input_len = len(X_y_pair[0])
            if tmp_input_len < self.max_seq_len:
                input_X = X_y_pair[0] + [self.word2idx['PAD']] * (self.max_seq_len - tmp_input_len)
            else:
                input_X = X_y_pair[0]
            
            output_y = X_y_pair[1]
            if outer_index in [0, self.batch_size-1]:
                print('='*100)
                print('Testing text:', ' '.join([self.idx2word[tmp_id] for tmp_id in input_X]))
                print('Testing text length:', len(input_X))
                print('Testing label:', self.label_dict[output_y])
                
            yield input_X, output_y
            outer_index += 1
        

    def next_batch_training(self):
        input_X_batch = []
        output_y_batch = []
        for idx in range(self.batch_size):
            tmp_X, tmp_y = next(self.single_generator_training)
            input_X_batch.append(tmp_X)
            output_y_batch.append(tmp_y)
        return np.array(input_X_batch, dtype=np.int32), np.array(output_y_batch, dtype=np.int32)
    
    def next_testing(self):
        testing_X = np.array([tmp_pair[0] for tmp_pair in self.testing], dtype=np.int32)
        testing_y = np.array([tmp_pair[1] for tmp_pair in self.testing], dtype=np.int32)
        return testing_X, testing_y        
    
    def next_batch_testing(self):
        input_X_batch = []
        output_y_batch = []
        for idx in range(self.batch_size):
            tmp_X, tmp_y = next(self.single_generator_testing)
            input_X_batch.append(tmp_X)
            output_y_batch.append(tmp_y)
        return np.array(input_X_batch, dtype=np.int32), np.array(output_y_batch, dtype=np.int32)

Hyper-parameters for this model.

In [3]:
class Arguments:
    """
    main hyper-parameters
    """
    MAX_SEQ_LENGTH = 150 # since Average length of all articles around 143
    EMBED_SIZE = 128 # embedding dimensions
    BATCH_SIZE = 64
    VOCAB_SIZE = 300000 # vocabulary size
    NUM_CLASSES = 6 # number of classes
    FOLDER_PATH = 'sogou_corpus'
    NUM_EPOCH = 7
    RNN_TYPE = 'LSTM' # RNN, LSTM or GRU
    CHECKPOINTS_DIR = 'text_classification_LSTM_model'
    LOGDIR = 'text_classification_LSTM_logdir'

Helper function for better organizing Tensorflow model structure.

From https://danijar.com/structuring-your-tensorflow-models

In [4]:
import functools

def lazy_property(function):
    """
    helper function from https://danijar.com/structuring-your-tensorflow-models
    """
    attribute = '_cache_' + function.__name__

    @property
    @functools.wraps(function)
    def decorator(self):
        if not hasattr(self, attribute):
            setattr(self, attribute, function(self))
        return getattr(self, attribute)

    return decorator

## `outputs, last_state = tf.contrib.rnn.static_rnn(cell, inputs, initial_state)`

`inputs` is a `list`, which size is `num_steps` and shape of each element is `[batch_size, num_units]`.

`outputs` is a `list`, which size is `num_steps` and contains the output (i.e., `ht`) at each time step.

Let `ht` and `ct` be the hidden state and cell state at time step *t*, respectively.

`last_state` is a `Tensor`, a tuple of `Tensor`, a `LSTMStateTuple` or a tuple of `LSTMStateTuple` in different scenarios.

### For `BasicRNNCell` and `GRUCell`
- one layer
    - Both `outputs[-1]` and `last_state` are `ht` at last time step
    - **`outputs[-1] == last_state`**
    - For instance: 
    ```
    BasicRNNCell:
    outputs[-1]: 
    Tensor("model/rnn/rnn/basic_rnn_cell/Tanh_199:0", shape=(32, 128), dtype=float32)    
    last_state: 
    Tensor("model/rnn/rnn/basic_rnn_cell/Tanh_199:0", shape=(32, 128), dtype=float32)
    
    GRUCell:
    outputs[-1]:
    Tensor("model/rnn/rnn/gru_cell/add_199:0", shape=(32, 128), dtype=float32)
    last_state:
    Tensor("model/rnn/rnn/gru_cell/add_199:0", shape=(32, 128), dtype=float32)
    ```
- Multiple layers (with `tf.contrib.rnn.MultiRNNCell` wrapper)
    - `outputs[-1]` is `ht` of **top** layer at the last time step
    - `last_state` contains `ht` of **all** layers at last time step
    - **`outputs[-1] == last_state[len(last_state)-1]`**
    - For instance (with `2` layers):
    ```
    outputs[-1]: 
    Tensor("model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_rnn_cell/Tanh_399:0", shape=(32, 128), dtype=float32)
    
    last_state: 
    (<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_rnn_cell/Tanh_398:0' shape=(32, 128) dtype=float32>,
    <tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_rnn_cell/Tanh_399:0' shape=(32, 128) dtype=float32>)
    ```

### For `BasicLSTMCell`
- One layer
    - `outputs[-1]` is `ht` at last time step
    - `last_state` is `LSTMStateTuple(ct, ht)` at last time step
    - **`outputs[-1] == last_state.h`**
    - For instance:
    ```
    outputs[-1]:
    Tensor("model/rnn/rnn/basic_lstm_cell/mul_599:0", shape=(32, 128), dtype=float32)

    last_state: 
    LSTMStateTuple(c=<tf.Tensor 'model/rnn/rnn/basic_lstm_cell/add_399:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'model/rnn/rnn/basic_lstm_cell/mul_599:0' shape=(32, 128) dtype=float32>)
    ```
- Multiple layers (with `tf.contrib.rnn.MultiRNNCell` wrapper)
    - `outputs[-1]` is `ht` of **top** layer at the last time step
    - `last_state` contains `LSTMStateTuple(ct, ht)` of **all** layers at last time step
    - **`outputs[-1] == last_state[len(last_state)-1].h`**
    - For instance (with `4` layers):
    ```
    outputs[-1]:
    Tensor("model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul_2399:0", shape=(32, 128), dtype=float32)
    
    last_state: 
    (LSTMStateTuple(c=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/add_1593:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul_2390:0' shape=(32, 128) dtype=float32>), 
    LSTMStateTuple(c=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/add_1595:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul_2393:0' shape=(32, 128) dtype=float32>), 
    LSTMStateTuple(c=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/add_1597:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul_2396:0' shape=(32, 128) dtype=float32>), 
    LSTMStateTuple(c=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/add_1599:0' shape=(32, 128) dtype=float32>, h=<tf.Tensor 'model/rnn/rnn/multi_rnn_cell/cell_0/cell_0/basic_lstm_cell/mul_2399:0' shape=(32, 128) dtype=float32>))
    ```

In [5]:
class TextClassificationModel:
    """
    Model class.
    """
    def __init__(self, args, is_training=True):
        self.num_units = args.EMBED_SIZE
        self.batch_size = args.BATCH_SIZE
        self.rnn_type = args.RNN_TYPE
        self.is_training = is_training
        
        if self.is_training:
            self.batch_size = args.BATCH_SIZE
        else:
            self.batch_size = args.TESTING_SIZE
        
        self.num_classes = args.NUM_CLASSES
        self.vocab_size = args.VOCAB_SIZE + 2
        self.num_steps = args.MAX_SEQ_LENGTH
        self.global_step = tf.Variable(initial_value=0, dtype=tf.int32, trainable=False, name='global_step')

        self.input_output
        self.model
        self.score
        self.cost
        self.optimizer
        
    @lazy_property
    def input_output(self):
        with tf.name_scope('input_output'):
            input_X = tf.placeholder(dtype=tf.int32, shape=[self.batch_size, self.num_steps], name='input_X')
            output_y = tf.placeholder(dtype=tf.int32, shape = [self.batch_size], name='output_y')
        return (input_X, output_y)
                
        
    @lazy_property
    def model(self):
        
        with tf.name_scope('RNNs_model'):
            with tf.variable_scope('embedding'):
                with tf.device('/cpu:0'):
                    embedding_matrix = tf.get_variable(name='embedding_matrix', shape=[self.vocab_size, self.num_units])
                    # inputs shape: (self.batch_size, self.num_steps, self.num_units)
                    inputs = tf.nn.embedding_lookup(params=embedding_matrix, ids=self.input_output[0], name='embed')

            if self.rnn_type == 'RNN':
                cell = tf.contrib.rnn.BasicRNNCell(num_units=self.num_units)
            elif self.rnn_type == 'GRU':
                cell = tf.contrib.rnn.GRUCell(num_units=self.num_units)
            elif self.rnn_type == 'LSTM':
                cell = tf.contrib.rnn.BasicLSTMCell(num_units=self.num_units)
            else:
                raise ValueError('The input rnn type is undefined.')
                
            initial_state = cell.zero_state(batch_size=self.batch_size, dtype=tf.float32)            
           
            inputs = tf.unstack(inputs, self.num_steps, 1)            
            
            print('='*100)
            print('static_rnn inputs type:', type(inputs)) # list
            print('static_rnn inputs len:', len(inputs)) # self.num_steps
            print('static_rnn inputs element type:', type(inputs[0])) # Tensor
            print('static_rnn inputs element shape:', inputs[0].get_shape()) # [self.batch_size, self.num_units]
            print('='*100)


            outputs, last_state = tf.contrib.rnn.static_rnn(cell, inputs=inputs, initial_state=initial_state)
            
            print('static_rnn output type:', type(outputs)) # list
            print('static_rnn output length:', len(outputs)) # self.num_steps
            print('static_rnn output element type:', type(outputs[-1])) # Tensor, output[-1] is last hidden state (i.e., hidden state at (self.num_steps - 1))
            print('static_rnn output element shape:', outputs[-1].get_shape()) # [self.batch_size, self.num_units]                
            print('static_rnn last_state type:', type(last_state))  # LSTMStateTuple (for BasicLSTMCell) or Tensor (for BasicRNNCell or GRUCell) 
            print('last_state:', last_state)
            print('='*100)
            
            if self.rnn_type == 'RNN' or self.rnn_type == 'GRU':
                print('outputs[-1] == last_state', outputs[-1] == last_state)
            elif self.rnn_type == 'LSTM':
                print('outputs[-1] == last_state.h', outputs[-1] == last_state.h)
            else:
                raise ValueError('The input rnn type is undefined.')

        return (outputs, last_state)
    
    @lazy_property
    def score(self):
        
        with tf.variable_scope('score'):

            softmax_weights = tf.get_variable(name='softmax_weights', shape=[self.num_units, self.num_classes])
            softmax_bias = tf.get_variable(name='softmax_bias', shape=[self.num_classes])
            logits = tf.matmul(self.model[0][-1], softmax_weights) + softmax_bias
            probs = tf.nn.softmax(logits)
            prediction = tf.argmax(probs, 1)
            accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.cast(prediction, tf.int32), self.input_output[1]), tf.float32))
            tf.summary.scalar(name='accuracy', tensor=accuracy)
            
        return (logits, accuracy, prediction)
    
    @lazy_property
    def cost(self):        
            
        with tf.name_scope('cost'):
            cost = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.score[0], labels=self.input_output[1]))
            tf.summary.scalar(name='loss', tensor=cost)
            tf.summary.histogram(name='histogram_loss', values=cost)
            self.summary_op = tf.summary.merge_all()
        return cost
    
    
    @lazy_property
    def optimizer(self):
        with tf.name_scope('optimizer'):
            return tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss=self.cost, global_step=self.global_step)
                           
                
    def predict(self, sess, data):
        testing_X = np.array([tmp_pair[0] for tmp_pair in data.testing], dtype=np.int32)
        testing_y = np.array([tmp_pair[1] for tmp_pair in data.testing], dtype=np.int32)
        feed_dict = {model.input_output:(testing_X, testing_y)}
        predict_labels, predict_accuracy = sess.run([model.score[2], model.score[1]], feed_dict=feed_dict)
        print('============================Example of predictions============================')
        for i in range(10):
            print('-'*100)
            print('Article: ', ''.join([data.idx2word[idx] for idx in testing_X[i]]))
            print('Real category: ', data.label_dict[testing_y[i]])
            print('Predicted category: ', data.label_dict[predict_labels[i]])
            print('-'*100 + '\n')
        return predict_labels, predict_accuracy

`train` method to train model.

`train_writer` and `test_writer` used as indicator of `accuracy` and `loss` for training and testing.

In [6]:
def train(data, model, args):
    saver = tf.train.Saver()
    with tf.Session() as sess:
        train_writer = tf.summary.FileWriter(logdir=args.LOGDIR + '/train', graph=sess.graph)
        test_writer = tf.summary.FileWriter(logdir=args.LOGDIR + '/test')
        
        sess.run(tf.global_variables_initializer())
        ckpt = tf.train.get_checkpoint_state(checkpoint_dir=args.CHECKPOINTS_DIR)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess=sess, save_path=ckpt.model_checkpoint_path)
            print(ckpt)
        
        max_iteration_num = args.NUM_EPOCH * data.TRAINING_SIZE // args.BATCH_SIZE
        initial_step = model.global_step.eval()
        for idx in range(initial_step, max_iteration_num):
            batch_X, batch_y = data.next_batch_training()
            
            feed_dict = {model.input_output: (batch_X, batch_y)}
            tmp_accuracy, tmp_cost, _, tmp_summary = sess.run([model.score[1], model.cost, model.optimizer, model.summary_op], feed_dict=feed_dict)
            train_writer.add_summary(summary=tmp_summary, global_step=model.global_step.eval())
            
            if idx % 50 == 0:
                print('='*100)
                print('Step:{}, training accuracy:{:4f}'.format(model.global_step.eval(), tmp_accuracy))
                print('Step: {} / {}, loss:{:4f}, accuracy:{:4f}'.format(idx, max_iteration_num, tmp_cost, tmp_accuracy))
                print('='*100)
                
            if idx % 200 == 0:
                test_batch_X, test_batch_y = data.next_batch_testing()
                test_feed_dict = {model.input_output:(test_batch_X, test_batch_y)}
                test_tmp_cost, test_tmp_accuracy, test_tmp_summary = sess.run([model.cost, model.score[1], model.summary_op], feed_dict=test_feed_dict)
                test_writer.add_summary(summary=test_tmp_summary, global_step=model.global_step.eval())
                print('-'*100)
                print('Step:{}, testing accuracy:{:4f}'.format(model.global_step.eval(), test_tmp_accuracy))
                print('Step: {} / {}, loss:{:4f}, accuracy:{:4f}'.format(idx, max_iteration_num, test_tmp_cost, test_tmp_accuracy))
                print('-'*100)
            
            if idx % 500 == 0 or (idx+1) == max_iteration_num:
                saver.save(sess=sess, save_path=os.path.join(args.CHECKPOINTS_DIR, 'text_classification_lstm.ckpt'), global_step=model.global_step.eval())

`test` method to load the trained model and test model.

In [7]:
def test(data, model, args):
    saver = tf.train.Saver()
    with tf.Session() as sess:
        ckpt = tf.train.latest_checkpoint(args.CHECKPOINTS_DIR)
        print(ckpt)
        saver.restore(sess=sess, save_path=ckpt)
        predict_labels, predict_accuracy = model.predict(sess, data)
        print('predict_accuracy:{:5f}'.format(predict_accuracy))

In [8]:
if __name__ == '__main__':
    args = Arguments()
    data = DataGenerator(args)
    
    # for training
    model = TextClassificationModel(args)
    train(data, model, args)
    
    
    # after training model, testing it using whole testing data
    # for testing
    # model = TextClassificationModel(args, is_training=False)
    # test(data, model, args) # predict_accuracy:0.769333
    

parsing file >>>>>>>>>>>>>>>  1
----------------------------------------------------------------------------------------------------
parsing file >>>>>>>>>>>>>>>  2
----------------------------------------------------------------------------------------------------
parsing file >>>>>>>>>>>>>>>  3
----------------------------------------------------------------------------------------------------
parsing file >>>>>>>>>>>>>>>  4
----------------------------------------------------------------------------------------------------
parsing file >>>>>>>>>>>>>>>  5
----------------------------------------------------------------------------------------------------
parsing file >>>>>>>>>>>>>>>  6
----------------------------------------------------------------------------------------------------
Size of training data: 90000
Size of testing data: 18000
Average length of all articles 143.20944444444444
top 10 frequent words:
[('系列', 106280), ('月', 93600), ('中', 84580), ('年', 77816), ('产品', 74051)

Step:451, training accuracy:0.953125
Step: 450 / 9843, loss:0.175668, accuracy:0.953125
Step:501, training accuracy:0.890625
Step: 500 / 9843, loss:0.229552, accuracy:0.890625
Step:551, training accuracy:0.906250
Step: 550 / 9843, loss:0.321827, accuracy:0.906250
Step:601, training accuracy:0.843750
Step: 600 / 9843, loss:0.459995, accuracy:0.843750
----------------------------------------------------------------------------------------------------
Step:601, testing accuracy:0.921875
Step: 600 / 9843, loss:0.330818, accuracy:0.921875
----------------------------------------------------------------------------------------------------
Step:651, training accuracy:0.906250
Step: 650 / 9843, loss:0.512109, accuracy:0.906250
Step:701, training accuracy:0.890625
Step: 700 / 9843, loss:0.280327, accuracy:0.890625
Step:751, training accuracy:0.968750
Step: 750 / 9843, loss:0.137724, accuracy:0.968750
Step:801, training accuracy:0.953125
Step: 800 / 9843, loss:0.172179, accuracy:0.953125
-------

Step:1651, training accuracy:0.968750
Step: 1650 / 9843, loss:0.254794, accuracy:0.968750
Step:1701, training accuracy:0.984375
Step: 1700 / 9843, loss:0.098473, accuracy:0.984375
Step:1751, training accuracy:0.984375
Step: 1750 / 9843, loss:0.063040, accuracy:0.984375
Step:1801, training accuracy:0.953125
Step: 1800 / 9843, loss:0.110057, accuracy:0.953125
----------------------------------------------------------------------------------------------------
Step:1801, testing accuracy:0.968750
Step: 1800 / 9843, loss:0.140755, accuracy:0.968750
----------------------------------------------------------------------------------------------------
Step:1851, training accuracy:0.968750
Step: 1850 / 9843, loss:0.187544, accuracy:0.968750
Step:1901, training accuracy:0.953125
Step: 1900 / 9843, loss:0.181228, accuracy:0.953125
Step:1951, training accuracy:0.984375
Step: 1950 / 9843, loss:0.097430, accuracy:0.984375
Step:2001, training accuracy:0.937500
Step: 2000 / 9843, loss:0.162424, accurac

Step:2851, training accuracy:0.937500
Step: 2850 / 9843, loss:0.210639, accuracy:0.937500
Step:2901, training accuracy:0.968750
Step: 2900 / 9843, loss:0.050754, accuracy:0.968750
Step:2951, training accuracy:0.937500
Step: 2950 / 9843, loss:0.173187, accuracy:0.937500
Step:3001, training accuracy:0.968750
Step: 3000 / 9843, loss:0.131826, accuracy:0.968750
----------------------------------------------------------------------------------------------------
Step:3001, testing accuracy:0.890625
Step: 3000 / 9843, loss:0.413092, accuracy:0.890625
----------------------------------------------------------------------------------------------------
Step:3051, training accuracy:1.000000
Step: 3050 / 9843, loss:0.035105, accuracy:1.000000
Step:3101, training accuracy:1.000000
Step: 3100 / 9843, loss:0.025751, accuracy:1.000000
Step:3151, training accuracy:0.953125
Step: 3150 / 9843, loss:0.161168, accuracy:0.953125
Step:3201, training accuracy:0.984375
Step: 3200 / 9843, loss:0.190967, accurac

Step:4051, training accuracy:0.984375
Step: 4050 / 9843, loss:0.091215, accuracy:0.984375
Step:4101, training accuracy:0.968750
Step: 4100 / 9843, loss:0.078991, accuracy:0.968750
Step:4151, training accuracy:0.984375
Step: 4150 / 9843, loss:0.108287, accuracy:0.984375
Step:4201, training accuracy:0.953125
Step: 4200 / 9843, loss:0.158832, accuracy:0.953125
----------------------------------------------------------------------------------------------------
Step:4201, testing accuracy:0.937500
Step: 4200 / 9843, loss:0.088754, accuracy:0.937500
----------------------------------------------------------------------------------------------------
Step:4251, training accuracy:0.968750
Step: 4250 / 9843, loss:0.095559, accuracy:0.968750
Step:4301, training accuracy:1.000000
Step: 4300 / 9843, loss:0.007672, accuracy:1.000000
Step:4351, training accuracy:0.968750
Step: 4350 / 9843, loss:0.138654, accuracy:0.968750
Step:4401, training accuracy:0.984375
Step: 4400 / 9843, loss:0.037895, accurac

Step:5251, training accuracy:0.953125
Step: 5250 / 9843, loss:0.117372, accuracy:0.953125
Step:5301, training accuracy:0.968750
Step: 5300 / 9843, loss:0.102250, accuracy:0.968750
Step:5351, training accuracy:0.968750
Step: 5350 / 9843, loss:0.176152, accuracy:0.968750
Step:5401, training accuracy:0.984375
Step: 5400 / 9843, loss:0.032431, accuracy:0.984375
----------------------------------------------------------------------------------------------------
Step:5401, testing accuracy:0.937500
Step: 5400 / 9843, loss:0.161853, accuracy:0.937500
----------------------------------------------------------------------------------------------------
Step:5451, training accuracy:0.953125
Step: 5450 / 9843, loss:0.092039, accuracy:0.953125
Step:5501, training accuracy:1.000000
Step: 5500 / 9843, loss:0.008989, accuracy:1.000000
Step:5551, training accuracy:0.984375
Step: 5550 / 9843, loss:0.105050, accuracy:0.984375
Step:5601, training accuracy:0.937500
Step: 5600 / 9843, loss:0.208437, accurac

Step:6451, training accuracy:0.968750
Step: 6450 / 9843, loss:0.030108, accuracy:0.968750
Step:6501, training accuracy:0.984375
Step: 6500 / 9843, loss:0.039018, accuracy:0.984375
Step:6551, training accuracy:0.953125
Step: 6550 / 9843, loss:0.110855, accuracy:0.953125
Step:6601, training accuracy:0.984375
Step: 6600 / 9843, loss:0.107665, accuracy:0.984375
----------------------------------------------------------------------------------------------------
Step:6601, testing accuracy:0.937500
Step: 6600 / 9843, loss:0.148537, accuracy:0.937500
----------------------------------------------------------------------------------------------------
Step:6651, training accuracy:0.968750
Step: 6650 / 9843, loss:0.088869, accuracy:0.968750
Step:6701, training accuracy:1.000000
Step: 6700 / 9843, loss:0.011663, accuracy:1.000000
Step:6751, training accuracy:0.953125
Step: 6750 / 9843, loss:0.110014, accuracy:0.953125
Step:6801, training accuracy:0.984375
Step: 6800 / 9843, loss:0.041993, accurac

Step:7651, training accuracy:0.984375
Step: 7650 / 9843, loss:0.021363, accuracy:0.984375
Step:7701, training accuracy:0.953125
Step: 7700 / 9843, loss:0.210182, accuracy:0.953125
Step:7751, training accuracy:0.984375
Step: 7750 / 9843, loss:0.043303, accuracy:0.984375
Step:7801, training accuracy:0.968750
Step: 7800 / 9843, loss:0.044631, accuracy:0.968750
----------------------------------------------------------------------------------------------------
Step:7801, testing accuracy:0.937500
Step: 7800 / 9843, loss:0.204294, accuracy:0.937500
----------------------------------------------------------------------------------------------------
Step:7851, training accuracy:0.984375
Step: 7850 / 9843, loss:0.034153, accuracy:0.984375
Step:7901, training accuracy:0.968750
Step: 7900 / 9843, loss:0.118199, accuracy:0.968750
Step:7951, training accuracy:0.984375
Step: 7950 / 9843, loss:0.101045, accuracy:0.984375
Step:8001, training accuracy:0.984375
Step: 8000 / 9843, loss:0.057492, accurac

Step:8851, training accuracy:0.984375
Step: 8850 / 9843, loss:0.027680, accuracy:0.984375
Step:8901, training accuracy:1.000000
Step: 8900 / 9843, loss:0.007087, accuracy:1.000000
Step:8951, training accuracy:1.000000
Step: 8950 / 9843, loss:0.002915, accuracy:1.000000
Step:9001, training accuracy:0.984375
Step: 9000 / 9843, loss:0.028946, accuracy:0.984375
----------------------------------------------------------------------------------------------------
Step:9001, testing accuracy:0.921875
Step: 9000 / 9843, loss:0.204132, accuracy:0.921875
----------------------------------------------------------------------------------------------------
Step:9051, training accuracy:0.984375
Step: 9050 / 9843, loss:0.052504, accuracy:0.984375
Step:9101, training accuracy:0.984375
Step: 9100 / 9843, loss:0.041257, accuracy:0.984375
Step:9151, training accuracy:0.984375
Step: 9150 / 9843, loss:0.035852, accuracy:0.984375
Step:9201, training accuracy:0.921875
Step: 9200 / 9843, loss:0.246202, accurac