Here we introduce the deep learning model we built. The main body is a RNN(here we use biGRU) with hierarchical mechanisms (from words to sentences, then sentences to articles) and attention mechanisms. In addition to the main model, we have added some of our own processing skills, I will introduce them one by one.

## read in data

In [1]:
import pandas as pd
import nltk
import pickle
from nltk.tokenize import WordPunctTokenizer
from collections import defaultdict
import json
import numpy as np
from collections import Counter
from nltk.corpus import stopwords
from string import punctuation
import warnings

warnings.filterwarnings("ignore")


sent_tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
word_tokenizer = WordPunctTokenizer()


word_freq = defaultdict(int)

df = pd.read_csv('data/cleaned_fake_or_real_news.csv')

#丢掉无用的列
df = df.drop(['title','text','hashtags','numerics','sentence_count','word_count','char_count','avg_word','avg_sentence','stopwords','upper'],axis=1)

df = df.reset_index(drop=True)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6295 entries, 0 to 6294
Data columns (total 2 columns):
fake          6295 non-null int64
title_text    6295 non-null object
dtypes: int64(1), object(1)
memory usage: 98.4+ KB


## Doc2vec

We need to train all the news items in the dataset according to the vector of glove pretrained embedding. It will play a role in the first word embedding layer of the HAN model, and we will introduce it later.  Now just remember that we get doc_feature, the doc vector of every news. Its dimension is (6295,300)

In [2]:
from gensim.models.doc2vec import TaggedDocument
from gensim.models import Doc2Vec

words_in_text = []  #create a list called"words_in_text" and save only the text"
for i in range(len(df)): #for each passage, add it to the words_in_text
    words_in_text.append(df.iloc[i]['title_text'])
    

stop_words = stopwords.words('english')
#str.translate does lookup in C, faster
mapping_table = {ord(char): u' ' for char in punctuation}
tokenized = [nltk.word_tokenize(line.translate(mapping_table)) for line in words_in_text]

def clean_text(tokenized_list, sw):
    new_list = []
    for doc in tokenized_list:
        new_list.append([token.lower() for token in doc if token.lower() not in sw])
    return new_list

cleaned = clean_text(tokenized, stop_words)


class TagDocIterator:
    def __init__(self, doc_list, idx_list):
        self.doc_list = doc_list
        self.idx_list = idx_list

    def __iter__(self):
        for doc, idx, in zip(self.doc_list, self.idx_list):
            tag = [idx]
            yield TaggedDocument(words=doc, tags=tag)

            
            
pretrained_emb = "data/glove.6B.100d.txt" 
doc2vec_model = Doc2Vec(size=300, dm = 1, iter=5,pretrained_emb=pretrained_emb)


doc2vec_model.build_vocab(TagDocIterator(cleaned, df.index))
doc2vec_model.train(TagDocIterator(cleaned, df.index), epochs=5, total_examples=doc2vec_model.corpus_count)

doc_feature = []#doc2vec vector for every news
for i in range(len(doc2vec_model.docvecs)):
    doc_feature.append(doc2vec_model.docvecs[i])
    
len(doc_feature)

6295

## Division

Division is a special idea in our code, and it's designed for the last voting. But unlike the voting we do in sklearn, it's not voting as a result of several different classification methods, but voting in the former and latter parts of a news, which means I would train a HAN model for the two parts of the news respectively. So, at the end of our model, when we get the output vectors of two parts of the news, we can choose some methods to combine them. 

In this task, we think division here has the following advantages:

### 1.Reduce the information loss
One of the most obvious benefits is that because of our HAN model (including all the other deep neural networks we've tried, LSTM CNN......), we have to build a fixed-size input matrix for them in tensorflow. This means that we can only fix the maximum number of sentences that an news can enter and the maximum number of words per sentence, and then discard the excess or make up the rest with UNK. But even though we have chosen a ratioanl maximum number to most news text (Here the irrationality means that if the fixed size is too large, most news will have many UNK, which may make them similar. And if the size is too small, the information we offer to our network may be inadequent), it is still inevitable that some news will lose the latter half of the information that may be important for our claasification. On the contrary, our method can greatly improve the problem and reduce the amount of information loss.

### 2.Improve acc
Just like the tranditional benefit about voting, it will improve our accuracy. If we can appropriatly control the weight of the two parts' prediction output and make both of the predictions are not much worser than the non-divides version, this improvement for our task will be amazing.

### 3.Multi-voting-method for diversified purposes
Because the predictive accuracys of real and fake news based on the two-part sections are both very high, if, in some cases, we are concerned not only with the accuracy, but also with the recognition rate of only fake news (if we want to identify all fake news, the recognition rate of real news is not so essential), then we can slightly switch our voting idea, which is when any part of a news is predicted to be fake, it is a fake news. In this way, the accuracy of fake news can reach almost 100%. Of course, it also applies to real ones if we just need find all real news.

In [3]:
def frequency(x):
    words = word_tokenizer.tokenize(x)
    for word in words:
        word_freq[word] += 1
df['title_text'].apply(frequency)

with open('data/word_freq.pickle', 'wb') as g:
    pickle.dump(word_freq, g)
    print(len(word_freq))
    print("word_freq save finished")
    
num_classes = 2
# Sort the word frequency and remove the 3 least frequent occurrences.
sort_words = list(sorted(word_freq.items(), key=lambda x:-x[1]))
print(sort_words[:10], sort_words[-10:])

#Construct vocablary and remove all words less than 5, regarded as UNKNOW.
vocab = {}
i = 1
vocab['UNKNOW_TOKEN'] = 0
for word, freq in word_freq.items():
    if freq > 5:
        vocab[word] = i
        i += 1
print(i)
UNKNOWN = 0

82582
word_freq save finished
[('the', 261365), (',', 239089), ('.', 222656), ('to', 139331), ('of', 129759), ('and', 114005), ('a', 104458), ('in', 90978), ('number', 70333), ('that', 68624)] [('rendezvous', 1), ('Bloodshed', 1), ('headedness', 1), ('Zanganeh', 1), ('Sourcebooks', 1), ('Cerretto', 1), ('initialled', 1), ('superstars', 1), ('sayin', 1), ('multicutural', 1)]
28622


This section is our treatment of the divide, former data extracts the first 30 words of the first 30 sentences of each news, latter data extracts the first 30 words of the last 30 sentences of each news.

In [4]:
data_x_former = []
data_x_latter = []
data_y = []
max_sent_in_doc = 30
max_word_in_sent = 30

def set_data(x):
    doc1=np.zeros((30,30), dtype=np.int32)
    doc2=np.zeros((30,30), dtype=np.int32)
    sents = sent_tokenizer.tokenize(x['title_text'])
    
    for i, sent in enumerate(sents):
        if i < max_sent_in_doc:
            for j, word in enumerate(word_tokenizer.tokenize(sent)):
                if j < max_word_in_sent:
                        doc1[i][j]=vocab.get(word, UNKNOWN) 

        i_verse = len(sents)-1-i
        
        if  i_verse < max_sent_in_doc:
            for j, word in enumerate(word_tokenizer.tokenize(sent)):
                if j < max_word_in_sent:
                        doc2[29-i_verse][j]=vocab.get(word, UNKNOWN) 

    label = int(x['fake'])
    labels = [0] * num_classes
    labels[label-1] = 1
    data_y.append(labels)
    data_x_former.append(doc1)
    data_x_latter.append(doc2)

df.apply(set_data,axis=1)
    
pickle.dump((data_x_former, data_y), open('former_latter_data/fake_news_data1', 'wb'))
pickle.dump((data_x_former, data_y), open('former_latter_data/fake_news_data2', 'wb'))
print(len(data_x_former))
print(len(data_x_latter))

6295
6295


The following two functions are functions that take part of the former and latter data during training or testing

In [5]:
def read_data1set():
    with open('former_latter_data/fake_news_data1', 'rb') as f:
        data_x_former, data_y = pickle.load(f)
        length = len(data_x_former)
        train_x_former, dev_x_former = data_x1[:int(length*0.7)], data_x1[int(length*0.7)+1 :]
        train_y, dev_y = data_y[:int(length*0.7)], data_y[int(length*0.7)+1 :]
        return train_x_former, train_y, dev__former1, dev_y
    
def read_data2set():
    with open('former_latter_data/fake_news_data2', 'rb') as f:
        data_x_latter, data_y = pickle.load(f)
        length = len(data_x_latter)
        train_x_latter, dev_x_latter = data_x_latter[:int(length*0.7)], data_x_latter[int(length*0.7)+1 :]
        train_y, dev_y = data_y[:int(length*0.7)], data_y[int(length*0.7)+1 :]
        return train_x_latter, train_y, dev_x_latter, dev_y

In addition, we have just established the doc2vec model, in order to adapt to tensor in tensorflow, some of the operations need to make. We're going to make it a (6295, 30, 30, 300) vector in advance to facilitate tf.concat with word vectors in tensorflow

In [7]:
doc2vec=np.zeros((6295,30,30,300), dtype=np.int32)

for i in range(6295):
    for second in doc2vec[i]:
        for three in second:
            for j in len(three):
                three[j]=final_feature[i][j]

<img src="picture\1.png">

In [8]:
import tensorflow as tf
from tensorflow.contrib import rnn
from tensorflow.contrib import layers

def length(sequences):
#返回一个序列中每个元素的长度
    used = tf.sign(tf.reduce_max(tf.abs(sequences), reduction_indices=2))
    seq_len = tf.reduce_sum(used, reduction_indices=1)
    return tf.cast(seq_len, tf.int32)

class HAN():

    def __init__(self, vocab_size, num_classes, embedding_size=200, hidden_size=50):

        self.vocab_size = vocab_size
        self.num_classes = num_classes
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size

        with tf.name_scope('placeholder'):
            self.max_sentence_num = tf.placeholder(tf.int32, name='max_sentence_num')
            self.max_sentence_length = tf.placeholder(tf.int32, name='max_sentence_length')
            self.batch_size = tf.placeholder(tf.int32, name='batch_size')
            #x的shape为[batch_size, 句子数， 句子长度(单词个数)]，但是每个样本的数据都不一样，，所以这里指定为空
            #y的shape为[batch_size, num_classes]
            self.input_x = tf.placeholder(tf.int32, [None, None, None], name='input_x')
            self.input_y = tf.placeholder(tf.float32, [None, num_classes], name='input_y')

        #构建模型
        word_embedded = self.word2vec()
        sent_vec = self.sent2vec(word_embedded)
        doc_vec = self.doc2vec(sent_vec)
        out = self.classifer(doc_vec)

        self.out = out
        self.embed_matrix=word_embedded


    def word2vec(self):
        #嵌入层
        global doc2vec_count_test,doc2vec_count_train,is_test
        with tf.name_scope("embedding"):
            embedding_mat = tf.Variable(tf.truncated_normal((self.vocab_size, self.embedding_size)))
            #shape为[batch_size, sent_in_doc, word_in_sent, embedding_size]
            word_embedded_raw = tf.nn.embedding_lookup(embedding_mat, self.input_x)
            
            if is_test==0:
                word_embedded = tf.concat([doc_feature[doc2vec_count_train:doc2vec_count_train+24],word_embedded_raw], axis=3)
                doc2vec_count_train = (doc2vec_count_train + 24)%4392
            else:
                word_embedded = tf.concat([doc_feature[doc2vec_count_test:doc2vec_count_test+24],word_embedded_raw], axis=3)
                doc2vec_count_test = doc2vec_count_test + 24
          
            
        return word_embedded

    def sent2vec(self, word_embedded):
        
        with tf.name_scope("sent2vec"):
            #GRU的输入tensor是[batch_size, max_time, ...].在构造句子向量时max_time应该是每个句子的长度，所以这里将
            #batch_size * sent_in_doc当做是batch_size.这样一来，每个GRU的cell处理的都是一个单词的词向量
            #并最终将一句话中的所有单词的词向量融合（Attention）在一起形成句子向量

            
            
            #shape为[batch_size*sent_in_doc, word_in_sent, 300+embedding_size]
            word_embedded = tf.reshape(word_embedded, [-1, self.max_sentence_length, 300+self.embedding_size])
            #shape为[batch_size*sent_in_doce, word_in_sent, hidden_size*2]
            word_encoded = self.BidirectionalGRUEncoder(word_embedded, name='word_encoder')
            #shape为[batch_size*sent_in_doc, hidden_size*2]
            sent_vec = self.AttentionLayer(word_encoded, name='word_attention')
            return sent_vec

    def doc2vec(self, sent_vec):
        #原理与sent2vec一样，根据文档中所有句子的向量构成一个文档向量
        with tf.name_scope("doc2vec"):
            sent_vec = tf.reshape(sent_vec, [-1, self.max_sentence_num, self.hidden_size*2])
            #shape为[batch_size, sent_in_doc, hidden_size*2]
            doc_encoded = self.BidirectionalGRUEncoder(sent_vec, name='sent_encoder')
            #shape为[batch_szie, hidden_szie*2]
            doc_vec = self.AttentionLayer(doc_encoded, name='sent_attention')
            return doc_vec

    def classifer(self, doc_vec):
        #最终的输出层，是一个全连接层
        with tf.name_scope('doc_classification'):
            out = layers.fully_connected(inputs=doc_vec, num_outputs=self.num_classes, activation_fn=None)
            return out

    def BidirectionalGRUEncoder(self, inputs, name):
        #双向GRU的编码层，将一句话中的所有单词或者一个文档中的所有句子向量进行编码得到一个 2×hidden_size的输出向量，然后在经过Attention层，将所有的单词或句子的输出向量加权得到一个最终的句子/文档向量。
        #输入inputs的shape是[batch_size, max_time, voc_size]
        with tf.variable_scope(name):
            GRU_cell_fw = rnn.GRUCell(self.hidden_size)
            GRU_cell_bw = rnn.GRUCell(self.hidden_size)
            #fw_outputs和bw_outputs的size都是[batch_size, max_time, hidden_size]
            ((fw_outputs, bw_outputs), (_, _)) = tf.nn.bidirectional_dynamic_rnn(cell_fw=GRU_cell_fw,
                                                                                 cell_bw=GRU_cell_bw,
                                                                                 inputs=inputs,
                                                                                 sequence_length=length(inputs),
                                                                                 dtype=tf.float32)
            #outputs的size是[batch_size, max_time, hidden_size*2]
            outputs = tf.concat((fw_outputs, bw_outputs), 2)
            return outputs

    def AttentionLayer(self, inputs, name):
        #inputs是GRU的输出，size是[batch_size, max_time, encoder_size(hidden_size * 2)]
        with tf.variable_scope(name):
            # u_context是上下文的重要性向量，用于区分不同单词/句子对于句子/文档的重要程度,
            # 因为使用双向GRU，所以其长度为2×hidden_szie
            u_context = tf.Variable(tf.truncated_normal([self.hidden_size * 2]), name='u_context')
            #使用一个全连接层编码GRU的输出的到期隐层表示,输出u的size是[batch_size, max_time, hidden_size * 2]
            h = layers.fully_connected(inputs, self.hidden_size * 2, activation_fn=tf.nn.tanh)
            #shape为[batch_size, max_time, 1]
            alpha = tf.nn.softmax(tf.reduce_sum(tf.multiply(h, u_context), axis=2, keep_dims=True), dim=1)
            #reduce_sum之前shape为[batch_szie, max_time, hidden_szie*2]，之后shape为[batch_size, hidden_size*2]
            atten_output = tf.reduce_sum(tf.multiply(inputs, alpha), axis=1)
            return atten_output

#### Now, I will introduce layer by layer.

<img src="picture\9.png">

the first layer in HAN is word2vec layer. Because we have transform all words into index of corpus, here we just construct our input matrix(24,30,30,300) with the help of tensorflow (for every word look up its vector according to the index). Remember the doc2vec matrix we make just now? After we get the word_embedded matrix here, next job is to concat embedding matrix here with relative section in word2vec matrix(which means the dimension of matrix become (24,30,30,600). It makes the input embedding vectors for every word from a single word to the word in certain news, which I think will make the performer better. The detail of concat is just as the iteration in note(because we could not use iteration in tensor, it is just a show of the idea)

In [None]:
def word2vec(self):
        #embedding
        global doc2vec_count_test,doc2vec_count_train,is_test
        with tf.name_scope("embedding"):
            embedding_mat = tf.Variable(tf.truncated_normal((self.vocab_size, self.embedding_size)))
            #shape为[batch_size, sent_in_doc, word_in_sent, embedding_size]
            word_embedded_raw = tf.nn.embedding_lookup(embedding_mat, self.input_x)
            
            
            if is_test==0:
                word_embedded = tf.concat([final_feature[doc2vec_count_train:doc2vec_count_train+24],word_embedded_raw], axis=0)
                doc2vec_count_train = (doc2vec_count_train + 24)%4392
            else:
                word_embedded = tf.concat([final_feature[doc2vec_count_test:doc2vec_count_test+24],word_embedded_raw], axis=0)
                doc2vec_count_test = doc2vec_count_test + 24
            '''#for news in word_embedded:
            for sentence in news:
                for word in sentence:
                        if is_test==0:
                            word=final_feature[doc2vec_count_train].append(word)
                            doc2vec_count_train = (doc2vec_count_train + 1)%4392
                        else: 
                            word=final_feature[doc2vec_count_test].append(word)
                            doc2vec_count_train = doc2vec_count_test + 1
            '''
          
        return word_embedded

<img src="picture\10.png">

After we get our word_embedding matrix, next step is just use them as input if lstm/cnn/gru layer. And because author said GRU plays better in his emotional analysis task, here usea GRU too. Of course, after the GRU layer, there is a attention layer here just like what is showed in picture.

In [None]:
def sent2vec(self, word_embedded):
        with tf.name_scope("sent2vec"):
            #GRU的输入tensor是[batch_size, max_time, ...].在构造句子向量时max_time应该是每个句子的长度，所以这里将
            #batch_size * sent_in_doc当做是batch_size.这样一来，每个GRU的cell处理的都是一个单词的词向量
            #并最终将一句话中的所有单词的词向量融合（Attention）在一起形成句子向量
            
            #shape为[batch_size*sent_in_doc, word_in_sent, 300+embedding_size]
            word_embedded = tf.reshape(word_embedded, [-1, self.max_sentence_length, 300+self.embedding_size])
            #shape为[batch_size*sent_in_doce, word_in_sent, hidden_size*2]
            word_encoded = self.BidirectionalGRUEncoder(word_embedded, name='word_encoder')
            #shape为[batch_size*sent_in_doc, hidden_size*2]
            sent_vec = self.AttentionLayer(word_encoded, name='word_attention')
            return sent_vec

<img src="picture\11.png">
What we do here is just like our behavior in word2sent, and just chage the input and output into sentence_vectors and document_vectors.

In [None]:
def doc2vec(self, sent_vec):
    #原理与sent2vec一样，根据文档中所有句子的向量构成一个文档向量
    with tf.name_scope("doc2vec"):
        sent_vec = tf.reshape(sent_vec, [-1, self.max_sentence_num, self.hidden_size*2])
        #shape为[batch_size, sent_in_doc, hidden_size*2]
        doc_encoded = self.BidirectionalGRUEncoder(sent_vec, name='sent_encoder')
        #shape为[batch_szie, hidden_szie*2]
        doc_vec = self.AttentionLayer(doc_encoded, name='sent_attention')
        return doc_vec

So, in the end, after so many work above, we get the vectors for all input news. And we just need to add a fully_connection layer just like a CNN layer. This is the end of HAN class, however, there is still some layer like softmax in subsequent codes.

In [None]:
def classifer(self, doc_vec):
    #最终的输出层，是一个全连接层
    with tf.name_scope('doc_classification'):
        out = layers.fully_connected(inputs=doc_vec, num_outputs=self.num_classes, activation_fn=None)
        return out

## Training
After the most complex model construction above, then we would begin the training process here. First, we need define some paremeters we would use, and their meaning are all offered in defination.

In [10]:
import tensorflow as tf
import time
import os


# Data loading params
tf.flags.DEFINE_integer("vocab_size", 84729, "vocabulary size")
tf.flags.DEFINE_integer("num_classes", 2, "number of classes")
tf.flags.DEFINE_integer("embedding_size", 300, "Dimensionality of character embedding (default: 200)")
tf.flags.DEFINE_integer("hidden_size", 100, "Dimensionality of GRU hidden layer (default: 50)")
tf.flags.DEFINE_integer("batch_size", 24, "Batch Size (default: 64)")
tf.flags.DEFINE_integer("num_epochs", 10, "Number of training epochs (default: 50)")
tf.flags.DEFINE_integer("evaluate_every", 200, "evaluate every this many batches")
tf.flags.DEFINE_float("learning_rate", 0.01, "learning rate")
tf.flags.DEFINE_float("grad_clip", 5, "grad clip to prevent gradient explode")

tf.app.flags.DEFINE_string('f', '', 'kernel')

FLAGS = tf.flags.FLAGS

Then, we use the function(read_data1set) to get our train&&validation data and their label. Just follow some normal processes in tensorflow to build up the network.

1.define a session 2.get the ouput of HAN network and calculate its loss with the help of tf.nn.softmax_cross_entropy_with_logits and tf.reduce_mean.3.use tf.argmax to get prediction value and use tf.reduce_mean to get the accuracy.

Finally, start the training.

In [11]:
train_x_former, train_y, dev_x_former, dev_y = read_data1set()
print("data1 load finished")

tf.reset_default_graph()

sess = tf.InteractiveSession()

han = HAN(vocab_size=FLAGS.vocab_size,
                    num_classes=FLAGS.num_classes,
                    embedding_size=FLAGS.embedding_size,
                    hidden_size=FLAGS.hidden_size)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=han.input_y,
                                                              logits=han.out,
                                                              name='loss'))

output_vec=han.out
input_label=han.input_y
predict = tf.argmax(han.out, axis=1, name='predict')
label = tf.argmax(han.input_y, axis=1, name='label')
acc = tf.reduce_mean(tf.cast(tf.equal(predict, label), tf.float32))
    
global_step = tf.Variable(0, trainable=False)
optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
# RNN中常用的梯度截断，防止出现梯度过大难以求导的现象
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), FLAGS.grad_clip)
grads_and_vars = tuple(zip(grads, tvars))
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())



def train_step(x_batch, y_batch):
    feed_dict = {
        han.input_x: x_batch,
        han.input_y: y_batch,
        han.max_sentence_num: 30,
        han.max_sentence_length: 30,
        han.batch_size: 24
    }
    _, step, cost, accuracy = sess.run([train_op, global_step , loss, acc], feed_dict)
    
    time_str = str(int(time.time()))
    print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))

    return step

def dev_step(x_batch, y_batch, writer=None):
    feed_dict = {
        han.input_x: x_batch,
        han.input_y: y_batch,
        han.max_sentence_num: 30,
        han.max_sentence_length: 30,
        han.batch_size: 24
    }
    step, cost, accuracy = sess.run([global_step,  loss, acc], feed_dict)
    time_str = str(int(time.time()))
    print("++++++++++++++++++dev++++++++++++++{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))

save_version=0
for epoch in range(FLAGS.num_epochs):
    print('current epoch %s' % (epoch + 1))
    for i in range(0, 4392, FLAGS.batch_size):
        x = train_x_former[i:i + FLAGS.batch_size]
        y = train_y[i:i + FLAGS.batch_size]
        step = train_step(x, y)
        if step % FLAGS.evaluate_every == 0:
            dev_step(dev_x_former, dev_y)
        
            save_path = saver.save(sess, "former_latter_model/text1/pretrained_lstm.ckpt", global_step=save_version)
            print("saved to %s" % save_path)
            save_version+=1

data1 load finished
Instructions for updating:
seq_dim is deprecated, use seq_axis instead
Instructions for updating:
batch_dim is deprecated, use batch_axis instead
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Instructions for updating:
dim is deprecated, use axis instead
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

current epoch 1
1537762072: step 1, loss 0.733924, acc 0.458333
1537762073: step 2, loss 1.24407, acc 0.583333
1537762074: step 3, loss 1.06877, acc 0.541667
1537762076: step 4, loss 0.732956, acc 0.416667
1537762077: step 5, loss 0.665761, acc 0.583333
1537762078: step 6, loss 0.808744, acc 0.458333
1537762079: step 7, loss 0.634797, acc 0.708333
1537762081: step 8, loss 0.643546, acc 0.625
1537762082: step 9, loss 0.675144, acc 0.5
1537762083: step 10, loss 0.683194, acc 0.5
1537762084: step 11, los

In [12]:
train_x_latter, train_y, dev_x_latter, dev_y = read_data2set()
print("data2 load finished")

tf.reset_default_graph()

sess = tf.InteractiveSession()

han = HAN(vocab_size=FLAGS.vocab_size,
                    num_classes=FLAGS.num_classes,
                    embedding_size=FLAGS.embedding_size,
                    hidden_size=FLAGS.hidden_size)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=han.input_y,
                                                              logits=han.out,
                                                              name='loss'))

output_vec=han.out
input_label=han.input_y
predict = tf.argmax(han.out, axis=1, name='predict')
label = tf.argmax(han.input_y, axis=1, name='label')
acc = tf.reduce_mean(tf.cast(tf.equal(predict, label), tf.float32))
    
global_step = tf.Variable(0, trainable=False)
optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate)
# RNN中常用的梯度截断，防止出现梯度过大难以求导的现象
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), FLAGS.grad_clip)
grads_and_vars = tuple(zip(grads, tvars))
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())



def train_step(x_batch, y_batch):
    feed_dict = {
        han.input_x: x_batch,
        han.input_y: y_batch,
        han.max_sentence_num: 30,
        han.max_sentence_length: 30,
        han.batch_size: 24
    }
    _, step, cost, accuracy = sess.run([train_op, global_step , loss, acc], feed_dict)
    
    time_str = str(int(time.time()))
    print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))

    return step

def dev_step(x_batch, y_batch, writer=None):
    feed_dict = {
        han.input_x: x_batch,
        han.input_y: y_batch,
        han.max_sentence_num: 30,
        han.max_sentence_length: 30,
        han.batch_size: 24
    }
    step, cost, accuracy = sess.run([global_step,  loss, acc], feed_dict)
    time_str = str(int(time.time()))
    print("++++++++++++++++++dev++++++++++++++{}: step {}, loss {:g}, acc {:g}".format(time_str, step, cost, accuracy))

save_version=0
for epoch in range(FLAGS.num_epochs):
    print('current epoch %s' % (epoch + 1))
    for i in range(0, 4392, FLAGS.batch_size):
        x = train_x_latter[i:i + FLAGS.batch_size]
        y = train_y[i:i + FLAGS.batch_size]
        step = train_step(x, y)
        if step % FLAGS.evaluate_every == 0:
            dev_step(dev_x2, dev_y)
        
            save_path = saver.save(sess, "former_latter_model/text2/pretrained_lstm.ckpt", global_step=save_version)
            print("saved to %s" % save_path)
            save_version+=1

data2 load finished
current epoch 1
1537765677: step 1, loss 0.706049, acc 0.583333
1537765678: step 2, loss 0.871344, acc 0.708333
1537765679: step 3, loss 0.883042, acc 0.625
1537765680: step 4, loss 1.05016, acc 0.291667
1537765682: step 5, loss 0.638533, acc 0.666667
1537765683: step 6, loss 0.879647, acc 0.541667
1537765684: step 7, loss 0.60166, acc 0.75
1537765685: step 8, loss 0.649221, acc 0.583333
1537765687: step 9, loss 0.59971, acc 0.708333
1537765688: step 10, loss 0.492634, acc 0.75
1537765689: step 11, loss 0.549772, acc 0.75
1537765691: step 12, loss 0.617509, acc 0.625
1537765692: step 13, loss 0.469264, acc 0.791667
1537765693: step 14, loss 0.37728, acc 0.916667
1537765695: step 15, loss 0.469307, acc 0.666667
1537765696: step 16, loss 0.426078, acc 0.833333
1537765697: step 17, loss 0.439307, acc 0.75
1537765699: step 18, loss 0.369443, acc 0.833333
1537765700: step 19, loss 0.744494, acc 0.541667
1537765701: step 20, loss 0.659898, acc 0.75
1537765702: step 21, lo

## test
For testing, I just use the same data of validate data. We can get their accurary seperately. And we can get their prediction vectors for every news too.

In [42]:
sess = tf.InteractiveSession()
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('former_latter_model/text1'))

predict_value1=[]
right_label1=[]
global avg_acc1
avg_acc1=0

def test_step(x_batch, y_batch, writer=None):
    global avg_acc1
    feed_dict = {
        han.input_x: x_batch,
        han.input_y: y_batch,
        han.max_sentence_num: 30,
        han.max_sentence_length: 30,
        han.batch_size: 24
    }
    cost, accuracy, predict, label = sess.run([loss, acc, output_vec, input_label], feed_dict)
    predict_value1.append(predict)
    right_label1.append(label)
    avg_acc1+=accuracy
    print("loss {:g}, acc {:g}".format(cost, accuracy))

for i in range(0, 1772, FLAGS.batch_size):
    x = dev_x1[i:i + FLAGS.batch_size]
    y = dev_y[i:i + FLAGS.batch_size]
    step = test_step(x, y)

print("average acc is")
print(avg_acc1/len(predict_value1))

INFO:tensorflow:Restoring parameters from former_latter_model/text1\pretrained_lstm.ckpt-11
loss 0.0920556, acc 0.958333
loss 0.0741103, acc 0.958333
loss 0.343748, acc 0.958333
loss 0.00279053, acc 1
loss 0.00466994, acc 1
loss 0.201603, acc 0.958333
loss 0.518266, acc 0.916667
loss 0.26631, acc 0.916667
loss 0.184382, acc 0.916667
loss 0.0496311, acc 1
loss 0.49801, acc 0.958333
loss 0.00631825, acc 1
loss 0.40336, acc 0.958333
loss 0.00293224, acc 1
loss 0.00331286, acc 1
loss 0.132667, acc 0.916667
loss 0.372646, acc 0.916667
loss 0.0773276, acc 0.958333
loss 0.350661, acc 0.958333
loss 0.17061, acc 0.916667
loss 0.00109679, acc 1
loss 0.000491657, acc 1
loss 0.0459353, acc 0.958333
loss 0.0625831, acc 0.958333
loss 0.019225, acc 1
loss 0.0125128, acc 1
loss 0.00478132, acc 1
loss 0.0153141, acc 1
loss 0.214775, acc 0.958333
loss 0.0762558, acc 0.958333
loss 3.28499e-05, acc 1
loss 0.00744325, acc 1
loss 0.48454, acc 0.916667
loss 0.0245959, acc 1
loss 0.00529811, acc 1
loss 0.0204

In [None]:
sess = tf.InteractiveSession()
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint('former_latter_model/text2'))

predict_value2=[]
right_label2=[]

global avg_acc2
avg_acc2=0
def test_step(x_batch, y_batch, writer=None):
    global avg_acc2
    feed_dict = {
        han.input_x: x_batch,
        han.input_y: y_batch,
        han.max_sentence_num: 30,
        han.max_sentence_length: 30,
        han.batch_size: 24
    }
    cost, accuracy, predict, label = sess.run([loss, acc, output_vec, input_label], feed_dict)
    avg_acc2+=accuracy
    predict_value2.append(predict)
    right_label2.append(label)

for i in range(0, 1772, FLAGS.batch_size):
    x = dev_x2[i:i + FLAGS.batch_size]
    y = dev_y[i:i + FLAGS.batch_size]
    step = test_step(x, y)

## voting
We just sum up two prediction vectors for every news, of course you can change some weights here to make the result better.

In [18]:
predict_value1

[array([[-2.87407732,  2.86728835],
        [-3.4724381 ,  4.59155655],
        [-3.40941668,  4.14441061],
        [ 5.61157703, -4.55303717],
        [ 6.62583971, -5.51869488],
        [ 3.55491328, -2.86266589],
        [ 6.71685648, -5.49294853],
        [ 2.25436592, -1.55985236],
        [-4.08949709,  4.50502157],
        [ 6.12130976, -5.66379118],
        [-4.62330961,  4.36618185],
        [-3.8996501 ,  4.58849335],
        [ 2.75561666, -1.92722094],
        [ 4.90062141, -3.15897346],
        [ 3.78916216, -3.94867086],
        [ 5.89220905, -5.29699612],
        [-0.50727808,  1.53815925],
        [-4.52573967,  4.53218937],
        [ 3.3774004 , -2.93087697],
        [-5.15171337,  5.70152235],
        [-5.94892597,  5.97100592],
        [ 7.75181007, -6.27497482],
        [-5.54223347,  5.44656849],
        [-2.66199303,  3.42927122]], dtype=float32),
 array([[-5.91026783,  6.12222385],
        [-3.09878039,  2.62918687],
        [ 8.1371212 , -6.65039206],
        [ 4

In [50]:
former=[]
latter=[]
label=[]
for i in range(len(predict_value1)):
    for j in range(len(predict_value1[0])):
        former.append(predict_value1[i][j][0]+predict_value2[i][j][0])
        latter.append(predict_value1[i][j][1]+predict_value2[i][j][1])
        if right_label1[i][j][0]>right_label1[i][j][1]:
            label.append(1)
        else:
            label.append(0)
            
combine_predict_label=[]
for i in range(len(former)):
    if former[i]>latter[i]:
        combine_predict_label.append(1)
    else:
        combine_predict_label.append(0)
        
count=0
for i in range(len(label)):
    if label[i]==combine_predict_label[i]:
        count+=1
print(count/len(label))

0.9701576576576577
