# NLP - Movie sentiment review - word2vector algorithm


Word2vec, published by Google in 2013, is a neural network implementation that learns distributed representations for words. Other deep or recurrent neural network architectures had been proposed for learning word representations prior to this, but the major problem with these was the long time required to train the models. Word2vec learns quickly relative to other models.

Word2Vec does not need labels in order to create meaningful representations. This is useful, since most data in the real world is unlabeled. If the network is given enough training data (tens of billions of words), it produces word vectors with intriguing characteristics. Words with similar meanings appear in clusters, and clusters are spaced such that some word relationships, such as analogies, can be reproduced using vector math. The famous example is that, with highly trained word vectors, "king - man + woman = queen."

Distributed word vectors are powerful and can be used for many applications, particularly word prediction and translation. Here, we will try to apply them to sentiment analysis.

source: https://www.kaggle.com/c/word2vec-nlp-tutorial#part-2-word-vectors

In [67]:
import nltk
import pandas as pd
import numpy as np

In [68]:
train = pd.read_csv('../word2vec/labeledTrainData.tsv', sep='\t')
test  = pd.read_csv('../word2vec/testData.tsv', sep='\t')

In [69]:
unlabel_train = pd.read_csv('../word2vec/unlabeledTrainData.tsv', 
 sep="\t", quoting=3)
#quoting=3 tells Python to ignore doubled quotes, 
#otherwise you may encounter errors trying to read the file.

In [70]:
print("Read %d labeled train reviews, %d labeled test reviews, " \
 "and %d unlabeled reviews\n" % (train["review"].size,  
 test["review"].size, unlabel_train["review"].size ))

Read 25000 labeled train reviews, 25000 labeled test reviews, and 50000 unlabeled reviews



In [71]:
# Import various modules for string cleaning
from bs4 import BeautifulSoup
import re
from nltk.corpus import stopwords

In [72]:
#copy the function we defined in previous exercise
def review_to_wordlist( review, remove_stopwords=False ):
    # Function to convert a document to a sequence of words,
    # optionally removing stop words.  Returns a list of words.
    #
    # 1. Remove HTML
    review_text = BeautifulSoup(review).get_text()
    #  
    # 2. Remove non-letters
    review_text = re.sub("[^a-zA-Z]"," ", review_text)
    #
    # 3. Convert words to lower case and split them
    words = review_text.lower().split()
    #
    # 4. Optionally remove stop words (false by default)
    if remove_stopwords:
        stops = set(stopwords.words("english"))
        words = [w for w in words if not w in stops]
    #
    # 5. Return a list of words
    return(words)

In [73]:
print(review_to_wordlist(test['review'][0],remove_stopwords=False))

['naturally', 'in', 'a', 'film', 'who', 's', 'main', 'themes', 'are', 'of', 'mortality', 'nostalgia', 'and', 'loss', 'of', 'innocence', 'it', 'is', 'perhaps', 'not', 'surprising', 'that', 'it', 'is', 'rated', 'more', 'highly', 'by', 'older', 'viewers', 'than', 'younger', 'ones', 'however', 'there', 'is', 'a', 'craftsmanship', 'and', 'completeness', 'to', 'the', 'film', 'which', 'anyone', 'can', 'enjoy', 'the', 'pace', 'is', 'steady', 'and', 'constant', 'the', 'characters', 'full', 'and', 'engaging', 'the', 'relationships', 'and', 'interactions', 'natural', 'showing', 'that', 'you', 'do', 'not', 'need', 'floods', 'of', 'tears', 'to', 'show', 'emotion', 'screams', 'to', 'show', 'fear', 'shouting', 'to', 'show', 'dispute', 'or', 'violence', 'to', 'show', 'anger', 'naturally', 'joyce', 's', 'short', 'story', 'lends', 'the', 'film', 'a', 'ready', 'made', 'structure', 'as', 'perfect', 'as', 'a', 'polished', 'diamond', 'but', 'the', 'small', 'changes', 'huston', 'makes', 'such', 'as', 'the', 

In [74]:
# Download now the punkt tokenizer for sentence splitting
#nltk.download()   

In [77]:
# Load the punkt tokenizer
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

In [88]:
test_limited = test[:1000] #as model below takes long time

In [76]:
# Define a function to split a review into parsed sentences
def review_to_sentences( review, tokenizer, remove_stopwords=False ):
    # Function to split a review into parsed sentences. Returns a 
    # list of sentences, where each sentence is a list of words
    #
    # 1. Use the NLTK tokenizer to split the paragraph into sentences
    raw_sentences = tokenizer.tokenize(review.strip())
    #
    # 2. Loop over each sentence
    sentences = []
    for raw_sentence in raw_sentences:
        # If a sentence is empty, skip it
        if len(raw_sentence) > 0:
            # Otherwise, call review_to_wordlist to get a list of words
            sentences.append( review_to_wordlist( raw_sentence, \
              remove_stopwords ))
    #
    # Return the list of sentences (each sentence is a list of words,
    # so this returns a list of lists
    return sentences

Now we can apply this function to prepare our data for input to Word2Vec. This will take some time.


A minor detail to note is the difference between the "+=" and "append" when it comes to Python lists. In many applications the two are interchangeable, but here they are not. If you are appending a list of lists to another list of lists, "append" will only append the first list; you need to use "+=" in order to join all of the lists at once.

In [99]:
sentences = []  # Initialize an empty list of sentences

print ("Parsing sentences from training set")

#for review in train['review']:
#    sentences += review_to_sentences(review, tokenizer)
    
print ("Parsing sentences from unlabeled set")

for review in test["review"]:
    sentences += review_to_sentences(review, tokenizer)
 

Parsing sentences from training set
Parsing sentences from unlabeled set


  ' Beautiful Soup.' % markup)
  ' Beautiful Soup.' % markup)


You may get a few warnings from BeautifulSoup about URLs in the sentences. These are nothing to worry about (although you may want to consider removing URLs when cleaning the text). 

We can take a look at the output to see how this differs from Part 1:

In [100]:
# Check how many sentences we have in total
print(len(sentences))

261737


In [101]:
print(sentences[0])

['naturally', 'in', 'a', 'film', 'who', 's', 'main', 'themes', 'are', 'of', 'mortality', 'nostalgia', 'and', 'loss', 'of', 'innocence', 'it', 'is', 'perhaps', 'not', 'surprising', 'that', 'it', 'is', 'rated', 'more', 'highly', 'by', 'older', 'viewers', 'than', 'younger', 'ones']


In [102]:
print(sentences[1])

['however', 'there', 'is', 'a', 'craftsmanship', 'and', 'completeness', 'to', 'the', 'film', 'which', 'anyone', 'can', 'enjoy']


**Training and Saving Your Model**

Choosing parameters is not easy, but once we have chosen our parameters, creating a Word2Vec model is straightforward:

In [103]:
# Import the built-in logging module and configure it so that Word2Vec 
# creates nice output messages

import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',\
    level=logging.INFO)

In [104]:
# Set values for various parameters
num_features = 300    # Word vector dimensionality                      
min_word_count = 40   # Minimum word count                        
num_workers = 4       # Number of threads to run in parallel
context = 10          # Context window size                                                                                    
downsampling = 1e-3   # Downsample setting for frequent words

In [105]:
# Initialize and train the model (this will take some time)
from gensim.models import word2vec
print('Training model...')

model = word2vec.Word2Vec(sentences, workers=num_workers, \
                          size=num_features, min_count=min_word_count, \
                         window=context, sample=downsampling)

  "C extension not loaded, training will be slow. "
2019-01-16 09:59:54,385 : INFO : collecting all words and their counts
2019-01-16 09:59:54,386 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2019-01-16 09:59:54,448 : INFO : PROGRESS: at sentence #10000, processed 225327 words, keeping 17351 word types
2019-01-16 09:59:54,495 : INFO : PROGRESS: at sentence #20000, processed 447171 words, keeping 24545 word types
2019-01-16 09:59:54,549 : INFO : PROGRESS: at sentence #30000, processed 667214 words, keeping 29793 word types


Training model...


2019-01-16 09:59:54,593 : INFO : PROGRESS: at sentence #40000, processed 886037 words, keeping 33761 word types
2019-01-16 09:59:54,647 : INFO : PROGRESS: at sentence #50000, processed 1106874 words, keeping 37132 word types
2019-01-16 09:59:54,696 : INFO : PROGRESS: at sentence #60000, processed 1333859 words, keeping 40339 word types
2019-01-16 09:59:54,742 : INFO : PROGRESS: at sentence #70000, processed 1555354 words, keeping 43109 word types
2019-01-16 09:59:54,790 : INFO : PROGRESS: at sentence #80000, processed 1774992 words, keeping 45399 word types
2019-01-16 09:59:54,838 : INFO : PROGRESS: at sentence #90000, processed 1993856 words, keeping 47720 word types
2019-01-16 09:59:54,885 : INFO : PROGRESS: at sentence #100000, processed 2207132 words, keeping 49733 word types
2019-01-16 09:59:54,933 : INFO : PROGRESS: at sentence #110000, processed 2427132 words, keeping 51549 word types
2019-01-16 09:59:54,979 : INFO : PROGRESS: at sentence #120000, processed 2650775 words, keepin

2019-01-16 10:04:30,064 : INFO : EPOCH 1 - PROGRESS: at 14.63% examples, 2110 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:04:31,067 : INFO : EPOCH 1 - PROGRESS: at 14.80% examples, 2127 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:04:33,072 : INFO : EPOCH 1 - PROGRESS: at 15.15% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:04:42,493 : INFO : EPOCH 1 - PROGRESS: at 15.32% examples, 2114 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:04:43,678 : INFO : EPOCH 1 - PROGRESS: at 15.49% examples, 2129 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:04:45,704 : INFO : EPOCH 1 - PROGRESS: at 15.85% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:04:55,060 : INFO : EPOCH 1 - PROGRESS: at 16.02% examples, 2117 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:04:56,238 : INFO : EPOCH 1 - PROGRESS: at 16.18% examples, 2131 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:04:58,137 : INFO : EPOCH 1 - PROGRESS: at 16.53% examples, 2162 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 10:09:34,952 : INFO : EPOCH 1 - PROGRESS: at 31.69% examples, 2164 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:09:42,848 : INFO : EPOCH 1 - PROGRESS: at 31.87% examples, 2147 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:09:44,872 : INFO : EPOCH 1 - PROGRESS: at 32.05% examples, 2151 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:09:47,695 : INFO : EPOCH 1 - PROGRESS: at 32.38% examples, 2163 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:09:55,186 : INFO : EPOCH 1 - PROGRESS: at 32.57% examples, 2148 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:09:57,580 : INFO : EPOCH 1 - PROGRESS: at 32.75% examples, 2150 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:10:00,215 : INFO : EPOCH 1 - PROGRESS: at 33.08% examples, 2164 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:10:07,942 : INFO : EPOCH 1 - PROGRESS: at 33.25% examples, 2148 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:10:10,213 : INFO : EPOCH 1 - PROGRESS: at 33.42% examples, 2151 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 10:14:49,683 : INFO : EPOCH 1 - PROGRESS: at 48.59% examples, 2149 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:14:52,178 : INFO : EPOCH 1 - PROGRESS: at 48.96% examples, 2159 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:14:57,387 : INFO : EPOCH 1 - PROGRESS: at 49.13% examples, 2154 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:15:02,075 : INFO : EPOCH 1 - PROGRESS: at 49.29% examples, 2150 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:15:05,082 : INFO : EPOCH 1 - PROGRESS: at 49.65% examples, 2158 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:15:09,725 : INFO : EPOCH 1 - PROGRESS: at 49.83% examples, 2154 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:15:14,482 : INFO : EPOCH 1 - PROGRESS: at 50.00% examples, 2150 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:15:17,592 : INFO : EPOCH 1 - PROGRESS: at 50.36% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:15:22,221 : INFO : EPOCH 1 - PROGRESS: at 50.54% examples, 2155 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 10:19:30,927 : INFO : EPOCH 1 - PROGRESS: at 64.10% examples, 2157 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:19:33,982 : INFO : EPOCH 1 - PROGRESS: at 64.27% examples, 2157 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:19:40,625 : INFO : EPOCH 1 - PROGRESS: at 64.43% examples, 2150 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:19:41,768 : INFO : EPOCH 1 - PROGRESS: at 64.61% examples, 2154 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:19:43,530 : INFO : EPOCH 1 - PROGRESS: at 64.78% examples, 2156 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:19:46,452 : INFO : EPOCH 1 - PROGRESS: at 64.94% examples, 2157 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:19:53,206 : INFO : EPOCH 1 - PROGRESS: at 65.12% examples, 2150 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:19:55,979 : INFO : EPOCH 1 - PROGRESS: at 65.47% examples, 2157 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:19:58,906 : INFO : EPOCH 1 - PROGRESS: at 65.66% examples, 2157 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 10:23:29,659 : INFO : EPOCH 1 - PROGRESS: at 77.26% examples, 2158 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:23:30,808 : INFO : EPOCH 1 - PROGRESS: at 77.44% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:23:38,488 : INFO : EPOCH 1 - PROGRESS: at 77.62% examples, 2155 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:23:39,704 : INFO : EPOCH 1 - PROGRESS: at 77.79% examples, 2157 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:23:42,369 : INFO : EPOCH 1 - PROGRESS: at 77.97% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:23:50,956 : INFO : EPOCH 1 - PROGRESS: at 78.32% examples, 2155 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:23:52,154 : INFO : EPOCH 1 - PROGRESS: at 78.50% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:23:54,928 : INFO : EPOCH 1 - PROGRESS: at 78.68% examples, 2158 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:24:03,237 : INFO : EPOCH 1 - PROGRESS: at 79.04% examples, 2155 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 10:28:15,220 : INFO : EPOCH 1 - PROGRESS: at 92.87% examples, 2156 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:28:16,711 : INFO : EPOCH 1 - PROGRESS: at 93.03% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:28:18,768 : INFO : EPOCH 1 - PROGRESS: at 93.21% examples, 2160 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:28:19,885 : INFO : EPOCH 1 - PROGRESS: at 93.38% examples, 2162 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:28:27,712 : INFO : EPOCH 1 - PROGRESS: at 93.55% examples, 2156 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:28:29,389 : INFO : EPOCH 1 - PROGRESS: at 93.73% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:28:31,058 : INFO : EPOCH 1 - PROGRESS: at 93.89% examples, 2160 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:28:32,283 : INFO : EPOCH 1 - PROGRESS: at 94.06% examples, 2163 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:28:40,284 : INFO : EPOCH 1 - PROGRESS: at 94.23% examples, 2157 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 10:34:40,231 : INFO : EPOCH 2 - PROGRESS: at 13.93% examples, 2102 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:34:42,153 : INFO : EPOCH 2 - PROGRESS: at 14.26% examples, 2138 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:34:52,573 : INFO : EPOCH 2 - PROGRESS: at 14.63% examples, 2107 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:34:54,525 : INFO : EPOCH 2 - PROGRESS: at 14.97% examples, 2141 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:35:04,996 : INFO : EPOCH 2 - PROGRESS: at 15.32% examples, 2110 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:35:07,289 : INFO : EPOCH 2 - PROGRESS: at 15.67% examples, 2141 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:35:17,477 : INFO : EPOCH 2 - PROGRESS: at 16.02% examples, 2114 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:35:19,704 : INFO : EPOCH 2 - PROGRESS: at 16.36% examples, 2144 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:35:29,852 : INFO : EPOCH 2 - PROGRESS: at 16.67% examples, 2117 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 10:41:19,658 : INFO : EPOCH 2 - PROGRESS: at 36.00% examples, 2151 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:41:22,649 : INFO : EPOCH 2 - PROGRESS: at 36.33% examples, 2162 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:41:31,947 : INFO : EPOCH 2 - PROGRESS: at 36.70% examples, 2152 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:41:35,030 : INFO : EPOCH 2 - PROGRESS: at 37.08% examples, 2162 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:41:36,110 : INFO : EPOCH 2 - PROGRESS: at 37.27% examples, 2169 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:41:44,323 : INFO : EPOCH 2 - PROGRESS: at 37.44% examples, 2153 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:41:47,443 : INFO : EPOCH 2 - PROGRESS: at 37.80% examples, 2163 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:41:48,750 : INFO : EPOCH 2 - PROGRESS: at 37.97% examples, 2169 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:41:56,545 : INFO : EPOCH 2 - PROGRESS: at 38.18% examples, 2154 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 10:45:48,798 : INFO : EPOCH 2 - PROGRESS: at 51.04% examples, 2166 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:45:54,226 : INFO : EPOCH 2 - PROGRESS: at 51.20% examples, 2160 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:45:55,315 : INFO : EPOCH 2 - PROGRESS: at 51.35% examples, 2165 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:45:59,166 : INFO : EPOCH 2 - PROGRESS: at 51.53% examples, 2164 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:46:01,370 : INFO : EPOCH 2 - PROGRESS: at 51.72% examples, 2166 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:46:06,672 : INFO : EPOCH 2 - PROGRESS: at 51.90% examples, 2161 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:46:11,529 : INFO : EPOCH 2 - PROGRESS: at 52.28% examples, 2164 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:46:14,080 : INFO : EPOCH 2 - PROGRESS: at 52.45% examples, 2166 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:46:19,424 : INFO : EPOCH 2 - PROGRESS: at 52.62% examples, 2161 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 10:50:53,361 : INFO : EPOCH 2 - PROGRESS: at 67.55% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:50:56,994 : INFO : EPOCH 2 - PROGRESS: at 67.70% examples, 2160 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:51:04,183 : INFO : EPOCH 2 - PROGRESS: at 68.05% examples, 2159 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:51:06,046 : INFO : EPOCH 2 - PROGRESS: at 68.21% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:51:09,527 : INFO : EPOCH 2 - PROGRESS: at 68.41% examples, 2160 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:51:16,507 : INFO : EPOCH 2 - PROGRESS: at 68.79% examples, 2159 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:51:18,629 : INFO : EPOCH 2 - PROGRESS: at 68.95% examples, 2161 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:51:21,924 : INFO : EPOCH 2 - PROGRESS: at 69.15% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:51:29,123 : INFO : EPOCH 2 - PROGRESS: at 69.51% examples, 2159 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 10:56:32,822 : INFO : EPOCH 2 - PROGRESS: at 86.12% examples, 2158 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:56:35,656 : INFO : EPOCH 2 - PROGRESS: at 86.31% examples, 2158 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:56:45,670 : INFO : EPOCH 2 - PROGRESS: at 86.83% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:56:48,596 : INFO : EPOCH 2 - PROGRESS: at 87.01% examples, 2158 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:56:58,587 : INFO : EPOCH 2 - PROGRESS: at 87.50% examples, 2157 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:57:01,473 : INFO : EPOCH 2 - PROGRESS: at 87.67% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:57:11,296 : INFO : EPOCH 2 - PROGRESS: at 88.20% examples, 2157 words/s, in_qsize 7, out_qsize 0
2019-01-16 10:57:14,131 : INFO : EPOCH 2 - PROGRESS: at 88.38% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 10:57:24,265 : INFO : EPOCH 2 - PROGRESS: at 88.91% examples, 2157 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 11:03:05,948 : INFO : EPOCH 3 - PROGRESS: at 7.32% examples, 2077 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:03:07,097 : INFO : EPOCH 3 - PROGRESS: at 7.50% examples, 2108 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:03:16,840 : INFO : EPOCH 3 - PROGRESS: at 7.68% examples, 2017 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:03:18,618 : INFO : EPOCH 3 - PROGRESS: at 8.02% examples, 2083 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:03:19,639 : INFO : EPOCH 3 - PROGRESS: at 8.20% examples, 2113 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:03:29,193 : INFO : EPOCH 3 - PROGRESS: at 8.38% examples, 2031 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:03:31,120 : INFO : EPOCH 3 - PROGRESS: at 8.72% examples, 2089 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:03:32,487 : INFO : EPOCH 3 - PROGRESS: at 8.89% examples, 2113 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:03:41,672 : INFO : EPOCH 3 - PROGRESS: at 9.07% examples, 2041 words/s, in_qsize 8, out_qsize 0
2019-01-16

2019-01-16 11:08:21,895 : INFO : EPOCH 3 - PROGRESS: at 24.20% examples, 2100 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:08:24,294 : INFO : EPOCH 3 - PROGRESS: at 24.53% examples, 2119 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:08:28,424 : INFO : EPOCH 3 - PROGRESS: at 24.72% examples, 2115 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:08:34,227 : INFO : EPOCH 3 - PROGRESS: at 24.88% examples, 2103 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:08:36,781 : INFO : EPOCH 3 - PROGRESS: at 25.23% examples, 2120 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:08:41,451 : INFO : EPOCH 3 - PROGRESS: at 25.38% examples, 2114 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:08:46,657 : INFO : EPOCH 3 - PROGRESS: at 25.56% examples, 2105 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:08:49,355 : INFO : EPOCH 3 - PROGRESS: at 25.88% examples, 2122 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:08:54,852 : INFO : EPOCH 3 - PROGRESS: at 26.05% examples, 2112 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 11:13:36,375 : INFO : EPOCH 3 - PROGRESS: at 41.46% examples, 2120 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:13:37,501 : INFO : EPOCH 3 - PROGRESS: at 41.82% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:13:41,564 : INFO : EPOCH 3 - PROGRESS: at 41.98% examples, 2132 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:13:49,045 : INFO : EPOCH 3 - PROGRESS: at 42.15% examples, 2120 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:13:50,157 : INFO : EPOCH 3 - PROGRESS: at 42.47% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:13:54,365 : INFO : EPOCH 3 - PROGRESS: at 42.65% examples, 2132 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:14:01,892 : INFO : EPOCH 3 - PROGRESS: at 42.80% examples, 2120 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:14:03,195 : INFO : EPOCH 3 - PROGRESS: at 43.16% examples, 2134 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:14:07,411 : INFO : EPOCH 3 - PROGRESS: at 43.34% examples, 2131 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 11:18:05,796 : INFO : EPOCH 3 - PROGRESS: at 56.20% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:08,125 : INFO : EPOCH 3 - PROGRESS: at 56.37% examples, 2136 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:12,873 : INFO : EPOCH 3 - PROGRESS: at 56.55% examples, 2133 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:14,117 : INFO : EPOCH 3 - PROGRESS: at 56.73% examples, 2137 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:18,271 : INFO : EPOCH 3 - PROGRESS: at 56.90% examples, 2135 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:20,641 : INFO : EPOCH 3 - PROGRESS: at 57.09% examples, 2136 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:25,531 : INFO : EPOCH 3 - PROGRESS: at 57.27% examples, 2133 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:18:26,827 : INFO : EPOCH 3 - PROGRESS: at 57.46% examples, 2137 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:18:31,045 : INFO : EPOCH 3 - PROGRESS: at 57.62% examples, 2135 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 11:22:25,839 : INFO : EPOCH 3 - PROGRESS: at 70.40% examples, 2137 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:22:27,651 : INFO : EPOCH 3 - PROGRESS: at 70.57% examples, 2139 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:22:35,535 : INFO : EPOCH 3 - PROGRESS: at 70.74% examples, 2131 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:22:38,408 : INFO : EPOCH 3 - PROGRESS: at 71.07% examples, 2137 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:22:40,168 : INFO : EPOCH 3 - PROGRESS: at 71.25% examples, 2139 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:22:48,437 : INFO : EPOCH 3 - PROGRESS: at 71.42% examples, 2131 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:22:50,852 : INFO : EPOCH 3 - PROGRESS: at 71.74% examples, 2138 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:22:52,857 : INFO : EPOCH 3 - PROGRESS: at 71.90% examples, 2140 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:23:01,269 : INFO : EPOCH 3 - PROGRESS: at 72.08% examples, 2131 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 11:27:54,684 : INFO : EPOCH 3 - PROGRESS: at 88.38% examples, 2140 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:28:04,505 : INFO : EPOCH 3 - PROGRESS: at 88.74% examples, 2136 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:28:07,344 : INFO : EPOCH 3 - PROGRESS: at 89.09% examples, 2140 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:28:08,379 : INFO : EPOCH 3 - PROGRESS: at 89.24% examples, 2143 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:28:17,080 : INFO : EPOCH 3 - PROGRESS: at 89.42% examples, 2136 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:28:19,856 : INFO : EPOCH 3 - PROGRESS: at 89.74% examples, 2140 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:28:21,125 : INFO : EPOCH 3 - PROGRESS: at 89.91% examples, 2143 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:28:29,473 : INFO : EPOCH 3 - PROGRESS: at 90.09% examples, 2136 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:28:32,253 : INFO : EPOCH 3 - PROGRESS: at 90.45% examples, 2141 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 11:33:43,751 : INFO : EPOCH 4 - PROGRESS: at 7.50% examples, 2158 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:33:54,824 : INFO : EPOCH 4 - PROGRESS: at 7.68% examples, 2043 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:33:56,105 : INFO : EPOCH 4 - PROGRESS: at 8.04% examples, 2116 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:34:07,142 : INFO : EPOCH 4 - PROGRESS: at 8.38% examples, 2056 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:34:08,559 : INFO : EPOCH 4 - PROGRESS: at 8.72% examples, 2120 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:34:19,947 : INFO : EPOCH 4 - PROGRESS: at 9.07% examples, 2060 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:34:21,340 : INFO : EPOCH 4 - PROGRESS: at 9.42% examples, 2121 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:34:32,250 : INFO : EPOCH 4 - PROGRESS: at 9.74% examples, 2069 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:34:33,942 : INFO : EPOCH 4 - PROGRESS: at 10.09% examples, 2123 words/s, in_qsize 7, out_qsize 0
2019-01-1

2019-01-16 11:39:42,155 : INFO : EPOCH 4 - PROGRESS: at 26.76% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:39:49,157 : INFO : EPOCH 4 - PROGRESS: at 26.96% examples, 2118 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:39:50,558 : INFO : EPOCH 4 - PROGRESS: at 27.30% examples, 2139 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:39:54,815 : INFO : EPOCH 4 - PROGRESS: at 27.46% examples, 2135 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:40:01,610 : INFO : EPOCH 4 - PROGRESS: at 27.63% examples, 2120 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:40:02,971 : INFO : EPOCH 4 - PROGRESS: at 27.99% examples, 2141 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:40:07,537 : INFO : EPOCH 4 - PROGRESS: at 28.16% examples, 2135 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:40:14,181 : INFO : EPOCH 4 - PROGRESS: at 28.33% examples, 2121 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:40:15,426 : INFO : EPOCH 4 - PROGRESS: at 28.70% examples, 2142 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 11:44:56,233 : INFO : EPOCH 4 - PROGRESS: at 44.01% examples, 2141 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:45:03,468 : INFO : EPOCH 4 - PROGRESS: at 44.18% examples, 2130 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:45:06,154 : INFO : EPOCH 4 - PROGRESS: at 44.37% examples, 2131 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:45:07,238 : INFO : EPOCH 4 - PROGRESS: at 44.54% examples, 2137 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:45:08,938 : INFO : EPOCH 4 - PROGRESS: at 44.70% examples, 2141 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:45:16,145 : INFO : EPOCH 4 - PROGRESS: at 44.88% examples, 2130 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:45:18,625 : INFO : EPOCH 4 - PROGRESS: at 45.04% examples, 2132 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:45:19,690 : INFO : EPOCH 4 - PROGRESS: at 45.20% examples, 2137 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:45:21,616 : INFO : EPOCH 4 - PROGRESS: at 45.38% examples, 2141 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 11:49:35,279 : INFO : EPOCH 4 - PROGRESS: at 58.76% examples, 2130 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:49:38,040 : INFO : EPOCH 4 - PROGRESS: at 59.10% examples, 2137 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:49:45,744 : INFO : EPOCH 4 - PROGRESS: at 59.27% examples, 2129 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:49:47,739 : INFO : EPOCH 4 - PROGRESS: at 59.44% examples, 2131 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:49:50,493 : INFO : EPOCH 4 - PROGRESS: at 59.80% examples, 2138 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:49:58,096 : INFO : EPOCH 4 - PROGRESS: at 59.98% examples, 2129 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:49:59,990 : INFO : EPOCH 4 - PROGRESS: at 60.14% examples, 2132 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:50:03,100 : INFO : EPOCH 4 - PROGRESS: at 60.48% examples, 2138 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:50:10,778 : INFO : EPOCH 4 - PROGRESS: at 60.65% examples, 2129 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 11:54:05,736 : INFO : EPOCH 4 - PROGRESS: at 73.49% examples, 2132 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:54:12,418 : INFO : EPOCH 4 - PROGRESS: at 73.84% examples, 2131 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:54:14,993 : INFO : EPOCH 4 - PROGRESS: at 74.00% examples, 2132 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:54:18,364 : INFO : EPOCH 4 - PROGRESS: at 74.17% examples, 2132 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:54:19,386 : INFO : EPOCH 4 - PROGRESS: at 74.34% examples, 2135 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:54:25,009 : INFO : EPOCH 4 - PROGRESS: at 74.51% examples, 2132 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:54:27,465 : INFO : EPOCH 4 - PROGRESS: at 74.68% examples, 2133 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:54:31,024 : INFO : EPOCH 4 - PROGRESS: at 74.84% examples, 2132 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:54:32,148 : INFO : EPOCH 4 - PROGRESS: at 75.01% examples, 2135 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 11:58:10,687 : INFO : EPOCH 4 - PROGRESS: at 86.83% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:58:13,502 : INFO : EPOCH 4 - PROGRESS: at 87.01% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:58:16,324 : INFO : EPOCH 4 - PROGRESS: at 87.17% examples, 2135 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:58:19,684 : INFO : EPOCH 4 - PROGRESS: at 87.35% examples, 2134 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:58:23,661 : INFO : EPOCH 4 - PROGRESS: at 87.50% examples, 2133 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:58:26,324 : INFO : EPOCH 4 - PROGRESS: at 87.67% examples, 2134 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:58:29,148 : INFO : EPOCH 4 - PROGRESS: at 87.84% examples, 2135 words/s, in_qsize 7, out_qsize 0
2019-01-16 11:58:32,461 : INFO : EPOCH 4 - PROGRESS: at 88.01% examples, 2134 words/s, in_qsize 8, out_qsize 0
2019-01-16 11:58:36,415 : INFO : EPOCH 4 - PROGRESS: at 88.20% examples, 2133 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 12:02:35,323 : INFO : EPOCH 5 - PROGRESS: at 0.15% examples, 506 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:02:48,073 : INFO : EPOCH 5 - PROGRESS: at 0.87% examples, 1314 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:03:02,052 : INFO : EPOCH 5 - PROGRESS: at 1.55% examples, 1537 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:03:15,349 : INFO : EPOCH 5 - PROGRESS: at 2.22% examples, 1664 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:03:27,886 : INFO : EPOCH 5 - PROGRESS: at 2.88% examples, 1758 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:03:40,825 : INFO : EPOCH 5 - PROGRESS: at 3.55% examples, 1816 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:03:53,742 : INFO : EPOCH 5 - PROGRESS: at 4.22% examples, 1857 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:04:06,229 : INFO : EPOCH 5 - PROGRESS: at 4.89% examples, 1896 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:04:18,365 : INFO : EPOCH 5 - PROGRESS: at 5.58% examples, 1931 words/s, in_qsize 8, out_qsize 0
2019-01-16 

2019-01-16 12:11:10,144 : INFO : EPOCH 5 - PROGRESS: at 28.33% examples, 2126 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:11:14,729 : INFO : EPOCH 5 - PROGRESS: at 28.86% examples, 2147 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:11:22,819 : INFO : EPOCH 5 - PROGRESS: at 29.04% examples, 2127 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:11:27,534 : INFO : EPOCH 5 - PROGRESS: at 29.55% examples, 2147 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:11:35,378 : INFO : EPOCH 5 - PROGRESS: at 29.72% examples, 2129 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:11:39,797 : INFO : EPOCH 5 - PROGRESS: at 30.25% examples, 2148 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:11:47,760 : INFO : EPOCH 5 - PROGRESS: at 30.42% examples, 2130 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:11:52,587 : INFO : EPOCH 5 - PROGRESS: at 30.97% examples, 2148 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:12:00,102 : INFO : EPOCH 5 - PROGRESS: at 31.17% examples, 2132 words/s, in_qsize 8, out_qsize 0
2

2019-01-16 12:17:36,870 : INFO : EPOCH 5 - PROGRESS: at 49.83% examples, 2152 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:17:37,876 : INFO : EPOCH 5 - PROGRESS: at 50.00% examples, 2157 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:17:45,023 : INFO : EPOCH 5 - PROGRESS: at 50.36% examples, 2155 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:17:49,687 : INFO : EPOCH 5 - PROGRESS: at 50.54% examples, 2151 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:17:51,220 : INFO : EPOCH 5 - PROGRESS: at 50.88% examples, 2162 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:17:58,144 : INFO : EPOCH 5 - PROGRESS: at 51.04% examples, 2154 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:18:02,467 : INFO : EPOCH 5 - PROGRESS: at 51.20% examples, 2151 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:18:03,600 : INFO : EPOCH 5 - PROGRESS: at 51.53% examples, 2163 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:18:10,611 : INFO : EPOCH 5 - PROGRESS: at 51.72% examples, 2154 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 12:22:44,837 : INFO : EPOCH 5 - PROGRESS: at 66.84% examples, 2161 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:22:47,328 : INFO : EPOCH 5 - PROGRESS: at 67.02% examples, 2162 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:22:49,182 : INFO : EPOCH 5 - PROGRESS: at 67.21% examples, 2165 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:22:57,296 : INFO : EPOCH 5 - PROGRESS: at 67.55% examples, 2161 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:22:59,692 : INFO : EPOCH 5 - PROGRESS: at 67.70% examples, 2163 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:23:01,811 : INFO : EPOCH 5 - PROGRESS: at 67.86% examples, 2165 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:23:09,792 : INFO : EPOCH 5 - PROGRESS: at 68.21% examples, 2162 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:23:12,005 : INFO : EPOCH 5 - PROGRESS: at 68.41% examples, 2163 words/s, in_qsize 8, out_qsize 0
2019-01-16 12:23:14,239 : INFO : EPOCH 5 - PROGRESS: at 68.60% examples, 2165 words/s, in_qsize 7, out_qsize 0
2

2019-01-16 12:28:34,128 : INFO : EPOCH 5 - PROGRESS: at 86.31% examples, 2167 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:28:35,794 : INFO : EPOCH 5 - PROGRESS: at 86.50% examples, 2169 words/s, in_qsize 7, out_qsize 0
2019-01-16 12:28:37,188 : INFO : EPOCH 5 - PROGRESS: at 86.65% examples, 2171 words/s, in_qsize 8, out_qsize 0
2019-01-16 13:03:05,018 : INFO : EPOCH 5 - PROGRESS: at 87.01% examples, 2165 words/s, in_qsize 7, out_qsize 0
2019-01-16 13:03:06,868 : INFO : EPOCH 5 - PROGRESS: at 87.17% examples, 2167 words/s, in_qsize 7, out_qsize 0
2019-01-16 13:03:08,644 : INFO : EPOCH 5 - PROGRESS: at 87.35% examples, 2169 words/s, in_qsize 8, out_qsize 0
2019-01-16 13:03:27,047 : INFO : EPOCH 5 - PROGRESS: at 87.67% examples, 2152 words/s, in_qsize 8, out_qsize 0
2019-01-16 13:03:29,479 : INFO : EPOCH 5 - PROGRESS: at 87.84% examples, 2153 words/s, in_qsize 8, out_qsize 0
2019-01-16 13:03:31,385 : INFO : EPOCH 5 - PROGRESS: at 88.01% examples, 2155 words/s, in_qsize 8, out_qsize 0
2

**Exploring the Model Results**

- 1. The `"doesnt_match"` function will try to deduce which word in a set is most dissimilar from the others.
- 2. We can also use the `"most_similar"` function to get insight into the model's word clusters.

In [115]:
# If you don't plan to train the model any further, calling 
# init_sims will make the model much more memory-efficient.
model.init_sims(replace=True)

# It can be helpful to create a meaningful model name and 
# save the model for later use. You can load it later using Word2Vec.load()

model_name = "300features_40minwords_10context"
model.save(model_name)

2019-01-16 13:35:48,714 : INFO : precomputing L2-norms of word weight vectors
2019-01-16 13:35:48,807 : INFO : saving Word2Vec object under 300features_40minwords_10context, separately None
2019-01-16 13:35:48,808 : INFO : not storing attribute vectors_norm
2019-01-16 13:35:48,810 : INFO : not storing attribute cum_table
2019-01-16 13:35:49,005 : INFO : saved 300features_40minwords_10context


In [106]:
 model.most_similar("man")

  """Entry point for launching an IPython kernel.
2019-01-16 13:18:24,732 : INFO : precomputing L2-norms of word weight vectors


[('woman', 0.6946828365325928),
 ('soldier', 0.628015398979187),
 ('boy', 0.6247947216033936),
 ('lady', 0.6045520305633545),
 ('doctor', 0.5911948084831238),
 ('priest', 0.5848212242126465),
 ('guy', 0.5813981890678406),
 ('murderer', 0.5792659521102905),
 ('person', 0.5668310523033142),
 ('scientist', 0.551520824432373)]

In [107]:
 model.most_similar("awful")

  """Entry point for launching an IPython kernel.


[('terrible', 0.8381466269493103),
 ('horrible', 0.8216108083724976),
 ('pathetic', 0.7252441644668579),
 ('atrocious', 0.7213294506072998),
 ('dreadful', 0.7104625701904297),
 ('lame', 0.7099960446357727),
 ('ridiculous', 0.6906020641326904),
 ('lousy', 0.6854507923126221),
 ('bad', 0.6728737354278564),
 ('stupid', 0.6699348092079163)]

In [108]:
model.most_similar("queen")

  """Entry point for launching an IPython kernel.


[('bride', 0.7049013376235962),
 ('princess', 0.6959552764892578),
 ('fatale', 0.6901453733444214),
 ('blond', 0.6856786012649536),
 ('rita', 0.6818348169326782),
 ('blonde', 0.6741431951522827),
 ('prince', 0.6683547496795654),
 ('babe', 0.6662310361862183),
 ('mama', 0.6534366607666016),
 ('carol', 0.6514571905136108)]

In [109]:
model.most_similar("kitchen")

  """Entry point for launching an IPython kernel.


[('bathroom', 0.7400612235069275),
 ('swimming', 0.7324721217155457),
 ('pool', 0.7258284091949463),
 ('coffee', 0.7090977430343628),
 ('mud', 0.7001243829727173),
 ('crashes', 0.6994427442550659),
 ('flames', 0.6978206634521484),
 ('hatch', 0.6972500085830688),
 ('cabin', 0.6898572444915771),
 ('cigarette', 0.6883264183998108)]

In [110]:
model.doesnt_match("man woman child kitchen".split())

  """Entry point for launching an IPython kernel.


'kitchen'

In [111]:
model.doesnt_match("france england germany berlin".split())

  """Entry point for launching an IPython kernel.


'berlin'

In [113]:
model.doesnt_match("paris berlin london austria".split())

  """Entry point for launching an IPython kernel.


'berlin'

In [121]:
#To call the model later
# Load the model that we created in Part 2
#from gensim.models import Word2Vec
#model = Word2Vec.load("300features_40minwords_10context")
#model