# Comparison of CBOW, SkipGram and SkipGram with Subword Information

### Imports and logging

First, we start with our imports and get logging established:

In [2]:
# imports needed and set up logging
import gzip
import gensim 
import logging

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)


### Dataset 

Now, let's take a closer look at this data below by printing the first line. You can see that this is a pretty hefty review.

In [5]:
data_file="reviews_data.txt.gz"

with gzip.open ('reviews_data.txt.gz', 'rb') as f:
    for i,line in enumerate (f):
        
        #print only two lines
        if i>1:
            break
            
        print(i+1,line)
        print("===")
        


1 b"Oct 12 2009 \tNice trendy hotel location not too bad.\tI stayed in this hotel for one night. As this is a fairly new place some of the taxi drivers did not know where it was and/or did not want to drive there. Once I have eventually arrived at the hotel, I was very pleasantly surprised with the decor of the lobby/ground floor area. It was very stylish and modern. I found the reception's staff geeting me with 'Aloha' a bit out of place, but I guess they are briefed to say that to keep up the coroporate image.As I have a Starwood Preferred Guest member, I was given a small gift upon-check in. It was only a couple of fridge magnets in a gift box, but nevertheless a nice gesture.My room was nice and roomy, there are tea and coffee facilities in each room and you get two complimentary bottles of water plus some toiletries by 'bliss'.The location is not great. It is at the last metro stop and you then need to take a taxi, but if you are not planning on going to see the historic sites in 

### Read files into a list
Now that we've had a sneak peak of our dataset, we can read it into a list so that we can pass this on to the Word2Vec model. Notice in the code below, that I am directly reading the 
compressed file. I'm also doing a mild pre-processing of the reviews using `gensim.utils.simple_preprocess (line)`. This does some basic pre-processing such as tokenization, lowercasing, etc and returns back a list of tokens (words). Documentation of this pre-processing method can be found on the official [Gensim documentation site](https://radimrehurek.com/gensim/utils.html). 



In [6]:

def read_input(input_file):
    """This method reads the input file which is in gzip format"""
    
    logging.info("reading file {0}...this may take a while".format(input_file))
    
    with gzip.open (input_file, 'rb') as f:
        for i, line in enumerate (f): 

            if (i%10000==0):
                logging.info ("read {0} reviews".format (i))
            # do some pre-processing and return a list of words for each review text
            yield gensim.utils.simple_preprocess (line)

# read the tokenized reviews into a list
# each review item becomes a serries of words
# so this becomes a list of lists
documents = list (read_input (data_file))
logging.info ("Done reading data file")    

2019-03-09 23:16:47,072 : INFO : reading file reviews_data.txt.gz...this may take a while
2019-03-09 23:16:47,075 : INFO : read 0 reviews
2019-03-09 23:16:48,966 : INFO : read 10000 reviews
2019-03-09 23:16:50,855 : INFO : read 20000 reviews
2019-03-09 23:16:53,042 : INFO : read 30000 reviews
2019-03-09 23:16:55,082 : INFO : read 40000 reviews
2019-03-09 23:16:57,360 : INFO : read 50000 reviews
2019-03-09 23:16:59,766 : INFO : read 60000 reviews
2019-03-09 23:17:01,664 : INFO : read 70000 reviews
2019-03-09 23:17:03,365 : INFO : read 80000 reviews
2019-03-09 23:17:05,153 : INFO : read 90000 reviews
2019-03-09 23:17:06,933 : INFO : read 100000 reviews
2019-03-09 23:17:08,691 : INFO : read 110000 reviews
2019-03-09 23:17:10,416 : INFO : read 120000 reviews
2019-03-09 23:17:12,162 : INFO : read 130000 reviews
2019-03-09 23:17:14,019 : INFO : read 140000 reviews
2019-03-09 23:17:16,280 : INFO : read 150000 reviews
2019-03-09 23:17:18,103 : INFO : read 160000 reviews
2019-03-09 23:17:19,812

## Training the CBOW, SkipGram and SkipGram with Subword Information Models

### Train a CBOW model

In [16]:
model_cbow = gensim.models.Word2Vec (documents, size=150, window=10, min_count=2, workers=10)
%time model_cbow.train(documents,total_examples=len(documents),epochs=10)

2019-03-09 23:28:34,343 : INFO : collecting all words and their counts
2019-03-09 23:28:34,344 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2019-03-09 23:28:34,573 : INFO : PROGRESS: at sentence #10000, processed 1655714 words, keeping 25777 word types
2019-03-09 23:28:34,781 : INFO : PROGRESS: at sentence #20000, processed 3317863 words, keeping 35016 word types
2019-03-09 23:28:35,019 : INFO : PROGRESS: at sentence #30000, processed 5264072 words, keeping 47518 word types
2019-03-09 23:28:35,241 : INFO : PROGRESS: at sentence #40000, processed 7081746 words, keeping 56675 word types
2019-03-09 23:28:35,494 : INFO : PROGRESS: at sentence #50000, processed 9089491 words, keeping 63744 word types
2019-03-09 23:28:35,745 : INFO : PROGRESS: at sentence #60000, processed 11013723 words, keeping 76781 word types
2019-03-09 23:28:35,965 : INFO : PROGRESS: at sentence #70000, processed 12637525 words, keeping 83194 word types
2019-03-09 23:28:36,164 : INFO : PROG

2019-03-09 23:29:08,337 : INFO : EPOCH 2 - PROGRESS: at 42.87% examples, 1497016 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:29:09,346 : INFO : EPOCH 2 - PROGRESS: at 48.24% examples, 1496651 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:29:10,347 : INFO : EPOCH 2 - PROGRESS: at 53.24% examples, 1495738 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:29:11,353 : INFO : EPOCH 2 - PROGRESS: at 58.59% examples, 1498622 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:29:12,367 : INFO : EPOCH 2 - PROGRESS: at 64.14% examples, 1501415 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:29:13,368 : INFO : EPOCH 2 - PROGRESS: at 69.32% examples, 1504700 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:29:14,372 : INFO : EPOCH 2 - PROGRESS: at 74.49% examples, 1506353 words/s, in_qsize 17, out_qsize 2
2019-03-09 23:29:15,378 : INFO : EPOCH 2 - PROGRESS: at 79.52% examples, 1510639 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:29:16,385 : INFO : EPOCH 2 - PROGRESS: at 84.87% examples, 1518118

2019-03-09 23:30:01,092 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-09 23:30:01,094 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-09 23:30:01,095 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-09 23:30:01,104 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-09 23:30:01,106 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-09 23:30:01,107 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-03-09 23:30:01,112 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-03-09 23:30:01,113 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-09 23:30:01,114 : INFO : EPOCH - 4 : training on 41519355 raw words (30347231 effective words) took 21.4s, 1419132 effective words/s
2019-03-09 23:30:02,125 : INFO : EPOCH 5 - PROGRESS: at 4.81% examples, 1471993 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:

2019-03-09 23:30:46,933 : INFO : EPOCH 2 - PROGRESS: at 12.00% examples, 1307943 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:47,937 : INFO : EPOCH 2 - PROGRESS: at 15.43% examples, 1271640 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:30:48,944 : INFO : EPOCH 2 - PROGRESS: at 18.38% examples, 1230476 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:49,948 : INFO : EPOCH 2 - PROGRESS: at 20.82% examples, 1184804 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:50,951 : INFO : EPOCH 2 - PROGRESS: at 24.38% examples, 1194417 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:51,954 : INFO : EPOCH 2 - PROGRESS: at 29.43% examples, 1219457 words/s, in_qsize 17, out_qsize 2
2019-03-09 23:30:52,956 : INFO : EPOCH 2 - PROGRESS: at 34.50% examples, 1243282 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:53,957 : INFO : EPOCH 2 - PROGRESS: at 39.46% examples, 1259240 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:30:54,960 : INFO : EPOCH 2 - PROGRESS: at 44.76% examples, 1276110

2019-03-09 23:31:40,919 : INFO : EPOCH 4 - PROGRESS: at 51.64% examples, 1330113 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:31:41,923 : INFO : EPOCH 4 - PROGRESS: at 55.32% examples, 1311844 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:31:42,938 : INFO : EPOCH 4 - PROGRESS: at 59.82% examples, 1309657 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:31:43,940 : INFO : EPOCH 4 - PROGRESS: at 64.40% examples, 1307101 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:31:44,949 : INFO : EPOCH 4 - PROGRESS: at 68.19% examples, 1296102 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:31:45,953 : INFO : EPOCH 4 - PROGRESS: at 71.75% examples, 1284494 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:31:46,957 : INFO : EPOCH 4 - PROGRESS: at 76.40% examples, 1290337 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:31:47,969 : INFO : EPOCH 4 - PROGRESS: at 80.94% examples, 1295828 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:31:48,969 : INFO : EPOCH 4 - PROGRESS: at 85.73% examples, 1303965

2019-03-09 23:32:35,878 : INFO : EPOCH 6 - PROGRESS: at 94.89% examples, 1302917 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:32:36,881 : INFO : EPOCH 6 - PROGRESS: at 99.79% examples, 1307456 words/s, in_qsize 9, out_qsize 1
2019-03-09 23:32:36,883 : INFO : worker thread finished; awaiting finish of 9 more threads
2019-03-09 23:32:36,897 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-09 23:32:36,900 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-09 23:32:36,904 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-09 23:32:36,905 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-09 23:32:36,909 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-09 23:32:36,910 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-09 23:32:36,913 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-03-09 23:32:36,922 : INFO : worker thr

2019-03-09 23:33:22,552 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-09 23:33:22,553 : INFO : EPOCH - 8 : training on 41519355 raw words (30351859 effective words) took 22.9s, 1323973 effective words/s
2019-03-09 23:33:23,558 : INFO : EPOCH 9 - PROGRESS: at 4.51% examples, 1392948 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:33:24,561 : INFO : EPOCH 9 - PROGRESS: at 9.12% examples, 1420209 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:33:25,567 : INFO : EPOCH 9 - PROGRESS: at 12.71% examples, 1394749 words/s, in_qsize 17, out_qsize 2
2019-03-09 23:33:26,569 : INFO : EPOCH 9 - PROGRESS: at 16.55% examples, 1373213 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:33:27,573 : INFO : EPOCH 9 - PROGRESS: at 19.98% examples, 1356360 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:33:28,579 : INFO : EPOCH 9 - PROGRESS: at 23.23% examples, 1320377 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:33:29,603 : INFO : EPOCH 9 - PROGRESS: at 26.48% examples, 1274606

CPU times: user 20min 40s, sys: 7.85 s, total: 20min 47s
Wall time: 3min 47s


(303488375, 415193550)

### Train a char n-gram model (subword information) with fastText

In [17]:
from gensim.models.fasttext import FastText
model_subword = FastText(documents, size=150, window=10, min_count=2, workers=10, min_n=3, max_n=6)  # instantiate
%time model_subword.train(documents,total_examples=len(documents),epochs=10)

2019-03-09 23:34:09,440 : INFO : resetting layer weights
2019-03-09 23:34:09,442 : INFO : Total number of ngrams is 0
2019-03-09 23:34:09,443 : INFO : collecting all words and their counts
2019-03-09 23:34:09,444 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2019-03-09 23:34:09,765 : INFO : PROGRESS: at sentence #10000, processed 1655714 words, keeping 25777 word types
2019-03-09 23:34:10,064 : INFO : PROGRESS: at sentence #20000, processed 3317863 words, keeping 35016 word types
2019-03-09 23:34:10,390 : INFO : PROGRESS: at sentence #30000, processed 5264072 words, keeping 47518 word types
2019-03-09 23:34:10,684 : INFO : PROGRESS: at sentence #40000, processed 7081746 words, keeping 56675 word types
2019-03-09 23:34:11,003 : INFO : PROGRESS: at sentence #50000, processed 9089491 words, keeping 63744 word types
2019-03-09 23:34:11,303 : INFO : PROGRESS: at sentence #60000, processed 11013723 words, keeping 76781 word types
2019-03-09 23:34:11,552 : INFO : 

2019-03-09 23:34:56,802 : INFO : EPOCH 1 - PROGRESS: at 23.88% examples, 226831 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:34:57,898 : INFO : EPOCH 1 - PROGRESS: at 24.48% examples, 226336 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:34:58,902 : INFO : EPOCH 1 - PROGRESS: at 25.27% examples, 226044 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:34:59,925 : INFO : EPOCH 1 - PROGRESS: at 26.08% examples, 225853 words/s, in_qsize 19, out_qsize 1
2019-03-09 23:35:00,932 : INFO : EPOCH 1 - PROGRESS: at 26.90% examples, 225774 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:35:01,944 : INFO : EPOCH 1 - PROGRESS: at 27.84% examples, 225999 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:35:02,968 : INFO : EPOCH 1 - PROGRESS: at 28.56% examples, 225493 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:35:03,977 : INFO : EPOCH 1 - PROGRESS: at 29.34% examples, 225219 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:35:04,988 : INFO : EPOCH 1 - PROGRESS: at 30.01% examples, 224630 words/s,

2019-03-09 23:36:11,971 : INFO : EPOCH 1 - PROGRESS: at 72.56% examples, 199527 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:36:12,982 : INFO : EPOCH 1 - PROGRESS: at 73.20% examples, 199266 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:36:14,016 : INFO : EPOCH 1 - PROGRESS: at 73.82% examples, 198909 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:36:15,105 : INFO : EPOCH 1 - PROGRESS: at 74.36% examples, 198463 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:36:16,105 : INFO : EPOCH 1 - PROGRESS: at 74.80% examples, 197862 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:36:17,196 : INFO : EPOCH 1 - PROGRESS: at 75.30% examples, 197435 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:36:18,199 : INFO : EPOCH 1 - PROGRESS: at 75.86% examples, 197286 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:36:19,232 : INFO : EPOCH 1 - PROGRESS: at 76.38% examples, 197078 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:36:20,260 : INFO : EPOCH 1 - PROGRESS: at 77.05% examples, 197075 words/s,

2019-03-09 23:37:18,482 : INFO : EPOCH 2 - PROGRESS: at 12.15% examples, 181069 words/s, in_qsize 20, out_qsize 1
2019-03-09 23:37:19,642 : INFO : EPOCH 2 - PROGRESS: at 12.70% examples, 181013 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:37:20,665 : INFO : EPOCH 2 - PROGRESS: at 13.38% examples, 181400 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:37:21,727 : INFO : EPOCH 2 - PROGRESS: at 13.81% examples, 180296 words/s, in_qsize 20, out_qsize 2
2019-03-09 23:37:22,813 : INFO : EPOCH 2 - PROGRESS: at 14.46% examples, 181068 words/s, in_qsize 17, out_qsize 2
2019-03-09 23:37:23,820 : INFO : EPOCH 2 - PROGRESS: at 15.07% examples, 182319 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:37:24,854 : INFO : EPOCH 2 - PROGRESS: at 15.76% examples, 183342 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:37:25,867 : INFO : EPOCH 2 - PROGRESS: at 16.33% examples, 184370 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:37:26,927 : INFO : EPOCH 2 - PROGRESS: at 16.94% examples, 185039 words/s,

2019-03-09 23:38:33,344 : INFO : EPOCH 2 - PROGRESS: at 58.02% examples, 185243 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:38:34,409 : INFO : EPOCH 2 - PROGRESS: at 58.62% examples, 184999 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:38:35,464 : INFO : EPOCH 2 - PROGRESS: at 59.25% examples, 184850 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:38:36,563 : INFO : EPOCH 2 - PROGRESS: at 59.88% examples, 184635 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:38:37,573 : INFO : EPOCH 2 - PROGRESS: at 60.56% examples, 184731 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:38:38,583 : INFO : EPOCH 2 - PROGRESS: at 61.15% examples, 184532 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:38:39,723 : INFO : EPOCH 2 - PROGRESS: at 61.81% examples, 184458 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:38:40,744 : INFO : EPOCH 2 - PROGRESS: at 62.54% examples, 184659 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:38:41,770 : INFO : EPOCH 2 - PROGRESS: at 63.24% examples, 184652 words/s,

2019-03-09 23:39:41,611 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-03-09 23:39:41,632 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-09 23:39:41,633 : INFO : EPOCH - 2 : training on 41519355 raw words (30350430 effective words) took 165.2s, 183697 effective words/s
2019-03-09 23:39:42,663 : INFO : EPOCH 3 - PROGRESS: at 0.53% examples, 170278 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:39:43,689 : INFO : EPOCH 3 - PROGRESS: at 1.21% examples, 190928 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:39:44,720 : INFO : EPOCH 3 - PROGRESS: at 1.93% examples, 197109 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:39:45,788 : INFO : EPOCH 3 - PROGRESS: at 2.61% examples, 198454 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:39:46,864 : INFO : EPOCH 3 - PROGRESS: at 3.34% examples, 198993 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:39:47,901 : INFO : EPOCH 3 - PROGRESS: at 4.09% examples, 201838 words/s, in_qsize 19, out_qsize 0

2019-03-09 23:40:54,985 : INFO : EPOCH 3 - PROGRESS: at 45.09% examples, 193341 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:40:55,998 : INFO : EPOCH 3 - PROGRESS: at 45.76% examples, 193232 words/s, in_qsize 20, out_qsize 1
2019-03-09 23:40:57,008 : INFO : EPOCH 3 - PROGRESS: at 46.42% examples, 193231 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:40:58,073 : INFO : EPOCH 3 - PROGRESS: at 47.08% examples, 193166 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:40:59,094 : INFO : EPOCH 3 - PROGRESS: at 47.79% examples, 193234 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:41:00,121 : INFO : EPOCH 3 - PROGRESS: at 48.52% examples, 193364 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:41:01,186 : INFO : EPOCH 3 - PROGRESS: at 49.22% examples, 193134 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:41:02,236 : INFO : EPOCH 3 - PROGRESS: at 49.94% examples, 193292 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:41:03,245 : INFO : EPOCH 3 - PROGRESS: at 50.64% examples, 193374 words/s,

2019-03-09 23:42:09,788 : INFO : EPOCH 3 - PROGRESS: at 96.68% examples, 198479 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:42:10,814 : INFO : EPOCH 3 - PROGRESS: at 97.50% examples, 198651 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:42:11,905 : INFO : EPOCH 3 - PROGRESS: at 98.26% examples, 198697 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:42:12,928 : INFO : EPOCH 3 - PROGRESS: at 99.04% examples, 198787 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:42:13,891 : INFO : worker thread finished; awaiting finish of 9 more threads
2019-03-09 23:42:13,908 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-09 23:42:13,914 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-09 23:42:13,930 : INFO : EPOCH 3 - PROGRESS: at 99.86% examples, 199038 words/s, in_qsize 6, out_qsize 1
2019-03-09 23:42:13,931 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-09 23:42:13,944 : INFO : worker thread finished; awaiting f

2019-03-09 23:43:14,920 : INFO : EPOCH 4 - PROGRESS: at 42.76% examples, 222629 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:43:15,934 : INFO : EPOCH 4 - PROGRESS: at 43.59% examples, 222604 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:43:16,944 : INFO : EPOCH 4 - PROGRESS: at 44.44% examples, 222474 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:43:17,988 : INFO : EPOCH 4 - PROGRESS: at 45.28% examples, 222444 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:43:18,999 : INFO : EPOCH 4 - PROGRESS: at 46.07% examples, 222315 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:43:20,030 : INFO : EPOCH 4 - PROGRESS: at 46.79% examples, 222210 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:43:21,124 : INFO : EPOCH 4 - PROGRESS: at 47.53% examples, 221924 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:43:22,151 : INFO : EPOCH 4 - PROGRESS: at 48.28% examples, 221755 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:43:23,175 : INFO : EPOCH 4 - PROGRESS: at 49.10% examples, 221598 words/s,

2019-03-09 23:44:28,611 : INFO : EPOCH 4 - PROGRESS: at 97.59% examples, 220364 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:44:29,618 : INFO : EPOCH 4 - PROGRESS: at 98.33% examples, 220328 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:44:30,638 : INFO : EPOCH 4 - PROGRESS: at 99.16% examples, 220379 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:44:31,575 : INFO : worker thread finished; awaiting finish of 9 more threads
2019-03-09 23:44:31,581 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-09 23:44:31,591 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-09 23:44:31,593 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-09 23:44:31,612 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-09 23:44:31,616 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-09 23:44:31,619 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-09 23:44:31,6

2019-03-09 23:45:33,582 : INFO : EPOCH 5 - PROGRESS: at 44.22% examples, 225252 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:45:34,605 : INFO : EPOCH 5 - PROGRESS: at 45.14% examples, 225479 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:45:35,660 : INFO : EPOCH 5 - PROGRESS: at 45.97% examples, 225370 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:45:36,684 : INFO : EPOCH 5 - PROGRESS: at 46.68% examples, 225137 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:45:37,702 : INFO : EPOCH 5 - PROGRESS: at 47.43% examples, 225043 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:45:38,772 : INFO : EPOCH 5 - PROGRESS: at 48.26% examples, 225002 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:45:39,859 : INFO : EPOCH 5 - PROGRESS: at 49.08% examples, 224586 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:45:40,949 : INFO : EPOCH 5 - PROGRESS: at 49.81% examples, 224274 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:45:41,954 : INFO : EPOCH 5 - PROGRESS: at 50.71% examples, 224761 words/s,

2019-03-09 23:46:44,604 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-09 23:46:44,609 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-09 23:46:44,616 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-09 23:46:44,640 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-09 23:46:44,672 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-03-09 23:46:44,680 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-03-09 23:46:44,685 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-09 23:46:44,686 : INFO : EPOCH - 5 : training on 41519355 raw words (30347519 effective words) took 133.0s, 228128 effective words/s
2019-03-09 23:46:44,687 : INFO : training on a 207596775 raw words (151747986 effective words) took 744.1s, 203943 effective words/s
2019-03-09 23:46:49,731 : INFO : Number of new ngrams is 0
2019-03-09 23:46:49,809 : INFO

2019-03-09 23:47:54,534 : INFO : EPOCH 1 - PROGRESS: at 52.62% examples, 253014 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:47:55,562 : INFO : EPOCH 1 - PROGRESS: at 53.39% examples, 252887 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:47:56,574 : INFO : EPOCH 1 - PROGRESS: at 54.29% examples, 253062 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:47:57,601 : INFO : EPOCH 1 - PROGRESS: at 55.26% examples, 253072 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:47:58,645 : INFO : EPOCH 1 - PROGRESS: at 56.16% examples, 252908 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:47:59,699 : INFO : EPOCH 1 - PROGRESS: at 57.12% examples, 253030 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:48:00,748 : INFO : EPOCH 1 - PROGRESS: at 57.91% examples, 252637 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:48:01,773 : INFO : EPOCH 1 - PROGRESS: at 58.81% examples, 252646 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:48:02,775 : INFO : EPOCH 1 - PROGRESS: at 59.65% examples, 252455 words/s,

2019-03-09 23:48:59,689 : INFO : EPOCH 2 - PROGRESS: at 8.28% examples, 248921 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:49:00,754 : INFO : EPOCH 2 - PROGRESS: at 9.10% examples, 249674 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:49:01,756 : INFO : EPOCH 2 - PROGRESS: at 9.81% examples, 251422 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:49:02,789 : INFO : EPOCH 2 - PROGRESS: at 10.52% examples, 250894 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:49:03,815 : INFO : EPOCH 2 - PROGRESS: at 11.21% examples, 251442 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:49:04,834 : INFO : EPOCH 2 - PROGRESS: at 11.92% examples, 252538 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:49:05,846 : INFO : EPOCH 2 - PROGRESS: at 12.54% examples, 251878 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:49:06,904 : INFO : EPOCH 2 - PROGRESS: at 13.44% examples, 252307 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:49:07,909 : INFO : EPOCH 2 - PROGRESS: at 14.22% examples, 253739 words/s, in

2019-03-09 23:50:13,864 : INFO : EPOCH 2 - PROGRESS: at 70.13% examples, 254448 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:50:14,899 : INFO : EPOCH 2 - PROGRESS: at 71.00% examples, 254579 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:50:15,913 : INFO : EPOCH 2 - PROGRESS: at 71.85% examples, 254518 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:50:16,924 : INFO : EPOCH 2 - PROGRESS: at 72.85% examples, 254621 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:50:17,950 : INFO : EPOCH 2 - PROGRESS: at 73.81% examples, 254682 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:50:18,977 : INFO : EPOCH 2 - PROGRESS: at 74.69% examples, 254658 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:50:20,008 : INFO : EPOCH 2 - PROGRESS: at 75.52% examples, 254794 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:50:21,011 : INFO : EPOCH 2 - PROGRESS: at 76.36% examples, 255055 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:50:22,074 : INFO : EPOCH 2 - PROGRESS: at 77.29% examples, 255233 words/s,

2019-03-09 23:51:18,943 : INFO : EPOCH 3 - PROGRESS: at 23.40% examples, 258656 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:51:19,960 : INFO : EPOCH 3 - PROGRESS: at 24.09% examples, 258951 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:51:20,966 : INFO : EPOCH 3 - PROGRESS: at 24.83% examples, 258903 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:51:22,000 : INFO : EPOCH 3 - PROGRESS: at 25.77% examples, 258450 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:51:23,064 : INFO : EPOCH 3 - PROGRESS: at 26.81% examples, 258655 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:51:24,082 : INFO : EPOCH 3 - PROGRESS: at 27.84% examples, 258553 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:51:25,113 : INFO : EPOCH 3 - PROGRESS: at 28.71% examples, 258177 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:51:26,125 : INFO : EPOCH 3 - PROGRESS: at 29.68% examples, 258487 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:51:27,136 : INFO : EPOCH 3 - PROGRESS: at 30.56% examples, 258271 words/s,

2019-03-09 23:52:32,610 : INFO : EPOCH 3 - PROGRESS: at 88.69% examples, 258671 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:33,651 : INFO : EPOCH 3 - PROGRESS: at 89.61% examples, 258652 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:52:34,707 : INFO : EPOCH 3 - PROGRESS: at 90.56% examples, 258671 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:35,725 : INFO : EPOCH 3 - PROGRESS: at 91.48% examples, 258637 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:36,752 : INFO : EPOCH 3 - PROGRESS: at 92.41% examples, 258586 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:37,759 : INFO : EPOCH 3 - PROGRESS: at 93.24% examples, 258702 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:38,776 : INFO : EPOCH 3 - PROGRESS: at 94.19% examples, 258680 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:39,817 : INFO : EPOCH 3 - PROGRESS: at 95.12% examples, 258725 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:52:40,834 : INFO : EPOCH 3 - PROGRESS: at 96.03% examples, 258766 words/s,

2019-03-09 23:53:37,808 : INFO : EPOCH 4 - PROGRESS: at 43.52% examples, 261363 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:53:38,816 : INFO : EPOCH 4 - PROGRESS: at 44.52% examples, 261298 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:53:39,822 : INFO : EPOCH 4 - PROGRESS: at 45.46% examples, 261104 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:53:40,840 : INFO : EPOCH 4 - PROGRESS: at 46.32% examples, 260865 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:53:41,841 : INFO : EPOCH 4 - PROGRESS: at 47.16% examples, 260824 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:53:42,854 : INFO : EPOCH 4 - PROGRESS: at 48.03% examples, 260611 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:53:43,868 : INFO : EPOCH 4 - PROGRESS: at 49.00% examples, 260650 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:53:44,878 : INFO : EPOCH 4 - PROGRESS: at 49.81% examples, 260215 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:53:45,902 : INFO : EPOCH 4 - PROGRESS: at 50.80% examples, 260450 words/s,

2019-03-09 23:54:42,495 : INFO : EPOCH - 4 : training on 41519355 raw words (30343978 effective words) took 117.3s, 258624 effective words/s
2019-03-09 23:54:43,577 : INFO : EPOCH 5 - PROGRESS: at 0.69% examples, 208936 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:54:44,580 : INFO : EPOCH 5 - PROGRESS: at 1.55% examples, 236849 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:54:45,613 : INFO : EPOCH 5 - PROGRESS: at 2.30% examples, 234369 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:54:46,682 : INFO : EPOCH 5 - PROGRESS: at 3.13% examples, 231446 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:54:47,683 : INFO : EPOCH 5 - PROGRESS: at 3.90% examples, 232705 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:54:48,698 : INFO : EPOCH 5 - PROGRESS: at 4.71% examples, 234289 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:54:49,699 : INFO : EPOCH 5 - PROGRESS: at 5.52% examples, 236659 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:54:50,706 : INFO : EPOCH 5 - PROGRESS: at 6.33% exampl

2019-03-09 23:55:56,747 : INFO : EPOCH 5 - PROGRESS: at 59.92% examples, 249242 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:55:57,795 : INFO : EPOCH 5 - PROGRESS: at 60.86% examples, 249244 words/s, in_qsize 18, out_qsize 1
2019-03-09 23:55:58,810 : INFO : EPOCH 5 - PROGRESS: at 61.71% examples, 249338 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:55:59,833 : INFO : EPOCH 5 - PROGRESS: at 62.54% examples, 249127 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:56:00,839 : INFO : EPOCH 5 - PROGRESS: at 63.39% examples, 248797 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:56:01,897 : INFO : EPOCH 5 - PROGRESS: at 64.39% examples, 248653 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:56:02,908 : INFO : EPOCH 5 - PROGRESS: at 65.24% examples, 248651 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:56:03,953 : INFO : EPOCH 5 - PROGRESS: at 66.01% examples, 248550 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:56:04,974 : INFO : EPOCH 5 - PROGRESS: at 66.87% examples, 248579 words/s,

2019-03-09 23:57:02,245 : INFO : EPOCH 6 - PROGRESS: at 13.03% examples, 244640 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:57:03,292 : INFO : EPOCH 6 - PROGRESS: at 13.81% examples, 245114 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:57:04,300 : INFO : EPOCH 6 - PROGRESS: at 14.58% examples, 245753 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:57:05,312 : INFO : EPOCH 6 - PROGRESS: at 15.41% examples, 247027 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:57:06,367 : INFO : EPOCH 6 - PROGRESS: at 16.21% examples, 247980 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:57:07,376 : INFO : EPOCH 6 - PROGRESS: at 16.94% examples, 248678 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:57:08,413 : INFO : EPOCH 6 - PROGRESS: at 17.63% examples, 249107 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:57:09,464 : INFO : EPOCH 6 - PROGRESS: at 18.35% examples, 248955 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:57:10,468 : INFO : EPOCH 6 - PROGRESS: at 19.00% examples, 248966 words/s,

2019-03-09 23:58:16,210 : INFO : EPOCH 6 - PROGRESS: at 76.99% examples, 257367 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:58:17,272 : INFO : EPOCH 6 - PROGRESS: at 77.87% examples, 257516 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:58:18,324 : INFO : EPOCH 6 - PROGRESS: at 78.80% examples, 257690 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:58:19,361 : INFO : EPOCH 6 - PROGRESS: at 79.74% examples, 257836 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:58:20,367 : INFO : EPOCH 6 - PROGRESS: at 80.58% examples, 257912 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:58:21,367 : INFO : EPOCH 6 - PROGRESS: at 81.41% examples, 257919 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:58:22,398 : INFO : EPOCH 6 - PROGRESS: at 82.36% examples, 257996 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:58:23,401 : INFO : EPOCH 6 - PROGRESS: at 83.31% examples, 258225 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:58:24,420 : INFO : EPOCH 6 - PROGRESS: at 84.08% examples, 258042 words/s,

2019-03-09 23:59:21,355 : INFO : EPOCH 7 - PROGRESS: at 31.59% examples, 263749 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:59:22,359 : INFO : EPOCH 7 - PROGRESS: at 32.45% examples, 263256 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:59:23,378 : INFO : EPOCH 7 - PROGRESS: at 33.40% examples, 263395 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:59:24,393 : INFO : EPOCH 7 - PROGRESS: at 34.17% examples, 262889 words/s, in_qsize 20, out_qsize 0
2019-03-09 23:59:25,444 : INFO : EPOCH 7 - PROGRESS: at 35.08% examples, 262693 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:59:26,452 : INFO : EPOCH 7 - PROGRESS: at 36.12% examples, 263083 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:59:27,454 : INFO : EPOCH 7 - PROGRESS: at 37.03% examples, 263133 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:59:28,465 : INFO : EPOCH 7 - PROGRESS: at 37.90% examples, 262662 words/s, in_qsize 19, out_qsize 0
2019-03-09 23:59:29,473 : INFO : EPOCH 7 - PROGRESS: at 38.88% examples, 262837 words/s,

2019-03-10 00:00:35,578 : INFO : EPOCH 7 - PROGRESS: at 97.95% examples, 262024 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:00:36,596 : INFO : EPOCH 7 - PROGRESS: at 98.94% examples, 262159 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:00:37,554 : INFO : worker thread finished; awaiting finish of 9 more threads
2019-03-10 00:00:37,561 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-10 00:00:37,587 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-10 00:00:37,588 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-10 00:00:37,625 : INFO : EPOCH 7 - PROGRESS: at 99.89% examples, 262190 words/s, in_qsize 5, out_qsize 1
2019-03-10 00:00:37,626 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-10 00:00:37,633 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-10 00:00:37,640 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-10 00:00:37,67

2019-03-10 00:01:40,681 : INFO : EPOCH 8 - PROGRESS: at 52.94% examples, 261848 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:01:41,713 : INFO : EPOCH 8 - PROGRESS: at 53.85% examples, 261901 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:01:42,717 : INFO : EPOCH 8 - PROGRESS: at 54.77% examples, 262093 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:01:43,718 : INFO : EPOCH 8 - PROGRESS: at 55.82% examples, 262165 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:01:44,728 : INFO : EPOCH 8 - PROGRESS: at 56.69% examples, 262100 words/s, in_qsize 20, out_qsize 1
2019-03-10 00:01:45,763 : INFO : EPOCH 8 - PROGRESS: at 57.65% examples, 262155 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:01:46,783 : INFO : EPOCH 8 - PROGRESS: at 58.59% examples, 262346 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:01:47,841 : INFO : EPOCH 8 - PROGRESS: at 59.55% examples, 262201 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:01:48,888 : INFO : EPOCH 8 - PROGRESS: at 60.45% examples, 262127 words/s,

2019-03-10 00:02:46,299 : INFO : EPOCH 9 - PROGRESS: at 10.62% examples, 251422 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:02:47,334 : INFO : EPOCH 9 - PROGRESS: at 11.36% examples, 252227 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:02:48,382 : INFO : EPOCH 9 - PROGRESS: at 12.06% examples, 253675 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:02:49,383 : INFO : EPOCH 9 - PROGRESS: at 12.79% examples, 254414 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:02:50,448 : INFO : EPOCH 9 - PROGRESS: at 13.63% examples, 254175 words/s, in_qsize 19, out_qsize 2
2019-03-10 00:02:51,490 : INFO : EPOCH 9 - PROGRESS: at 14.46% examples, 255373 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:02:52,533 : INFO : EPOCH 9 - PROGRESS: at 15.30% examples, 256529 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:02:53,535 : INFO : EPOCH 9 - PROGRESS: at 16.11% examples, 257754 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:02:54,575 : INFO : EPOCH 9 - PROGRESS: at 16.88% examples, 257958 words/s,

2019-03-10 00:04:00,488 : INFO : EPOCH 9 - PROGRESS: at 75.33% examples, 262478 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:04:01,570 : INFO : EPOCH 9 - PROGRESS: at 76.20% examples, 262526 words/s, in_qsize 20, out_qsize 1
2019-03-10 00:04:02,617 : INFO : EPOCH 9 - PROGRESS: at 77.11% examples, 262668 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:04:03,664 : INFO : EPOCH 9 - PROGRESS: at 77.99% examples, 262800 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:04:04,743 : INFO : EPOCH 9 - PROGRESS: at 78.93% examples, 262844 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:04:05,828 : INFO : EPOCH 9 - PROGRESS: at 79.89% examples, 262883 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:04:06,907 : INFO : EPOCH 9 - PROGRESS: at 80.80% examples, 262929 words/s, in_qsize 20, out_qsize 1
2019-03-10 00:04:07,967 : INFO : EPOCH 9 - PROGRESS: at 81.78% examples, 263029 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:04:09,047 : INFO : EPOCH 9 - PROGRESS: at 82.75% examples, 263073 words/s,

2019-03-10 00:05:05,862 : INFO : EPOCH 10 - PROGRESS: at 30.50% examples, 266204 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:05:06,872 : INFO : EPOCH 10 - PROGRESS: at 31.60% examples, 266355 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:05:07,874 : INFO : EPOCH 10 - PROGRESS: at 32.50% examples, 266166 words/s, in_qsize 17, out_qsize 2
2019-03-10 00:05:08,947 : INFO : EPOCH 10 - PROGRESS: at 33.42% examples, 265712 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:05:09,969 : INFO : EPOCH 10 - PROGRESS: at 34.35% examples, 265981 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:05:10,975 : INFO : EPOCH 10 - PROGRESS: at 35.29% examples, 266312 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:05:11,992 : INFO : EPOCH 10 - PROGRESS: at 36.22% examples, 265905 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:05:13,016 : INFO : EPOCH 10 - PROGRESS: at 37.12% examples, 265791 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:05:14,047 : INFO : EPOCH 10 - PROGRESS: at 38.14% examples, 265920

2019-03-10 00:06:19,962 : INFO : EPOCH 10 - PROGRESS: at 97.40% examples, 264116 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:06:20,989 : INFO : EPOCH 10 - PROGRESS: at 98.26% examples, 264005 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:06:21,989 : INFO : EPOCH 10 - PROGRESS: at 99.20% examples, 264041 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:06:22,630 : INFO : worker thread finished; awaiting finish of 9 more threads
2019-03-10 00:06:22,635 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-10 00:06:22,649 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-10 00:06:22,669 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-10 00:06:22,672 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-10 00:06:22,693 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-10 00:06:22,702 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-10 00:06:2

CPU times: user 2h 28min 56s, sys: 23.5 s, total: 2h 29min 20s
Wall time: 19min 37s


### Train a SkipGram model

In [18]:
model_skipgram = gensim.models.Word2Vec (documents, size=150, window=10, min_count=2, workers=10, sg=1)
%time model_skipgram.train(documents,total_examples=len(documents),epochs=10)

2019-03-10 00:06:26,245 : INFO : collecting all words and their counts
2019-03-10 00:06:26,246 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2019-03-10 00:06:26,455 : INFO : PROGRESS: at sentence #10000, processed 1655714 words, keeping 25777 word types
2019-03-10 00:06:26,664 : INFO : PROGRESS: at sentence #20000, processed 3317863 words, keeping 35016 word types
2019-03-10 00:06:26,909 : INFO : PROGRESS: at sentence #30000, processed 5264072 words, keeping 47518 word types
2019-03-10 00:06:27,134 : INFO : PROGRESS: at sentence #40000, processed 7081746 words, keeping 56675 word types
2019-03-10 00:06:27,382 : INFO : PROGRESS: at sentence #50000, processed 9089491 words, keeping 63744 word types
2019-03-10 00:06:27,624 : INFO : PROGRESS: at sentence #60000, processed 11013723 words, keeping 76781 word types
2019-03-10 00:06:27,827 : INFO : PROGRESS: at sentence #70000, processed 12637525 words, keeping 83194 word types
2019-03-10 00:06:28,043 : INFO : PROG

2019-03-10 00:07:10,269 : INFO : EPOCH 1 - PROGRESS: at 39.40% examples, 335236 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:11,283 : INFO : EPOCH 1 - PROGRESS: at 40.47% examples, 334632 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:12,291 : INFO : EPOCH 1 - PROGRESS: at 41.83% examples, 334700 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:13,309 : INFO : EPOCH 1 - PROGRESS: at 42.94% examples, 334607 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:14,346 : INFO : EPOCH 1 - PROGRESS: at 44.22% examples, 334425 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:15,353 : INFO : EPOCH 1 - PROGRESS: at 45.49% examples, 334295 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:16,385 : INFO : EPOCH 1 - PROGRESS: at 46.62% examples, 334151 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:07:17,423 : INFO : EPOCH 1 - PROGRESS: at 47.79% examples, 334128 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:07:18,467 : INFO : EPOCH 1 - PROGRESS: at 49.10% examples, 334382 words/s,

2019-03-10 00:08:14,944 : INFO : EPOCH 2 - PROGRESS: at 11.64% examples, 330426 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:08:15,981 : INFO : EPOCH 2 - PROGRESS: at 12.47% examples, 328825 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:17,004 : INFO : EPOCH 2 - PROGRESS: at 13.58% examples, 330101 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:18,065 : INFO : EPOCH 2 - PROGRESS: at 14.60% examples, 330683 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:19,113 : INFO : EPOCH 2 - PROGRESS: at 15.71% examples, 331711 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:20,144 : INFO : EPOCH 2 - PROGRESS: at 16.64% examples, 332365 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:21,150 : INFO : EPOCH 2 - PROGRESS: at 17.51% examples, 331812 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:22,213 : INFO : EPOCH 2 - PROGRESS: at 18.50% examples, 331781 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:08:23,227 : INFO : EPOCH 2 - PROGRESS: at 19.35% examples, 332082 words/s,

2019-03-10 00:09:28,878 : INFO : EPOCH 2 - PROGRESS: at 93.55% examples, 333481 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:09:29,882 : INFO : EPOCH 2 - PROGRESS: at 94.71% examples, 333538 words/s, in_qsize 20, out_qsize 1
2019-03-10 00:09:30,883 : INFO : EPOCH 2 - PROGRESS: at 95.82% examples, 333358 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:09:31,955 : INFO : EPOCH 2 - PROGRESS: at 97.02% examples, 333301 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:09:32,969 : INFO : EPOCH 2 - PROGRESS: at 98.17% examples, 333296 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:09:33,975 : INFO : EPOCH 2 - PROGRESS: at 99.30% examples, 333115 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:09:34,389 : INFO : worker thread finished; awaiting finish of 9 more threads
2019-03-10 00:09:34,399 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-10 00:09:34,406 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-10 00:09:34,416 : INFO : worker thr

2019-03-10 00:10:34,230 : INFO : EPOCH 3 - PROGRESS: at 64.94% examples, 333011 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:10:35,246 : INFO : EPOCH 3 - PROGRESS: at 66.01% examples, 333253 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:10:36,281 : INFO : EPOCH 3 - PROGRESS: at 67.23% examples, 333456 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:10:37,290 : INFO : EPOCH 3 - PROGRESS: at 68.38% examples, 333492 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:10:38,298 : INFO : EPOCH 3 - PROGRESS: at 69.49% examples, 333623 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:10:39,301 : INFO : EPOCH 3 - PROGRESS: at 70.55% examples, 333579 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:10:40,311 : INFO : EPOCH 3 - PROGRESS: at 71.68% examples, 333619 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:10:41,335 : INFO : EPOCH 3 - PROGRESS: at 72.81% examples, 333360 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:10:42,377 : INFO : EPOCH 3 - PROGRESS: at 74.06% examples, 333327 words/s,

2019-03-10 00:11:38,684 : INFO : EPOCH 4 - PROGRESS: at 32.16% examples, 321978 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:39,744 : INFO : EPOCH 4 - PROGRESS: at 33.33% examples, 321680 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:40,752 : INFO : EPOCH 4 - PROGRESS: at 34.47% examples, 322160 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:41,762 : INFO : EPOCH 4 - PROGRESS: at 35.56% examples, 322149 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:42,826 : INFO : EPOCH 4 - PROGRESS: at 36.80% examples, 322234 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:43,835 : INFO : EPOCH 4 - PROGRESS: at 37.97% examples, 322759 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:44,846 : INFO : EPOCH 4 - PROGRESS: at 39.18% examples, 322695 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:45,869 : INFO : EPOCH 4 - PROGRESS: at 40.36% examples, 322914 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:11:46,904 : INFO : EPOCH 4 - PROGRESS: at 41.65% examples, 322868 words/s,

2019-03-10 00:12:43,193 : INFO : EPOCH 5 - PROGRESS: at 1.95% examples, 298167 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:44,244 : INFO : EPOCH 5 - PROGRESS: at 3.03% examples, 301785 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:45,250 : INFO : EPOCH 5 - PROGRESS: at 4.09% examples, 306939 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:46,294 : INFO : EPOCH 5 - PROGRESS: at 5.14% examples, 307751 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:47,306 : INFO : EPOCH 5 - PROGRESS: at 6.23% examples, 312009 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:48,334 : INFO : EPOCH 5 - PROGRESS: at 7.23% examples, 310478 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:49,342 : INFO : EPOCH 5 - PROGRESS: at 8.30% examples, 313467 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:50,386 : INFO : EPOCH 5 - PROGRESS: at 9.24% examples, 311506 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:12:51,393 : INFO : EPOCH 5 - PROGRESS: at 10.09% examples, 313144 words/s, in_qsiz

2019-03-10 00:13:56,731 : INFO : EPOCH 5 - PROGRESS: at 76.83% examples, 310942 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:13:57,734 : INFO : EPOCH 5 - PROGRESS: at 77.83% examples, 311097 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:13:58,765 : INFO : EPOCH 5 - PROGRESS: at 78.82% examples, 310938 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:13:59,778 : INFO : EPOCH 5 - PROGRESS: at 79.80% examples, 310696 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:14:00,790 : INFO : EPOCH 5 - PROGRESS: at 80.80% examples, 310726 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:14:01,832 : INFO : EPOCH 5 - PROGRESS: at 81.90% examples, 310735 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:14:02,862 : INFO : EPOCH 5 - PROGRESS: at 82.90% examples, 310433 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:14:03,870 : INFO : EPOCH 5 - PROGRESS: at 83.95% examples, 310567 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:14:04,922 : INFO : EPOCH 5 - PROGRESS: at 84.91% examples, 310431 words/s,

2019-03-10 00:14:58,797 : INFO : EPOCH 1 - PROGRESS: at 37.40% examples, 303316 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:14:59,835 : INFO : EPOCH 1 - PROGRESS: at 38.57% examples, 303508 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:00,891 : INFO : EPOCH 1 - PROGRESS: at 39.67% examples, 303443 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:01,919 : INFO : EPOCH 1 - PROGRESS: at 40.89% examples, 304075 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:02,927 : INFO : EPOCH 1 - PROGRESS: at 42.09% examples, 303977 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:03,946 : INFO : EPOCH 1 - PROGRESS: at 43.13% examples, 303796 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:05,017 : INFO : EPOCH 1 - PROGRESS: at 44.32% examples, 303659 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:06,062 : INFO : EPOCH 1 - PROGRESS: at 45.54% examples, 303793 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:15:07,062 : INFO : EPOCH 1 - PROGRESS: at 46.52% examples, 303630 words/s,

2019-03-10 00:16:03,777 : INFO : EPOCH 2 - PROGRESS: at 3.75% examples, 281812 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:04,804 : INFO : EPOCH 2 - PROGRESS: at 4.83% examples, 288848 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:05,808 : INFO : EPOCH 2 - PROGRESS: at 5.75% examples, 288368 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:06,843 : INFO : EPOCH 2 - PROGRESS: at 6.81% examples, 291796 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:07,873 : INFO : EPOCH 2 - PROGRESS: at 7.77% examples, 292887 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:08,881 : INFO : EPOCH 2 - PROGRESS: at 8.73% examples, 292884 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:09,894 : INFO : EPOCH 2 - PROGRESS: at 9.58% examples, 293339 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:10,899 : INFO : EPOCH 2 - PROGRESS: at 10.38% examples, 295274 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:16:11,902 : INFO : EPOCH 2 - PROGRESS: at 11.17% examples, 295126 words/s, in_qsi

2019-03-10 00:17:17,418 : INFO : EPOCH 2 - PROGRESS: at 74.58% examples, 293113 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:17:18,476 : INFO : EPOCH 2 - PROGRESS: at 75.66% examples, 293486 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:19,480 : INFO : EPOCH 2 - PROGRESS: at 76.55% examples, 293471 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:20,485 : INFO : EPOCH 2 - PROGRESS: at 77.50% examples, 293566 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:21,487 : INFO : EPOCH 2 - PROGRESS: at 78.54% examples, 293927 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:22,549 : INFO : EPOCH 2 - PROGRESS: at 79.57% examples, 293895 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:23,580 : INFO : EPOCH 2 - PROGRESS: at 80.51% examples, 293811 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:24,587 : INFO : EPOCH 2 - PROGRESS: at 81.52% examples, 293966 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:17:25,597 : INFO : EPOCH 2 - PROGRESS: at 82.55% examples, 294033 words/s,

2019-03-10 00:18:21,893 : INFO : EPOCH 3 - PROGRESS: at 34.81% examples, 291029 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:18:22,925 : INFO : EPOCH 3 - PROGRESS: at 35.86% examples, 290921 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:18:23,954 : INFO : EPOCH 3 - PROGRESS: at 36.91% examples, 290821 words/s, in_qsize 20, out_qsize 1
2019-03-10 00:18:24,955 : INFO : EPOCH 3 - PROGRESS: at 37.93% examples, 290912 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:18:25,959 : INFO : EPOCH 3 - PROGRESS: at 39.06% examples, 291155 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:18:26,965 : INFO : EPOCH 3 - PROGRESS: at 40.12% examples, 291393 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:18:27,970 : INFO : EPOCH 3 - PROGRESS: at 41.25% examples, 291624 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:18:29,067 : INFO : EPOCH 3 - PROGRESS: at 42.32% examples, 291067 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:18:30,122 : INFO : EPOCH 3 - PROGRESS: at 43.38% examples, 290982 words/s,

2019-03-10 00:19:27,413 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-10 00:19:27,414 : INFO : EPOCH - 3 : training on 41519355 raw words (30348322 effective words) took 104.5s, 290546 effective words/s
2019-03-10 00:19:28,422 : INFO : EPOCH 4 - PROGRESS: at 0.71% examples, 231284 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:19:29,434 : INFO : EPOCH 4 - PROGRESS: at 1.65% examples, 258489 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:19:30,445 : INFO : EPOCH 4 - PROGRESS: at 2.61% examples, 271941 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:19:31,460 : INFO : EPOCH 4 - PROGRESS: at 3.63% examples, 278503 words/s, in_qsize 20, out_qsize 1
2019-03-10 00:19:32,466 : INFO : EPOCH 4 - PROGRESS: at 4.52% examples, 277398 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:19:33,487 : INFO : EPOCH 4 - PROGRESS: at 5.48% examples, 278199 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:19:34,526 : INFO : EPOCH 4 - PROGRESS: at 6.43% examples, 279005 words/s, in

2019-03-10 00:20:40,126 : INFO : EPOCH 4 - PROGRESS: at 67.13% examples, 282971 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:20:41,137 : INFO : EPOCH 4 - PROGRESS: at 68.09% examples, 282896 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:20:42,141 : INFO : EPOCH 4 - PROGRESS: at 69.02% examples, 282759 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:20:43,163 : INFO : EPOCH 4 - PROGRESS: at 69.92% examples, 282739 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:20:44,163 : INFO : EPOCH 4 - PROGRESS: at 70.79% examples, 282634 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:20:45,191 : INFO : EPOCH 4 - PROGRESS: at 71.73% examples, 282605 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:20:46,226 : INFO : EPOCH 4 - PROGRESS: at 72.81% examples, 282741 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:20:47,241 : INFO : EPOCH 4 - PROGRESS: at 73.88% examples, 282753 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:20:48,243 : INFO : EPOCH 4 - PROGRESS: at 74.84% examples, 282893 words/s,

2019-03-10 00:21:45,237 : INFO : EPOCH 5 - PROGRESS: at 24.15% examples, 281118 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:21:46,262 : INFO : EPOCH 5 - PROGRESS: at 25.07% examples, 281342 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:21:47,272 : INFO : EPOCH 5 - PROGRESS: at 26.18% examples, 281969 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:21:48,292 : INFO : EPOCH 5 - PROGRESS: at 27.30% examples, 282223 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:21:49,331 : INFO : EPOCH 5 - PROGRESS: at 28.41% examples, 282538 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:21:50,359 : INFO : EPOCH 5 - PROGRESS: at 29.44% examples, 282444 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:21:51,414 : INFO : EPOCH 5 - PROGRESS: at 30.47% examples, 282586 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:21:52,432 : INFO : EPOCH 5 - PROGRESS: at 31.65% examples, 282824 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:21:53,451 : INFO : EPOCH 5 - PROGRESS: at 32.64% examples, 282811 words/s,

2019-03-10 00:22:57,391 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-10 00:22:57,398 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-10 00:22:57,413 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-10 00:22:57,431 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-10 00:22:57,471 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-10 00:22:57,485 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-03-10 00:22:57,486 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-03-10 00:22:57,490 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-10 00:22:57,491 : INFO : EPOCH - 5 : training on 41519355 raw words (30348020 effective words) took 101.8s, 298111 effective words/s
2019-03-10 00:22:58,499 : INFO : EPOCH 6 - PROGRESS: at 0.87% examples, 289186 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:22:5

2019-03-10 00:24:04,841 : INFO : EPOCH 6 - PROGRESS: at 69.53% examples, 316294 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:05,856 : INFO : EPOCH 6 - PROGRESS: at 70.57% examples, 316350 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:06,867 : INFO : EPOCH 6 - PROGRESS: at 71.70% examples, 316644 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:07,876 : INFO : EPOCH 6 - PROGRESS: at 72.81% examples, 316610 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:08,879 : INFO : EPOCH 6 - PROGRESS: at 74.04% examples, 316897 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:09,901 : INFO : EPOCH 6 - PROGRESS: at 75.02% examples, 316697 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:10,938 : INFO : EPOCH 6 - PROGRESS: at 76.07% examples, 316838 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:11,947 : INFO : EPOCH 6 - PROGRESS: at 77.01% examples, 316518 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:24:12,950 : INFO : EPOCH 6 - PROGRESS: at 78.03% examples, 316769 words/s,

2019-03-10 00:25:10,148 : INFO : EPOCH 7 - PROGRESS: at 36.12% examples, 315732 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:11,157 : INFO : EPOCH 7 - PROGRESS: at 37.30% examples, 316447 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:25:12,160 : INFO : EPOCH 7 - PROGRESS: at 38.46% examples, 316386 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:13,172 : INFO : EPOCH 7 - PROGRESS: at 39.63% examples, 316882 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:14,180 : INFO : EPOCH 7 - PROGRESS: at 40.81% examples, 317180 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:15,235 : INFO : EPOCH 7 - PROGRESS: at 42.05% examples, 316585 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:16,257 : INFO : EPOCH 7 - PROGRESS: at 43.20% examples, 316759 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:17,259 : INFO : EPOCH 7 - PROGRESS: at 44.38% examples, 316782 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:25:18,267 : INFO : EPOCH 7 - PROGRESS: at 45.58% examples, 316882 words/s,

2019-03-10 00:26:14,742 : INFO : EPOCH 8 - PROGRESS: at 6.21% examples, 303417 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:26:15,763 : INFO : EPOCH 8 - PROGRESS: at 7.22% examples, 303398 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:16,819 : INFO : EPOCH 8 - PROGRESS: at 8.18% examples, 302047 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:17,824 : INFO : EPOCH 8 - PROGRESS: at 9.24% examples, 306446 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:18,826 : INFO : EPOCH 8 - PROGRESS: at 10.06% examples, 307997 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:19,830 : INFO : EPOCH 8 - PROGRESS: at 10.92% examples, 308661 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:20,846 : INFO : EPOCH 8 - PROGRESS: at 11.76% examples, 310038 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:21,863 : INFO : EPOCH 8 - PROGRESS: at 12.65% examples, 311130 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:26:22,899 : INFO : EPOCH 8 - PROGRESS: at 13.75% examples, 312765 words/s, in_

2019-03-10 00:27:28,410 : INFO : EPOCH 8 - PROGRESS: at 83.76% examples, 320531 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:29,421 : INFO : EPOCH 8 - PROGRESS: at 84.74% examples, 320423 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:30,423 : INFO : EPOCH 8 - PROGRESS: at 85.71% examples, 320105 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:27:31,450 : INFO : EPOCH 8 - PROGRESS: at 86.92% examples, 320228 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:32,476 : INFO : EPOCH 8 - PROGRESS: at 88.12% examples, 320259 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:33,483 : INFO : EPOCH 8 - PROGRESS: at 89.36% examples, 320561 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:34,497 : INFO : EPOCH 8 - PROGRESS: at 90.47% examples, 320477 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:35,527 : INFO : EPOCH 8 - PROGRESS: at 91.66% examples, 320588 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:27:36,540 : INFO : EPOCH 8 - PROGRESS: at 92.80% examples, 320743 words/s,

2019-03-10 00:28:33,282 : INFO : EPOCH 9 - PROGRESS: at 52.27% examples, 323358 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:34,287 : INFO : EPOCH 9 - PROGRESS: at 53.34% examples, 323753 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:35,297 : INFO : EPOCH 9 - PROGRESS: at 54.42% examples, 323595 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:36,316 : INFO : EPOCH 9 - PROGRESS: at 55.63% examples, 323505 words/s, in_qsize 18, out_qsize 1
2019-03-10 00:28:37,348 : INFO : EPOCH 9 - PROGRESS: at 56.74% examples, 323351 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:38,350 : INFO : EPOCH 9 - PROGRESS: at 57.85% examples, 323362 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:39,353 : INFO : EPOCH 9 - PROGRESS: at 58.88% examples, 322869 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:40,396 : INFO : EPOCH 9 - PROGRESS: at 60.00% examples, 322695 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:28:41,400 : INFO : EPOCH 9 - PROGRESS: at 61.17% examples, 322859 words/s,

2019-03-10 00:29:38,312 : INFO : EPOCH 10 - PROGRESS: at 20.26% examples, 321896 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:29:39,323 : INFO : EPOCH 10 - PROGRESS: at 21.25% examples, 321744 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:29:40,400 : INFO : EPOCH 10 - PROGRESS: at 22.24% examples, 320721 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:29:41,412 : INFO : EPOCH 10 - PROGRESS: at 23.10% examples, 321302 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:29:42,427 : INFO : EPOCH 10 - PROGRESS: at 23.88% examples, 320624 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:29:43,431 : INFO : EPOCH 10 - PROGRESS: at 24.88% examples, 321542 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:29:44,464 : INFO : EPOCH 10 - PROGRESS: at 26.00% examples, 320739 words/s, in_qsize 20, out_qsize 0
2019-03-10 00:29:45,504 : INFO : EPOCH 10 - PROGRESS: at 27.35% examples, 321451 words/s, in_qsize 19, out_qsize 0
2019-03-10 00:29:46,514 : INFO : EPOCH 10 - PROGRESS: at 28.58% examples, 321960

2019-03-10 00:30:50,730 : INFO : worker thread finished; awaiting finish of 8 more threads
2019-03-10 00:30:50,732 : INFO : worker thread finished; awaiting finish of 7 more threads
2019-03-10 00:30:50,746 : INFO : worker thread finished; awaiting finish of 6 more threads
2019-03-10 00:30:50,747 : INFO : worker thread finished; awaiting finish of 5 more threads
2019-03-10 00:30:50,757 : INFO : EPOCH 10 - PROGRESS: at 99.91% examples, 322808 words/s, in_qsize 4, out_qsize 1
2019-03-10 00:30:50,758 : INFO : worker thread finished; awaiting finish of 4 more threads
2019-03-10 00:30:50,764 : INFO : worker thread finished; awaiting finish of 3 more threads
2019-03-10 00:30:50,785 : INFO : worker thread finished; awaiting finish of 2 more threads
2019-03-10 00:30:50,797 : INFO : worker thread finished; awaiting finish of 1 more threads
2019-03-10 00:30:50,803 : INFO : worker thread finished; awaiting finish of 0 more threads
2019-03-10 00:30:50,804 : INFO : EPOCH - 10 : training on 41519355 

CPU times: user 2h 8min 26s, sys: 17.4 s, total: 2h 8min 44s
Wall time: 16min 31s


(303499323, 415193550)

### Save the models

In [19]:
# save only the word vectors
model_cbow.wv.save("cbow_vector.bin")
model_subword.wv.save("subword_vector.bin")
model_skipgram.wv.save("skipgram_vector.bin")

2019-03-10 00:30:50,820 : INFO : saving Word2VecKeyedVectors object under cbow_vector.bin, separately None
2019-03-10 00:30:50,824 : INFO : storing np array 'vectors' to cbow_vector.bin.vectors.npy
2019-03-10 00:30:50,853 : INFO : not storing attribute vectors_norm
2019-03-10 00:30:51,041 : INFO : saved cbow_vector.bin
2019-03-10 00:30:51,042 : INFO : saving FastTextKeyedVectors object under subword_vector.bin, separately None
2019-03-10 00:30:51,043 : INFO : storing np array 'vectors' to subword_vector.bin.vectors.npy
2019-03-10 00:30:51,080 : INFO : storing np array 'vectors_vocab' to subword_vector.bin.vectors_vocab.npy
2019-03-10 00:30:51,119 : INFO : storing np array 'vectors_ngrams' to subword_vector.bin.vectors_ngrams.npy
2019-03-10 00:30:51,267 : INFO : not storing attribute vectors_norm
2019-03-10 00:30:51,268 : INFO : not storing attribute vectors_vocab_norm
2019-03-10 00:30:51,269 : INFO : not storing attribute vectors_ngrams_norm
2019-03-10 00:30:51,269 : INFO : not storing

In [20]:
from gensim.models import KeyedVectors
cbow_vectors = KeyedVectors.load("cbow_vector.bin")
subword_vectors = KeyedVectors.load("subword_vector.bin")
skipgram_vectors = KeyedVectors.load("skipgram_vector.bin")

2019-03-10 00:30:51,791 : INFO : loading Word2VecKeyedVectors object from cbow_vector.bin
2019-03-10 00:30:52,886 : INFO : loading vectors from cbow_vector.bin.vectors.npy with mmap=None
2019-03-10 00:30:52,913 : INFO : setting ignored attribute vectors_norm to None
2019-03-10 00:30:52,914 : INFO : loaded cbow_vector.bin
2019-03-10 00:30:52,938 : INFO : loading Word2VecKeyedVectors object from subword_vector.bin
2019-03-10 00:30:53,081 : INFO : loading vectors from subword_vector.bin.vectors.npy with mmap=None
2019-03-10 00:30:53,098 : INFO : loading vectors_vocab from subword_vector.bin.vectors_vocab.npy with mmap=None
2019-03-10 00:30:53,115 : INFO : loading vectors_ngrams from subword_vector.bin.vectors_ngrams.npy with mmap=None
2019-03-10 00:30:53,215 : INFO : setting ignored attribute vectors_norm to None
2019-03-10 00:30:53,216 : INFO : setting ignored attribute vectors_vocab_norm to None
2019-03-10 00:30:53,216 : INFO : setting ignored attribute vectors_ngrams_norm to None
2019-

In [88]:
w1=['staff']
topn_cbow=cbow_vectors.wv.most_similar(positive=w1, topn=8)
topn_subword=subword_vectors.wv.most_similar(positive=w1, topn=8)
topn_skipgram=skipgram_vectors.wv.most_similar(positive=w1, topn=8)


def get_topn(label,topn_items):
    topn_words=[item[0] for item in topn_items]
    
    return [label,topn_words]

  
  This is separate from the ipykernel package so we can avoid doing imports until
  after removing the cwd from sys.path.


In [117]:
from IPython.display import display_html
import pandas as pd 



def display_side_by_side(*args):
    html_str=''
    
    for df in args:
        html_str+=df.to_html()
        
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)
    

display_side_by_side(pd.DataFrame(topn_cbow,columns=['cbow_sim','cbow_sim_score']),
                     pd.DataFrame(topn_skipgram,columns=['skipgram_sim','skipgram_sim_score']),
                     pd.DataFrame(topn_subword,columns=['subword_sim','subword_sim_score']))


Unnamed: 0,cbow_sim,cbow_sim_score
0,personnel,0.85232
1,staffs,0.814188
2,employees,0.810958
3,personel,0.705274
4,receptionists,0.704763
5,staf,0.695963
6,satff,0.653459
7,doormen,0.619601

Unnamed: 0,skipgram_sim,skipgram_sim_score
0,friendly,0.754009
1,helpful,0.737687
2,employees,0.688849
3,courteous,0.669786
4,polite,0.665958
5,personnel,0.659796
6,doormen,0.65627
7,staffl,0.648195

Unnamed: 0,subword_sim,subword_sim_score
0,staffstaff,0.987939
1,stafff,0.976234
2,staffl,0.961618
3,staffc,0.961578
4,staffa,0.953434
5,staffwe,0.953239
6,stafft,0.949566
7,staffs,0.938418


## Similarity between words

In [116]:
# similarity between two related words

w1='filth'
w2='filthy'
sim_cbow=model_cbow.wv.similarity(w1=w1,w2=w2)
sim_skipgram=model_skipgram.wv.similarity(w1=w1,w2=w2)
sim_subword=model_subword.wv.similarity(w1=w1,w2=w2)


sim_cbow,sim_skipgram,sim_subword

(0.6141429, 0.7374536, 0.82823896)

Under the hood, the above three snippets computes the cosine similarity between the two specified words using word vectors of each. From the scores, it makes sense that `dirty` is highly similar to `smelly` but `dirty` is dissimilar to `clean`. If you do a similarity between two identical words, the score will be 1.0 as the range of the cosine similarity score will always be between [0.0-1.0]. You can read more about cosine similarity scoring [here](https://en.wikipedia.org/wiki/Cosine_similarity).

### Find the odd one out
You can even use Word2Vec to find odd items given a list of items.

In [63]:
# Which one is the odd one out in this list?
model.wv.doesnt_match(["cat","dog","france"])

'france'

In [77]:
# Which one is the odd one out in this list?
model.wv.doesnt_match(["bed","pillow","duvet","shower"])


'shower'

## Understanding some of the parameters
To train the model earlier, we had to set some parameters. Now, let's try to understand what some of them mean. For reference, this is the command that we used to train the model.

```
model = gensim.models.Word2Vec (documents, size=150, window=10, min_count=2, workers=10)
```

### `size`
The size of the dense vector to represent each token or word. If you have very limited data, then size should be a much smaller value. If you have lots of data, its good to experiment with various sizes. A value of 100-150 has worked well for me. 

### `window`
The maximum distance between the target word and its neighboring word. If your neighbor's position is greater than the maximum window width to the left and the right, then, some neighbors are not considered as being related to the target word. In theory, a smaller window should give you terms that are more related. If you have lots of data, then the window size should not matter too much, as long as its a decent sized window. 

### `min_count`
Minimium frequency count of words. The model would ignore words that do not statisfy the `min_count`. Extremely infrequent words are usually unimportant, so its best to get rid of those. Unless your dataset is really tiny, this does not really affect the model.

### `workers`
How many threads to use behind the scenes?


## When should you use Word2Vec?

There are many application scenarios for Word2Vec. Imagine if you need to build a sentiment lexicon. Training a Word2Vec model on large amounts of user reviews helps you achieve that. You have a lexicon for not just sentiment, but for most words in the vocabulary. 

Beyond, raw unstructured text data, you could also use Word2Vec for more structured data. For example, if you had tags for a million stackoverflow questions and answers, you could find tags that are related to a given tag and recommend the related ones for exploration. You can do this by treating each set of co-occuring tags as a "sentence" and train a Word2Vec model on this data. Granted, you still need a large number of examples to make it work. 
