In [2]:
%matplotlib inline


How to Apply Doc2Vec to Reproduce the 'Paragraph Vector' paper
==============================================================

Shows how to reproduce results of the "Distributed Representation of Sentences and Documents" paper by Le and Mikolov using Gensim.




In [3]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

Introduction
------------

This guide shows you how to reproduce the results of the paper by `Le and
Mikolov 2014 <https://arxiv.org/pdf/1405.4053.pdf>`_ using Gensim. While the
entire paper is worth reading (it's only 9 pages), we will be focusing on
Section 3.2: "Beyond One Sentence - Sentiment Analysis with the IMDB
dataset".

This guide follows the following steps:

#. Load the IMDB dataset
#. Train a variety of Doc2Vec models on the dataset
#. Evaluate the performance of each model using a logistic regression
#. Examine some of the results directly:

When examining results, we will look for answers for the following questions:

#. Are inferred vectors close to the precalculated ones?
#. Do close documents seem more related than distant ones?
#. Do the word vectors show useful similarities?
#. Are the word vectors from this dataset any good at analogies?

Load corpus
-----------

Our data for the tutorial will be the `IMDB archive
<http://ai.stanford.edu/~amaas/data/sentiment/>`_.
If you're not familiar with this dataset, then here's a brief intro: it
contains several thousand movie reviews.

Each review is a single line of text containing multiple sentences, for example:

```
One of the best movie-dramas I have ever seen. We do a lot of acting in the
church and this is one that can be used as a resource that highlights all the
good things that actors can do in their work. I highly recommend this one,
especially for those who have an interest in acting, as a "must see."
```

These reviews will be the **documents** that we will work with in this tutorial.
There are 100 thousand reviews in total.

#. 25k reviews for training (12.5k positive, 12.5k negative)
#. 25k reviews for testing (12.5k positive, 12.5k negative)
#. 50k unlabeled reviews

Out of 100k reviews, 50k have a label: either positive (the reviewer liked
the movie) or negative.
The remaining 50k are unlabeled.

Our first task will be to prepare the dataset.

More specifically, we will:

#. Download the tar.gz file (it's only 84MB, so this shouldn't take too long)
#. Unpack it and extract each movie review
#. Split the reviews into training and test datasets

First, let's define a convenient datatype for holding data for a single document:

* words: The text of the document, as a ``list`` of words.
* tags: Used to keep the index of the document in the entire dataset.
* split: one of ``train``\ , ``test`` or ``extra``. Determines how the document will be used (for training, testing, etc).
* sentiment: either 1 (positive), 0 (negative) or None (unlabeled document).

This data type is helpful for later evaluation and reporting.
In particular, the ``index`` member will help us quickly and easily retrieve the vectors for a document from a model.




In [4]:
import collections

SentimentDocument = collections.namedtuple('SentimentDocument', 'words tags split sentiment')

We can now proceed with loading the corpus.



In [5]:
import io
import re
import tarfile
import os.path

import smart_open
import gensim.utils

def download_dataset(url='http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz'):
    fname = url.split('/')[-1]

    if os.path.isfile(fname):
       return fname

    # Download the file to local storage first.
    # We can't read it on the fly because of
    # https://github.com/RaRe-Technologies/smart_open/issues/331
    with smart_open.open(url, "rb", ignore_ext=True) as fin:
        with smart_open.open(fname, 'wb', ignore_ext=True) as fout:
            while True:
                buf = fin.read(io.DEFAULT_BUFFER_SIZE)
                if not buf:
                    break
                fout.write(buf)

    return fname

def create_sentiment_document(name, text, index):
    _, split, sentiment_str, _ = name.split('/')
    sentiment = {'pos': 1.0, 'neg': 0.0, 'unsup': None}[sentiment_str]

    if sentiment is None:
        split = 'extra'

    tokens = gensim.utils.to_unicode(text).split()
    return SentimentDocument(tokens, [index], split, sentiment)

def extract_documents():
    fname = download_dataset()

    index = 0

    with tarfile.open(fname, mode='r:gz') as tar:
        for member in tar.getmembers():
            if re.match(r'aclImdb/(train|test)/(pos|neg|unsup)/\d+_\d+.txt$', member.name):
                member_bytes = tar.extractfile(member).read()
                member_text = member_bytes.decode('utf-8', errors='replace')
                assert member_text.count('\n') == 0
                yield create_sentiment_document(member.name, member_text, index)
                index += 1

alldocs = list(extract_documents())

Here's what a single document looks like



In [6]:
print(alldocs[27])

SentimentDocument(words=['I', 'was', 'looking', 'forward', 'to', 'this', 'movie.', 'Trustworthy', 'actors,', 'interesting', 'plot.', 'Great', 'atmosphere', 'then', '?????', 'IF', 'you', 'are', 'going', 'to', 'attempt', 'something', 'that', 'is', 'meant', 'to', 'encapsulate', 'the', 'meaning', 'of', 'life.', 'First.', 'Know', 'it.', 'OK', 'I', 'did', 'not', 'expect', 'the', 'directors', 'or', 'writers', 'to', 'actually', 'know', 'the', 'meaning', 'but', 'I', 'thought', 'they', 'may', 'have', 'offered', 'crumbs', 'to', 'peck', 'at', 'and', 'treats', 'to', 'add', 'fuel', 'to', 'the', 'fire-Which!', 'they', 'almost', 'did.', 'Things', 'I', "didn't", 'get.', 'A', 'woman', 'wandering', 'around', 'in', 'dark', 'places', 'and', 'lonely', 'car', 'parks', 'alone-oblivious', 'to', 'the', 'consequences.', 'Great', 'riddles', 'that', 'fell', 'by', 'the', 'wayside.', 'The', 'promise', 'of', 'the', 'knowledge', 'therein', 'contained', 'by', 'the', 'original', 'so-called', 'criminal.', 'I', 'had', 'no

Extract our documents and split into training/test sets



In [9]:
train_docs = [doc for doc in alldocs if doc.split == 'train']
test_docs = [doc for doc in alldocs if doc.split == 'test']
print('%d docs: %d train-sentiment, %d test-sentiment' % (len(alldocs), len(train_docs), len(test_docs)))

100000 docs: 25000 train-sentiment, 25000 test-sentiment


Set-up Doc2Vec Training & Evaluation Models
-------------------------------------------
We approximate the experiment of Le & Mikolov `"Distributed Representations
of Sentences and Documents"
<http://cs.stanford.edu/~quocle/paragraph_vector.pdf>`_ with guidance from
Mikolov's `example go.sh
<https://groups.google.com/d/msg/word2vec-toolkit/Q49FIrNOQRo/J6KG8mUj45sJ>`_::

    ./word2vec -train ../alldata-id.txt -output vectors.txt -cbow 0 -size 100 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 40 -binary 0 -iter 20 -min-count 1 -sentence-vectors 1

We vary the following parameter choices:

* 100-dimensional vectors, as the 400-d vectors of the paper take a lot of
  memory and, in our tests of this task, don't seem to offer much benefit
* Similarly, frequent word subsampling seems to decrease sentiment-prediction
  accuracy, so it's left out
* ``cbow=0`` means skip-gram which is equivalent to the paper's 'PV-DBOW'
  mode, matched in gensim with ``dm=0``
* Added to that DBOW model are two DM models, one which averages context
  vectors (\ ``dm_mean``\ ) and one which concatenates them (\ ``dm_concat``\ ,
  resulting in a much larger, slower, more data-hungry model)
* A ``min_count=2`` saves quite a bit of model memory, discarding only words
  that appear in a single doc (and are thus no more expressive than the
  unique-to-each doc vectors themselves)




In [10]:
import multiprocessing
from collections import OrderedDict

import gensim.models.doc2vec
#assert gensim.models.doc2vec.FAST_VERSION > -1, "This will be painfully slow otherwise"

from gensim.models.doc2vec import Doc2Vec

common_kwargs = dict(
    vector_size=100, epochs=20, min_count=2,
    sample=0, workers=multiprocessing.cpu_count(), negative=5, hs=0,
)

simple_models = [
    # PV-DBOW plain
    Doc2Vec(dm=0, **common_kwargs),
    # PV-DM w/ default averaging; a higher starting alpha may improve CBOW/PV-DM modes
    Doc2Vec(dm=1, window=10, alpha=0.05, comment='alpha=0.05', **common_kwargs),
    # PV-DM w/ concatenation - big, slow, experimental mode
    # window=5 (both sides) approximates paper's apparent 10-word total window size
    Doc2Vec(dm=1, dm_concat=1, window=5, **common_kwargs),
]

for model in simple_models:
    model.build_vocab(alldocs)
    print("%s vocabulary scanned & state initialized" % model)

models_by_name = OrderedDict((str(model), model) for model in simple_models)

2020-09-01 08:50:36,954 : INFO : using concatenative 1100-dimensional layer1
2020-09-01 08:50:36,954 : INFO : collecting all words and their counts
2020-09-01 08:50:36,955 : INFO : PROGRESS: at example #0, processed 0 words (0/s), 0 word types, 0 tags
2020-09-01 08:50:37,264 : INFO : PROGRESS: at example #10000, processed 2292381 words (7426204/s), 150816 word types, 10000 tags
2020-09-01 08:50:37,594 : INFO : PROGRESS: at example #20000, processed 4573645 words (6928108/s), 238497 word types, 20000 tags
2020-09-01 08:50:37,930 : INFO : PROGRESS: at example #30000, processed 6865575 words (6828415/s), 312348 word types, 30000 tags
2020-09-01 08:50:38,287 : INFO : PROGRESS: at example #40000, processed 9190019 words (6515924/s), 377231 word types, 40000 tags
2020-09-01 08:50:38,650 : INFO : PROGRESS: at example #50000, processed 11557847 words (6545275/s), 438729 word types, 50000 tags
2020-09-01 08:50:39,010 : INFO : PROGRESS: at example #60000, processed 13899883 words (6511465/s), 49

Doc2Vec(dbow,d100,n5,mc2,t12) vocabulary scanned & state initialized


2020-09-01 08:51:30,304 : INFO : PROGRESS: at example #10000, processed 2292381 words (7922493/s), 150816 word types, 10000 tags
2020-09-01 08:51:30,612 : INFO : PROGRESS: at example #20000, processed 4573645 words (7418203/s), 238497 word types, 20000 tags
2020-09-01 08:51:30,928 : INFO : PROGRESS: at example #30000, processed 6865575 words (7275484/s), 312348 word types, 30000 tags
2020-09-01 08:51:31,263 : INFO : PROGRESS: at example #40000, processed 9190019 words (6936175/s), 377231 word types, 40000 tags
2020-09-01 08:51:31,604 : INFO : PROGRESS: at example #50000, processed 11557847 words (6964074/s), 438729 word types, 50000 tags
2020-09-01 08:51:31,943 : INFO : PROGRESS: at example #60000, processed 13899883 words (6915696/s), 493913 word types, 60000 tags
2020-09-01 08:51:32,290 : INFO : PROGRESS: at example #70000, processed 16270094 words (6845134/s), 548474 word types, 70000 tags
2020-09-01 08:51:32,632 : INFO : PROGRESS: at example #80000, processed 18598876 words (682256

Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12) vocabulary scanned & state initialized


2020-09-01 08:52:23,517 : INFO : PROGRESS: at example #10000, processed 2292381 words (7933630/s), 150816 word types, 10000 tags
2020-09-01 08:52:23,820 : INFO : PROGRESS: at example #20000, processed 4573645 words (7553316/s), 238497 word types, 20000 tags
2020-09-01 08:52:24,130 : INFO : PROGRESS: at example #30000, processed 6865575 words (7416629/s), 312348 word types, 30000 tags
2020-09-01 08:52:24,460 : INFO : PROGRESS: at example #40000, processed 9190019 words (7045852/s), 377231 word types, 40000 tags
2020-09-01 08:52:24,792 : INFO : PROGRESS: at example #50000, processed 11557847 words (7152896/s), 438729 word types, 50000 tags
2020-09-01 08:52:25,126 : INFO : PROGRESS: at example #60000, processed 13899883 words (7027796/s), 493913 word types, 60000 tags
2020-09-01 08:52:25,467 : INFO : PROGRESS: at example #70000, processed 16270094 words (6953210/s), 548474 word types, 70000 tags
2020-09-01 08:52:25,804 : INFO : PROGRESS: at example #80000, processed 18598876 words (692985

Doc2Vec(dm/c,d100,n5,w5,mc2,t12) vocabulary scanned & state initialized


Le and Mikolov note that combining a paragraph vector from Distributed Bag of
Words (DBOW) and Distributed Memory (DM) improves performance. We will
follow, pairing the models together for evaluation. Here, we concatenate the
paragraph vectors obtained from each model with the help of a thin wrapper
class included in a gensim test module. (Note that this a separate, later
concatenation of output-vectors than the kind of input-window-concatenation
enabled by the ``dm_concat=1`` mode above.)




In [11]:
from gensim.test.test_doc2vec import ConcatenatedDoc2Vec
models_by_name['dbow+dmm'] = ConcatenatedDoc2Vec([simple_models[0], simple_models[1]])
models_by_name['dbow+dmc'] = ConcatenatedDoc2Vec([simple_models[0], simple_models[2]])

2020-09-01 08:53:15,576 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-09-01 08:53:15,577 : INFO : built Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...) from 9 documents (total 29 corpus positions)


Predictive Evaluation Methods
-----------------------------

Given a document, our ``Doc2Vec`` models output a vector representation of the document.
How useful is a particular model?
In case of sentiment analysis, we want the ouput vector to reflect the sentiment in the input document.
So, in vector space, positive documents should be distant from negative documents.

We train a logistic regression from the training set:

  - regressors (inputs): document vectors from the Doc2Vec model
  - target (outpus): sentiment labels

So, this logistic regression will be able to predict sentiment given a document vector.

Next, we test our logistic regression on the test set, and measure the rate of errors (incorrect predictions).
If the document vectors from the Doc2Vec model reflect the actual sentiment well, the error rate will be low.

Therefore, the error rate of the logistic regression is indication of *how well* the given Doc2Vec model represents documents as vectors.
We can then compare different ``Doc2Vec`` models by looking at their error rates.




In [12]:
import numpy as np
import statsmodels.api as sm
from random import sample

def logistic_predictor_from_data(train_targets, train_regressors):
    """Fit a statsmodel logistic predictor on supplied data"""
    logit = sm.Logit(train_targets, train_regressors)
    predictor = logit.fit(disp=0)
    # print(predictor.summary())
    return predictor

def error_rate_for_model(test_model, train_set, test_set):
    """Report error rate on test_doc sentiments, using supplied model and train_docs"""

    train_targets = [doc.sentiment for doc in train_set]
    train_regressors = [test_model.docvecs[doc.tags[0]] for doc in train_set]
    train_regressors = sm.add_constant(train_regressors)
    predictor = logistic_predictor_from_data(train_targets, train_regressors)

    test_regressors = [test_model.docvecs[doc.tags[0]] for doc in test_set]
    test_regressors = sm.add_constant(test_regressors)

    # Predict & evaluate
    test_predictions = predictor.predict(test_regressors)
    corrects = sum(np.rint(test_predictions) == [doc.sentiment for doc in test_set])
    errors = len(test_predictions) - corrects
    error_rate = float(errors) / len(test_predictions)
    return (error_rate, errors, len(test_predictions), predictor)

Bulk Training & Per-Model Evaluation
------------------------------------

Note that doc-vector training is occurring on *all* documents of the dataset,
which includes all TRAIN/TEST/DEV docs.  Because the native document-order
has similar-sentiment documents in large clumps – which is suboptimal for
training – we work with once-shuffled copy of the training set.

We evaluate each model's sentiment predictive power based on error rate, and
the evaluation is done for each model.

(On a 4-core 2.6Ghz Intel Core i7, these 20 passes training and evaluating 3
main models takes about an hour.)




In [13]:
from collections import defaultdict
error_rates = defaultdict(lambda: 1.0)  # To selectively print only best errors achieved

In [14]:
from random import shuffle
shuffled_alldocs = alldocs[:]
shuffle(shuffled_alldocs)

for model in simple_models:
    print("Training %s" % model)
    model.train(shuffled_alldocs, total_examples=len(shuffled_alldocs), epochs=model.epochs)

    print("\nEvaluating %s" % model)
    err_rate, err_count, test_count, predictor = error_rate_for_model(model, train_docs, test_docs)
    error_rates[str(model)] = err_rate
    print("\n%f %s\n" % (err_rate, model))

for model in [models_by_name['dbow+dmm'], models_by_name['dbow+dmc']]:
    print("\nEvaluating %s" % model)
    err_rate, err_count, test_count, predictor = error_rate_for_model(model, train_docs, test_docs)
    error_rates[str(model)] = err_rate
    print("\n%f %s\n" % (err_rate, model))

2020-09-01 08:53:16,050 : INFO : training model with 12 workers on 265408 vocabulary and 100 features, using sg=1 hs=0 sample=0 negative=5 window=5


Training Doc2Vec(dbow,d100,n5,mc2,t12)


2020-09-01 08:53:17,060 : INFO : EPOCH 1 - PROGRESS: at 6.68% examples, 1532881 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:53:18,062 : INFO : EPOCH 1 - PROGRESS: at 13.73% examples, 1573074 words/s, in_qsize 22, out_qsize 1
2020-09-01 08:53:19,068 : INFO : EPOCH 1 - PROGRESS: at 20.83% examples, 1585205 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:53:20,078 : INFO : EPOCH 1 - PROGRESS: at 27.96% examples, 1596417 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:53:21,081 : INFO : EPOCH 1 - PROGRESS: at 34.98% examples, 1603403 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:53:22,094 : INFO : EPOCH 1 - PROGRESS: at 42.06% examples, 1605600 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:53:23,094 : INFO : EPOCH 1 - PROGRESS: at 49.30% examples, 1610257 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:53:24,102 : INFO : EPOCH 1 - PROGRESS: at 56.27% examples, 1608292 words/s, in_qsize 22, out_qsize 1
2020-09-01 08:53:25,115 : INFO : EPOCH 1 - PROGRESS: at 63.38% examples, 1610856 

2020-09-01 08:53:58,573 : INFO : worker thread finished; awaiting finish of 1 more threads
2020-09-01 08:53:58,573 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 08:53:58,574 : INFO : EPOCH - 3 : training on 23279529 raw words (22951015 effective words) took 14.2s, 1621323 effective words/s
2020-09-01 08:53:59,583 : INFO : EPOCH 4 - PROGRESS: at 6.81% examples, 1561446 words/s, in_qsize 24, out_qsize 0
2020-09-01 08:54:00,588 : INFO : EPOCH 4 - PROGRESS: at 13.89% examples, 1590107 words/s, in_qsize 21, out_qsize 2
2020-09-01 08:54:01,594 : INFO : EPOCH 4 - PROGRESS: at 21.05% examples, 1599730 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:54:02,600 : INFO : EPOCH 4 - PROGRESS: at 28.00% examples, 1599144 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:54:03,605 : INFO : EPOCH 4 - PROGRESS: at 35.03% examples, 1604717 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:54:04,614 : INFO : EPOCH 4 - PROGRESS: at 42.06% examples, 1606372 words/s, in_qsize 23, 

2020-09-01 08:54:41,097 : INFO : worker thread finished; awaiting finish of 4 more threads
2020-09-01 08:54:41,100 : INFO : worker thread finished; awaiting finish of 3 more threads
2020-09-01 08:54:41,104 : INFO : worker thread finished; awaiting finish of 2 more threads
2020-09-01 08:54:41,105 : INFO : worker thread finished; awaiting finish of 1 more threads
2020-09-01 08:54:41,106 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 08:54:41,107 : INFO : EPOCH - 6 : training on 23279529 raw words (22951015 effective words) took 14.2s, 1620222 effective words/s
2020-09-01 08:54:42,117 : INFO : EPOCH 7 - PROGRESS: at 6.81% examples, 1558879 words/s, in_qsize 24, out_qsize 0
2020-09-01 08:54:43,120 : INFO : EPOCH 7 - PROGRESS: at 13.93% examples, 1595159 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:54:44,127 : INFO : EPOCH 7 - PROGRESS: at 21.10% examples, 1602652 words/s, in_qsize 22, out_qsize 1
2020-09-01 08:54:45,132 : INFO : EPOCH 7 - PROGRESS: at 28.0

2020-09-01 08:55:23,601 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 08:55:23,603 : INFO : worker thread finished; awaiting finish of 6 more threads
2020-09-01 08:55:23,604 : INFO : worker thread finished; awaiting finish of 5 more threads
2020-09-01 08:55:23,612 : INFO : worker thread finished; awaiting finish of 4 more threads
2020-09-01 08:55:23,613 : INFO : worker thread finished; awaiting finish of 3 more threads
2020-09-01 08:55:23,615 : INFO : worker thread finished; awaiting finish of 2 more threads
2020-09-01 08:55:23,617 : INFO : worker thread finished; awaiting finish of 1 more threads
2020-09-01 08:55:23,618 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 08:55:23,618 : INFO : EPOCH - 9 : training on 23279529 raw words (22951015 effective words) took 14.2s, 1620153 effective words/s
2020-09-01 08:55:24,627 : INFO : EPOCH 10 - PROGRESS: at 6.81% examples, 1562065 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:55

2020-09-01 08:56:06,089 : INFO : worker thread finished; awaiting finish of 11 more threads
2020-09-01 08:56:06,093 : INFO : worker thread finished; awaiting finish of 10 more threads
2020-09-01 08:56:06,099 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 08:56:06,103 : INFO : worker thread finished; awaiting finish of 8 more threads
2020-09-01 08:56:06,105 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 08:56:06,106 : INFO : worker thread finished; awaiting finish of 6 more threads
2020-09-01 08:56:06,108 : INFO : worker thread finished; awaiting finish of 5 more threads
2020-09-01 08:56:06,114 : INFO : worker thread finished; awaiting finish of 4 more threads
2020-09-01 08:56:06,115 : INFO : worker thread finished; awaiting finish of 3 more threads
2020-09-01 08:56:06,116 : INFO : worker thread finished; awaiting finish of 2 more threads
2020-09-01 08:56:06,118 : INFO : worker thread finished; awaiting finish of 1 more threa

2020-09-01 08:56:46,509 : INFO : EPOCH 15 - PROGRESS: at 84.78% examples, 1617304 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:56:47,515 : INFO : EPOCH 15 - PROGRESS: at 91.98% examples, 1618467 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:56:48,519 : INFO : EPOCH 15 - PROGRESS: at 99.16% examples, 1618758 words/s, in_qsize 21, out_qsize 0
2020-09-01 08:56:48,581 : INFO : worker thread finished; awaiting finish of 11 more threads
2020-09-01 08:56:48,586 : INFO : worker thread finished; awaiting finish of 10 more threads
2020-09-01 08:56:48,590 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 08:56:48,593 : INFO : worker thread finished; awaiting finish of 8 more threads
2020-09-01 08:56:48,595 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 08:56:48,597 : INFO : worker thread finished; awaiting finish of 6 more threads
2020-09-01 08:56:48,600 : INFO : worker thread finished; awaiting finish of 5 more threads
2020-09-01 08

2020-09-01 08:57:26,061 : INFO : EPOCH 18 - PROGRESS: at 63.85% examples, 1617812 words/s, in_qsize 22, out_qsize 1
2020-09-01 08:57:27,070 : INFO : EPOCH 18 - PROGRESS: at 70.93% examples, 1617457 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:57:28,071 : INFO : EPOCH 18 - PROGRESS: at 78.08% examples, 1619276 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:57:29,083 : INFO : EPOCH 18 - PROGRESS: at 85.30% examples, 1618035 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:57:30,089 : INFO : EPOCH 18 - PROGRESS: at 92.52% examples, 1619260 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:57:31,083 : INFO : worker thread finished; awaiting finish of 11 more threads
2020-09-01 08:57:31,094 : INFO : EPOCH 18 - PROGRESS: at 99.61% examples, 1618492 words/s, in_qsize 10, out_qsize 1
2020-09-01 08:57:31,098 : INFO : worker thread finished; awaiting finish of 10 more threads
2020-09-01 08:57:31,100 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 08:57:31,103 : INF


Evaluating Doc2Vec(dbow,d100,n5,mc2,t12)


2020-09-01 08:57:59,871 : INFO : training model with 12 workers on 265408 vocabulary and 100 features, using sg=0 hs=0 sample=0 negative=5 window=10



0.103960 Doc2Vec(dbow,d100,n5,mc2,t12)

Training Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)


2020-09-01 08:58:00,887 : INFO : EPOCH 1 - PROGRESS: at 3.77% examples, 859452 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:01,889 : INFO : EPOCH 1 - PROGRESS: at 7.91% examples, 909264 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:02,904 : INFO : EPOCH 1 - PROGRESS: at 12.16% examples, 926937 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:03,909 : INFO : EPOCH 1 - PROGRESS: at 16.35% examples, 936016 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:04,909 : INFO : EPOCH 1 - PROGRESS: at 20.65% examples, 942774 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:05,931 : INFO : EPOCH 1 - PROGRESS: at 24.82% examples, 941961 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:06,935 : INFO : EPOCH 1 - PROGRESS: at 28.96% examples, 943908 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:07,936 : INFO : EPOCH 1 - PROGRESS: at 33.18% examples, 947959 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:08,937 : INFO : EPOCH 1 - PROGRESS: at 37.28% examples, 948923 words/s, i

2020-09-01 08:58:53,810 : INFO : EPOCH 3 - PROGRESS: at 24.99% examples, 950813 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:54,822 : INFO : EPOCH 3 - PROGRESS: at 29.27% examples, 954498 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:55,845 : INFO : EPOCH 3 - PROGRESS: at 33.49% examples, 955520 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:56,854 : INFO : EPOCH 3 - PROGRESS: at 37.72% examples, 958245 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:57,867 : INFO : EPOCH 3 - PROGRESS: at 42.02% examples, 960108 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:58,898 : INFO : EPOCH 3 - PROGRESS: at 46.19% examples, 955741 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:58:59,927 : INFO : EPOCH 3 - PROGRESS: at 50.55% examples, 956906 words/s, in_qsize 22, out_qsize 1
2020-09-01 08:59:00,934 : INFO : EPOCH 3 - PROGRESS: at 54.77% examples, 957990 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:01,938 : INFO : EPOCH 3 - PROGRESS: at 58.98% examples, 958986 words/s,

2020-09-01 08:59:46,595 : INFO : EPOCH 5 - PROGRESS: at 46.10% examples, 958709 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:47,605 : INFO : EPOCH 5 - PROGRESS: at 50.30% examples, 957959 words/s, in_qsize 24, out_qsize 0
2020-09-01 08:59:48,624 : INFO : EPOCH 5 - PROGRESS: at 54.53% examples, 958072 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:49,643 : INFO : EPOCH 5 - PROGRESS: at 58.76% examples, 958089 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:50,646 : INFO : EPOCH 5 - PROGRESS: at 62.99% examples, 959760 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:51,662 : INFO : EPOCH 5 - PROGRESS: at 67.19% examples, 959964 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:52,663 : INFO : EPOCH 5 - PROGRESS: at 71.38% examples, 959246 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:53,668 : INFO : EPOCH 5 - PROGRESS: at 75.55% examples, 959498 words/s, in_qsize 23, out_qsize 0
2020-09-01 08:59:54,675 : INFO : EPOCH 5 - PROGRESS: at 79.82% examples, 959464 words/s,

2020-09-01 09:00:39,428 : INFO : EPOCH 7 - PROGRESS: at 67.27% examples, 960704 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:40,430 : INFO : EPOCH 7 - PROGRESS: at 71.51% examples, 960413 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:41,432 : INFO : EPOCH 7 - PROGRESS: at 75.69% examples, 960801 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:42,440 : INFO : EPOCH 7 - PROGRESS: at 79.90% examples, 960147 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:43,442 : INFO : EPOCH 7 - PROGRESS: at 84.16% examples, 959997 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:44,444 : INFO : EPOCH 7 - PROGRESS: at 88.55% examples, 962050 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:45,446 : INFO : EPOCH 7 - PROGRESS: at 92.64% examples, 960031 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:46,456 : INFO : EPOCH 7 - PROGRESS: at 96.93% examples, 960056 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:00:47,084 : INFO : worker thread finished; awaiting finish of 11 more thre

2020-09-01 09:01:32,232 : INFO : EPOCH 9 - PROGRESS: at 88.58% examples, 960193 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:01:33,235 : INFO : EPOCH 9 - PROGRESS: at 93.06% examples, 962148 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:01:34,236 : INFO : EPOCH 9 - PROGRESS: at 97.23% examples, 961178 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:01:34,793 : INFO : worker thread finished; awaiting finish of 11 more threads
2020-09-01 09:01:34,804 : INFO : worker thread finished; awaiting finish of 10 more threads
2020-09-01 09:01:34,816 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 09:01:34,817 : INFO : worker thread finished; awaiting finish of 8 more threads
2020-09-01 09:01:34,824 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 09:01:34,827 : INFO : worker thread finished; awaiting finish of 6 more threads
2020-09-01 09:01:34,831 : INFO : worker thread finished; awaiting finish of 5 more threads
2020-09-01 09:01:34

2020-09-01 09:02:22,494 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 09:02:22,495 : INFO : worker thread finished; awaiting finish of 8 more threads
2020-09-01 09:02:22,496 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 09:02:22,506 : INFO : worker thread finished; awaiting finish of 6 more threads
2020-09-01 09:02:22,508 : INFO : worker thread finished; awaiting finish of 5 more threads
2020-09-01 09:02:22,515 : INFO : worker thread finished; awaiting finish of 4 more threads
2020-09-01 09:02:22,520 : INFO : worker thread finished; awaiting finish of 3 more threads
2020-09-01 09:02:22,522 : INFO : worker thread finished; awaiting finish of 2 more threads
2020-09-01 09:02:22,524 : INFO : worker thread finished; awaiting finish of 1 more threads
2020-09-01 09:02:22,526 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 09:02:22,526 : INFO : EPOCH - 11 : training on 23279529 raw words (22951015 eff

2020-09-01 09:03:10,215 : INFO : worker thread finished; awaiting finish of 4 more threads
2020-09-01 09:03:10,220 : INFO : worker thread finished; awaiting finish of 3 more threads
2020-09-01 09:03:10,223 : INFO : worker thread finished; awaiting finish of 2 more threads
2020-09-01 09:03:10,224 : INFO : worker thread finished; awaiting finish of 1 more threads
2020-09-01 09:03:10,226 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 09:03:10,226 : INFO : EPOCH - 13 : training on 23279529 raw words (22951015 effective words) took 23.9s, 961827 effective words/s
2020-09-01 09:03:11,240 : INFO : EPOCH 14 - PROGRESS: at 4.11% examples, 930635 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:03:12,244 : INFO : EPOCH 14 - PROGRESS: at 8.28% examples, 948003 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:03:13,252 : INFO : EPOCH 14 - PROGRESS: at 12.39% examples, 942559 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:03:14,270 : INFO : EPOCH 14 - PROGRESS: at 16.6

2020-09-01 09:03:57,898 : INFO : EPOCH - 15 : training on 23279529 raw words (22951015 effective words) took 23.8s, 963168 effective words/s
2020-09-01 09:03:58,903 : INFO : EPOCH 16 - PROGRESS: at 4.07% examples, 929312 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:03:59,908 : INFO : EPOCH 16 - PROGRESS: at 8.20% examples, 942199 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:00,917 : INFO : EPOCH 16 - PROGRESS: at 12.39% examples, 944772 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:01,941 : INFO : EPOCH 16 - PROGRESS: at 16.61% examples, 947150 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:02,953 : INFO : EPOCH 16 - PROGRESS: at 20.83% examples, 945531 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:03,954 : INFO : EPOCH 16 - PROGRESS: at 25.07% examples, 950833 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:04,967 : INFO : EPOCH 16 - PROGRESS: at 29.31% examples, 953025 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:05,973 : INFO : EPOCH 16 - PROGRESS: at

2020-09-01 09:04:49,581 : INFO : EPOCH 18 - PROGRESS: at 16.56% examples, 950059 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:50,612 : INFO : EPOCH 18 - PROGRESS: at 20.83% examples, 946345 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:51,617 : INFO : EPOCH 18 - PROGRESS: at 25.17% examples, 954005 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:52,640 : INFO : EPOCH 18 - PROGRESS: at 29.41% examples, 954509 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:53,643 : INFO : EPOCH 18 - PROGRESS: at 33.58% examples, 956582 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:54,654 : INFO : EPOCH 18 - PROGRESS: at 37.79% examples, 958993 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:55,666 : INFO : EPOCH 18 - PROGRESS: at 42.02% examples, 958926 words/s, in_qsize 24, out_qsize 0
2020-09-01 09:04:56,677 : INFO : EPOCH 18 - PROGRESS: at 46.19% examples, 956450 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:04:57,709 : INFO : EPOCH 18 - PROGRESS: at 50.55% examples, 957242

2020-09-01 09:05:41,354 : INFO : EPOCH 20 - PROGRESS: at 33.49% examples, 957057 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:42,360 : INFO : EPOCH 20 - PROGRESS: at 37.69% examples, 958828 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:43,360 : INFO : EPOCH 20 - PROGRESS: at 41.94% examples, 960865 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:44,361 : INFO : EPOCH 20 - PROGRESS: at 46.15% examples, 959876 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:45,386 : INFO : EPOCH 20 - PROGRESS: at 50.38% examples, 958641 words/s, in_qsize 22, out_qsize 1
2020-09-01 09:05:46,386 : INFO : EPOCH 20 - PROGRESS: at 54.60% examples, 960091 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:47,397 : INFO : EPOCH 20 - PROGRESS: at 58.83% examples, 960549 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:48,410 : INFO : EPOCH 20 - PROGRESS: at 63.04% examples, 960747 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:05:49,420 : INFO : EPOCH 20 - PROGRESS: at 67.22% examples, 961238


Evaluating Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)


2020-09-01 09:05:57,560 : INFO : training model with 12 workers on 265409 vocabulary and 1100 features, using sg=0 hs=0 sample=0 negative=5 window=5



0.169760 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)

Training Doc2Vec(dm/c,d100,n5,w5,mc2,t12)


2020-09-01 09:05:58,577 : INFO : EPOCH 1 - PROGRESS: at 1.87% examples, 430834 words/s, in_qsize 22, out_qsize 1
2020-09-01 09:05:59,580 : INFO : EPOCH 1 - PROGRESS: at 4.11% examples, 466599 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:00,653 : INFO : EPOCH 1 - PROGRESS: at 6.60% examples, 492859 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:01,669 : INFO : EPOCH 1 - PROGRESS: at 9.14% examples, 512599 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:02,669 : INFO : EPOCH 1 - PROGRESS: at 11.62% examples, 523969 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:03,673 : INFO : EPOCH 1 - PROGRESS: at 14.01% examples, 528009 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:04,685 : INFO : EPOCH 1 - PROGRESS: at 16.35% examples, 529278 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:05,688 : INFO : EPOCH 1 - PROGRESS: at 18.84% examples, 532009 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:06:06,690 : INFO : EPOCH 1 - PROGRESS: at 21.30% examples, 535261 words/s, in_

2020-09-01 09:07:01,659 : INFO : EPOCH 2 - PROGRESS: at 60.09% examples, 594948 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:02,663 : INFO : EPOCH 2 - PROGRESS: at 62.77% examples, 595838 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:03,665 : INFO : EPOCH 2 - PROGRESS: at 65.35% examples, 596273 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:04,673 : INFO : EPOCH 2 - PROGRESS: at 68.08% examples, 597063 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:05,687 : INFO : EPOCH 2 - PROGRESS: at 70.77% examples, 597165 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:06,692 : INFO : EPOCH 2 - PROGRESS: at 73.38% examples, 597195 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:07,703 : INFO : EPOCH 2 - PROGRESS: at 76.05% examples, 597472 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:08,713 : INFO : EPOCH 2 - PROGRESS: at 78.78% examples, 598244 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:09,746 : INFO : EPOCH 2 - PROGRESS: at 81.52% examples, 597954 words/s,

2020-09-01 09:07:54,864 : INFO : EPOCH 4 - PROGRESS: at 5.62% examples, 609969 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:55,919 : INFO : EPOCH 4 - PROGRESS: at 8.64% examples, 627183 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:56,922 : INFO : EPOCH 4 - PROGRESS: at 11.53% examples, 636466 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:57,930 : INFO : EPOCH 4 - PROGRESS: at 14.32% examples, 635750 words/s, in_qsize 22, out_qsize 1
2020-09-01 09:07:58,964 : INFO : EPOCH 4 - PROGRESS: at 17.22% examples, 637678 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:07:59,966 : INFO : EPOCH 4 - PROGRESS: at 20.23% examples, 643229 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:08:00,972 : INFO : EPOCH 4 - PROGRESS: at 23.06% examples, 643550 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:08:01,990 : INFO : EPOCH 4 - PROGRESS: at 25.93% examples, 643833 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:08:03,012 : INFO : EPOCH 4 - PROGRESS: at 28.80% examples, 644874 words/s, i

2020-09-01 09:08:58,055 : INFO : EPOCH 5 - PROGRESS: at 88.59% examples, 671278 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:08:59,070 : INFO : EPOCH 5 - PROGRESS: at 91.74% examples, 672084 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:00,084 : INFO : EPOCH 5 - PROGRESS: at 94.74% examples, 672289 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:01,093 : INFO : EPOCH 5 - PROGRESS: at 97.83% examples, 672576 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:01,697 : INFO : worker thread finished; awaiting finish of 11 more threads
2020-09-01 09:09:01,710 : INFO : worker thread finished; awaiting finish of 10 more threads
2020-09-01 09:09:01,714 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 09:09:01,717 : INFO : worker thread finished; awaiting finish of 8 more threads
2020-09-01 09:09:01,722 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 09:09:01,726 : INFO : worker thread finished; awaiting finish of 6 more thre

2020-09-01 09:09:51,291 : INFO : EPOCH 7 - PROGRESS: at 49.34% examples, 698398 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:52,300 : INFO : EPOCH 7 - PROGRESS: at 52.45% examples, 699550 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:53,318 : INFO : EPOCH 7 - PROGRESS: at 55.58% examples, 699420 words/s, in_qsize 24, out_qsize 0
2020-09-01 09:09:54,347 : INFO : EPOCH 7 - PROGRESS: at 58.64% examples, 699153 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:55,361 : INFO : EPOCH 7 - PROGRESS: at 61.67% examples, 699048 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:56,363 : INFO : EPOCH 7 - PROGRESS: at 64.70% examples, 699619 words/s, in_qsize 22, out_qsize 1
2020-09-01 09:09:57,368 : INFO : EPOCH 7 - PROGRESS: at 67.81% examples, 700325 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:58,376 : INFO : EPOCH 7 - PROGRESS: at 70.97% examples, 700723 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:09:59,387 : INFO : EPOCH 7 - PROGRESS: at 74.09% examples, 701087 words/s,

2020-09-01 09:10:43,730 : INFO : EPOCH 9 - PROGRESS: at 12.75% examples, 709807 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:44,747 : INFO : EPOCH 9 - PROGRESS: at 16.08% examples, 717847 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:45,758 : INFO : EPOCH 9 - PROGRESS: at 19.26% examples, 716844 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:46,760 : INFO : EPOCH 9 - PROGRESS: at 22.45% examples, 717946 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:47,779 : INFO : EPOCH 9 - PROGRESS: at 25.75% examples, 722023 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:48,813 : INFO : EPOCH 9 - PROGRESS: at 28.92% examples, 720923 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:49,824 : INFO : EPOCH 9 - PROGRESS: at 32.16% examples, 722483 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:50,834 : INFO : EPOCH 9 - PROGRESS: at 35.29% examples, 722987 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:10:51,839 : INFO : EPOCH 9 - PROGRESS: at 38.32% examples, 722206 words/s,

2020-09-01 09:11:41,875 : INFO : worker thread finished; awaiting finish of 7 more threads
2020-09-01 09:11:41,885 : INFO : worker thread finished; awaiting finish of 6 more threads
2020-09-01 09:11:41,890 : INFO : worker thread finished; awaiting finish of 5 more threads
2020-09-01 09:11:41,906 : INFO : worker thread finished; awaiting finish of 4 more threads
2020-09-01 09:11:41,912 : INFO : worker thread finished; awaiting finish of 3 more threads
2020-09-01 09:11:41,915 : INFO : worker thread finished; awaiting finish of 2 more threads
2020-09-01 09:11:41,916 : INFO : worker thread finished; awaiting finish of 1 more threads
2020-09-01 09:11:41,917 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 09:11:41,918 : INFO : EPOCH - 10 : training on 23279529 raw words (22951015 effective words) took 30.9s, 742822 effective words/s
2020-09-01 09:11:42,962 : INFO : EPOCH 11 - PROGRESS: at 3.08% examples, 680109 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:11:

2020-09-01 09:12:36,828 : INFO : EPOCH 12 - PROGRESS: at 80.94% examples, 761888 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:12:37,834 : INFO : EPOCH 12 - PROGRESS: at 84.37% examples, 762245 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:12:38,835 : INFO : EPOCH 12 - PROGRESS: at 87.59% examples, 761193 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:12:39,844 : INFO : EPOCH 12 - PROGRESS: at 91.15% examples, 762470 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:12:40,849 : INFO : EPOCH 12 - PROGRESS: at 94.57% examples, 763106 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:12:41,849 : INFO : EPOCH 12 - PROGRESS: at 97.96% examples, 762862 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:12:42,350 : INFO : worker thread finished; awaiting finish of 11 more threads
2020-09-01 09:12:42,355 : INFO : worker thread finished; awaiting finish of 10 more threads
2020-09-01 09:12:42,358 : INFO : worker thread finished; awaiting finish of 9 more threads
2020-09-01 09:12:42,362 : INFO : wo

2020-09-01 09:13:28,299 : INFO : EPOCH 14 - PROGRESS: at 54.86% examples, 778178 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:29,321 : INFO : EPOCH 14 - PROGRESS: at 58.30% examples, 778045 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:30,328 : INFO : EPOCH 14 - PROGRESS: at 61.67% examples, 778080 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:31,352 : INFO : EPOCH 14 - PROGRESS: at 65.11% examples, 778724 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:32,354 : INFO : EPOCH 14 - PROGRESS: at 68.64% examples, 779968 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:33,355 : INFO : EPOCH 14 - PROGRESS: at 71.94% examples, 778722 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:34,358 : INFO : EPOCH 14 - PROGRESS: at 75.37% examples, 778958 words/s, in_qsize 22, out_qsize 1
2020-09-01 09:13:35,364 : INFO : EPOCH 14 - PROGRESS: at 78.83% examples, 779296 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:13:36,379 : INFO : EPOCH 14 - PROGRESS: at 82.46% examples, 780172

2020-09-01 09:14:20,582 : INFO : EPOCH 16 - PROGRESS: at 34.87% examples, 792784 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:21,616 : INFO : EPOCH 16 - PROGRESS: at 38.33% examples, 792247 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:22,617 : INFO : EPOCH 16 - PROGRESS: at 41.86% examples, 793215 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:23,619 : INFO : EPOCH 16 - PROGRESS: at 45.38% examples, 793321 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:24,623 : INFO : EPOCH 16 - PROGRESS: at 48.88% examples, 793130 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:25,627 : INFO : EPOCH 16 - PROGRESS: at 52.36% examples, 794166 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:26,635 : INFO : EPOCH 16 - PROGRESS: at 55.90% examples, 794459 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:27,652 : INFO : EPOCH 16 - PROGRESS: at 59.33% examples, 794113 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:14:28,663 : INFO : EPOCH 16 - PROGRESS: at 62.81% examples, 794612

2020-09-01 09:15:11,802 : INFO : EPOCH 18 - PROGRESS: at 14.05% examples, 795173 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:12,803 : INFO : EPOCH 18 - PROGRESS: at 17.64% examples, 800818 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:13,812 : INFO : EPOCH 18 - PROGRESS: at 21.26% examples, 802026 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:14,825 : INFO : EPOCH 18 - PROGRESS: at 24.78% examples, 802100 words/s, in_qsize 24, out_qsize 0
2020-09-01 09:15:15,844 : INFO : EPOCH 18 - PROGRESS: at 28.30% examples, 801627 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:16,851 : INFO : EPOCH 18 - PROGRESS: at 31.89% examples, 803456 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:17,870 : INFO : EPOCH 18 - PROGRESS: at 35.34% examples, 802827 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:18,878 : INFO : EPOCH 18 - PROGRESS: at 38.84% examples, 804020 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:15:19,888 : INFO : EPOCH 18 - PROGRESS: at 42.41% examples, 804303

2020-09-01 09:16:04,222 : INFO : worker thread finished; awaiting finish of 0 more threads
2020-09-01 09:16:04,223 : INFO : EPOCH - 19 : training on 23279529 raw words (22951015 effective words) took 28.2s, 815214 effective words/s
2020-09-01 09:16:05,233 : INFO : EPOCH 20 - PROGRESS: at 3.35% examples, 763838 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:16:06,245 : INFO : EPOCH 20 - PROGRESS: at 6.89% examples, 789110 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:16:07,265 : INFO : EPOCH 20 - PROGRESS: at 10.46% examples, 794871 words/s, in_qsize 24, out_qsize 0
2020-09-01 09:16:08,267 : INFO : EPOCH 20 - PROGRESS: at 14.05% examples, 801259 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:16:09,295 : INFO : EPOCH 20 - PROGRESS: at 17.69% examples, 803278 words/s, in_qsize 22, out_qsize 1
2020-09-01 09:16:10,310 : INFO : EPOCH 20 - PROGRESS: at 21.38% examples, 806359 words/s, in_qsize 23, out_qsize 0
2020-09-01 09:16:11,314 : INFO : EPOCH 20 - PROGRESS: at 24.95% examples, 808283


Evaluating Doc2Vec(dm/c,d100,n5,w5,mc2,t12)

0.296360 Doc2Vec(dm/c,d100,n5,w5,mc2,t12)


Evaluating Doc2Vec(dbow,d100,n5,mc2,t12)+Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)

0.103440 Doc2Vec(dbow,d100,n5,mc2,t12)+Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)


Evaluating Doc2Vec(dbow,d100,n5,mc2,t12)+Doc2Vec(dm/c,d100,n5,w5,mc2,t12)

0.105600 Doc2Vec(dbow,d100,n5,mc2,t12)+Doc2Vec(dm/c,d100,n5,w5,mc2,t12)



Achieved Sentiment-Prediction Accuracy
--------------------------------------
Compare error rates achieved, best-to-worst



In [15]:
print("Err_rate Model")
for rate, name in sorted((rate, name) for name, rate in error_rates.items()):
    print("%f %s" % (rate, name))

Err_rate Model
0.103440 Doc2Vec(dbow,d100,n5,mc2,t12)+Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)
0.103960 Doc2Vec(dbow,d100,n5,mc2,t12)
0.105600 Doc2Vec(dbow,d100,n5,mc2,t12)+Doc2Vec(dm/c,d100,n5,w5,mc2,t12)
0.169760 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)
0.296360 Doc2Vec(dm/c,d100,n5,w5,mc2,t12)


In our testing, contrary to the results of the paper, on this problem,
PV-DBOW alone performs as good as anything else. Concatenating vectors from
different models only sometimes offers a tiny predictive improvement – and
stays generally close to the best-performing solo model included.

The best results achieved here are just around 10% error rate, still a long
way from the paper's reported 7.42% error rate.

(Other trials not shown, with larger vectors and other changes, also don't
come close to the paper's reported value. Others around the net have reported
a similar inability to reproduce the paper's best numbers. The PV-DM/C mode
improves a bit with many more training epochs – but doesn't reach parity with
PV-DBOW.)




Examining Results
-----------------

Let's look for answers to the following questions:

#. Are inferred vectors close to the precalculated ones?
#. Do close documents seem more related than distant ones?
#. Do the word vectors show useful similarities?
#. Are the word vectors from this dataset any good at analogies?




Are inferred vectors close to the precalculated ones?
-----------------------------------------------------



In [29]:
#doc_id = np.random.randint(simple_models[0].docvecs.count)# Pick random doc; re-run cell for more examples
doc_id = 40484
print('for doc %d...' % doc_id)
for model in simple_models:
    inferred_docvec = model.infer_vector(alldocs[doc_id].words)
    print('%s:\n %s' % (model, model.docvecs.most_similar([inferred_docvec], topn=3)))

for doc 40484...
Doc2Vec(dbow,d100,n5,mc2,t12):
 [(40484, 0.9845091104507446), (68182, 0.6114521622657776), (24699, 0.6026051640510559)]
Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12):
 [(40484, 0.9155202507972717), (39636, 0.6092585325241089), (70206, 0.5715395212173462)]
Doc2Vec(dm/c,d100,n5,w5,mc2,t12):
 [(40484, 0.8342512249946594), (57096, 0.4506417512893677), (65944, 0.44313186407089233)]


for doc 40484...
Doc2Vec(dbow,d100,n5,mc2,t12):
 [(40484, 0.9863420128822327), (68182, 0.6269837021827698), (6824, 0.6055160760879517)]
Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12):
 [(40484, 0.9247998595237732), (86187, 0.6025664806365967), (39636, 0.5781896114349365)]
Doc2Vec(dm/c,d100,n5,w5,mc2,t12):
 [(40484, 0.8544603586196899), (17126, 0.4431724548339844), (59160, 0.43748584389686584)]


for doc 40484...
Doc2Vec(dbow,d100,n5,mc2,t12):
 [(40484, 0.9875952005386353), (68182, 0.6229878664016724), (6824, 0.6047743558883667)]
Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12):
 [(40484, 0.9151185750961304), (39636, 0.5833958983421326), (39220, 0.5720853805541992)]
Doc2Vec(dm/c,d100,n5,w5,mc2,t12):
 [(40484, 0.8362890481948853), (56128, 0.45506563782691956), (43376, 0.4498768746852875)]


for doc 40484...
Doc2Vec(dbow,d100,n5,mc2,t12):
 [(40484, 0.985596776008606), (68182, 0.6207669973373413), (24699, 0.5981945991516113)]
Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12):
 [(40484, 0.9290589094161987), (39636, 0.5834196209907532), (86187, 0.5826824903488159)]
Doc2Vec(dm/c,d100,n5,w5,mc2,t12):
 [(40484, 0.7767651081085205), (73876, 0.4356870651245117), (38544, 0.42808622121810913)]

(Yes, here the stored vector from 20 epochs of training is usually one of the
closest to a freshly-inferred vector for the same words. Defaults for
inference may benefit from tuning for each dataset or model parameters.)




Do close documents seem more related than distant ones?
-------------------------------------------------------



In [17]:
import random

doc_id = np.random.randint(simple_models[0].docvecs.count)  # pick random doc, re-run cell for more examples
model = random.choice(simple_models)  # and a random model
sims = model.docvecs.most_similar(doc_id, topn=model.docvecs.count)  # get *all* similar documents
print(u'TARGET (%d): «%s»\n' % (doc_id, ' '.join(alldocs[doc_id].words)))
print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:\n' % model)
for label, index in [('MOST', 0), ('MEDIAN', len(sims)//2), ('LEAST', len(sims) - 1)]:
    s = sims[index]
    i = sims[index][0]
    words = ' '.join(alldocs[i].words)
    print(u'%s %s: «%s»\n' % (label, s, words))

TARGET (54781): «I will not forget this movie for the rest of my life! Although the direction is excellent with a very good script and good production values, the performances are the standout aspect of this remarkable film. Alan Arkin and Sondra Locke make the film work and were both nominated for Oscars for their work here. Their scenes are magic, but Arkin is easily the stronger performance of the two. The supporting cast is very good, but Chuck McCann deserves special mention here. A comedian who's work has mainly been very lighthearted, he plays a straight dramatic role here and does a excellent job. A very human, very touching, very emotional film, the last 15-20 minutes will stay with you a very long time. Turner Classic Movies runs this on occasion and it is currently in print. Most highly recommended.»

SIMILAR/DISSIMILAR DOCS PER MODEL Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12):

MOST (85064, 0.741161584854126): «I think the film had enough humorous moments to make a 30-m

Somewhat, in terms of reviewer tone, movie genre, etc... the MOST
cosine-similar docs usually seem more like the TARGET than the MEDIAN or
LEAST... especially if the MOST has a cosine-similarity > 0.5. Re-run the
cell to try another random target document.




Do the word vectors show useful similarities?
---------------------------------------------




In [18]:
import random

word_models = simple_models[:]

def pick_random_word(model, threshold=10):
    # pick a random word with a suitable number of occurences
    while True:
        word = random.choice(model.wv.index2word)
        if model.wv.vocab[word].count > threshold:
            return word

target_word = pick_random_word(word_models[0])
# or uncomment below line, to just pick a word from the relevant domain:
# target_word = 'comedy/drama'

for model in word_models:
    print('target_word: %r model: %s similar words:' % (target_word, model))
    for i, (word, sim) in enumerate(model.wv.most_similar(target_word, topn=10), 1):
        print('    %d. %.2f %r' % (i, sim, word))
    print()

2020-09-01 09:16:34,664 : INFO : precomputing L2-norms of word weight vectors
2020-09-01 09:16:34,746 : INFO : precomputing L2-norms of word weight vectors
2020-09-01 09:16:34,840 : INFO : precomputing L2-norms of word weight vectors


target_word: 'animate' model: Doc2Vec(dbow,d100,n5,mc2,t12) similar words:
    1. 0.45 'Carrel'
    2. 0.42 '1/6'
    3. 0.42 'Nott.'
    4. 0.41 "Alice's"
    5. 0.41 'Russians,'
    6. 0.41 "Lagoon's"
    7. 0.40 'sure).<br'
    8. 0.39 'Carlotto'
    9. 0.38 'Myrl'
    10. 0.38 'Chastity'

target_word: 'animate' model: Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12) similar words:
    1. 0.67 'adjust'
    2. 0.66 'attach'
    3. 0.64 'assess'
    4. 0.64 'starve'
    5. 0.63 'adapt'
    6. 0.63 'respond'
    7. 0.63 'utilize'
    8. 0.62 'devote'
    9. 0.62 'listen'
    10. 0.62 'allude'

target_word: 'animate' model: Doc2Vec(dm/c,d100,n5,w5,mc2,t12) similar words:
    1. 0.72 'sew'
    2. 0.71 'flay'
    3. 0.71 'recharge'
    4. 0.70 'oppress'
    5. 0.69 'stabilize'
    6. 0.69 'out-do'
    7. 0.68 'exist;'
    8. 0.68 'sate'
    9. 0.68 'hypnotise'
    10. 0.68 'prolong'



Do the DBOW words look meaningless? That's because the gensim DBOW model
doesn't train word vectors – they remain at their random initialized values –
unless you ask with the ``dbow_words=1`` initialization parameter. Concurrent
word-training slows DBOW mode significantly, and offers little improvement
(and sometimes a little worsening) of the error rate on this IMDB
sentiment-prediction task, but may be appropriate on other tasks, or if you
also need word-vectors.

Words from DM models tend to show meaningfully similar words when there are
many examples in the training data (as with 'plot' or 'actor'). (All DM modes
inherently involve word-vector training concurrent with doc-vector training.)




Are the word vectors from this dataset any good at analogies?
-------------------------------------------------------------



In [19]:
# grab the file if not already local
questions_filename = 'questions-words.txt'
if not os.path.isfile(questions_filename):
    # Download IMDB archive
    print("Downloading analogy questions file...")
    url = u'https://raw.githubusercontent.com/tmikolov/word2vec/master/questions-words.txt'
    with smart_open.open(url, 'rb') as fin:
        with smart_open.open(questions_filename, 'wb') as fout:
            fout.write(fin.read())
assert os.path.isfile(questions_filename), "questions-words.txt unavailable"
print("Success, questions-words.txt is available for next steps.")

# Note: this analysis takes many minutes
for model in word_models:
    score, sections = model.wv.evaluate_word_analogies('questions-words.txt')
    correct, incorrect = len(sections[-1]['correct']), len(sections[-1]['incorrect'])
    print('%s: %0.2f%% correct (%d of %d)' % (model, float(correct*100)/(correct+incorrect), correct, correct+incorrect))

Downloading analogy questions file...
Success, questions-words.txt is available for next steps.


2020-09-01 09:16:36,430 : INFO : Evaluating word analogies for top 300000 words in the model on questions-words.txt
2020-09-01 09:16:39,263 : INFO : capital-common-countries: 0.0% (0/420)
2020-09-01 09:16:45,383 : INFO : capital-world: 0.0% (0/902)
2020-09-01 09:16:45,969 : INFO : currency: 0.0% (0/86)
2020-09-01 09:16:56,310 : INFO : city-in-state: 0.0% (0/1510)
2020-09-01 09:16:59,763 : INFO : family: 0.0% (0/506)
2020-09-01 09:17:06,544 : INFO : gram1-adjective-to-adverb: 0.0% (0/992)
2020-09-01 09:17:11,724 : INFO : gram2-opposite: 0.0% (0/756)
2020-09-01 09:17:20,709 : INFO : gram3-comparative: 0.0% (0/1332)
2020-09-01 09:17:27,866 : INFO : gram4-superlative: 0.0% (0/1056)
2020-09-01 09:17:34,733 : INFO : gram5-present-participle: 0.0% (0/992)
2020-09-01 09:17:44,678 : INFO : gram6-nationality-adjective: 0.0% (0/1445)
2020-09-01 09:17:55,340 : INFO : gram7-past-tense: 0.0% (0/1560)
2020-09-01 09:18:03,501 : INFO : gram8-plural: 0.0% (0/1190)
2020-09-01 09:18:09,416 : INFO : gram9-

Doc2Vec(dbow,d100,n5,mc2,t12): 0.00% correct (0 of 13617)


2020-09-01 09:18:09,694 : INFO : Evaluating word analogies for top 300000 words in the model on questions-words.txt
2020-09-01 09:18:12,480 : INFO : capital-common-countries: 4.0% (17/420)
2020-09-01 09:18:18,529 : INFO : capital-world: 0.6% (5/902)
2020-09-01 09:18:19,122 : INFO : currency: 0.0% (0/86)
2020-09-01 09:18:29,431 : INFO : city-in-state: 0.1% (2/1510)
2020-09-01 09:18:33,015 : INFO : family: 37.9% (192/506)
2020-09-01 09:18:39,772 : INFO : gram1-adjective-to-adverb: 4.2% (42/992)
2020-09-01 09:18:44,911 : INFO : gram2-opposite: 6.1% (46/756)
2020-09-01 09:18:53,693 : INFO : gram3-comparative: 51.7% (688/1332)
2020-09-01 09:19:01,146 : INFO : gram4-superlative: 23.1% (244/1056)
2020-09-01 09:19:08,105 : INFO : gram5-present-participle: 22.7% (225/992)
2020-09-01 09:19:18,042 : INFO : gram6-nationality-adjective: 2.9% (42/1445)
2020-09-01 09:19:29,181 : INFO : gram7-past-tense: 26.9% (419/1560)
2020-09-01 09:19:37,532 : INFO : gram8-plural: 18.7% (223/1190)
2020-09-01 09:19:

Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12): 18.60% correct (2533 of 13617)


2020-09-01 09:19:44,371 : INFO : Evaluating word analogies for top 300000 words in the model on questions-words.txt
2020-09-01 09:19:47,249 : INFO : capital-common-countries: 2.6% (11/420)
2020-09-01 09:19:53,471 : INFO : capital-world: 0.7% (6/902)
2020-09-01 09:19:54,059 : INFO : currency: 0.0% (0/86)
2020-09-01 09:20:04,256 : INFO : city-in-state: 0.1% (2/1510)
2020-09-01 09:20:07,838 : INFO : family: 36.4% (184/506)
2020-09-01 09:20:14,737 : INFO : gram1-adjective-to-adverb: 6.6% (65/992)
2020-09-01 09:20:19,853 : INFO : gram2-opposite: 5.4% (41/756)
2020-09-01 09:20:29,157 : INFO : gram3-comparative: 40.9% (545/1332)
2020-09-01 09:20:36,323 : INFO : gram4-superlative: 28.7% (303/1056)
2020-09-01 09:20:43,241 : INFO : gram5-present-participle: 35.1% (348/992)
2020-09-01 09:20:52,811 : INFO : gram6-nationality-adjective: 1.9% (27/1445)
2020-09-01 09:21:03,640 : INFO : gram7-past-tense: 24.9% (389/1560)
2020-09-01 09:21:12,166 : INFO : gram8-plural: 9.8% (117/1190)
2020-09-01 09:21:1

Doc2Vec(dm/c,d100,n5,w5,mc2,t12): 17.98% correct (2449 of 13617)


Even though this is a tiny, domain-specific dataset, it shows some meager
capability on the general word analogies – at least for the DM/mean and
DM/concat models which actually train word vectors. (The untrained
random-initialized words of the DBOW model of course fail miserably.)




In [31]:
str(simple_models[1])

'Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t12)'

In [33]:
simple_models[1].save('tmp/gensim_models/alpha05_dmm_d100_n5_w10_mc2.bin')

2020-09-01 10:52:11,347 : INFO : saving Doc2Vec object under tmp/gensim_models/alpha05_dmm_d100_n5_w10_mc2.bin, separately None
2020-09-01 10:52:11,348 : INFO : storing np array 'syn1neg' to tmp/gensim_models/alpha05_dmm_d100_n5_w10_mc2.bin.trainables.syn1neg.npy
2020-09-01 10:52:11,393 : INFO : storing np array 'vectors' to tmp/gensim_models/alpha05_dmm_d100_n5_w10_mc2.bin.wv.vectors.npy
2020-09-01 10:52:11,436 : INFO : storing np array 'vectors_norm' to tmp/gensim_models/alpha05_dmm_d100_n5_w10_mc2.bin.wv.vectors_norm.npy
2020-09-01 10:52:12,326 : INFO : saved tmp/gensim_models/alpha05_dmm_d100_n5_w10_mc2.bin
