Credit to https://radimrehurek.com/gensim/auto_examples/tutorials/run_fasttext.html#sphx-glr-download-auto-examples-tutorials-run-fasttext-py -- Gensim fastText model and use on Lee Corpus. FastText made by FB research

In [83]:
import operator
import numpy as np

In [1]:
%matplotlib inline

In [2]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

Here, we'll learn to work with fastText library for training word-embedding
models, saving & loading them and performing similarity operations & vector
lookups analogous to Word2Vec.



When to use FastText?
---------------------

The main principle behind `fastText <https://github.com/facebookresearch/fastText>`_ 
is that it treats each word as the aggregation of its subwords. Subwords are character n-grams of the word. The vector for a word is simply taken to be the sum of all vectors of its component char-ngrams.


According to a detailed comparison of Word2Vec and FastText in `this notebook <https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Word2Vec_FastText_Comparison.ipynb>`__, fastText does significantly better on syntactic tasks as compared to the original Word2Vec, especially when the size of the training corpus is small. Word2Vec slightly outperforms FastText on semantic tasks though. The differences grow smaller as the size of training corpus increases.


Training time for fastText is significantly higher than the Gensim version of Word2Vec (\ ``15min 42s`` vs ``6min 42s`` on text8, 17 mil tokens, 5 epochs, and a vector size of 100).


fastText can be used to obtain vectors for out-of-vocabulary (OOV) words, by summing up vectors for its component char-ngrams, provided at least one of the char-ngrams was present in the training data.




## Load Resume Data

In [3]:
import pandas as pd

# resumes only for now
# gives df with list of strings (tokenized) as well as lemmatized list of strings
local_resume_cleaned_lemmatized_tokenized_path = '/Users/richardkuzma/coding/NLP_projects/job_recommender_project/data/cleaned_lemmatized_tokenized_resume_dataset_maitrip.csv'

resumes = pd.read_csv(local_resume_cleaned_lemmatized_tokenized_path)
resumes.head()
resumes_sentences = resumes['lemmatized_resume'].tolist()

Unnamed: 0,ID,Category,dirty_resume,resume,tokenized_resume,lemmatized_resume
0,1,HR,"b'John H. Smith, P.H.R.\n800-991-5187 | PO Box...",john h smith phr po box callahan fl infog...,"['john', 'h', 'smith', 'phr', 'po', 'box', 'ca...","['john', 'h', 'smith', 'phr', 'po', 'box', 'ca..."
1,2,HR,b'Name Surname\nAddress\nMobile No/Email\nPERS...,name surname address mobile noemail personal p...,"['name', 'surname', 'address', 'mobile', 'noem...","['name', 'surname', 'address', 'mobile', 'noem..."
2,3,HR,b'Anthony Brown\nHR Assistant\nAREAS OF EXPERT...,anthony brown hr assistant areas expertise per...,"['anthony', 'brown', 'hr', 'assistant', 'areas...","['anthony', 'brown', 'hr', 'assistant', 'area'..."
3,4,HR,b'www.downloadmela.com\nSatheesh\nEMAIL ID:\nC...,satheesh email id career objective pursue gro...,"['satheesh', 'email', 'id', 'career', 'objecti...","['satheesh', 'email', 'id', 'career', 'objecti..."
4,5,HR,"b""HUMAN RESOURCES DIRECTOR\n\xef\x82\xb7Expert...",human resources director expert organizational...,"['human', 'resources', 'director', 'expert', '...","['human', 'resource', 'director', 'expert', 'o..."


Training models
---------------




For the following examples, we'll use the Lee Corpus (which you already have if you've installed gensim) for training our model.






In [5]:

#this may be annoying later...
from pprint import pprint



from gensim.models.fasttext import FastText as FT_gensim
from gensim.test.utils import datapath

# Set file names for train and test data
#corpus_file = datapath('lee_background.cor')



2020-04-16 13:44:26,423 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 13:44:26,425 : INFO : built Dictionary(12 unique tokens: ['computer', 'human', 'interface', 'response', 'survey']...) from 9 documents (total 29 corpus positions)


In [6]:
model = FT_gensim(size=100)

# build the vocabulary
model.build_vocab(sentences = resumes_sentences)

# train the model
model.train(
    sentences=resumes_sentences, epochs=model.epochs,
    total_examples=model.corpus_count, total_words=model.corpus_total_words
)

print(model)

2020-04-16 13:44:37,433 : INFO : resetting layer weights
2020-04-16 13:45:04,598 : INFO : collecting all words and their counts
2020-04-16 13:45:04,711 : INFO : PROGRESS: at sentence #0, processed 0 words, keeping 0 word types
2020-04-16 13:45:06,740 : INFO : collected 31 word types from a corpus of 8591130 raw words and 1219 sentences
2020-04-16 13:45:06,746 : INFO : Loading a fresh vocabulary
2020-04-16 13:45:06,751 : INFO : effective_min_count=5 retains 31 unique words (100% of original 31, drops 0)
2020-04-16 13:45:06,767 : INFO : effective_min_count=5 leaves 8591130 word corpus (100% of original 8591130, drops 0)
2020-04-16 13:45:06,781 : INFO : deleting the raw counts dictionary of 31 items
2020-04-16 13:45:06,789 : INFO : sample=0.001 downsamples 26 most-common words
2020-04-16 13:45:06,796 : INFO : downsampling leaves estimated 1512884 word corpus (17.6% of prior 8591130)
2020-04-16 13:45:06,801 : INFO : estimated required memory for 31 words, 31 buckets and 100 dimensions: 544

FastText(vocab=31, size=100, alpha=0.025)


Training hyperparameters
^^^^^^^^^^^^^^^^^^^^^^^^




Hyperparameters for training the model follow the same pattern as Word2Vec. FastText supports the following parameters from the original word2vec:

- model: Training architecture. Allowed values: `cbow`, `skipgram` (Default `cbow`)
- size: Size of embeddings to be learnt (Default 100)
- alpha: Initial learning rate (Default 0.025)
- window: Context window size (Default 5)
- min_count: Ignore words with number of occurrences below this (Default 5)
- loss: Training objective. Allowed values: `ns`, `hs`, `softmax` (Default `ns`)
- sample: Threshold for downsampling higher-frequency words (Default 0.001)
- negative: Number of negative words to sample, for `ns` (Default 5)
- iter: Number of epochs (Default 5)
- sorted_vocab: Sort vocab by descending frequency (Default 1)
- threads: Number of threads to use (Default 12)


In addition, FastText has three additional parameters:

- min_n: min length of char ngrams (Default 3)
- max_n: max length of char ngrams (Default 6)
- bucket: number of buckets used for hashing ngrams (Default 2000000)


Parameters ``min_n`` and ``max_n`` control the lengths of character ngrams that each word is broken down into while training and looking up embeddings. If ``max_n`` is set to 0, or to be lesser than ``min_n``\ , no character ngrams are used, and the model effectively reduces to Word2Vec.



To bound the memory requirements of the model being trained, a hashing function is used that maps ngrams to integers in 1 to K. For hashing these character sequences, the `Fowler-Noll-Vo hashing function <http://www.isthe.com/chongo/tech/comp/fnv>`_ (FNV-1a variant) is employed.




**Note:** As in the case of Word2Vec, you can continue to train your model while using Gensim's native implementation of fastText.




## Now we have a fastText model trained on Resumes
<br>
Now we need to take individual job postings to compare

In [27]:
# load jobs into df

local_jobs_cleaned_lemmatized_tokenized_path = '/Users/richardkuzma/coding/NLP_projects/job_recommender_project/data/cleaned_lemmatized_tokenized_jobs_dataset_madhab.csv'

jobs = pd.read_csv(local_jobs_cleaned_lemmatized_tokenized_path)

In [28]:
jobs.head()

Unnamed: 0,Title,Company,JobDescription,RequiredQual,JobRequirement,label,combined,dirty_combined,tokenized_combined,lemmatized_combined
0,Chief Financial Officer,AMERIA Investment Consulting Company,AMERIA Investment Consulting Company is seekin...,"To perform this job successfully, an\r\nindivi...",- Supervises financial management and administ...,1,chief financial officer ameria investment cons...,Chief Financial Officer AMERIA Investment Cons...,"['chief', 'financial', 'officer', 'ameria', 'i...","['chief', 'financial', 'officer', 'ameria', 'i..."
1,Country Coordinator,Caucasus Environmental NGO Network (CENN),Public outreach and strengthening of a growing...,"- Degree in environmentally related field, or ...",- Working with the Country Director to provide...,2,country coordinator public outreach strengthen...,Country Coordinator Public outreach and streng...,"['country', 'coordinator', 'public', 'outreach...","['country', 'coordinator', 'public', 'outreach..."
2,BCC Specialist,Manoff Group,The LEAD (Local Enhancement and Development fo...,"- Advanced degree in public health, social sci...",- Identify gaps in knowledge and overseeing in...,3,bcc specialist lead local enhancement developm...,BCC Specialist The LEAD (Local Enhancement and...,"['bcc', 'specialist', 'lead', 'local', 'enhanc...","['bcc', 'specialist', 'lead', 'local', 'enhanc..."
3,"Community Development, Capacity Building and C...",Food Security Regional Cooperation and Stabili...,Food Security Regional Cooperation and Stabili...,- Higher Education and/or professional experie...,- Assist the Tavush Marz communities and commu...,4,community development capacity building confli...,"Community Development, Capacity Building and C...","['community', 'development', 'capacity', 'buil...","['community', 'development', 'capacity', 'buil..."
4,Country Economist (NOB),"United Nations Development Programme, Armenia",The United Nations Development Programme in Ar...,- Minimum Masters Degree in Economics;\r\n- Mi...,The incumbent under direct supervision of UNDP...,5,country economist nob united nations developme...,Country Economist (NOB) The United Nations Dev...,"['country', 'economist', 'nob', 'united', 'nat...","['country', 'economist', 'nob', 'united', 'nat..."


In [29]:
jobs_list = jobs['lemmatized_combined'].tolist()

In [30]:
jobs.shape[0]

13124

In [84]:
def pick_job():
    print("There are {} jobs".format(jobs.shape[0]))
    
    # Select a random int from 0 to length of rjob set
    rand_int = np.random.randint(1, jobs.shape[0]+1)
    
    
    
    # selection = 2
    selection = rand_int
    
    
    
    print ('\nselected job is ID #{}'.format(selection))
    
    # pick the job text and ID associated with the random int
    job_label = jobs.iloc[selection - 1, jobs.columns.get_loc('label')] #we could grab ID, but this works for non-indexed labels too
    job_title = jobs.iloc[selection - 1 ]['Title']
    job_company = jobs.iloc[selection - 1 ]['Company']
    job_description = jobs.iloc[selection - 1 ]['JobDescription']
    
    
    print('Job Posting ID is: {}'.format(job_label))
    print('Job Posting Title: {}'.format(job_title))
    print('Job Posting Company: {}'.format(job_company))
    print('Job Posting Description: {}'.format(job_description))
    
    #Convert the sample document into a list and use the infer_vector method to get a vector representation for it
    job_text_to_process = jobs['lemmatized_combined'][selection - 1]
    
    return job_text_to_process


In [96]:
def given_job_find_similar_resumes(job_you_pick):
  
    #find all distances between chosen job and each resume
    temp_distance = []
    min_dist = float("inf")
    min_index = float("inf")


    # for i in range len(resume_sentences): 
    for i in range (0, len(resumes_sentences)):
        dist = model.wmdistance(job_you_pick, resumes_sentences[i])
        if dist < min_dist:
            min_dist = dist
            min_index = i
        temp_distance.append((dist, i))

    #sort list of tuples
    temp_distance.sort(key = operator.itemgetter(0))
    temp_distance

    num_similar = 10 #or 10, 20, 25
    print('\nPrinting {} most similar candidates...\n'.format(num_similar))
    for i in range(0,num_similar):
        print('\n#{} most similar job'.format(i+1))
        print('Resume ID from list: {}'.format(temp_distance[i][1]))
        print('WM Distance: {}'.format(temp_distance[i][0]))
        print('Resume ID from df: {}'.format(resumes.iloc[temp_distance[i][1]]['ID']))
        print('Resume text (500 chars): {}'.format(resumes.iloc[temp_distance[i][1]]['resume'][0:500]))


In [100]:
chosen_job = pick_job()

There are 13124 jobs

selected job is ID #3187
Job Posting ID is: 3187
Job Posting Title: PR Manager
Job Posting Company: Megafood LLC
Job Posting Description: Megafood LLC is seeking a candidate for the position
of PR Manager who will coordinate the company's market research,
marketing stratagy, product development and public relations activities.
The PR Manager will initiate, develop and manage PR programs to support
and enhance the overall marketing, sales and business goals of the
company.


In [101]:
given_job_find_similar_resumes(chosen_job)

  # This is added back by InteractiveShellApp.init_path()
2020-04-16 14:39:59,608 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:39:59,614 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 7820 corpus positions)
2020-04-16 14:39:59,634 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:39:59,637 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2855 corpus positions)
2020-04-16 14:39:59,659 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:39:59,663 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8309 corpus positions)
2020-04-16 14:39:59,694 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:39:59,705 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 11208 corpus positions)
2020-04-16 14

2020-04-16 14:40:01,059 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 60456 corpus positions)
2020-04-16 14:40:01,098 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:01,100 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5285 corpus positions)
2020-04-16 14:40:01,134 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:01,138 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 11989 corpus positions)
2020-04-16 14:40:01,165 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:01,168 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3374 corpus positions)
2020-04-16 14:40:01,193 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:01,196 : INFO : built Dictionary(31 unique tokens: ['

2020-04-16 14:40:02,603 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:02,610 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4899 corpus positions)
2020-04-16 14:40:02,644 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:02,650 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4389 corpus positions)
2020-04-16 14:40:02,727 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:02,729 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2981 corpus positions)
2020-04-16 14:40:02,767 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:02,776 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 14692 corpus positions)
2020-04-16 14:40:02,838 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:04,379 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 14060 corpus positions)
2020-04-16 14:40:04,430 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:04,433 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4300 corpus positions)
2020-04-16 14:40:04,461 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:04,486 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4800 corpus positions)
2020-04-16 14:40:04,530 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:04,534 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 10546 corpus positions)
2020-04-16 14:40:04,571 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:04,573 : INFO : built Dictionary(31 unique tokens: ['

2020-04-16 14:40:05,716 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:05,719 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8922 corpus positions)
2020-04-16 14:40:05,752 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:05,756 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 9691 corpus positions)
2020-04-16 14:40:05,787 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:05,790 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3721 corpus positions)
2020-04-16 14:40:05,826 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:05,838 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 27199 corpus positions)
2020-04-16 14:40:05,881 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:06,976 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3363 corpus positions)
2020-04-16 14:40:07,006 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:07,009 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4745 corpus positions)
2020-04-16 14:40:07,033 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:07,036 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4518 corpus positions)
2020-04-16 14:40:07,073 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:07,075 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3815 corpus positions)
2020-04-16 14:40:07,101 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:07,106 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:08,466 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:08,470 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3450 corpus positions)
2020-04-16 14:40:08,527 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:08,533 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8368 corpus positions)
2020-04-16 14:40:08,574 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:08,577 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5013 corpus positions)
2020-04-16 14:40:08,614 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:08,617 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 9259 corpus positions)
2020-04-16 14:40:08,650 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:09,826 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3132 corpus positions)
2020-04-16 14:40:09,858 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:09,864 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 11388 corpus positions)
2020-04-16 14:40:09,898 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:09,900 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3684 corpus positions)
2020-04-16 14:40:09,939 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:09,942 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 7502 corpus positions)
2020-04-16 14:40:09,973 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:09,975 : INFO : built Dictionary(31 unique tokens: [' 

2020-04-16 14:40:11,110 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:11,115 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6648 corpus positions)
2020-04-16 14:40:11,138 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:11,141 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4922 corpus positions)
2020-04-16 14:40:11,171 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:11,173 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3264 corpus positions)
2020-04-16 14:40:11,195 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:11,197 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2982 corpus positions)
2020-04-16 14:40:11,237 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:12,406 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6343 corpus positions)
2020-04-16 14:40:12,441 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:12,445 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4743 corpus positions)
2020-04-16 14:40:12,487 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:12,489 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3369 corpus positions)
2020-04-16 14:40:12,514 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:12,519 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 10076 corpus positions)
2020-04-16 14:40:12,545 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:12,548 : INFO : built Dictionary(31 unique tokens: [' 

2020-04-16 14:40:13,845 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:13,858 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 29402 corpus positions)
2020-04-16 14:40:13,913 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:13,919 : INFO : built Dictionary(29 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3998 corpus positions)
2020-04-16 14:40:13,956 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:13,987 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 32352 corpus positions)
2020-04-16 14:40:14,036 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:14,039 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4548 corpus positions)
2020-04-16 14:40:14,064 : INFO : adding document #0 to Dictionary(0 un

2020-04-16 14:40:15,155 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 7727 corpus positions)
2020-04-16 14:40:15,181 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:15,184 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6646 corpus positions)
2020-04-16 14:40:15,204 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:15,208 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3125 corpus positions)
2020-04-16 14:40:15,237 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:15,240 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6579 corpus positions)
2020-04-16 14:40:15,265 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:15,268 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:16,436 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:16,441 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 16506 corpus positions)
2020-04-16 14:40:16,490 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:16,495 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5919 corpus positions)
2020-04-16 14:40:16,523 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:16,527 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5919 corpus positions)
2020-04-16 14:40:16,554 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:16,556 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4762 corpus positions)
2020-04-16 14:40:16,586 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:17,689 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4701 corpus positions)
2020-04-16 14:40:17,717 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:17,719 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3790 corpus positions)
2020-04-16 14:40:17,742 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:17,746 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3315 corpus positions)
2020-04-16 14:40:17,771 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:17,774 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3732 corpus positions)
2020-04-16 14:40:17,799 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:17,805 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:19,597 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:19,600 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4524 corpus positions)
2020-04-16 14:40:19,643 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:19,649 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6627 corpus positions)
2020-04-16 14:40:19,708 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:19,718 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 19477 corpus positions)
2020-04-16 14:40:19,767 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:19,774 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3769 corpus positions)
2020-04-16 14:40:19,835 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:21,915 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6850 corpus positions)
2020-04-16 14:40:21,982 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:21,992 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 34231 corpus positions)
2020-04-16 14:40:22,035 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:22,038 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2723 corpus positions)
2020-04-16 14:40:22,085 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:22,087 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3801 corpus positions)
2020-04-16 14:40:22,112 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:22,116 : INFO : built Dictionary(31 unique tokens: [' 

2020-04-16 14:40:24,002 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:24,006 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2983 corpus positions)
2020-04-16 14:40:24,076 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:24,080 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3792 corpus positions)
2020-04-16 14:40:24,176 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:24,223 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2795 corpus positions)
2020-04-16 14:40:24,300 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:24,306 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3953 corpus positions)
2020-04-16 14:40:24,364 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:26,959 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3986 corpus positions)
2020-04-16 14:40:27,006 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:27,010 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2440 corpus positions)
2020-04-16 14:40:27,064 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:27,068 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2714 corpus positions)
2020-04-16 14:40:27,188 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:27,191 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2739 corpus positions)
2020-04-16 14:40:27,224 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:27,227 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:28,870 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:28,901 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8568 corpus positions)
2020-04-16 14:40:28,953 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:29,026 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2664 corpus positions)
2020-04-16 14:40:29,096 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:29,102 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2811 corpus positions)
2020-04-16 14:40:29,143 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:29,157 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4767 corpus positions)
2020-04-16 14:40:29,184 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:31,183 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8786 corpus positions)
2020-04-16 14:40:31,371 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:31,373 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3289 corpus positions)
2020-04-16 14:40:31,409 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:31,458 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5179 corpus positions)
2020-04-16 14:40:31,544 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:31,552 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 9953 corpus positions)
2020-04-16 14:40:31,620 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:31,635 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:33,008 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:33,027 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3190 corpus positions)
2020-04-16 14:40:33,101 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:33,115 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2580 corpus positions)
2020-04-16 14:40:33,161 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:33,166 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2941 corpus positions)
2020-04-16 14:40:33,199 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:33,202 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2066 corpus positions)
2020-04-16 14:40:33,239 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:35,930 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3675 corpus positions)
2020-04-16 14:40:35,981 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:35,999 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4908 corpus positions)
2020-04-16 14:40:36,046 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:36,051 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2636 corpus positions)
2020-04-16 14:40:36,157 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:36,170 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2568 corpus positions)
2020-04-16 14:40:36,222 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:36,236 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:39,849 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:39,854 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6268 corpus positions)
2020-04-16 14:40:39,903 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:39,905 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3470 corpus positions)
2020-04-16 14:40:39,945 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:39,948 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3482 corpus positions)
2020-04-16 14:40:40,060 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:40,080 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2392 corpus positions)
2020-04-16 14:40:40,117 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:41,876 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3251 corpus positions)
2020-04-16 14:40:41,915 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:41,919 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3888 corpus positions)
2020-04-16 14:40:41,944 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:41,950 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4523 corpus positions)
2020-04-16 14:40:41,981 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:41,985 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4642 corpus positions)
2020-04-16 14:40:42,009 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:42,012 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:43,296 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:43,299 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 1743 corpus positions)
2020-04-16 14:40:43,345 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:43,348 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3927 corpus positions)
2020-04-16 14:40:43,378 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:43,381 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3564 corpus positions)
2020-04-16 14:40:43,424 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:43,437 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 22275 corpus positions)
2020-04-16 14:40:43,471 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:44,947 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5962 corpus positions)
2020-04-16 14:40:44,990 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:44,993 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2950 corpus positions)
2020-04-16 14:40:45,032 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:45,033 : INFO : built Dictionary(29 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 1154 corpus positions)
2020-04-16 14:40:45,073 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:45,084 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 39307 corpus positions)
2020-04-16 14:40:45,130 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:45,133 : INFO : built Dictionary(31 unique tokens: [' 

2020-04-16 14:40:46,909 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:46,912 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6242 corpus positions)
2020-04-16 14:40:46,966 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:46,978 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 28588 corpus positions)
2020-04-16 14:40:47,019 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:47,025 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4908 corpus positions)
2020-04-16 14:40:47,072 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:47,078 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 7277 corpus positions)
2020-04-16 14:40:47,143 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:48,621 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 41529 corpus positions)
2020-04-16 14:40:48,674 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:48,677 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6530 corpus positions)
2020-04-16 14:40:48,712 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:48,716 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 9016 corpus positions)
2020-04-16 14:40:48,786 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:48,790 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6559 corpus positions)
2020-04-16 14:40:48,825 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:48,827 : INFO : built Dictionary(31 unique tokens: [' 

2020-04-16 14:40:50,362 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:50,365 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3861 corpus positions)
2020-04-16 14:40:50,411 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:50,414 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5614 corpus positions)
2020-04-16 14:40:50,449 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:50,457 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5435 corpus positions)
2020-04-16 14:40:50,499 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:50,509 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4115 corpus positions)
2020-04-16 14:40:50,541 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:52,243 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 9255 corpus positions)
2020-04-16 14:40:52,282 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:52,286 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4826 corpus positions)
2020-04-16 14:40:52,324 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:52,329 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4245 corpus positions)
2020-04-16 14:40:52,376 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:52,379 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8373 corpus positions)
2020-04-16 14:40:52,411 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:52,415 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:53,682 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:53,685 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3095 corpus positions)
2020-04-16 14:40:53,720 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:53,726 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3894 corpus positions)
2020-04-16 14:40:53,786 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:53,789 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4723 corpus positions)
2020-04-16 14:40:53,864 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:53,873 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 27516 corpus positions)
2020-04-16 14:40:53,905 : INFO : adding document #0 to Dictionary(0 uni

2020-04-16 14:40:55,193 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5572 corpus positions)
2020-04-16 14:40:55,236 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:55,243 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 5626 corpus positions)
2020-04-16 14:40:55,278 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:55,281 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 8698 corpus positions)
2020-04-16 14:40:55,313 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:55,320 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3091 corpus positions)
2020-04-16 14:40:55,344 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:55,349 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:56,789 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:56,791 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2985 corpus positions)
2020-04-16 14:40:56,822 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:56,827 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 2775 corpus positions)
2020-04-16 14:40:56,857 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:56,859 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3495 corpus positions)
2020-04-16 14:40:56,900 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:56,902 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3138 corpus positions)
2020-04-16 14:40:56,935 : INFO : adding document #0 to Dictionary(0 uniq

2020-04-16 14:40:58,204 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 6579 corpus positions)
2020-04-16 14:40:58,244 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:58,269 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 1762 corpus positions)
2020-04-16 14:40:58,324 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:58,326 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3209 corpus positions)
2020-04-16 14:40:58,370 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:58,373 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3422 corpus positions)
2020-04-16 14:40:58,419 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:58,421 : INFO : built Dictionary(31 unique tokens: [' '

2020-04-16 14:40:59,770 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:59,772 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4181 corpus positions)
2020-04-16 14:40:59,802 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:59,804 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 3147 corpus positions)
2020-04-16 14:40:59,844 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:59,857 : INFO : built Dictionary(31 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 36454 corpus positions)
2020-04-16 14:40:59,890 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2020-04-16 14:40:59,894 : INFO : built Dictionary(30 unique tokens: [' ', "'", ',', '[', ']']...) from 2 documents (total 4199 corpus positions)
2020-04-16 14:40:59,932 : INFO : adding document #0 to Dictionary(0 uni


Printing 10 most similar candidates...


#1 most similar job
Resume ID from list: 692
WM Distance: 0.16037376807236195
Resume ID from df: 693
Resume text (500 chars): matt arnzen mattmattarnzencom    portland professional summary marketing professional fifteen years progressive experience digital marketing product management user experience design accomplished creative development website operations digital platform management collaborating effectively stakeholders vendors clients digital marketing product management content management marketing automation website design development search engine optimization web analytics reporting customer relationship mana

#2 most similar job
Resume ID from list: 437
WM Distance: 0.16067475820688432
Resume ID from df: 438
Resume text (500 chars): steve jones business operations manager dayjob ltd big peg birmingham b nf    e infodayjobcom personal statment e infodayjobcom progressive business operations manager particular strength driving performa