Skip to content

Commit

Permalink
Modified doc-topic mapper to use new output type.
Browse files Browse the repository at this point in the history
More efficient and avoids pickling problems with sending doc datatypes between processes.
  • Loading branch information
markgw committed Feb 10, 2020
1 parent ff33602 commit a3d779a
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def process_document(worker, archive_name, doc_name, doc):
# Use the LDA model to infer a topic vector for the document
topic_weights = dict(worker.model[bow])
# The weights are a sparse vector: fill in the relevant values and leave the rest as 0
return worker.info.document(vector=[topic_weights.get(i, 0.) for i in range(worker.model.num_topics)])
return {"vector": [topic_weights.get(i, 0.) for i in range(worker.model.num_topics)]}


ModuleExecutor = multiprocessing_executor_factory(process_document, worker_set_up_fn=worker_set_up)

0 comments on commit a3d779a

Please sign in to comment.