Skip to content

Commit

Permalink
2vec saveload fixes (#11)
Browse files Browse the repository at this point in the history
* Make docs clearer on `alpha` parameter in LDA model

* Update Hoffman paper link

* rm whitespace

* Update gensim/models/ldamodel.py

* Update gensim/models/ldamodel.py

* Update gensim/models/ldamodel.py

* re-applying changes from piskvorky#2821

* migrating + regenerating changed docs

* fix forgotten iteritems

* remove extra `model.wv`

* split overlong doc line

* get rid of six in doc2vec

* increase test timeout for Visdom server

* add 32/64 bits report

* add deprecations for init_sims()

* remove vectors_norm + add link to migration guide to deprecation warnings

* rename vectors_norm everywhere, update tests, regen docs

* put back no-op property setter of deprecated vectors_norm

* fix typo

* fix flake8

* disable Keras tests
- failing with weird errors on py3.7+3.8, see https://travis-ci.org/github/RaRe-Technologies/gensim/jobs/713448950#L862

* test showing FT failure as W2V

* set .vectors even when ngrams off

* Update gensim/test/test_fasttext.py

* Update gensim/test/test_fasttext.py

* refresh docs for run_annoy tutorial

* Reduce memory use of the term similarity matrix constructor, deprecate the positive_definite parameter, and extend normalization capabilities of the inner_product method (piskvorky#2783)

* Deprecate SparseTermSimilarityMatrix's positive_definite parameter

* Reference paper on efficient implementation of soft cosine similarity

* Add example with Annoy indexer to SparseTermSimilarityMatrix

* Add example of obtaining word embeddings from SparseTermSimilarityMatrix

* Reduce space complexity of SparseTermSimilarityMatrix construction
Build matrix using arrays and bitfields rather than DOK sparse format

This work is based on the following blog post by @maciejkula:
https://maciejkula.github.io/2015/02/22/incremental-construction-of-sparse-matrices/

* Fix a typo in the soft cosine similarity Jupyter notebook

* Add human-readable string representation for TermSimilarityIndex

* Avoid sparse term similarity matrix computation when nonzero_limit <= 0

* Extend normalization in the inner_product method

Support the `maintain` vector normalization scheme.
Support separate vector normalization schemes for queries and documents.

* Remove a note in the docstring of SparseTermSimilarityMatrix

* Rerun continuous integration tests

* Use ==/!= to compare constant literals

* Add human-readable string representation for TermSimilarityIndex (cont.)

* Prod flake8 with a coding style violation in a docstring

* Collapse two lambdas into one internal function

* Revert "Prod flake8 with a coding style violation in a docstring"

This reverts commit 6557b84.

* Avoid str.format()

* Slice SparseTermSimilarityMatrix.inner_product tests by input types

* Remove similarity_type_code local variable

* Remove starting underscore from local function name

* Save indentation level and define populate_buffers function

* Extract SparseTermSimilarityMatrix constructor body to _create_source

* Extract NON_NEGATIVE_NORM_ASSERTION_MESSAGE to a module-level constant

* Extract cell assignment logic to cell_full local function

* Split variable swapping into three separate statements

* Extract normalization from the body of SparseTermSimilarityMatrix.inner_product

* Wrap overlong line

* Add test_inner_product_zerovector_zerovector and test_inner_product_zerovector_vector tests

* Further split test_inner_product into 63 test cases

* Raise ValueError when dictionary is empty

* Fix doc2vec crash for large sets of doc-vectors (piskvorky#2907)

* Fix AttributeError in WikiCorpus (piskvorky#2901)

* bug fix: wikicorpus getstream from data file-path \n Replace fname with input

* refactor: use property decorator for input

Co-authored-by: jshah02 <jenisnehal.shah@factset.com>

* intensify cbow+hs tests; bulk testing method

* use increment operator

Co-authored-by: Radim Řehůřek <me@radimrehurek.com>

* Change num_words to topn in dtm_coherence (piskvorky#2926)

* docstirng fixes

* get rid of python2 constructs

Co-authored-by: S Mono <10430241+xh2@users.noreply.github.com>
Co-authored-by: Gordon Mohr <gojogit@gmail.com>
Co-authored-by: Vít Novotný <witiko@mail.muni.cz>
Co-authored-by: jeni Shah <jenishah@users.noreply.github.com>
Co-authored-by: jshah02 <jenisnehal.shah@factset.com>
Co-authored-by: Megan <megan.stodel@bbc.co.uk>
  • Loading branch information
7 people committed Sep 8, 2020
1 parent 0316084 commit b5794ee
Show file tree
Hide file tree
Showing 65 changed files with 1,825 additions and 1,105 deletions.
1 change: 1 addition & 0 deletions ISSUE_TEMPLATE.md
Expand Up @@ -22,6 +22,7 @@ Please provide the output of:
```python
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import struct; print("Bits", 8 * struct.calcsize("P"))
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import gensim; print("gensim", gensim.__version__)
Expand Down
4 changes: 2 additions & 2 deletions docs/notebooks/soft_cosine_tutorial.ipynb
Expand Up @@ -225,7 +225,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Number of documents: 3\n",
"Number of documents: 2274338\n",
"CPU times: user 2min 1s, sys: 1.9 s, total: 2min 3s\n",
"Wall time: 2min 56s\n"
]
Expand Down Expand Up @@ -259,7 +259,7 @@
" [preprocess(relcomment[\"RelCText\"]) for relcomment in thread[\"RelComments\"]])\n",
" for thread in api.load(\"semeval-2016-2017-task3-subtaskA-unannotated\")]))\n",
"\n",
"print(\"Number of documents: %d\" % len(documents))"
"print(\"Number of documents: %d\" % len(corpus))"
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions docs/src/_matutils.rst
@@ -1,8 +1,8 @@
:mod:`_matutils` -- Cython matutils
===================================
:mod:`_matutils` -- Compiled extension for math utils
=====================================================

.. automodule:: gensim._matutils
:synopsis: Cython math utils
:synopsis: Compiled extension for math utils
:members:
:inherited-members:
:undoc-members:
Expand Down
5 changes: 3 additions & 2 deletions docs/src/apiref.rst
Expand Up @@ -50,6 +50,7 @@ Modules:
models/_fasttext_bin
models/phrases
models/poincare
viz/poincare
models/coherencemodel
models/basemodel
models/callbacks
Expand All @@ -63,7 +64,8 @@ Modules:
models/wrappers/varembed
similarities/docsim
similarities/termsim
similarities/index
similarities/annoy
similarities/nmslib
sklearn_api/atmodel
sklearn_api/d2vmodel
sklearn_api/hdp
Expand Down Expand Up @@ -102,4 +104,3 @@ Modules:
summarization/summariser
summarization/syntactic_unit
summarization/textcleaner
viz/poincare
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/src/auto_examples/core/run_similarity_queries.py.md5
@@ -1 +1 @@
a3eaf7347874a32d1d25a455753206dc
54804120deb345715247f0eed42b5e0e
28 changes: 14 additions & 14 deletions docs/src/auto_examples/core/run_similarity_queries.rst
Expand Up @@ -142,7 +142,7 @@ no random-walk static ranks, just a semantic extension over the boolean keyword

.. code-block:: none
[(0, 0.4618210045327158), (1, 0.07002766527900064)]
[(0, 0.46182100453271613), (1, 0.07002766527900031)]
Expand Down Expand Up @@ -254,15 +254,15 @@ order, and obtain the final answer to the query `"Human computer interaction"`:

.. code-block:: none
(2, 0.9984453) Human machine interface for lab abc computer applications
(0, 0.998093) A survey of user opinion of computer system response time
(3, 0.9865886) The EPS user interface management system
(1, 0.93748635) System and human system engineering testing of EPS
(4, 0.90755945) Relation of user perceived response time to error measurement
(8, 0.050041765) The generation of random binary unordered trees
(7, -0.09879464) The intersection graph of paths in trees
(6, -0.10639259) Graph minors IV Widths of trees and well quasi ordering
(5, -0.12416792) Graph minors A survey
0.9984453 The EPS user interface management system
0.998093 Human machine interface for lab abc computer applications
0.9865886 System and human system engineering testing of EPS
0.93748635 A survey of user opinion of computer system response time
0.90755945 Relation of user perceived response time to error measurement
0.050041765 Graph minors A survey
-0.09879464 Graph minors IV Widths of trees and well quasi ordering
-0.10639259 The intersection graph of paths in trees
-0.12416792 The generation of random binary unordered trees
Expand Down Expand Up @@ -319,17 +319,17 @@ on large datasets easily, and to facilitate prototyping of new algorithms for re

.. code-block:: none
/Volumes/work/workspace/gensim_misha/docs/src/gallery/core/run_similarity_queries.py:194: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
plt.show()
/Volumes/work/workspace/vew/gensim3.6/lib/python3.6/site-packages/matplotlib/figure.py:445: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
% get_backend())
.. rst-class:: sphx-glr-timing

**Total running time of the script:** ( 0 minutes 0.663 seconds)
**Total running time of the script:** ( 0 minutes 1.211 seconds)

**Estimated memory usage:** 6 MB
**Estimated memory usage:** 39 MB


.. _sphx_glr_download_auto_examples_core_run_similarity_queries.py:
Expand Down
6 changes: 3 additions & 3 deletions docs/src/auto_examples/core/sg_execution_times.rst
Expand Up @@ -5,9 +5,9 @@

Computation times
=================
**00:00.844** total execution time for **auto_examples_core** files:
**00:01.211** total execution time for **auto_examples_core** files:

- **00:00.844**: :ref:`sphx_glr_auto_examples_core_run_topics_and_transformations.py` (``run_topics_and_transformations.py``)
- **00:01.211**: :ref:`sphx_glr_auto_examples_core_run_similarity_queries.py` (``run_similarity_queries.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_core_concepts.py` (``run_core_concepts.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py` (``run_corpora_and_vector_spaces.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_similarity_queries.py` (``run_similarity_queries.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_topics_and_transformations.py` (``run_topics_and_transformations.py``)

0 comments on commit b5794ee

Please sign in to comment.