[MRG] Update README instructions + clean up testing (#2814)

* update README instructions * WIP: enable test deps * unpin old tensorflow in tests - old versions not present in newer Pythons * looking into segfault in py3.6 - https://travis-ci.org/github/RaRe-Technologies/gensim/jobs/681096362 * put back pyemd * put keras back * put back tensorflow * investigate segfault in py3.6 * address review comments * avoid py3.6 segfault in Travis tests
piskvorky · May 1, 2020 · 29d1092 · 29d1092
1 parent 996801b
commit 29d1092
Show file tree

Hide file tree

Showing 6 changed files with 92 additions and 87 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -25,27 +25,29 @@ matrix:
     - python: '3.7'
       env:
         - TOXENV="py37-linux"
-        - BOTO_CONFIG="/dev/null"
+      # The following two lines used to be necessary because Travis left files lying around in ~/.aws/,
+      # messing up our tests. Now fixed since https://github.com/travis-ci/travis-ci/issues/7940
+        # - BOTO_CONFIG="/dev/null"
+      #sudo: true
       dist: xenial
-      sudo: true
 
     - python: '3.6'
       env: TOXENV="py36-linux"
 
 
 install:
   - pip install tox
-  - sudo apt-get install -y gdb  # install gdb
+  - sudo apt-get install -y gdb
 
 
 before_script:
-  - ulimit -c unlimited -S       # enable core dumps
+  - ulimit -c unlimited -S  # enable core dumps
 
 
 script: tox -vv
 
 
 after_failure:
   - pwd
-  - COREFILE=$(find . -maxdepth 1 -name "core*" | head -n 1) # find core file
+  - COREFILE=$(find . -maxdepth 1 -name "core*" | head -n 1)
   - if [[ -f "$COREFILE" ]]; then EXECFILE=$(gdb -c "$COREFILE" -batch | grep "Core was generated" | tr -d "\`" | cut -d' ' -f5); file "$COREFILE"; gdb -c "$COREFILE" "$EXECFILE" -x continuous_integration/debug.gdb -batch; fi
diff --git a/README.md b/README.md
@@ -49,13 +49,6 @@ If this feature list left you scratching your head, you can first read
 more about the [Vector Space Model] and [unsupervised document analysis]
 on Wikipedia.
 
-Support
-------------
-
-Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).
-
-Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.
-
 Installation
 ------------
 
@@ -69,23 +62,23 @@ NumPy. This is optional, but using an optimized BLAS such as [ATLAS] or
 magnitude. On OS X, NumPy picks up the BLAS that comes with it
 automatically, so you don’t need to do anything special.
 
-The simple way to install gensim is:
+Install the latest version of gensim:
 
-    pip install -U gensim
+```bash
+    pip install --upgrade gensim
+```
 
 Or, if you have instead downloaded and unzipped the [source tar.gz]
-package, you’d run:
+package:
 
-    python setup.py test
+```bash
     python setup.py install
+```
 
-For alternative modes of installation (without root privileges,
-development installation, optional install features), see the
-[documentation].
+For alternative modes of installation, see the [documentation].
 
-This version has been tested under Python 2.7, 3.5 and 3.6. Gensim’s github repo is hooked
-against [Travis CI for automated testing] on every commit push and pull
-request. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5). 
+Gensim is being [continuously tested](https://travis-ci.org/RaRe-Technologies/gensim) under Python 3.6, 3.7 and 3.8.
+Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.
 
 How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
 --------------------------------------------------------------------------------------------------------
@@ -113,14 +106,21 @@ Documentation
   [Tutorials]: https://radimrehurek.com/gensim/auto_examples/
   [Official Documentation and Walkthrough]: http://radimrehurek.com/gensim/
   [Official API Documentation]: http://radimrehurek.com/gensim/apiref.html
-
+
+Support
+-------
+
+Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).
+
+Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.
+
 ---------
 
 Adopters
 --------
 
 | Company | Logo | Industry | Use of Gensim |
-|---------|------|----------|---------------|                          
+|---------|------|----------|---------------|
 | [RARE Technologies](http://rare-technologies.com) | ![rare](docs/src/readme_images/rare.png) | ML & NLP consulting | Creators of Gensim – this is us! |
 | [Amazon](http://www.amazon.com/) |  ![amazon](docs/src/readme_images/amazon.png) | Retail |  Document similarity. |
 | [National Institutes of Health](https://github.com/NIHOPA/pipeline_word2vec) | ![nih](docs/src/readme_images/nih.png) | Health | Processing grants and publications with word2vec. |
@@ -169,8 +169,8 @@ BibTeX entry:
   [Talentpair]: https://avatars3.githubusercontent.com/u/8418395?v=3&s=100
   [citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC
 
-  
-  
+
+
   [documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
   [Vector Space Model]: http://en.wikipedia.org/wiki/Vector_space_model
   [unsupervised document analysis]: http://en.wikipedia.org/wiki/Latent_semantic_indexing

diff --git a/gensim/similarities/nmslib.py b/gensim/similarities/nmslib.py
@@ -8,10 +8,10 @@
 Intro
 -----
 
-This module contains integration Nmslib with :class:`~gensim.models.word2vec.Word2Vec`,
+This module contains integration NMSLIB with :class:`~gensim.models.word2vec.Word2Vec`,
 :class:`~gensim.models.doc2vec.Doc2Vec`, :class:`~gensim.models.fasttext.FastText` and
 :class:`~gensim.models.keyedvectors.KeyedVectors`.
-To use nmslib, instantiate a :class:`~gensim.similarities.nmslib.NmslibIndexer` class
+To use NMSLIB, instantiate a :class:`~gensim.similarities.nmslib.NmslibIndexer` class
 and pass the instance as the indexer parameter to your model's most_similar method
 (e.g. :py:func:`~gensim.models.doc2vec.most_similar`).
 
@@ -50,23 +50,23 @@
     >>> model.most_similar("cat", topn=2, indexer=new_indexer)
     [('cat', 1.0), ('meow', 0.5595494508743286)]
 
-What is Nmslib
--------------
+What is NMSLIB
+--------------
 
 Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit
 for evaluation of similarity search methods. The core-library does not have any third-party dependencies.
-More information about Nmslib: `github repository <https://github.com/nmslib/nmslib>`_.
+More information about NMSLIB: `github repository <https://github.com/nmslib/nmslib>`_.
 
-Why use Nmslib?
--------------
+Why use NMSIB?
+--------------
 
 The current implementation for finding k nearest neighbors in a vector space in gensim has linear complexity
 via brute force in the number of indexed documents, although with extremely low constant factors.
 The retrieved results are exact, which is an overkill in many applications:
 approximate results retrieved in sub-linear time may be enough.
-Nmslib can find approximate nearest neighbors much faster.
-Compared to annoy, nmslib has more parameters to control the build and query time and accuracy.
-Nmslib can achieve faster and more accurate nearest neighbors search than annoy.
+NMSLIB can find approximate nearest neighbors much faster.
+Compared to Annoy, NMSLIB has more parameters to control the build and query time and accuracy.
+NMSLIB can achieve faster and more accurate nearest neighbors search than annoy.
 """
 
 from smart_open import open
@@ -84,12 +84,12 @@
     import nmslib
 except ImportError:
     raise ImportError(
-        "Nmslib has not been installed, if you wish to use the nmslib indexer, please run `pip install nmslib`"
+        "NMSLIB not installed. To use the NMSLIB indexer, please run `pip install nmslib`."
     )
 
 
 class NmslibIndexer(object):
-    """This class allows to use `Nmslib <https://github.com/nmslib/nmslib>`_ as indexer for `most_similar` method
+    """This class allows to use `NMSLIB <https://github.com/nmslib/nmslib>`_ as indexer for `most_similar` method
     from :class:`~gensim.models.word2vec.Word2Vec`, :class:`~gensim.models.doc2vec.Doc2Vec`,
     :class:`~gensim.models.fasttext.FastText` and :class:`~gensim.models.keyedvectors.Word2VecKeyedVectors` classes.
 
@@ -102,9 +102,9 @@ def __init__(self, model, index_params=None, query_time_params=None):
         model : :class:`~gensim.models.base_any2vec.BaseWordEmbeddingsModel`
             Model, that will be used as source for index.
         index_params : dict, optional
-            index_params for Nmslib indexer.
+            index_params for NMSLIB indexer.
         query_time_params : dict, optional
-            query_time_params for Nmslib indexer.
+            query_time_params for NMSLIB indexer.
 
         """
         if index_params is None:
@@ -179,21 +179,21 @@ def load(cls, fname):
         return nmslib_instance
 
     def _build_from_word2vec(self):
-        """Build an Nmslib index using word vectors from a Word2Vec model."""
+        """Build an NMSLIB index using word vectors from a Word2Vec model."""
 
         self.model.init_sims()
         self._build_from_model(self.model.wv.vectors_norm, self.model.wv.index2word)
 
     def _build_from_doc2vec(self):
-        """Build an Nmslib index using document vectors from a Doc2Vec model."""
+        """Build an NMSLIB index using document vectors from a Doc2Vec model."""
 
         docvecs = self.model.docvecs
         docvecs.init_sims()
         labels = [docvecs.index_to_doctag(i) for i in range(0, docvecs.count)]
         self._build_from_model(docvecs.vectors_docs_norm, labels)
 
     def _build_from_keyedvectors(self):
-        """Build an Nmslib index using word vectors from a KeyedVectors model."""
+        """Build an NMSLIB index using word vectors from a KeyedVectors model."""
 
         self.model.init_sims()
         self._build_from_model(self.model.vectors_norm, self.model.index2word)

diff --git a/gensim/test/test_keyedvectors.py b/gensim/test/test_keyedvectors.py
@@ -11,15 +11,15 @@
 
 import logging
 import unittest
-from mock import patch
 
+from mock import patch
 import numpy as np
 
 from gensim.corpora import Dictionary
-from gensim.models.keyedvectors import KeyedVectors, WordEmbeddingSimilarityIndex, \
-    FastTextKeyedVectors, REAL
+from gensim.models.keyedvectors import (
+    KeyedVectors, WordEmbeddingSimilarityIndex, FastTextKeyedVectors, REAL,
+)
 from gensim.test.utils import datapath
-
 import gensim.models.keyedvectors
 
 logger = logging.getLogger(__name__)

diff --git a/gensim/test/test_similarities.py b/gensim/test/test_similarities.py
@@ -544,8 +544,8 @@ class TestWord2VecAnnoyIndexer(unittest.TestCase):
     def setUp(self):
         try:
             import annoy  # noqa:F401
-        except ImportError:
-            raise unittest.SkipTest("Annoy library is not available")
+        except ImportError as e:
+            raise unittest.SkipTest("Annoy library is not available: %s" % e)
 
         from gensim.similarities.index import AnnoyIndexer
         self.indexer = AnnoyIndexer
@@ -648,8 +648,8 @@ class TestDoc2VecAnnoyIndexer(unittest.TestCase):
     def setUp(self):
         try:
             import annoy  # noqa:F401
-        except ImportError:
-            raise unittest.SkipTest("Annoy library is not available")
+        except ImportError as e:
+            raise unittest.SkipTest("Annoy library is not available: %s" % e)
 
         from gensim.similarities.index import AnnoyIndexer
 
@@ -707,8 +707,8 @@ class TestWord2VecNmslibIndexer(unittest.TestCase):
     def setUp(self):
         try:
             import nmslib  # noqa:F401
-        except ImportError:
-            raise unittest.SkipTest("Nmslib library is not available")
+        except ImportError as e:
+            raise unittest.SkipTest("NMSLIB library is not available: %s" % e)
 
         from gensim.similarities.nmslib import NmslibIndexer
         self.indexer = NmslibIndexer
@@ -800,8 +800,8 @@ class TestDoc2VecNmslibIndexer(unittest.TestCase):
     def setUp(self):
         try:
             import nmslib  # noqa:F401
-        except ImportError:
-            raise unittest.SkipTest("Nmslib library is not available")
+        except ImportError as e:
+            raise unittest.SkipTest("NMSLIB library is not available: %s" % e)
 
         from gensim.similarities.nmslib import NmslibIndexer