Skip to content

Commit

Permalink
[MRG] Update README instructions + clean up testing (#2814)
Browse files Browse the repository at this point in the history
* update README instructions

* WIP: enable test deps

* unpin old tensorflow in tests
- old versions not present in newer Pythons

* looking into segfault in py3.6
- https://travis-ci.org/github/RaRe-Technologies/gensim/jobs/681096362

* put back pyemd

* put keras back

* put back tensorflow

* investigate segfault in py3.6

* address review comments

* avoid py3.6 segfault in Travis tests
  • Loading branch information
piskvorky committed May 1, 2020
1 parent 996801b commit 29d1092
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 87 deletions.
12 changes: 7 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,27 +25,29 @@ matrix:
- python: '3.7'
env:
- TOXENV="py37-linux"
- BOTO_CONFIG="/dev/null"
# The following two lines used to be necessary because Travis left files lying around in ~/.aws/,
# messing up our tests. Now fixed since https://github.com/travis-ci/travis-ci/issues/7940
# - BOTO_CONFIG="/dev/null"
#sudo: true
dist: xenial
sudo: true

- python: '3.6'
env: TOXENV="py36-linux"


install:
- pip install tox
- sudo apt-get install -y gdb # install gdb
- sudo apt-get install -y gdb


before_script:
- ulimit -c unlimited -S # enable core dumps
- ulimit -c unlimited -S # enable core dumps


script: tox -vv


after_failure:
- pwd
- COREFILE=$(find . -maxdepth 1 -name "core*" | head -n 1) # find core file
- COREFILE=$(find . -maxdepth 1 -name "core*" | head -n 1)
- if [[ -f "$COREFILE" ]]; then EXECFILE=$(gdb -c "$COREFILE" -batch | grep "Core was generated" | tr -d "\`" | cut -d' ' -f5); file "$COREFILE"; gdb -c "$COREFILE" "$EXECFILE" -x continuous_integration/debug.gdb -batch; fi
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,6 @@ If this feature list left you scratching your head, you can first read
more about the [Vector Space Model] and [unsupervised document analysis]
on Wikipedia.

Support
------------

Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).

Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.

Installation
------------

Expand All @@ -69,23 +62,23 @@ NumPy. This is optional, but using an optimized BLAS such as [ATLAS] or
magnitude. On OS X, NumPy picks up the BLAS that comes with it
automatically, so you don’t need to do anything special.

The simple way to install gensim is:
Install the latest version of gensim:

pip install -U gensim
```bash
pip install --upgrade gensim
```

Or, if you have instead downloaded and unzipped the [source tar.gz]
package, you’d run:
package:

python setup.py test
```bash
python setup.py install
```

For alternative modes of installation (without root privileges,
development installation, optional install features), see the
[documentation].
For alternative modes of installation, see the [documentation].

This version has been tested under Python 2.7, 3.5 and 3.6. Gensim’s github repo is hooked
against [Travis CI for automated testing] on every commit push and pull
request. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5).
Gensim is being [continuously tested](https://travis-ci.org/RaRe-Technologies/gensim) under Python 3.6, 3.7 and 3.8.
Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
--------------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -113,14 +106,21 @@ Documentation
[Tutorials]: https://radimrehurek.com/gensim/auto_examples/
[Official Documentation and Walkthrough]: http://radimrehurek.com/gensim/
[Official API Documentation]: http://radimrehurek.com/gensim/apiref.html


Support
-------

Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).

Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.

---------

Adopters
--------

| Company | Logo | Industry | Use of Gensim |
|---------|------|----------|---------------|
|---------|------|----------|---------------|
| [RARE Technologies](http://rare-technologies.com) | ![rare](docs/src/readme_images/rare.png) | ML & NLP consulting | Creators of Gensim – this is us! |
| [Amazon](http://www.amazon.com/) | ![amazon](docs/src/readme_images/amazon.png) | Retail | Document similarity. |
| [National Institutes of Health](https://github.com/NIHOPA/pipeline_word2vec) | ![nih](docs/src/readme_images/nih.png) | Health | Processing grants and publications with word2vec. |
Expand Down Expand Up @@ -169,8 +169,8 @@ BibTeX entry:
[Talentpair]: https://avatars3.githubusercontent.com/u/8418395?v=3&s=100
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC



[documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
[Vector Space Model]: http://en.wikipedia.org/wiki/Vector_space_model
[unsupervised document analysis]: http://en.wikipedia.org/wiki/Latent_semantic_indexing
Expand Down
34 changes: 17 additions & 17 deletions gensim/similarities/nmslib.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
Intro
-----
This module contains integration Nmslib with :class:`~gensim.models.word2vec.Word2Vec`,
This module contains integration NMSLIB with :class:`~gensim.models.word2vec.Word2Vec`,
:class:`~gensim.models.doc2vec.Doc2Vec`, :class:`~gensim.models.fasttext.FastText` and
:class:`~gensim.models.keyedvectors.KeyedVectors`.
To use nmslib, instantiate a :class:`~gensim.similarities.nmslib.NmslibIndexer` class
To use NMSLIB, instantiate a :class:`~gensim.similarities.nmslib.NmslibIndexer` class
and pass the instance as the indexer parameter to your model's most_similar method
(e.g. :py:func:`~gensim.models.doc2vec.most_similar`).
Expand Down Expand Up @@ -50,23 +50,23 @@
>>> model.most_similar("cat", topn=2, indexer=new_indexer)
[('cat', 1.0), ('meow', 0.5595494508743286)]
What is Nmslib
-------------
What is NMSLIB
--------------
Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit
for evaluation of similarity search methods. The core-library does not have any third-party dependencies.
More information about Nmslib: `github repository <https://github.com/nmslib/nmslib>`_.
More information about NMSLIB: `github repository <https://github.com/nmslib/nmslib>`_.
Why use Nmslib?
-------------
Why use NMSIB?
--------------
The current implementation for finding k nearest neighbors in a vector space in gensim has linear complexity
via brute force in the number of indexed documents, although with extremely low constant factors.
The retrieved results are exact, which is an overkill in many applications:
approximate results retrieved in sub-linear time may be enough.
Nmslib can find approximate nearest neighbors much faster.
Compared to annoy, nmslib has more parameters to control the build and query time and accuracy.
Nmslib can achieve faster and more accurate nearest neighbors search than annoy.
NMSLIB can find approximate nearest neighbors much faster.
Compared to Annoy, NMSLIB has more parameters to control the build and query time and accuracy.
NMSLIB can achieve faster and more accurate nearest neighbors search than annoy.
"""

from smart_open import open
Expand All @@ -84,12 +84,12 @@
import nmslib
except ImportError:
raise ImportError(
"Nmslib has not been installed, if you wish to use the nmslib indexer, please run `pip install nmslib`"
"NMSLIB not installed. To use the NMSLIB indexer, please run `pip install nmslib`."
)


class NmslibIndexer(object):
"""This class allows to use `Nmslib <https://github.com/nmslib/nmslib>`_ as indexer for `most_similar` method
"""This class allows to use `NMSLIB <https://github.com/nmslib/nmslib>`_ as indexer for `most_similar` method
from :class:`~gensim.models.word2vec.Word2Vec`, :class:`~gensim.models.doc2vec.Doc2Vec`,
:class:`~gensim.models.fasttext.FastText` and :class:`~gensim.models.keyedvectors.Word2VecKeyedVectors` classes.
Expand All @@ -102,9 +102,9 @@ def __init__(self, model, index_params=None, query_time_params=None):
model : :class:`~gensim.models.base_any2vec.BaseWordEmbeddingsModel`
Model, that will be used as source for index.
index_params : dict, optional
index_params for Nmslib indexer.
index_params for NMSLIB indexer.
query_time_params : dict, optional
query_time_params for Nmslib indexer.
query_time_params for NMSLIB indexer.
"""
if index_params is None:
Expand Down Expand Up @@ -179,21 +179,21 @@ def load(cls, fname):
return nmslib_instance

def _build_from_word2vec(self):
"""Build an Nmslib index using word vectors from a Word2Vec model."""
"""Build an NMSLIB index using word vectors from a Word2Vec model."""

self.model.init_sims()
self._build_from_model(self.model.wv.vectors_norm, self.model.wv.index2word)

def _build_from_doc2vec(self):
"""Build an Nmslib index using document vectors from a Doc2Vec model."""
"""Build an NMSLIB index using document vectors from a Doc2Vec model."""

docvecs = self.model.docvecs
docvecs.init_sims()
labels = [docvecs.index_to_doctag(i) for i in range(0, docvecs.count)]
self._build_from_model(docvecs.vectors_docs_norm, labels)

def _build_from_keyedvectors(self):
"""Build an Nmslib index using word vectors from a KeyedVectors model."""
"""Build an NMSLIB index using word vectors from a KeyedVectors model."""

self.model.init_sims()
self._build_from_model(self.model.vectors_norm, self.model.index2word)
Expand Down
8 changes: 4 additions & 4 deletions gensim/test/test_keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@

import logging
import unittest
from mock import patch

from mock import patch
import numpy as np

from gensim.corpora import Dictionary
from gensim.models.keyedvectors import KeyedVectors, WordEmbeddingSimilarityIndex, \
FastTextKeyedVectors, REAL
from gensim.models.keyedvectors import (
KeyedVectors, WordEmbeddingSimilarityIndex, FastTextKeyedVectors, REAL,
)
from gensim.test.utils import datapath

import gensim.models.keyedvectors

logger = logging.getLogger(__name__)
Expand Down
16 changes: 8 additions & 8 deletions gensim/test/test_similarities.py
Original file line number Diff line number Diff line change
Expand Up @@ -544,8 +544,8 @@ class TestWord2VecAnnoyIndexer(unittest.TestCase):
def setUp(self):
try:
import annoy # noqa:F401
except ImportError:
raise unittest.SkipTest("Annoy library is not available")
except ImportError as e:
raise unittest.SkipTest("Annoy library is not available: %s" % e)

from gensim.similarities.index import AnnoyIndexer
self.indexer = AnnoyIndexer
Expand Down Expand Up @@ -648,8 +648,8 @@ class TestDoc2VecAnnoyIndexer(unittest.TestCase):
def setUp(self):
try:
import annoy # noqa:F401
except ImportError:
raise unittest.SkipTest("Annoy library is not available")
except ImportError as e:
raise unittest.SkipTest("Annoy library is not available: %s" % e)

from gensim.similarities.index import AnnoyIndexer

Expand Down Expand Up @@ -707,8 +707,8 @@ class TestWord2VecNmslibIndexer(unittest.TestCase):
def setUp(self):
try:
import nmslib # noqa:F401
except ImportError:
raise unittest.SkipTest("Nmslib library is not available")
except ImportError as e:
raise unittest.SkipTest("NMSLIB library is not available: %s" % e)

from gensim.similarities.nmslib import NmslibIndexer
self.indexer = NmslibIndexer
Expand Down Expand Up @@ -800,8 +800,8 @@ class TestDoc2VecNmslibIndexer(unittest.TestCase):
def setUp(self):
try:
import nmslib # noqa:F401
except ImportError:
raise unittest.SkipTest("Nmslib library is not available")
except ImportError as e:
raise unittest.SkipTest("NMSLIB library is not available: %s" % e)

from gensim.similarities.nmslib import NmslibIndexer

Expand Down

0 comments on commit 29d1092

Please sign in to comment.