Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Update README instructions + clean up testing #2814

Merged
merged 10 commits into from
May 1, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 5 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,27 +25,27 @@ matrix:
- python: '3.7'
env:
- TOXENV="py37-linux"
- BOTO_CONFIG="/dev/null"
- BOTO_CONFIG="/dev/null" # XXX: why is this here?
piskvorky marked this conversation as resolved.
Show resolved Hide resolved
dist: xenial
sudo: true
sudo: true # XXX: why is this here vs. others?
piskvorky marked this conversation as resolved.
Show resolved Hide resolved

- python: '3.6'
env: TOXENV="py36-linux"


install:
- pip install tox
- sudo apt-get install -y gdb # install gdb
- sudo apt-get install -y gdb


before_script:
- ulimit -c unlimited -S # enable core dumps
- ulimit -c unlimited -S # enable core dumps


script: tox -vv


after_failure:
- pwd
- COREFILE=$(find . -maxdepth 1 -name "core*" | head -n 1) # find core file
- COREFILE=$(find . -maxdepth 1 -name "core*" | head -n 1)
- if [[ -f "$COREFILE" ]]; then EXECFILE=$(gdb -c "$COREFILE" -batch | grep "Core was generated" | tr -d "\`" | cut -d' ' -f5); file "$COREFILE"; gdb -c "$COREFILE" "$EXECFILE" -x continuous_integration/debug.gdb -batch; fi
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,6 @@ If this feature list left you scratching your head, you can first read
more about the [Vector Space Model] and [unsupervised document analysis]
on Wikipedia.

Support
piskvorky marked this conversation as resolved.
Show resolved Hide resolved
------------

Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).

Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.

Installation
------------

Expand All @@ -69,23 +62,23 @@ NumPy. This is optional, but using an optimized BLAS such as [ATLAS] or
magnitude. On OS X, NumPy picks up the BLAS that comes with it
automatically, so you don’t need to do anything special.

The simple way to install gensim is:
Install the latest version of gensim:

pip install -U gensim
```bash
pip install --upgrade gensim
```

Or, if you have instead downloaded and unzipped the [source tar.gz]
package, you’d run:
package:

python setup.py test
```bash
python setup.py install
```

For alternative modes of installation (without root privileges,
development installation, optional install features), see the
[documentation].
For alternative modes of installation, see the [documentation].

This version has been tested under Python 2.7, 3.5 and 3.6. Gensim’s github repo is hooked
against [Travis CI for automated testing] on every commit push and pull
request. Support for Python 2.6, 3.3 and 3.4 was dropped in gensim 1.0.0. Install gensim 0.13.4 if you *must* use Python 2.6, 3.3 or 3.4. Support for Python 2.5 was dropped in gensim 0.10.0; install gensim 0.9.1 if you *must* use Python 2.5).
Gensim is being [continuously tested](https://travis-ci.org/RaRe-Technologies/gensim) under Python 3.5, 3.6, 3.7 and 3.8.
piskvorky marked this conversation as resolved.
Show resolved Hide resolved
Support for Python 2.7 was dropped in gensim 4.0.0 – install gensim 3.8.3 if you must use Python 2.7.

How come gensim is so fast and memory efficient? Isn’t it pure Python, and isn’t Python slow and greedy?
--------------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -113,14 +106,21 @@ Documentation
[Tutorials]: https://radimrehurek.com/gensim/auto_examples/
[Official Documentation and Walkthrough]: http://radimrehurek.com/gensim/
[Official API Documentation]: http://radimrehurek.com/gensim/apiref.html


Support
-------

Ask open-ended or research questions on the [Gensim Mailing List](https://groups.google.com/forum/#!forum/gensim).

Raise bugs on [Github](https://github.com/RaRe-Technologies/gensim/blob/develop/CONTRIBUTING.md) but **make sure you follow the [issue template](https://github.com/RaRe-Technologies/gensim/blob/develop/ISSUE_TEMPLATE.md)**. Issues that are not bugs or fail to follow the issue template will be closed without inspection.

---------

Adopters
--------

| Company | Logo | Industry | Use of Gensim |
|---------|------|----------|---------------|
|---------|------|----------|---------------|
| [RARE Technologies](http://rare-technologies.com) | ![rare](docs/src/readme_images/rare.png) | ML & NLP consulting | Creators of Gensim – this is us! |
| [Amazon](http://www.amazon.com/) | ![amazon](docs/src/readme_images/amazon.png) | Retail | Document similarity. |
| [National Institutes of Health](https://github.com/NIHOPA/pipeline_word2vec) | ![nih](docs/src/readme_images/nih.png) | Health | Processing grants and publications with word2vec. |
Expand Down Expand Up @@ -169,8 +169,8 @@ BibTeX entry:
[Talentpair]: https://avatars3.githubusercontent.com/u/8418395?v=3&s=100
[citing gensim in academic papers and theses]: https://scholar.google.cz/citations?view_op=view_citation&hl=en&user=9vG_kV0AAAAJ&citation_for_view=9vG_kV0AAAAJ:u-x6o8ySG0sC



[documentation and Jupyter Notebook tutorials]: https://github.com/RaRe-Technologies/gensim/#documentation
[Vector Space Model]: http://en.wikipedia.org/wiki/Vector_space_model
[unsupervised document analysis]: http://en.wikipedia.org/wiki/Latent_semantic_indexing
Expand Down
34 changes: 17 additions & 17 deletions gensim/similarities/nmslib.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
Intro
-----

This module contains integration Nmslib with :class:`~gensim.models.word2vec.Word2Vec`,
This module contains integration NMSLIB with :class:`~gensim.models.word2vec.Word2Vec`,
:class:`~gensim.models.doc2vec.Doc2Vec`, :class:`~gensim.models.fasttext.FastText` and
:class:`~gensim.models.keyedvectors.KeyedVectors`.
To use nmslib, instantiate a :class:`~gensim.similarities.nmslib.NmslibIndexer` class
To use NMSLIB, instantiate a :class:`~gensim.similarities.nmslib.NmslibIndexer` class
and pass the instance as the indexer parameter to your model's most_similar method
(e.g. :py:func:`~gensim.models.doc2vec.most_similar`).

Expand Down Expand Up @@ -50,23 +50,23 @@
>>> model.most_similar("cat", topn=2, indexer=new_indexer)
[('cat', 1.0), ('meow', 0.5595494508743286)]

What is Nmslib
-------------
What is NMSLIB
--------------

Non-Metric Space Library (NMSLIB) is an efficient cross-platform similarity search library and a toolkit
for evaluation of similarity search methods. The core-library does not have any third-party dependencies.
More information about Nmslib: `github repository <https://github.com/nmslib/nmslib>`_.
More information about NMSLIB: `github repository <https://github.com/nmslib/nmslib>`_.

Why use Nmslib?
-------------
Why use NMSIB?
--------------

The current implementation for finding k nearest neighbors in a vector space in gensim has linear complexity
via brute force in the number of indexed documents, although with extremely low constant factors.
The retrieved results are exact, which is an overkill in many applications:
approximate results retrieved in sub-linear time may be enough.
Nmslib can find approximate nearest neighbors much faster.
Compared to annoy, nmslib has more parameters to control the build and query time and accuracy.
Nmslib can achieve faster and more accurate nearest neighbors search than annoy.
NMSLIB can find approximate nearest neighbors much faster.
Compared to Annoy, NMSLIB has more parameters to control the build and query time and accuracy.
NMSLIB can achieve faster and more accurate nearest neighbors search than annoy.
"""

from smart_open import open
Expand All @@ -84,12 +84,12 @@
import nmslib
except ImportError:
raise ImportError(
"Nmslib has not been installed, if you wish to use the nmslib indexer, please run `pip install nmslib`"
"NMSLIB not installed. To use the NMSLIB indexer, please run `pip install nmslib`."
)


class NmslibIndexer(object):
"""This class allows to use `Nmslib <https://github.com/nmslib/nmslib>`_ as indexer for `most_similar` method
"""This class allows to use `NMSLIB <https://github.com/nmslib/nmslib>`_ as indexer for `most_similar` method
from :class:`~gensim.models.word2vec.Word2Vec`, :class:`~gensim.models.doc2vec.Doc2Vec`,
:class:`~gensim.models.fasttext.FastText` and :class:`~gensim.models.keyedvectors.Word2VecKeyedVectors` classes.

Expand All @@ -102,9 +102,9 @@ def __init__(self, model, index_params=None, query_time_params=None):
model : :class:`~gensim.models.base_any2vec.BaseWordEmbeddingsModel`
Model, that will be used as source for index.
index_params : dict, optional
index_params for Nmslib indexer.
index_params for NMSLIB indexer.
query_time_params : dict, optional
query_time_params for Nmslib indexer.
query_time_params for NMSLIB indexer.

"""
if index_params is None:
Expand Down Expand Up @@ -179,21 +179,21 @@ def load(cls, fname):
return nmslib_instance

def _build_from_word2vec(self):
"""Build an Nmslib index using word vectors from a Word2Vec model."""
"""Build an NMSLIB index using word vectors from a Word2Vec model."""

self.model.init_sims()
self._build_from_model(self.model.wv.vectors_norm, self.model.wv.index2word)

def _build_from_doc2vec(self):
"""Build an Nmslib index using document vectors from a Doc2Vec model."""
"""Build an NMSLIB index using document vectors from a Doc2Vec model."""

docvecs = self.model.docvecs
docvecs.init_sims()
labels = [docvecs.index_to_doctag(i) for i in range(0, docvecs.count)]
self._build_from_model(docvecs.vectors_docs_norm, labels)

def _build_from_keyedvectors(self):
"""Build an Nmslib index using word vectors from a KeyedVectors model."""
"""Build an NMSLIB index using word vectors from a KeyedVectors model."""

self.model.init_sims()
self._build_from_model(self.model.vectors_norm, self.model.index2word)
Expand Down
8 changes: 4 additions & 4 deletions gensim/test/test_keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@

import logging
import unittest
from mock import patch

from mock import patch
import numpy as np

from gensim.corpora import Dictionary
from gensim.models.keyedvectors import KeyedVectors, WordEmbeddingSimilarityIndex, \
FastTextKeyedVectors, REAL
from gensim.models.keyedvectors import (
KeyedVectors, WordEmbeddingSimilarityIndex, FastTextKeyedVectors, REAL,
)
from gensim.test.utils import datapath

import gensim.models.keyedvectors

logger = logging.getLogger(__name__)
Expand Down
16 changes: 8 additions & 8 deletions gensim/test/test_similarities.py
Original file line number Diff line number Diff line change
Expand Up @@ -544,8 +544,8 @@ class TestWord2VecAnnoyIndexer(unittest.TestCase):
def setUp(self):
try:
import annoy # noqa:F401
except ImportError:
raise unittest.SkipTest("Annoy library is not available")
except ImportError as e:
raise unittest.SkipTest("Annoy library is not available: %s" % e)

from gensim.similarities.index import AnnoyIndexer
self.indexer = AnnoyIndexer
Expand Down Expand Up @@ -648,8 +648,8 @@ class TestDoc2VecAnnoyIndexer(unittest.TestCase):
def setUp(self):
try:
import annoy # noqa:F401
except ImportError:
raise unittest.SkipTest("Annoy library is not available")
except ImportError as e:
raise unittest.SkipTest("Annoy library is not available: %s" % e)

from gensim.similarities.index import AnnoyIndexer

Expand Down Expand Up @@ -707,8 +707,8 @@ class TestWord2VecNmslibIndexer(unittest.TestCase):
def setUp(self):
try:
import nmslib # noqa:F401
except ImportError:
raise unittest.SkipTest("Nmslib library is not available")
except ImportError as e:
raise unittest.SkipTest("NMSLIB library is not available: %s" % e)

from gensim.similarities.nmslib import NmslibIndexer
self.indexer = NmslibIndexer
Expand Down Expand Up @@ -800,8 +800,8 @@ class TestDoc2VecNmslibIndexer(unittest.TestCase):
def setUp(self):
try:
import nmslib # noqa:F401
except ImportError:
raise unittest.SkipTest("Nmslib library is not available")
except ImportError as e:
raise unittest.SkipTest("NMSLIB library is not available: %s" % e)

from gensim.similarities.nmslib import NmslibIndexer

Expand Down