Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the FastSS and Levenshtein modules to docs #3279

Merged
merged 5 commits into from
Dec 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ Modules:
similarities/termsim
similarities/annoy
similarities/nmslib
similarities/levenshtein
similarities/fastss
test/utils
topic_coherence/aggregation
topic_coherence/direct_confirmation_measure
Expand Down
34 changes: 17 additions & 17 deletions docs/src/auto_examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Understanding this functionality is vital for using gensim effectively.

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces transformations and demonstrates their use on a toy corpus.">
<div class="sphx-glr-thumbcontainer" tooltip="Introduces transformations and demonstrates their use on a toy corpus. ">

.. only:: html

Expand All @@ -92,7 +92,7 @@ Understanding this functionality is vital for using gensim effectively.

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates querying a corpus for similar documents.">
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates querying a corpus for similar documents. ">

.. only:: html

Expand Down Expand Up @@ -169,14 +169,14 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s fastText model and demonstrates its use on the Lee Corpus.">
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s EnsembleLda model">

.. only:: html

.. figure:: /auto_examples/tutorials/images/thumb/sphx_glr_run_fasttext_thumb.png
:alt: FastText Model
.. figure:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
:alt: Ensemble LDA

:ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`
:ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`

.. raw:: html

Expand All @@ -186,18 +186,18 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
.. toctree::
:hidden:

/auto_examples/tutorials/run_fasttext
/auto_examples/tutorials/run_ensemblelda

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s EnsembleLda model">
<div class="sphx-glr-thumbcontainer" tooltip="Introduces Gensim&#x27;s fastText model and demonstrates its use on the Lee Corpus. ">

.. only:: html

.. figure:: /auto_examples/tutorials/images/thumb/sphx_glr_run_ensemblelda_thumb.png
:alt: Ensemble LDA
.. figure:: /auto_examples/tutorials/images/thumb/sphx_glr_run_fasttext_thumb.png
:alt: FastText Model

:ref:`sphx_glr_auto_examples_tutorials_run_ensemblelda.py`
:ref:`sphx_glr_auto_examples_tutorials_run_fasttext.py`

.. raw:: html

Expand All @@ -207,11 +207,11 @@ Learning-oriented lessons that introduce a particular gensim feature, e.g. a mod
.. toctree::
:hidden:

/auto_examples/tutorials/run_ensemblelda
/auto_examples/tutorials/run_fasttext

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Introduces the Annoy library for similarity queries on top of vectors learned by Word2Vec.">
<div class="sphx-glr-thumbcontainer" tooltip="Introduces the Annoy library for similarity queries on top of vectors learned by Word2Vec. ">

.. only:: html

Expand Down Expand Up @@ -309,7 +309,7 @@ These **goal-oriented guides** demonstrate how to **solve a specific problem** u

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates simple and quick access to common corpora and pretrained models.">
<div class="sphx-glr-thumbcontainer" tooltip="Demonstrates simple and quick access to common corpora and pretrained models. ">

.. only:: html

Expand All @@ -330,7 +330,7 @@ These **goal-oriented guides** demonstrate how to **solve a specific problem** u

.. raw:: html

<div class="sphx-glr-thumbcontainer" tooltip="How to author documentation for Gensim.">
<div class="sphx-glr-thumbcontainer" tooltip="How to author documentation for Gensim. ">

.. only:: html

Expand Down Expand Up @@ -447,13 +447,13 @@ Blog posts, tutorial videos, hackathons and other useful Gensim resources, from

.. container:: sphx-glr-download sphx-glr-download-python

:download:`Download all examples in Python source code: auto_examples_python.zip </auto_examples/auto_examples_python.zip>`
:download:`Download all examples in Python source code: auto_examples_python.zip <//Volumes/work/workspace/gensim/trunk/docs/src/auto_examples/auto_examples_python.zip>`



.. container:: sphx-glr-download sphx-glr-download-jupyter

:download:`Download all examples in Jupyter notebooks: auto_examples_jupyter.zip </auto_examples/auto_examples_jupyter.zip>`
:download:`Download all examples in Jupyter notebooks: auto_examples_jupyter.zip <//Volumes/work/workspace/gensim/trunk/docs/src/auto_examples/auto_examples_jupyter.zip>`


.. only:: html
Expand Down
8 changes: 8 additions & 0 deletions docs/src/similarities/fastss.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:mod:`similarities.fastss` -- Fast Levenshtein edit distance
==================================================================

.. automodule:: gensim.similarities.fastss
:synopsis: Fast fuzzy search between strings, using the Levenshtein edit distance
:members:
:inherited-members:

8 changes: 8 additions & 0 deletions docs/src/similarities/levenshtein.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
:mod:`similarities.levenshtein` -- Fast soft-cosine semantic similarity search
==============================================================================

.. automodule:: gensim.similarities.levenshtein
:synopsis: Fast fuzzy search between strings, using the Soft-Cosine Semantic Similarity
:members:
:inherited-members:

9 changes: 9 additions & 0 deletions gensim/similarities/fastss.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,15 @@ def bytes2set(b):


class FastSS:
"""
Fast implementation of FastSS (Fast Similarity Search): https://fastss.csg.uzh.ch/

FastSS enables fuzzy search of a dynamic query (a word, string) against a static
dictionary (a set of words, strings). The "fuziness" is configurable by means
of a maximum edit distance (Levenshtein) between the query string and any of the
dictionary words.

"""

def __init__(self, words=None, max_dist=2):
"""
Expand Down
2 changes: 1 addition & 1 deletion gensim/similarities/levenshtein.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ class LevenshteinSimilarityIndex(TermSimilarityIndex):
"Levenshtein similarity" is a modification of the Levenshtein (edit) distance,
defined in [charletetal17]_.

This implementation uses the FastSS neighbourhood algorithm
This implementation uses the :class:`~gensim.similarities.fastss.FastSS` algorithm
for fast kNN nearest-neighbor retrieval.

Parameters
Expand Down