Permalink
Commits on Aug 20, 2011
  1. added utils.upload_chunked fnc

    piskvorky committed Aug 14, 2011
    * uploads a corpus to SimServer in smaller chunks
  2. speed up of lsi[corpus]

    piskvorky committed Aug 17, 2011
    * uses sparse * dense multiplication (was: dense * dense)
    * about 5x speed-up :-)
    * lowering lsi.num_topics (=slicing projection.u) is less efficient, because array order is always wrong now. no big deal, few people manually decrease num_topics anyway. could probably be improved by clever use sparsetools.matvecs...
Commits on Aug 3, 2011
  1. distributed code now uses new Pyro4 (was: Pyro 4.1)

    piskvorky committed Aug 3, 2011
    * plus documentation update
Commits on Jul 16, 2011
  1. added option of query = a specific index doc

    piskvorky committed Jul 16, 2011
    * if we're interested in how similar is document #123 in the index against every other index document: `index.similarity_by_id(123)`
    * so the query is only a number 0 <= query < len(index), not a full document/vector like in standard `index[query]`
    * implemented in the Similarity class
Commits on Jul 7, 2011
Commits on Jul 6, 2011
Commits on Jun 28, 2011
  1. work around strange Pyro packaging (version numbers)

    piskvorky committed Jun 28, 2011
    * to be removed once the new Pyro (>=4.4) is integrated
Commits on Jun 27, 2011
  1. added alias any2utf8 for to_utf8

    piskvorky committed Jun 27, 2011
    * and any2unicode for to_unicode
  2. Merge pull request #44 from dedan/develop

    piskvorky committed Jun 27, 2011
    fix the module import when linking to the git root instead of module
  3. fix the module import when linking to the git root instead of module

    dedan committed Jun 27, 2011
    for some application I need to link to the gensim folder which is also the root of the repository. This script helps python to find the actual sourcecode of the module and had to be changed because radim moved the source within the repo
Commits on Jun 25, 2011
  1. fixed one PEP8 orphan

    piskvorky committed Jun 25, 2011
Commits on Jun 22, 2011
  1. Merge branch 'develop' of github.com:piskvorky/gensim into develop

    piskvorky committed Jun 22, 2011
    Conflicts:
    	gensim/test/test_models.py
  2. Merge pull request #40 from Dieterbe/develop

    piskvorky committed Jun 22, 2011
    Rename variable "chunks" to more sensible "chunksize"
  3. Rename variable "chunks" to more sensible "chunksize"

    Dieter Plaetinck committed Jun 22, 2011
  4. removed print_debug calls from the LSI unittest

    piskvorky committed Jun 22, 2011
    * was causing `invalid value in divide` warnings in numpy
    * see http://groups.google.com/group/gensim/browse_thread/thread/45c1c9efe91ce8d0
Commits on Jun 19, 2011
  1. up version: 0.8.0rc1

    piskvorky committed Jun 19, 2011
Commits on Jun 18, 2011
  1. improved doc strings

    piskvorky committed Jun 18, 2011
Commits on Jun 16, 2011
  1. Added chunking for lsi[corpus] transformation (about 3x faster)

    piskvorky committed Jun 16, 2011
    * before, lsi[corpus] was just syntactic sugar for (lsi[doc] for doc in corpus)
    * now, lsi[corpus] proceeds in chunks of documents (256 by default) and transforms each entire chunk at once
    * the reason is, transforming a chunk = matrix * matrix multiply, is faster than 256 single document transforms = matrix * vector multiplies (bc. of cache&co)
Commits on Jun 15, 2011