* uses sparse * dense multiplication (was: dense * dense) * about 5x speed-up :-) * lowering lsi.num_topics (=slicing projection.u) is less efficient, because array order is always wrong now. no big deal, few people manually decrease num_topics anyway. could probably be improved by clever use sparsetools.matvecs...
* plus documentation update
* if we're interested in how similar is document #123 in the index against every other index document: `index.similarity_by_id(123)` * so the query is only a number 0 <= query < len(index), not a full document/vector like in standard `index[query]` * implemented in the Similarity class
* fixing issues reported by users: http://groups.google.com/group/gensim/msg/4751348a4cfa8ff7
* to be removed once the new Pyro (>=4.4) is integrated
for some application I need to link to the gensim folder which is also the root of the repository. This script helps python to find the actual sourcecode of the module and had to be changed because radim moved the source within the repo
* was causing `invalid value in divide` warnings in numpy * see http://groups.google.com/group/gensim/browse_thread/thread/45c1c9efe91ce8d0
* before, lsi[corpus] was just syntactic sugar for (lsi[doc] for doc in corpus) * now, lsi[corpus] proceeds in chunks of documents (256 by default) and transforms each entire chunk at once * the reason is, transforming a chunk = matrix * matrix multiply, is faster than 256 single document transforms = matrix * vector multiplies (bc. of cache&co)