Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BleiCorpus with an index #1

Closed
wants to merge 1 commit into from
Closed

Conversation

Dieterbe
Copy link
Contributor

No description provided.

The index can be used for fast retrieval of specific documents,
and for fast responses to len() queries
@piskvorky
Copy link
Owner

I'll think about how to do this more generally, for any *Corpus.

Just have to find a way to do it without breaking things or making them too complicated...

@piskvorky
Copy link
Owner

Ok, i wrote a draft of the functionality: i added a new class, corpora.IndexedCorpus. It's in the indexedcorpus branch.

I'll make every file-based corpus format inherit from IndexedCorpus, plus add the docbyoffset() method. That should be enough to index every corpus format, including a convenient save/load, unless i missed something.

@Dieterbe
Copy link
Contributor Author

your method is better indeed :) I'm removing this commit again from my branches

piskvorky pushed a commit that referenced this pull request Aug 12, 2012
piskvorky pushed a commit that referenced this pull request Sep 7, 2012
make len(Dictionary.from_corpus) consistent with its content
piskvorky pushed a commit that referenced this pull request Oct 27, 2013
piskvorky pushed a commit that referenced this pull request Apr 23, 2014
piskvorky pushed a commit that referenced this pull request Jun 4, 2014
piskvorky pushed a commit that referenced this pull request Sep 12, 2014
fix bugs in state reset and state init
piskvorky pushed a commit that referenced this pull request Sep 17, 2014
minor doc&format fixes in DTM model
piskvorky pushed a commit that referenced this pull request Oct 5, 2014
@lerela lerela mentioned this pull request Oct 16, 2014
piskvorky pushed a commit that referenced this pull request Jul 5, 2015
Consistency with gensim and pep 8
piskvorky pushed a commit that referenced this pull request Nov 19, 2015
Simplify job loop + merge latest gensim
tmylk pushed a commit that referenced this pull request Mar 23, 2016
code style fixes to CLI scripts
tmylk pushed a commit that referenced this pull request Apr 3, 2016
@dnabanita7 dnabanita7 mentioned this pull request Jan 22, 2019
piskvorky pushed a commit that referenced this pull request Jul 26, 2020
Make docs clearer on `alpha` parameter in LDA model
piskvorky pushed a commit that referenced this pull request May 10, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants