-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation is out of date #22
Comments
Hi David, I think documentation is very important, so improvements are very welcome! However in this case, I think you are wrong -- I always check all tutorial examples before each release:
Your note about corpus parsing and reparsing is serious though, it means it is not clear to users how the dictionary processing fits within corpus creation. That's a conceptual mistake, so the tutorial is apparently not doing a good job there, I will try to improve it. EDIT: maybe the confusion comes from you using an older version of gensim? The documentation always reflects the latest release. |
Hi Radim,
As for 3., I didn't mean the methods |
I am sorry, I closed this accidentally. I am still learning GitHub, I just wished the "Comment and close" button wasn't the default. :/ |
Learning GitHub is a never ending process... I filed one site bug report just yesterday :-) Full documentation (including HTML) is version controlled, and is a part of each gensim release. So you can access the relevant version a) from the source .tgz package of your release, There are several questions about dictionaries and corpora at the mailing list now, not just yours, so apparently the tutorial on that part is insufficient. I'll try to improve it, but once you figure it out, please consider upgrading the docs yourself. I know gensim too well, it's difficult to have a detached perspective on some things. I may see stuff as obvious and misunderstand problems. |
Just another issue, for the fun of it. :)
As I have been skimming though the documentation I have found a few places where it is outdated, or lacking in certain respects. Here's a few off the top of my head (actually, it's not just the top, it's all I have found for now):
serialize
method in the corpus classes, onlysaveCorpus
(both in API and tutorials)lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, numTopics=2)
does not work (http://nlp.fi.muni.cz/projekty/gensim/tut2.html). It should beid2word=dictionary.id2word
.dictionary.filterTokens
, let alonecompactify
. I reckon the corpus would obviously "not work" after these commands, and has to be reparsed, but I am not sure. A few sentences on this would be welcome, along with a code example of how to reparse the corpus with such a dictionary (doc2bow
withallowUpdate=False
?)The text was updated successfully, but these errors were encountered: