-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
My flsa fixes #3438
My flsa fixes #3438
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, the documentation makes much more sense now.
I left a few minor comments; the only bigger step left is to accept streaming input = avoid materializing the whole corpus in-RAM. I'll try to get to it this weekend, to assist you.
I have just fixed the small comments you left: However, when I push this the same as how I did before, I get the following error: When I try to pull the branch, git seems up to date already: Which commands can I use to push my latest commit: 7864ebc? |
@ERijck we'll be releasing the hotfix soon, which only removes the Your error above means you rebased your PR, instead of doing simple commit, so the changes are "non-fast-forward". What did you do exactly? Maybe some graphic git tool messed up your repository. I prefer the command line for this reason. Either way, if Flsa is not fixed by the next release, we'll have to remove it. |
Hi @piskvorky, thanks for your message. I will look into this with a colleague soon. |
Pylint fixes
@ERijck any plans to finish this? I'm thinking it's best if you put this (fixed) module into a separate package, separate from gensim. It's useful on its own, but looks like it's not ready for gensim yet. The maintenance effort to support this module seems too steep for us. |
@piskvorky, I thought I fixed the points on March 21st. So that only step 2 (avoid materializing the whole corpus in RAM) is left. For this, I was waiting for your input to finish, but I understand that this takes too much time and effort for you. What needs to change so that it is ready for Gensim? Is it correct that you want to use a BOW representation using integers to represent a document rather than a list of strings? If so, I can implement these changes. If you want other changes, could you briefly describe what you want/how I can do that? Then, I will work on these implementations. |
Fixes points 1,3 and 4 from #3437.
More specifically, I have:
ERijck:
I did not fix.flsamodel.py
.Now, we still need to fix point 2. It is unclear how to perform singular value decomposition on a bow representation. However, I suspect you do this in the
lsimodel.py
as well, so I will see how it is done there and rewrite the code.