Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My flsa fixes #3438

Merged
merged 9 commits into from
May 10, 2023
Merged

My flsa fixes #3438

merged 9 commits into from
May 10, 2023

Conversation

ERijck
Copy link
Contributor

@ERijck ERijck commented Jan 25, 2023

Fixes points 1,3 and 4 from #3437.

More specifically, I have:

  • Addressed the FIXMEs. I removed the ones I fixed and answered the ones ERijck: I did not fix.
  • Fixed points 1,3, and 4.
  • Removed the method to get topic embeddings. It is not logical to have it in the flsamodel.py.

Now, we still need to fix point 2. It is unclear how to perform singular value decomposition on a bow representation. However, I suspect you do this in the lsimodel.py as well, so I will see how it is done there and rewrite the code.

Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, the documentation makes much more sense now.

I left a few minor comments; the only bigger step left is to accept streaming input = avoid materializing the whole corpus in-RAM. I'll try to get to it this weekend, to assist you.

gensim/models/flsamodel.py Outdated Show resolved Hide resolved
gensim/models/flsamodel.py Outdated Show resolved Hide resolved
gensim/models/flsamodel.py Outdated Show resolved Hide resolved
gensim/models/flsamodel.py Outdated Show resolved Hide resolved
gensim/models/flsamodel.py Show resolved Hide resolved
@ERijck
Copy link
Contributor Author

ERijck commented Feb 6, 2023

I have just fixed the small comments you left:

image

However, when I push this the same as how I did before, I get the following error:

image

When I try to pull the branch, git seems up to date already:

image

Which commands can I use to push my latest commit: 7864ebc?

@piskvorky
Copy link
Owner

piskvorky commented Mar 2, 2023

@ERijck we'll be releasing the hotfix soon, which only removes the setup.py dependency. Can you finish up this PR please?

Your error above means you rebased your PR, instead of doing simple commit, so the changes are "non-fast-forward". What did you do exactly? Maybe some graphic git tool messed up your repository. I prefer the command line for this reason.

Either way, if Flsa is not fixed by the next release, we'll have to remove it.

@ERijck
Copy link
Contributor Author

ERijck commented Mar 14, 2023

Hi @piskvorky, thanks for your message. I will look into this with a colleague soon.

@piskvorky
Copy link
Owner

@ERijck any plans to finish this?

I'm thinking it's best if you put this (fixed) module into a separate package, separate from gensim. It's useful on its own, but looks like it's not ready for gensim yet. The maintenance effort to support this module seems too steep for us.

@ERijck
Copy link
Contributor Author

ERijck commented May 1, 2023

@piskvorky, I thought I fixed the points on March 21st. So that only step 2 (avoid materializing the whole corpus in RAM) is left. For this, I was waiting for your input to finish, but I understand that this takes too much time and effort for you.

What needs to change so that it is ready for Gensim? Is it correct that you want to use a BOW representation using integers to represent a document rather than a list of strings? If so, I can implement these changes. If you want other changes, could you briefly describe what you want/how I can do that? Then, I will work on these implementations.

@piskvorky piskvorky merged commit 4fcda16 into piskvorky:flsa_fixes May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants