Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

personalization() has explosive memory requirements due to pairwise comparison #37

Open
ahgraber opened this issue Jan 7, 2022 · 3 comments

Comments

@ahgraber
Copy link

ahgraber commented Jan 7, 2022

On my system (16gb ram), a list of 10k recommendations will run. A list of 50k will crash out. I'd like to try to understand the personalization score across my entire hypothetical customer base 250k+.

Is there a way to chunk the scipy.sparse.csr_matrix and iteratively calculate the cosine similarity to avoid holding the whole thing in memory?

@Alex-Bujorianu
Copy link

I have the same issue. As a workaround, I randomly sampled a set of users from the population.

@gregwchase gregwchase added the bug Something isn't working label Feb 23, 2023
@ibuda
Copy link
Contributor

ibuda commented Mar 17, 2023

@ahgraber @Alex-Bujorianu The problem with personalization(), besides the performance complexity, is that it uses quadratic space.
I had issues with 50k users with performance time only here. I resolved the space problem by increasing the swap from default 2 GB to 40 GB (I am using Ubuntu). Hope that helps.

@ibuda
Copy link
Contributor

ibuda commented Mar 17, 2023

@gregwchase I wouldn't label this issue as a bug, as there is no way to bypass the space complexity here. I left a recommendation on how to increase memory space for those who use Linux machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants