Is your feature request related to a problem? Please describe.
It is not related to any problem. I would like to generate soft clustering data for a sample having size of 1.5 million. Number of clusters generated after clustering : 15,000
In order to run soft clustering, I need at least 1.5 * 10^6 * 15000 * 4 = 90 GB memory to store the result.
This is clearly not scalable if we want to run clustering on even large data.
Describe the solution you'd like
Instead of storing the probabilities of a point belonging to all the clusters, expose a param in the constructor and store only those many probabilities after sorting. And also store the probability of this point being a noise point in separate array.
At least for my use case, I don't need more than top 10 probabilities and the noise probability
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.