New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance about usage with large datasets #22
Comments
Howdy It is a reasonable point that I should provide some more basic advice on when to apply different types of functions. I think the key issue is whether one uses a feature-based function where the complexity is linear in the size of the data set size or a graph-based function where the complexity is linear in the number of edges (e.g. quadratic w.r.t data set size). Unfortunately, submodular functions with the same computational complexity can sometimes have very different runtimes depending on the number of operations in the score function. Further, it's often unclear which function will work best given a data set other than making high-level decisions like feature-based vs graph-based. Can you describe a bit more what you'd like to see? I'll see if I can add some more thoughts in soon. |
I completely understand the issue and I do not expect a 1:1 mapping between use-cases and functions. I am also quite interested in using streaming algorithms, for which you produced a notebook, and provided the partial_fit function. In this case it would be good to understand what happen if you have a stream of point (or multiple batches). |
I tried the library on some datasets and I have to say I am positively surprised about the usability and the effectiveness of the methods provided.
At the same time, I found some serious blocker in using it with large datasets since this requires to read the literature referenced in the documentation. It would be extremely useful to provide guidance about the computational complexity of the different methods or distinguish between scalable methods (e.g., streaming methods) and less scalable ones.
The text was updated successfully, but these errors were encountered: