-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GPU based DBSCAN clustering algo #239
Comments
I'm new to this repository but I'm interested in helping to implement G-DBSCAN as a part of a class project at NCSU (I'm an MS student.) I noticed these two repos: DBSCAN and CudaDBClustering, that may be good starting points. The latter is a little messy. Has this issue made progress? Is there anything I can help with? thanks, |
From what I know @teju85 will not be implementing this for the time being (think they decided to work on Kalman Filters) so you'd have to do the whole implementation yourself (which of course you're more than welcome to do). This would mean:
You can follow the KMeans/tSVD file structure. All our GPU code is here https://github.com/h2oai/h2o4gpu/tree/master/src/gpu
Even only a CUDA backend (without Python wrappers and C++ impl) would be of great help! For starters, it would be good if you just cloned and built the repo, maybe ran tests, check if everything is working. Then go through one existing algo (I think truncated SVD would be easiest) and if you think you're up to it fork the repo and start working on your impl. If you have any questions ping us here or on gitter. |
@n-casale my bad, misunderstood @teju85 - they already started working on DBSCAN implementation. Not sure if they need any help with it but probably won't be very parallelizable. |
Thanks Mateusz for the clarification. @n-casale Thanks for showing interest and also for the links! Regards, |
Hello @teju85 |
Czesc @spaszek :-) From what I remember @teju85 got a bit sidetracked with other work related issues so I wouldn't count on getting DBSCAN in the nearest future but let me follow up with him and see what we can do :-) Are you using DBSCAN in production? What would be your use case - how many GPUs are you guys using? What's your data size? We're trying to figure out what the users might need. |
Oh, okay - I was under the impression you guys would have DBScan ready at Q1 2018 (as said in the post here https://blog.h2o.ai/2017/12/h2o4gpu-hands-lab-video-updates/ ) :P To be honest, we are not using DBScan anywhere(yet) - I am just looking around for alternatives to Do you plan to port the implementation to h2o-core? I would love to call DBScan from Scala easily, even without the GPUs. I work at a heavily JVM based company and we replace Python with Scala/Clojure whenever possible. ąęśćóźłż - if you needed them in Japan :D |
@spaszek our roadmap is more of a blueprint than anything set in stone unfortunately as we're heavily understaffed :( I'm actually working on Java bindings (using SWIG) as we speak and hope to have it integrated with h2o-3/flow before GTC next month (so you'll be able to use it through h2o-3 or just as standalone h2o4gpu4java lib:-). This will have certain limitations, though, as our GPU implementations are not node-distributed (something we're looking into but still not sure should we go with distribution on the CPU or GPU level and whether it is actually necessary). |
@mdymczyk Would there be a way for me to contribute to I'm already working on PR at |
@spaszek getting Java bindings should be easy and I will probably have it working by the end of the week/next week. But if you want to use it through H2O-3 or Sparkling Water we'll have to somehow integrate that and there are 2 ways to do that:
Will have to have a chat with the h2o-3/sw teams, see which path to choose and then we'll make some JIRAs or github issues here. Will keep you posted :-) |
great, thanks @mdymczyk I would love to contribute to that if possible - please don't forget to mention me when you guys settle for a solution. I just started using h2o (and sparkling water) extensively. Contributing to the project would be a good way of learning how to actually use the library (as well as speeding you guys up a little in the process). |
Hey @teju85 - are there any updates on this? We have tried several other parallelism methods (Spark etc) but I think that CUDA is the way to go - I also think that some of the other items we are currently doing would be a really good fit for the RTX technology (Spatial joins etc) |
Hi @voycey , |
This seems to have problems crashing unfortunately! Ill keep an eye on the issue that is debugging it! |
@voycey Fixes for crash issue should now be at the HEAD of rapidsai/cuml. Can you check and provide feedback, please? If you still find problems, file an issue in that repo and I'll look at it sooner. |
@mdymczyk Do you think we still need to keep this issue open? |
I will have an attempt at it this week sometime if I can! |
Sorry for necro-bumping, but has there been any progress or alternatives to this? |
@blackvvine please refer to my earlier message above. We implemented dbscan inside cuML. Not sure if we ever want to provide a wrapper for h2o4gpu, though. I'll leave this decision to the project admins. @pseudotensor ? |
@teju85 Which algorithm has been used to implement DBSCAN in cuML? CUDA-DClust or G-DBSCAN or something else? |
Original serial algo: https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf
GPU based implementations:
IMHO, I think we should start with an implementation based on G-DBSCAN and the refine the implementation as per performance profile.
The text was updated successfully, but these errors were encountered: