-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict for HDBSCAN #32
Comments
Predicting which clusters new points belong be done simply w/ the cluster membership probabilities for either the default clustering returned or for the clusters returned by One small technical issue is that since both DBSCAN and HDBSCAN are unsupervised frameworks for clustering, and the predicted clusters won't necessarily match the result of e.g. running DBSCAN/HDBSCAN on the original data set w/ the new data instead, i.e. |
Predicting cluster membership on new data is a useful thing and should be added. |
sounds good, i will try to create a PR.
@peekxc could you clarify what you mean by that? I'm not entirely sure how to implement your suggestions. what do you mean by the "default clustering"? |
@moredatapls What I mean is that HDBSCAN is not a singular clustering algorithm per-se. If you run But HDBSCAN isn't limited to just those local cuts, you can also use it as you would with a more traditional cluster hierarchy, e.g.
For the prediction though, I think the default clustering is fine. |
I think the default clustering is fine. I have now extracted the predict functions into its own file predict.R. Please put the code for HDBSCAN there. |
@peekxc: Please review the code. |
+1 for a predict.hdbscan, it is something we need if we want to implement https://github.com/michalovadek/top2vecr and put a package on CRAN for that. |
hdbscan has now a predict function. |
thanks! |
For a trained HDBSCAN object, I would like to predict the cluster for new data points similar to what is described here. I see that such a functionality exists for DBSCAN in the function
predict.dbscan_fast()
, but is missing forhdbscan
.Would it be possible to implement a
predict.hdbscan()
function similar to the one fordbscan_fast
? Is there any technical reason why this function doesn't exist? Otherwise, I'd be happy to try to create a PR for that.The text was updated successfully, but these errors were encountered: