Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is HDBSCAN model export possible? #313

Open
kalpit-konverge opened this issue Dec 26, 2022 · 1 comment
Open

is HDBSCAN model export possible? #313

kalpit-konverge opened this issue Dec 26, 2022 · 1 comment
Labels
question General question

Comments

@kalpit-konverge
Copy link

Ask the question
Is it possible to export HDBSCAN model via ONNX? No such functionality exists in Python as far as I am aware.

Is your question about a specific ML algorithm or approach?
This is about clustering, specifically the HDBSCAN method

Is your question about a specific Tribuo class?
HdbscanModel.java

System details

  • Tribuo version
  • Java version (if appropriate)
  • OS/Architecture (if appropriate)

Additional context
It would be great if I can get some general pointers on how export/import of HDBSCAN could be achieved.

@kalpit-konverge kalpit-konverge added the question General question label Dec 26, 2022
@Craigacp
Copy link
Member

The HDBSCAN algorithm doesn't naturally have a predict method for determining the cluster assignment of a new point, as strictly speaking a new point could change the clustering of all the rest of the data. In Tribuo (and also in this Python scikit-learn-contrib HDBSCAN implementation) the prediction method is approximate and based on the nearest neighbour keypoints. It would be possible to export that nearest neighbour search into ONNX, but we've not done that for any of Tribuo's nearest neighbour prediction models (K-NN, K-Means, HDBSCAN) yet, and also as ONNX doesn't naturally have a nearest neighbour op it would bake in an exhaustive search of the keypoints (e.g. in the way this scikit-learn K-NN ONNX converter does - https://github.com/onnx/sklearn-onnx/blob/main/skl2onnx/operator_converters/nearest_neighbours.py#L64). We'd accept contributions to add that kind of export support, otherwise it's on the backlog of ONNX model export features and we'll get to it at some point in the future.

Tribuo is unlikely to ever support importing a HDBSCAN model into the HdbscanModel class, though with a small amount of additional work we could support loading in an ONNX clustering model (currently Tribuo is missing the ClusterID version of this output adaptor class which would be straightforward to add).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

No branches or pull requests

2 participants