-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-6447 constrained kmeans POC #4067
Conversation
bccb82c
to
6e11999
Compare
a67e5e4
to
93b9806
Compare
@angela0xdata, could you review the documentation part of this PR, please? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving the experimental nature of this feature - I think this PR is okay to be merged.
h2o-r/h2o-package/R/kmeans.R
Outdated
@@ -36,6 +36,8 @@ | |||
#' @param categorical_encoding Encoding scheme for categorical features Must be one of: "AUTO", "Enum", "OneHotInternal", "OneHotExplicit", | |||
#' "Binary", "Eigen", "LabelEncoder", "SortByResponse", "EnumLimited". Defaults to AUTO. | |||
#' @param export_checkpoints_dir Automatically export generated models to this directory. | |||
#' @param cluster_size_constraints Specify how many points should be at least in each cluster. The length of constraints array has to be same as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor edit:
Specify how many points should be at least in each cluster. The length of constraints array must be the same as the number of clusters (experimental).
@maurever, I added a simple example to h2o-bindings/bin/custom/python/gen_kmeans.py (similar to other KMeans options). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming my change didn't break anything. (h2o-3 built locally without error.)
No, it is ok. My code causes the fail of the test. Thank you very much for your improvements and review @angela0xdata! |
Constrained K-means - Experimental
Calculate K-means using the minimal size of cluster constrain.
Implemented according to https://pdfs.semanticscholar.org/ecad/eb93378d7911c2f7b9bd83a8af55d7fa9e06.pdf
JIRA: https://0xdata.atlassian.net/projects/PUBDEV/issues/PUBDEV-6447
Currently implemented an only serial version of minimal cost flow calculation. A map-reduce version will be implemented soon.
EDIT:
-> assign as experimental
-> remove from Python/R API
-> remove Doc