Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "kmeans" strategy to select the prototypes #52

Closed
GaelVaroquaux opened this issue Oct 29, 2018 · 2 comments
Closed

Add a "kmeans" strategy to select the prototypes #52

GaelVaroquaux opened this issue Oct 29, 2018 · 2 comments
Labels
enhancement New feature or request

Comments

@GaelVaroquaux
Copy link
Member

Using a kmeans to define prototypes is useful to have a reduced dimensionality. The steps are the following:

  1. hash all the strings of the categories
  2. random project them in 256 dimensions
  3. run a kmeans on the resulting data with number of clusters the desired dimensionality (and n_init=1)
  4. use a nearest neighbor from scikit-learn to assign the cluster centers to original categories.
@GaelVaroquaux GaelVaroquaux added the enhancement New feature or request label Oct 29, 2018
@GaelVaroquaux GaelVaroquaux changed the title Add a "kmeans" strategie to select the prototypes Add a "kmeans" strategy to select the prototypes Oct 29, 2018
@GaelVaroquaux
Copy link
Member Author

This should probably be implemented in a separate function, as it is well suited for functions.

@GaelVaroquaux GaelVaroquaux added this to the sprint nov 2018 milestone Oct 29, 2018
@GaelVaroquaux
Copy link
Member Author

Closing as implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant