Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

k-means #2

Closed
maxogden opened this Issue Jun 27, 2012 · 6 comments

Comments

Projects
None yet
3 participants

http://web.engr.oregonstate.edu/~shindler/papers/StreamingKMeans_soda11.pdf

there are c++ implementations out there on the 'nets

piatra commented Nov 20, 2012

After some research turns out that you can apply the kmeans algorithm for pretty much anything that you can define a norm on, be that images, words or numbers. What do you have in mind ?

I was thinking a number clusterer would be the most widely useful at first. something like this one that works on X/Y points http://polymaps.org/ex/cluster.html

so an API like this (not necessarily with the method chaining style though):

// initialize and configure
var cluster = kmeans().iterations(16).size(64))

// add a new point
cluster.add([x,y])

// get the current means
var means = cluster.means()

though instead of cluster.add it would ideally implement the node stream API (the other stats functions in this library already do), and there is a nice guide here https://github.com/substack/stream-handbook as well as example streams like this https://github.com/maxogden/websocket-stream/blob/master/index.js

the hard part is making it truly streaming so that you can call cluster.means() after adding any number of points and it will constantly be recalculating the clusters each time you add

schue commented Nov 21, 2012

Do voronoi cells accurately represent the distance for spaces with streets? Do the routes of a map morph its topology? Sometimes a location on a map may be close by but inconvenient to reach so its really further away than it appears. Is there a way to transform a map that compensates for the convenience of its routes?

piatra commented Nov 21, 2012

@schue Usually these algorithms take into account a cost for each point. It's up to you to tweak the algorithm and increase the cost of a specific point based on additional data (just an undocumented opinion)

piatra commented Nov 22, 2012

I've ported the algorithm from octave to javascript (last year's homework =) ) and started a repo to keep track of the progress. It should work but I have only tested it on a single set of numbers. You can view the output here http://piatra.github.com/streaming-k-means/ not much just some coloured dots. It's far from anything right now but it's only a first attempt.I'll write a better documentation and start working on it to make it stream !

@tmcw tmcw closed this in ec061ef Jun 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment