ClodHopper: A High Performance Java Library for Data Clustering
ClodHopper is a open source Java library for high-performance clustering of numerical data.
It contains clustering implementations such as K-Means, K-Means++, X-Means, G-Means, Fuzzy C-Means, and various forms of hierarchical clustering. ClodHopper's clustering implementations take advantage of the host system's concurrent processing ability in order to speed up clustering. The data structures are also very lean in order to conserve on memory usage. ClodHopper is also very extensible. If you are developing a new clustering algorithm, you may save yourself an enormous amount of work by extending a ClodHopper base class.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Randall Scarberry, email: email@example.com
How to get started with ClodHopper:
If you want to download and browse the code, use git as follows:
git clone https://github.com/rscarberry-wa/clodhopper.git
In the newly-created clodhopper directory, you will find the subdirectories clodhopper-core and clodhopper_examples. The first contains a maven project for clodhopper proper. The second contains a project of numerous examples. I recommend importing both projects into eclipse or the IDE of your choice.
- If you simply want to use ClodHopper to cluster something in one of your programs, just place this
dependency into your maven
<dependency> <groupId>org.battelle</groupId> <artifactId>clodhopper-core</artifactId> <version>1.0.0</version> </dependency>
The simplest example shows you how to use k-means to cluster a csv file containing numeric data. The example is contained in the file:
This file is generously commented.
Also check out the following demos:
This example runs several of the clustering algorithms in sequence on generated data. As they complete, it display scatter plots with the clusters collapsed into 2 dimensions. You can drag your mouse to select clusters and points in any of the plots and the selections propagate to the other plots, indicating how the clusters correspond.
This example permits you to read in a csv data file and cluster the data using many of the algorithms in the library using just about any parameter setting you please. Then you can save the clustering results in a simple csv file.
Watch for more on the wiki! ClodHopper is just getting started.