Permalink
Browse files

Initial SVN import

git-svn-id: https://python-cluster.svn.sourceforge.net/svnroot/python-cluster/trunk@1 57eab859-f816-0410-af72-e61ffa1cc713
  • Loading branch information...
exhuma committed Oct 14, 2007
0 parents commit 72092770328659988998d5693c1a23a0dd74990f
Showing with 1,539 additions and 0 deletions.
  1. +46 −0 CHANGELOG
  2. +44 −0 INSTALL
  3. +505 −0 LICENSE
  4. +2 −0 MANIFEST.in
  5. +42 −0 README
  6. BIN cluster.bmp
  7. +727 −0 cluster.py
  8. +148 −0 clusterTests.py
  9. +8 −0 makedist.sh
  10. +2 −0 setup.cfg
  11. +15 −0 setup.py
@@ -0,0 +1,46 @@
+1.1.1b2
+ - Fixed bug #1604859 (thanks to Willi Richert for reporting it)
+
+1.1.1b1
+ - Applied patch [1535137] (thanks ajaksu)
+ --> Topology output supported
+ --> data and raw_data are now properties.
+
+1.1.0b1
+ - KMeans Clustering implemented for simple numeric tuples.
+ Data in the form [(1,1), (2,1), (5,3), ...]
+ can be clustered.
+
+ Usage:
+
+ >>> from cluster import KMeansClustering
+ >>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
+ >>> clusters = cl.getclusters(2)
+
+ the method "getclusters" takes the amount of clusters you would like to
+ have as parameter.
+
+ Only numeric values are supported in the tuples. The reason for this is
+ that the "centroid" method which I use, essentially returns a tuple of
+ floats. So you will lose any other kind of metadata. Once I figure out a
+ way how to recode that method, other types should be possible.
+
+1.0.1b2
+ - Optimized calculation of the hierarchical clustering by using the fact, that
+ the generated matrix is symmetrical.
+
+1.0.1b1
+ - Implemented complete-, average-, and uclus-linkage methods. You can select
+ one by specifying it in the constructor, for example:
+
+ cl = HierarchicalClustering(data, distfunc, linkage='uclus')
+
+ or by setting it before starting the clustering process:
+
+ cl = HierarchicalClustering(data, distfunc)
+ cl.setLinkageMethod('uclus')
+ cl.cluster()
+
+ - Clustering is not executed on object creation, but on the first call of
+ "getlevel". You can force the creation of the clusters by calling the
+ "cluster" method as shown above.
44 INSTALL
@@ -0,0 +1,44 @@
+INSTALLATION
+============
+
+Linux
+-----
+
+RPM-Installation
+~~~~~~~~~~~~~~~~
+
+I'm not familiar with RPM-distributions but as far as I know it should be
+something like::
+
+ rpm -i <filename.rpm>
+
+RPM-source Installation
+~~~~~~~~~~~~~~~~~~~~~~~
+
+This is something I don't know. If somebody can enlighten me, please do!
+
+Binary/Source installation
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Untar the package with you favourite archive tool. On the console it will be
+something along the lines::
+
+ tar xzf <filename.tar.gz>
+
+Next, go to the folder just created. It will have the same name as the package
+(for example "cluster-1.0.0b1") and run::
+
+ python setup.py install
+
+For this step you need root-priviledges
+
+Windows
+-------
+
+Execute the executable file and follow the instructions displayed. Default
+values will be fine in most cases.
+
+MacOS-X
+-------
+
+Simply follow the same instructions as with the Linux-Source installation.
505 LICENSE

Large diffs are not rendered by default.

Oops, something went wrong.
@@ -0,0 +1,2 @@
+include README LICENSE CHANGELOG
+include *.py cluster.bmp MANIFEST.in
42 README
@@ -0,0 +1,42 @@
+DESCRIPTION
+===========
+
+python-cluster is a "simple" package that allows to create several groups
+(clusters) of objects from a list. It's meant to be flexible and able to
+cluster any object. To ensure this kind of flexibility, you need not only to
+supply the list of objects, but also a function that calculates the similarity
+between two of those objects. For simple datatypes, like integers, this can be
+as simple as a subtraction, but more complex calculations are possible. Right
+now, it is possible to generate the clusters using a hierarchical clustering
+and the popular K-Means algorithm. For the hierarchical algorithm there are
+different "linkage" (single, complete, average and uclus) methods available. I
+plan to implement other algoithms as well on an
+"as-needed" or "as-I-have-time" basis.
+
+Algorithms are based on the document found at
+http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/
+
+USAGE
+=====
+
+A simple python program could look like this::
+
+ >>> from cluster import *
+ >>> data = [12,34,23,32,46,96,13]
+ >>> cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
+ >>> cl.getlevel(10) # get clusters of items closer than 10
+ [96, 46, [12, 13, 23, 34, 32]]
+ >>> cl.getlevel(5) # get clusters of items closer than 5
+ [96, 46, [12, 13], 23, [34, 32]]
+
+Note, that when you retrieve a set of clusters, it immediately starts the
+clustering process, which is quite complex. If you intend to create clusters
+from a large dataset, consider doing that in a separate thread.
+
+For K-Means clustering it would look like this:
+
+ >>> from cluster import KMeansClustering
+ >>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
+ >>> clusters = cl.getclusters(2)
+
+The parameter passed to getclusters is the count of clusters generated.
Binary file not shown.
Oops, something went wrong.

0 comments on commit 7209277

Please sign in to comment.