Miner is a toy library for data mining. The main goal of this library is to provide an introduction to different data mining techniques while learning on the subject myself.
A simple yet powerful algorithm for cluster analysis is the k-means algorithm. This algorithm will partition a set of points over k clusters. After the algorithm has converged, the
clusters property of the
kmeans objects (
kmeans.clusters) will contain a dictionary with indexes that refer to the elements in
import miner.utils import miner.clustering space = miner.utils.Space() space.points([(2, 2), (2, 1), (2, 3), (2, -2), (2, -1), (2, -3)]) kmeans = miner.clustering.KMeans(2, space) kmeans.converge()
If you want to classify a certain record by comparing it to training data, you can employ a classification algorithm, like KNN. This simple algorithm calculates the distance between the record to be classified and the training records. The algorithm will return the class label which is most common for the k nearest records.
import miner.utils import miner.classification # Set up our matrix for records with 4 attributes matrix = miner.utils.Matrix(4) matrix.records([[1.0, 2.3, 5.0, 3.0], [2.0, 4.0, 1.2, 1.8], [15.0, 12.2, 13.0, 1.1], [10.0, 9.4, 8.4, 1.], [-10.0, 3.2, 1.6, -1.0], [-1.0, 2.2, 1.2, -3.0]]) # Associate the class label with the records matrix.classes(['A', 'A', 'B', 'B', 'C', 'C']) matrix.normalize() # Find the class label for the following record, for k = 3 knn = miner.classification.KNearestNeighbor(3, matrix) print knn.classify([1.2, 4.2, 4.2, -1.2])
This library is released under the MIT license.