Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Mar 21, 2012

  1. Mahout-981, Fixing test cases which are keeping clusters-*-final in t…

    …he same directory for canopy and kmeans.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1303282 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Mar 18, 2012

  1. Mahout-981, Changed key to WritableComparable<?> to fix the reuters e…

    …xamples build. Now any type can be feeded as a key in the input sequence file.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1302100 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Mar 17, 2012

  1. MAHOUT-981, Added outlier removal option in method and CLI for KMeans…

    …Driver.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1301886 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Mar 16, 2012

  1. MAHOUT-981, MAHOUT-983. Fixing test cases which fail intermittently.

    Build is passing on my machine ( even for the last commit ). 
    Tried to identify all test cases, which can fail intermittently and fixed them.
    
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1301761 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored
  2. MAHOUT-981, MAHOUT-983. Refactored K-Means Clustering and Dirichlet C…

    …lustering to use ClusterClassificationDriver.
    
    Using cluster.getModel().configure() in ClusterClassificationDriver in order to configure DirichletCluster for MahalanobisDistanceMeasure. 
    Added/fixed test cases by:
    Using separate directories in test cases for supplying initial clusters and to store buildClusters to prevent two cluster-*-final files in the same directory.
    Writing IntWritable in test cases instead of LongWritable ( As the ClusterClassificationDriver clusters records with IntWritable keys). 
    
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1301654 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Mar 13, 2012

  1. Attempt to fix test failure in TestDistributedRowMatrix

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1300196 13f79535-47bb-0310-9956-ffa450edef68
    Tom Pierce authored

Mar 12, 2012

  1. MAHOUT-822: Make Mahout compatible with Hadoop 0.23.1.

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1299770 13f79535-47bb-0310-9956-ffa450edef68
    Tom Pierce authored

Mar 10, 2012

  1. MAHOUT-982, Added method and CLI option to remove outliers.

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1299207 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Mar 08, 2012

  1. MAHOUT-933:

    - refactored ClusteringPolicies into hierarchy under new AbstractClusteringPolicy
    - added close() to ClusteringPolicy to allow policy-specific actions needed to compute convergence
    - removed ClusteringPolicy from ClusterIterator constructor as ClusterClassifier already has one
    - added convergence computations for kmeans and fuzzyk
    - added final clustersOut renaming to add -final suffix
    - updated Display examples and unit tests to reflect above
    - all tests run
    
    I think it is time to begin refactoring the buildClusters methods of the respective clustering drivers to use ClusterIterator as it seems to be producing equivalent results to the original implementations. This will involve removing a lot of existing driver, mapper and reducer code and many time-consuming unit tests. It will also have some impact on other components as the representation of clusters in the file system changes from Cluster to self-describing ClusterWritable.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1298625 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored
  2. MAHOUT-982, Formatted few newly added lines.

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1298408 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored
  3. MAHOUT-982, Clustering vectors using ClusterClassificationDriver. Del…

    …eted ClusterMapper and its test cases.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1298406 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Mar 05, 2012

  1. Cleanups to address Jenkins violations

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1297298 13f79535-47bb-0310-9956-ffa450edef68
    Tom Pierce authored

Mar 02, 2012

  1. a few more cleanups to make Jenkins happier

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1296412 13f79535-47bb-0310-9956-ffa450edef68
    Tom Pierce authored
  2. Sean Owen

    Doc typo fix

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1296164 13f79535-47bb-0310-9956-ffa450edef68
    srowen authored

Mar 01, 2012

  1. cleanups to make Jenkins a little happier

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1295424 13f79535-47bb-0310-9956-ffa450edef68
    Tom Pierce authored
  2. MAHOUT-980: Fix DistributedCache usage to allow EMR deployment

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1295352 13f79535-47bb-0310-9956-ffa450edef68
    Tom Pierce authored

Feb 28, 2012

  1. MAHOUT-929, MAHOUT-931. Implemented mapreduce version of ClusterClass…

    …ificationDriver with outlier removal capability.
    
    Changed output of sequential to WeightedVectorWritable. Fixed and added test cases.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1294454 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Feb 26, 2012

  1. MAHOUT-931, MAHOUT-929. Added emitMostLikely and threshold based outl…

    …ier removal capability in ClusterClassificationDriver.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1293874 13f79535-47bb-0310-9956-ffa450edef68
    Paritosh Ranjan authored

Feb 25, 2012

  1. Subversive missed a few reorganization classes. all tests run

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1293713 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored
  2. Minor namespace reorganization of ClusterIterator and related classes…

    …. All tests run
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1293712 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored

Feb 23, 2012

  1. MAHOUT-933: Fixed undetected defects introduced by earlier commit.

    I will run all the unit tests before every check-in
    I will run all the unit tests before every check-in
    I will run all the unit tests before every check-in
    ...
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1292629 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored

Feb 22, 2012

  1. MAHOUT-933: Refactored actual classification out of ClusterClassifier…

    … and into ClusteringPolicies. This
    
    allows classifier to be completely generic as to the algorithm and gives policies correct use of e.g. fuzzyK 'm'
    Introduced Canopy and MeanShift clustering policies for classification though not used by cluster iterator
    Modified serialization of ClusterClassifiers to include ClusteringPolicy
    Added ClusterClassifier serialization methods to exploded sequenceFile representation needed for MR
    Updated Display examples and unit tests. All run
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1292563 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored
  2. Dmitriy Lyubimov

    MAHOUT-817 PCA options for SSVD (RC1)

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1292532 13f79535-47bb-0310-9956-ffa450edef68
    dlyubimov authored

Feb 17, 2012

  1. Sean Owen

    MAHOUT-977 add multiple anonymous user support to DataModel

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1245615 13f79535-47bb-0310-9956-ffa450edef68
    srowen authored

Feb 15, 2012

  1. Grant Ingersoll

    upgrade lucene

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1244556 13f79535-47bb-0310-9956-ffa450edef68
    gsingers authored

Feb 14, 2012

  1. MAHOUT-929: Committing patch Mahout-929. All tests run

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1244191 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored

Feb 13, 2012

  1. Grant Ingersoll

    MAHOUT-947: add in support for multiple options

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243557 13f79535-47bb-0310-9956-ffa450edef68
    gsingers authored
  2. Grant Ingersoll

    MAHOUT-947: add new inputs to seq dumper, refactor to common CLI input

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243556 13f79535-47bb-0310-9956-ffa450edef68
    gsingers authored
  3. MAHOUT-933:

    - Model: renamed count() to getNumObservations() and added getTotalObservations()
    - Cluster: removed getNumPoints() which was redundant with getNumObservations()
    - AbstractCluster: renamed numPoints to numObservations and added totalObservations. Aded observation statistics to persistent state, updating write() and readFields() to serialize these fields and totalObservations
    - CIMapper: added code to update policy based upon classifier model's state (esp totalObservations for Dirichlet)
    - DirichletClusteringPolicy: removed totalCounts now in each model and changed update to use given prior models' totalCounts
    - FuzzyKMeansClusteringPolicy: added m and convergenceDelta
    - KMeansClusteringPolicy: added convergenceDelta
    - Adjusted many other classes to account for these fundamental changes
    
    All tests run
    
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243387 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored
  4. MAHOUT-933:

    - Model: renamed count() to getNumObservations() and added getTotalObservations()
    - Cluster: removed getNumPoints() which was redundant with getNumObservations()
    - AbstractCluster: renamed numPoints to numObservations and added totalObservations. Aded observation statistics to persistent state, updating write() and readFields() to serialize these fields and totalObservations
    - CIMapper: added code to update policy based upon classifier model's state (esp totalObservations for Dirichlet)
    - DirichletClusteringPolicy: removed totalCounts now in each model and changed update to use given prior models' totalCounts
    - FuzzyKMeansClusteringPolicy: added m and convergenceDelta
    - KMeansClusteringPolicy: added convergenceDelta
    - Adjusted many other classes to account for these fundamental changes
    
    All tests run
    
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243386 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored

Feb 12, 2012

  1. MAHOUT-933:

    - Implemented ClusteringPolicyWritable and made ClusteringPolicies implement Writable so they can be written to the file system. 
    - Modified ClusterIterator and CIMapper to write and read the appropriate clustering policy. 
    
    Next steps need to address the fact that clusters do not serialize their observation state (s0, s1, s2) and so the MR version of ClusterIterator does not actually produce correct values. This will be a much bigger project.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243326 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored
  2. Reformatted using Eclipse-Lucene-Codestyle but with 120 char text width

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243298 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored
  3. MAHOUT-933: Implemented ClusterWritable to support an MR version of C…

    …lusterIterator. Not working correctly yet - needs to incorporate arbitrary policies - but is a step forward. All tests run.
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1243294 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored

Feb 10, 2012

  1. Removing ending 2 in pfpgrowth package declaration. Compiles fine bot…

    …h ways except in Eclipse
    
    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1242646 13f79535-47bb-0310-9956-ffa450edef68
    Jeff Eastman authored

Feb 09, 2012

  1. Sean Owen

    MAHOUT-948 better error for bad type

    git-svn-id: https://svn.apache.org/repos/asf/mahout/trunk@1242333 13f79535-47bb-0310-9956-ffa450edef68
    srowen authored
Something went wrong with that request. Please try again.