This repository has been archived by the owner. It is now read-only.
Permalink
Switch branches/tags
Commits on Jun 13, 2014
  1. Remove grouped not

    rurounijones committed Jun 13, 2014
    It is an extra negative to think about. Change it for more standard
    syntax
  2. Remove redundant encoding step.

    rurounijones committed Jun 13, 2014
    Remove a redundant force_encoding after already encoding the string.
    The encode contains all the error handling we need
  3. DRY up classifier instance variable cache cleaning

    rurounijones committed Jun 13, 2014
Commits on Jun 3, 2014
  1. Sleep before running tests

    rurounijones committed Jun 3, 2014
    To give time for travis-CI to get the storage services up and running
    sleep for 15 seconds before starting our tests
    
    As documented at:
    http://docs.travis-ci.com/user/database-setup/#MongoDB-may-not-be-immediately-accepting-connections
  2. Replace check method to see if a string is numeric

    rurounijones committed Jun 3, 2014
    The old code used the very correct Float(word) method to see if a string
    was numeric. This works reliable with all sorts of edge-case data but it
    is very slow.
    
    Since we have already parsed out a lot of possibilities during word
    atomisation (e.g. decimal numbers like 123.45 have already been split
    into "123" and "45") we do not need this level of "dealing with edge"
    case surety.
    
    Therefore we can just do a simple regex check to see if the string is
    all numerals or not.
    
    In tests on 1000 emails (Single threaded) the run-time was reduced
    from 2.4 seconds to 1.4 seconds.
    
    Since we have traded edge-case reliability for speed we can no longer
    leave this as a String class monkey-patch so move it into a method that
    will only be called by Ankusa itself
Commits on May 30, 2014
  1. Improve performance of Stopword lookup

    rurounijones committed May 30, 2014
    Ankusa::STOPWORDS is created once and then searched for every single
    word that we are classifying.
    
    Change it from an Array with O(n) average time complexity to a Set
    (Hash-Table) with O(1) average time complexity.
    
    In tests on 1000 emails (Single threaded) the run-time was reduced
    from 6.4 seconds to 2.4 seconds.
Commits on May 22, 2014
  1. Merge pull request bmuller#4 from rurounijones/travis-ci-support

    bmuller committed May 22, 2014
    Improved Travis ci support
Commits on May 21, 2014
  1. Add the "mongo" development dependency

    Jeffrey Jones
    Jeffrey Jones committed May 21, 2014
    We do not want to include storage specific dependencies in production
    however for a developer hacking on Ankusa it is a different matter.
    Add the "mongo" development gem dependency to the gemspec to avoid
    people (and Travis-CI) having to do it themselves
  2. Add MongoDB tests to the Travis-CI tests

    Jeffrey Jones
    Jeffrey Jones committed May 21, 2014
  3. Combine memory and filesystem tests under Travis-CI

    Jeffrey Jones
    Jeffrey Jones committed May 21, 2014
    * Update Rakefile with new Travis-CI task
    * Fix FileSystem tests (had same name as memory ones)
    * Update travis config file to call new task
  4. Initial support for Travis-CI

    Jeffrey Jones
    Jeffrey Jones committed May 21, 2014
  5. Libraries should not include Gemfile.lock

    Jeffrey Jones
    Jeffrey Jones committed May 21, 2014
    Libraries should be more liberal with their dependencies than
    applications to allow bundler to select dependencies with the best
    chance of avoiding conflicts
Commits on Jun 19, 2013
  1. updated rvmrc and ruby version to 1.9.3, replaced iconv with String.e…

    bmuller committed Jun 19, 2013
    …ncode (only available in 1.9.3), bumped version to 0.1
Commits on Nov 30, 2012
Commits on Nov 29, 2012
  1. fix for classifying a doc when no training has been done. fixed tests…

    bborn committed Nov 29, 2012
    … for hbase, memory, and file system storage, but could'nt get cassandra tests running under ruby 1.9.3
Commits on Aug 15, 2012
Commits on May 23, 2012
  1. updated docs to add references to mongo availability

    Brian Muller
    Brian Muller committed May 23, 2012
Commits on Apr 16, 2012
  1. added mongodb support so bumping version

    Brian Muller
    Brian Muller committed Apr 16, 2012
  2. Merge pull request bmuller#6 from Dreepi/mongodb

    bmuller committed Apr 16, 2012
    MongoDb Support added
Commits on Apr 10, 2012
  1. fixed bug in testing, bumped version

    Brian Muller
    Brian Muller committed Apr 10, 2012
  2. bumped version

    Brian Muller
    Brian Muller committed Apr 10, 2012
  3. fixed bug in text atomizer

    Brian Muller
    Brian Muller committed Apr 10, 2012
Commits on Apr 1, 2012
  1. can now optionally not stem words in TextHash

    Brian Muller
    Brian Muller committed Apr 1, 2012
  2. sped up stopword hashing

    Brian Muller
    Brian Muller committed Apr 1, 2012
Commits on Oct 16, 2011
  1. added mongo specs to readme

    kitop committed Oct 16, 2011
Commits on Jun 20, 2011
  1. added indexes to tables

    kitop committed Jun 20, 2011
Commits on Jun 17, 2011
  1. passed tests

    kitop committed Jun 17, 2011
  2. fix keys on mongo

    kitop committed Jun 17, 2011
  3. added some mongo db tests

    kitop committed Jun 17, 2011
  4. ruby 1.8 compatible syntax

    kitop committed Jun 17, 2011
Commits on Jun 16, 2011