Permalink
Commits on Jul 6, 2011
Commits on Jun 30, 2011
  1. Cleanup create_plot function

    committed Jun 30, 2011
  2. Initial cleanup of treePlotter

    The simple stuff, whitespace removal,
    2 lines between function, removing
    commented out code.
    committed Jun 30, 2011
  3. More refactoring to tree module

    Add docstrings for building the tree.
    committed Jun 30, 2011
Commits on Jun 24, 2011
  1. Update more of trees module

    Docstrings, pep8 complaint, longer, more
    descriptive variable names, etc.  Also update
    the main function to demonstrate how some of
    the functions works.
    committed Jun 24, 2011
Commits on Jun 10, 2011
  1. Rewrite entropy calculation

    Use numpy to do the heavy lifting, adhere to pep8,
    and add a demo function that shows the entropy calculation
    at work.
    committed Jun 10, 2011
Commits on Jun 7, 2011
  1. Refactor handwriting class test.

    Extract some of the functionality out into functions,
    add docstrings, and cleanup some of the code.
    committed Jun 7, 2011
Commits on Jun 6, 2011
  1. pep8 img2vector function

    committed Jun 6, 2011
  2. Open datingTestSet.txt for test function

    There's no difference between datingTestSet2.txt and
    datingTestSet.txt, and the book refers to datingTestSet.txt
    so use the datingTestSet.txt file.
    committed Jun 6, 2011
  3. Optimize knn classifier

    Use a defaultdict to make the code more readable.
    Also call argmin k times instead of sorting the entire
    array and extracting out k elements.  This switches
    from an O(n log n) to an O(kn) algorithm.  Confirmed
    performance using timeit.
    committed Jun 6, 2011
  4. Use numpy functions to optimize knn_classify

    Despite having read the docs several times, I don't
    personally find the use of tile() all that intuitive.
    Specifically for knn_classify I found:
    
        diff_matrix = subtract(input_vector, training_set)
    
    to be easier to read than:
    
        diff_matrix = tile(input_vector, (training_set_size, 1)) - training_set
    
    And again, profiling shows that the first approach is also
    slightly faster than the second approach.
    committed Jun 6, 2011
  5. Refactor dating_class_test function

    Including:
    * pep8 issues
    * hold out ratio says 10% (as it does in the book), but 50% was
      being used.
    * Computing the training matrix moved outside of for loop.
    committed Jun 6, 2011
Commits on Jun 5, 2011
  1. Simplify normalization function

    Using numpy.subract and numpy.divide makes
    the code much simpler and once again, as an added
    bonus, simplifying the code actually makes it slightly
    faster:
    
    Before:
    $ python -m timeit -s "import kNN; ds = \
    kNN.load_data_set('datingTestSet.txt')[0]" "kNN.normalize(ds)"
    10000 loops, best of 3: 154 usec per loop
    
    After:
    $ python -m timeit -s "import kNN; ds = \
    kNN.load_data_set('datingTestSet.txt')[0]" "kNN.normalize(ds)"
    10000 loops, best of 3: 117 usec per loop
    committed Jun 5, 2011
Commits on Jun 4, 2011
  1. Rewrite how the dating dataset is loaded

    Cleaner and as an added bonus, slightly faster:
    
    Before:
    $ python -m timeit -s "from kNN import load_data_set" \
    "load_data_set('datingTestSet2.txt')"
    10 loops, best of 3: 22.6 msec per loop
    
    After:
    $ python -m timeit -s "from kNN import load_data_set" \
    "load_data_set('datingTestSet2.txt')"
    100 loops, best of 3: 14.8 msec per loop
    committed Jun 4, 2011
  2. Rework classify0 into knn_classify

    Fix pep8 issues along with renaming
    the variables for better quality.
    committed Jun 4, 2011
Commits on Jun 2, 2011
  1. Convert to pep8 conventions

    Well, everything except the camelCasing.
    I wasn't sure how to best address the numpy
    style array indexing, e.g array[1,2] vs. array[1, 2]
    but I thought the first one was reasonable so I left
    them alone.
    committed Jun 2, 2011
Commits on May 31, 2011
Commits on May 26, 2011
Commits on May 5, 2011
  1. updated README

    pbharrin committed May 5, 2011
  2. removed Ch2

    pbharrin committed May 5, 2011
  3. corrected Ch02

    pbharrin committed May 5, 2011
  4. adding Chapter 7

    pbharrin committed May 5, 2011
  5. adding Ch06

    pbharrin committed May 5, 2011
Commits on May 4, 2011
  1. adding Ch05

    pbharrin committed May 4, 2011
  2. adding Ch04

    pbharrin committed May 4, 2011
  3. adding ch3

    pbharrin committed May 4, 2011
  4. adding Ch2 data

    pbharrin committed May 4, 2011
  5. adding chapter 2

    pbharrin committed May 4, 2011