Permalink
Commits on Sep 18, 2011
  1. v0.70

        - When using --train-fast, remove the "flushing cache" message when done
    
        - Word tokenizer:
            * Improve tokenization of email addresses
            * Use backspace instead of escape as a magic character when
              capitalizing text in multiple passes, since it's less likely to
              appear in tokens.
            * Preserve casing of words like "ATMs"
    committed Sep 18, 2011
Commits on Aug 23, 2011
Commits on May 14, 2011
Commits on May 13, 2011
  1. Fix typo in comment

    committed May 13, 2011
Commits on May 9, 2011
Commits on May 7, 2011
  1. v0.69

        - Scored engine: Prefer shorter replies, like MegaHAL/cobe do
    
        - Word tokenizer:
            * Improve matching/capitalization of filenames and domain names
            * Match timestamps as single tokens
            * Match IRC nicks (<foobar>, <@foobar>, etc) as single tokens
            * Match IRC channel names (#foo, &bar, +baz)
            * Match various prefixes and postfixes with numbers
            * Match "#1" and "#1234" as single tokens
            * Match </foo> as a single token
    
        - Depend on MouseX::Getopt 0.33 to fix test failures
    committed May 7, 2011
Commits on May 6, 2011
  1. Shorten this

    committed May 6, 2011
  2. Remove unused variable

    committed May 6, 2011
  3. Match </foo> as a single token

    committed May 6, 2011
  4. Forget the tabs matching

    It's not that useful anyway.
    committed May 6, 2011
  5. Prettify the Changes file

    committed May 6, 2011
  6. Match timestamps and IRC nicks

    I changed the way input is processed, so that we can match whitespace in
    tokens. This allows matching paths with spaces in them, as well as IRC
    nicks from irssi such as < literal>.
    committed May 5, 2011
  7. Match tabs as tokens

    committed May 5, 2011
Commits on May 4, 2011
  1. Prefer shorter replies

    committed May 4, 2011
  2. Remove dead code

    Due to how the tokenizer works, at least one of the tokens will always
    have normal spacing.
    committed May 4, 2011
Commits on May 3, 2011
  1. v0.68

        - Speed up the learning of repetitive sentences by caching more
    
        - Added Hailo::Engine::Scored, which generates multiple replies (limited
          by time or number of iterations) and returns the best one. Based on
          code from Peter Teichman's Cobe project.
    
        - Fixed a bug which caused the tokenizer to be very slow at capitalizing
          replies which contain things like "script/osm-to-tilenumbers.pl"
    
        - Speed up learning quite a bit (up to 25%) by using more efficient SQL.
    
        - Add --train-fast to speed up learning by up to an additional 45% on
          large brains by using aggressive caching. This uses a lot of memory.
          Almost 600MB with SQLite on a 64bit machine for a brain which
          eventually takes 134MB on disk (trained from a 350k line IRC log).
    
        - Word tokenizer:
            * Preserve casing of Emacs key sequences like "C-u"
            * Don't capitalize words after ellipses (e.g. "Wait... what?")
            * When adding a full stop to paragraphs which end with a quoted word,
              add it inside the quotes (e.g. "I heard him say 'hello there.'")
            * Make it work correctly when the input has newlines
    committed May 3, 2011
Commits on May 2, 2011
  1. Fix typo in Changes

    committed May 2, 2011
  2. Separate these from the rest

    committed May 2, 2011
  3. Rename this regex for clarity

    committed May 2, 2011