Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Commits on Sep 18, 2011
  1. v0.70

        - When using --train-fast, remove the "flushing cache" message when done
        - Word tokenizer:
            * Improve tokenization of email addresses
            * Use backspace instead of escape as a magic character when
              capitalizing text in multiple passes, since it's less likely to
              appear in tokens.
            * Preserve casing of words like "ATMs"
Commits on Aug 23, 2011
Commits on May 14, 2011
Commits on May 13, 2011
  1. Fix typo in comment

Commits on May 9, 2011
Commits on May 7, 2011
  1. v0.69

        - Scored engine: Prefer shorter replies, like MegaHAL/cobe do
        - Word tokenizer:
            * Improve matching/capitalization of filenames and domain names
            * Match timestamps as single tokens
            * Match IRC nicks (<foobar>, <@foobar>, etc) as single tokens
            * Match IRC channel names (#foo, &bar, +baz)
            * Match various prefixes and postfixes with numbers
            * Match "#1" and "#1234" as single tokens
            * Match </foo> as a single token
        - Depend on MouseX::Getopt 0.33 to fix test failures
Commits on May 6, 2011
  1. Shorten this

  2. Remove unused variable

  3. Match </foo> as a single token

  4. Forget the tabs matching

    It's not that useful anyway.
  5. Prettify the Changes file

  6. Match timestamps and IRC nicks

    I changed the way input is processed, so that we can match whitespace in
    tokens. This allows matching paths with spaces in them, as well as IRC
    nicks from irssi such as < literal>.
  7. Match tabs as tokens

Commits on May 4, 2011
  1. Prefer shorter replies

  2. Remove dead code

    Due to how the tokenizer works, at least one of the tokens will always
    have normal spacing.
Commits on May 3, 2011
  1. v0.68

        - Speed up the learning of repetitive sentences by caching more
        - Added Hailo::Engine::Scored, which generates multiple replies (limited
          by time or number of iterations) and returns the best one. Based on
          code from Peter Teichman's Cobe project.
        - Fixed a bug which caused the tokenizer to be very slow at capitalizing
          replies which contain things like "script/"
        - Speed up learning quite a bit (up to 25%) by using more efficient SQL.
        - Add --train-fast to speed up learning by up to an additional 45% on
          large brains by using aggressive caching. This uses a lot of memory.
          Almost 600MB with SQLite on a 64bit machine for a brain which
          eventually takes 134MB on disk (trained from a 350k line IRC log).
        - Word tokenizer:
            * Preserve casing of Emacs key sequences like "C-u"
            * Don't capitalize words after ellipses (e.g. "Wait... what?")
            * When adding a full stop to paragraphs which end with a quoted word,
              add it inside the quotes (e.g. "I heard him say 'hello there.'")
            * Make it work correctly when the input has newlines
Commits on May 2, 2011
  1. Fix typo in Changes

  2. Separate these from the rest

  3. Rename this regex for clarity

Something went wrong with that request. Please try again.