Clone this wiki locally
Cobe is a Markov chain based text generator. It provides a command line learning/replying interface, an IRC client, and a low-level API for writing your own text generation tools. It uses SQLite as its data store.
Cobe is your robot pal, a metal friend.
Getting started with Cobe.
2.1.1 — 2014-10-20
This is a minor update to pull in current versions of the irc and PyStemmer dependencies; it fixes installation from PyPI.
- The irc client now falls back on Latin-1 encoding if invalid UTF-8 is input.
2.1.0 — 2013-06-01
This version introduces two minor API changes for
- An optional
loop_msargument for specifying how long cobe should spend generating random replies.
- An optional
max_lenargument that limits replies to a specified number of characters. When using this, cobe’s default scorer tends to prefer shorter replies than you might like. Consider boosting scores of longer replies:
2.0.4 — 2012-05-28
- The fix for python-irclib installation in 2.0.3 worked in testing but not once cobe was on PyPI. This version fixes that problem, pointing cobe’s setup.py at this mirror of the pristine python-irclib source.
2.0.3 — 2012-05-28
- Bump the python-irclib requirement to its latest version, 0.4.8
- Add an explicit URL for python-irclib to fix cobe installation from PyPI
- Normalize common smile/frown emoticons to :) & :(. This takes effect rarely in practice, since cobe only pivots on punctuation if no words are present in the input.
- Add a log handler for the IRC client that logs to a secondary channel.
- Disable caching in SQLite, which can cause performance issues with cobe’s usage patterns. Since SQLite doesn’t pin its cache pages in memory, its cache can be swapped to disk even if the underlying data is in RAM in the operating system’s file cache.
2.0.2 — 2011-11-02
Bug fixes and performance improvements. No new features.
- Fix two rare reply quality issues. Disallow empty replies and replies with leading or trailing spaces.
- Re-enable stemming during batch learning, which broke with 2.0.0.
- Make the irc client reconnect automatically.
- Widespread performance improvements, particularly with batch learning.
- Remove the redundant tokens_text index.
2.0.1 — 2011-09-22
- Add an IdentityScorer that exactly matches the input tokens, and use it to discourage identical input and reply.
- Fix a tokenizer bug. If exactly two punctuation characters were found together, like “:(”, they were tokenized as two items. Now they’re tokenized as one.
2.0.0 — 2011-09-19
- Major architecture changes. You will need to start with a new brain file from scratch. Introduce a new graph-based data model and flexible scoring options. See Cobe 2.0
1.2.0 — 2011-05-03
- Support stemming, as described in Cobe Learns English. This optional feature allows Cobe to change word sense and capitalization when replying to an individual token.
1.1.2 — 2010-06-23
- Include the COPYING file in the tarball distribution.
1.1.1 — 2010-06-23
- Minor tokenizer improvements. Consider dash and apostrophe as either word or non-word tokens.
- Accept any protocol in the tokenizer’s URL detection regex.
- Minor improvements to learning speed.
1.1.0 — 2010-06-08
- This version includes a schema change, hence the bump in minor version. It will upgrade existing brains automatically to the new schema.
- Major performance improvements. This version is twice as fast, able to generate 110% more replies in its 0.5s time slice.
- Quality improvements. We now prefer rarer pivot words when constructing replies. Between this and the performance changes (faster replies mean more chances for high scores), reply scores are increased by 30% on average.
1.0.5 — 2010-06-01
- Add an irc client, using the Twisted framework as an optional dependency. The client is only enabled if Twisted is installed.
1.0.4 — 2010-05-08
- Fix a crash when attempting to reply with an empty brain
1.0.3 — 2010-05-08
- Reply quality: generate a single random context if we don’t recognize any words in the input, and loop generating/scoring replies from that. We were generating many more random contexts before, and reply quality suffered (subjectively) as very long replies would often score higher.
- Add an —only-nick argument to “cobe learn-irc-log”, for learning only the things one person has said
- Tokenize https urls as a single token in the Cobe tokenizer
- Bug fix: don’t force capitalization of replies in “cobe console”
- Bug fix: use Python’s Unicode tables to determine whether a token is a word (was ASCII alpha-only). This greatly improves output with languages that use non-ASCII characters.
- Support instatrace for logging performance statistics
1.0.2 — 2010-04-17
- Reply quality: evaluate the surprise of replies even if we don’t recognize any words in the input
1.0.1 — 2010-03-27
- Ignore invalid UTF-8 characters
- Support —megahal argument to “cobe init” for creating MegaHAL-like brains
- Use argparse for command line parsing
1.0 — 2010-03-15
- Performance improvements
- Code cleanup and API stabilization
0.5 — 2010-03-13
- Support learning and replying with arbitrary order Markov chains (default 5)
- MegaHAL 9.x compatible tokenizer (case insensitive, whitespace preserving)
- Cobe tokenizer (case sensitive, strips whitespace between tokens)
- MegaHAL-style scoring (maximize “surprise”)