This is an IRC bot that generates text using individualized Markov chains based on given source material. It can be used to generate "new" phrases from a user based on things they have said before. It is written in python, using the Twisted framework.
The term "lookback tuple" refers to current state in the Markov chain. The length of the lookback tuple determines the quality of quote generated. In general, a lower tuple length will lead to longer, more rambling, and less coherent phrases. Conversely, a higher length will generate shorter, less varied, and more syntactically correct quotes, which are more likely to be exact matches to underlying source material.
The tuple length is configurable for Impostor, and is set to 2 by default.
Take a user with the following input set:
she sells sea-shells by the sea-shore
the dog was eating sausages by the dozen
The following are the possible outputs, where the lookback tuple length is 2:
she sells sea-shells by the sea-shore
she sells sea-shells by the dozen
the dog was eating sausages by the dozen
the dog was eating sausages by the sea-shore
- Python: tested with version 2.7.9, other versions of Python 2.x may also work
- python-twisted: tested with version 14.x, other versions may also work
There must be a single input directory, containing the items listed below. The location of this directory does not matter; the directory name will be passed at run-time.
These are plain text files, each with the extension .src
. The filename minus the extension is the username. The following format must apply.
- Each line of input is to be on its own line in the file.
- Each line is to have at least the same number of words as the lookback tuple length.
- If there is a source file that is an amalgamation of all source files, call it
all.src
How the source material is generated is up to you. Typically, it will involve parsing IRC log files, stripping out very short lines, and maybe some normalization. A source building script has been included as an example of how this may be done.
Note that the more input (both number of lines and number of words per line) we have for a user, the better the output will be. The generator works best when there are many possible successors to each tuple of words; the more possible successors, the more varied the generated lines.
Say we have a user with username mollusc
. In our sources directory there must be a file named mollusc.src
. Its contents could be something like this:
I am not a fish
my occupation is making pearls
om nom nom tasty algae
today is not a good day
An optional file called meta.info
may be added to the input directory. If present, this would contain metadata about source generation. The following attributes are supported.
- Date: Unix timestamp of when the source material was generated.
- Primary: The primary channel used to generate source material.
- Additional: Other channels from which source material is taken (one may wish, for example, to exclude users found in the additional channels but not in the primary).
date=1489964352
primary=#underthesea
additional=#mollusc_test #beach #iloveraisins
An optional file called merge.lst
may be added to the input directory. If present, this would contain mappings of user aliases (other nicks a user has been known by) to canonical nicknames. The file should be laid out as follows.
- Each user must be on its own line.
- Lines consist of tab-separated nicks.
- The first nick is the canonical one; all other nicks are aliases.
From the checkout location, run
export PYTHONPATH=$(pwd):$PYTHONPATH
python impostor/ImpostorBot.py <network> <channel> <logfile> <sourcedir>
Where
<network>
is the name of the network the bot is to connect to, e.g.irc.freenode.net
.<channel>
is the name of the channel the bot is to run in, e.g.#impostor
.<logfile>
is the name of the file that the bot is to log to while it is running.<sourcedir>
is the location of the directory containing all of the source material.
By default, the bot will connect with the nick impostor
. If this is taken, it will instead use impostor^
by default. If this, too, is taken, then it will fail.
Note that, if there is a lot of source material, it may take a few seconds to start.
The instructions below indicate what to type in IRC to prompt the bot. First, you must connect to IRC as usual and join the channel the bot is in.
There are two triggers for the impostor
bot, !
and @
. One can also direct a comment at the bot by starting a line with impostor:
, but all this does is display a help message. The bot ignores all other lines.
The only reserved username (other than all
, if present) is random
. If there is a user called random
, then it will not be possible to generate lines for them. Everything else is possible though; one can even call impostor
on itself.
The trigger to generate quotes is !
.
A single-user comment is one generated from the source material of only one user. To generate a single-user comment for a user named mollusc
, type the following in the impostor
channel:
!mollusc
A multi-user comment is one generated from the source material of multiple users; it is currently limited to two users. To generate such a comment for users mollusc
and daffodil
, type:
!mollusc:daffodil
The ordering does not matter; reversing it to !daffodil:mollusc
will produce exactly the same results.
Optionally, the bot may be seeded, meaning it is possible to give the starting words of a quote. This is the syntax for achieving this:
!<nick> <seed words>
A few notes on this:
- Seed words must be space-separated.
- While nicks and special-feature triggers are case-insensitive, seed words are case-sensitive.
- Seed words will match from anywhere within a user's material, i.e. not just on their starting tuples.
- The number of seed words given must be less than or equal to the lookback tuple length.
- If the number of seed words given is less than the tuple length, then the match will run from the start of each tuple.
- If no matching tuples or partial tuples are found, the bot will not output anything.
As an example, imagine a user mollusc
with the following source material, and where the tuple length is 3:
I am not a fish
The following will be the result of various interactions:
| Input | Output |
|-------------------|---------------------------|
| !mollusc I | [mollusc] I am not a fish |
| !mollusc I am | [mollusc] I am not a fish |
| !mollusc I am not | [mollusc] I am not a fish |
| !mollusc not | [mollusc] not a fish |
| !mollusc am not | [mollusc] am not a fish |
The following will not produce any output:
!mollusc I am not a
!mollusc a
!mollusc i
A random-user comment is one generated from a random user in the set. To generate such a comment, type:
!random
random
may be substituted anywhere a normal nick is expected. In other words, the following are all valid.
!mollusc:random
!random:mollusc
!random:random
-
If the bot is called on a username that does not exist, then it does not do anything.
-
If the bot is called on a nick that is an alias, then the real user will be resolved.
-
If it is called for a combination of a user that exists and one that does not, then it will only return a quote for one that exists.
-
If
random
is called as part of a combination (including arandom:random
combination), it will not return two of the same.
Other features are triggered using the @
trigger.
To see a help message, type:
@help
It is also possible to ask for help with something specific. Currently, the following are supported.
@help generator
@help mystery
@help stats
@help score
To see some generic statistics (and other, non-statistical, information), type:
@stats
To see some statistics pertaining to a specific user, type:
@stats <nick>
Statistics are persisted across restarts of the bot. By default, they are persisted to a file called users.p
.
This is a simple game where players guess the identity (or "author") of a Markov-generated quote. It is controlled using the following commands.
To start a game, type:
@mystery
impostor
will print a generated quote from some random user. Users with fewer than a certain number of source productions are excluded. This is in order to avoid huge numbers of quotes from little-known, unguessable users.
Players then attempt to guess the identity of the author, by typing:
@guess <nick>
Players can guess as many times as they like. To see some information on scores, type:
@score
@score <player-nick>
The first prints high scores. The second prints the score for a given player. Note that the player nick set is entirely separate from the user nick set. Like user statistics, player scores are persisted, by default to a file called players.p
.
It is also possible to request hints:
@hint
This will print a character from the author's name at random. Up to three hints may be requested. If the nick consists of three characters or fewer, only one hint will be given. To see the solution, type:
@solve
The game ends when either someone guesses correctly, or @solve
is called. There are no prizes.