Skip to content
/ ai Public

Artificial Ignorance: monitor messages you don't recognize in logs you don't ignore

License

Notifications You must be signed in to change notification settings

mbucc/ai

Repository files navigation

Artificial Ignorance: log monitoring ala Marcus Ranum
Since: Wed Nov 19 11:47:27 EST 2014

	Get emails of log messages that you don't recognize
	in log files that you don't ignore.

	The stoplist here is specific to my environment,
	(Ubuntu Lucid) so you will need to create your own.


Setup:

	1.  Read HOW-TO included below.

	2.  Copy setup.ai.sh to your host.

	3.  Tune stopfiles by repeatedly running setup.ai.sh after
	    adding regular expressions to stopfiles.

	4.  Tune stoplist by repeatedly running setup.ai.sh after
	    adding regular expressions to stoplist.

	5.  When setup.ai.sh outputs only lines you want to be
	    notified about, copy your stop files to this directory.

	6.  Edit ansible hosts file and review/edit ai.yml.

                      make clean all

	    n.b. If you don't edit ai.yml, all logs are set to
	    rotate daily and be saved for 365 days.

	7.  If you like, submit a pull request with your stop files.

Todo:

	[] Store stoplist patterns as one file per daemon.  Then cat
	  together lists for daemons you care about to build stoplist.

	[x] Implement the "retail" trick described below so we don't
	  have to use the rotate daily hack and can still run from
	  cron and works for all log files not excluded.

		Basically, it was like tail but it remembered the
		offset in the file from the previous run, and the
		inode of the file (so it'd detect file shifts).

		The trick is to keep one fd open to the file and
		seek within it, then stat it every so often to see
		if the file has grown or changed inode. If it has,
		read to EOF, open the new file, and start again.

		That way you can chop the end of the log file through
		a filter every couple seconds with minimal expense
		in CPU and disk I/O.


Inspired by a 1997 HOW-TO posted by Marcus Ranum, which I found
at http://www.ranum.com/security/computer_security/papers/ai/.


	---------------------- START HOWTO --------------------------
	Marcus J. Ranum
	Tue, 23 Sep 1997 23:06:06 +0000
	artificial ignorance: how-to guide

	By request, here's a quick how-to on log scanning via
	artificial ignorance. :) It assumes UNIX and the presence
	of a good grep - you could use other stuff if you wanted to
	but this is just an example.

	Setting up a filter is a process of constant tuning. First
	you build a file of common strings that aren't interesting,
	and, as new uninteresting things happen, you add them
	to the file.

	I start with a shell command like this:

	cd /var/log
	cat * | \
		sed -e 's/^.*demo//' -e 's/\[[0-9]*\]//' | \
		sort | uniq -c | \
		sort -r -n > /tmp/xx

	In this example "demo" is my laptop's name, and I use
	it in the sed command to strip out the leading lines of
	syslog messages so that I lose the date/timestamps. This
	means that the overall variation in the text is reduced
	considerably. The next argument to sed strips out the
	PID from the daemon, another source of text variation.
	we then sort it, collapse duplicates into a count, then
	sort the count numerically.

	This yields a file of the frequency with which something
	shows up in syslog (more or less):
	 297  cron: (root) CMD (/usr/bin/at)
	 167  sendmail: alias database /etc/aliases.db out of date
	 120  ftpd: PORT
	  61  lpd: restarted
	  48  kernel: wdpi0: transfer size=2048 intr cmd DRQ
	 ... etc

	In the example on "demo" this reduced 3982 lines of
	syslog records to 889.

	Then what you want to do is trim from BOTH ends of
	the file and build an "ignore this" list. In this example, I
	don't care that cron ran "at" OK so I'd add a regexp
	like:
	cron.*: (root) CMD (/usr/bin/at)
	That's a pretty precise one. :)

	At the bottom of my file there were about 200 entries
	that looked like:
	   1  ftpd: RETR pic9.jpg
	   1  ftpd: RETR pic8.jpg
	   1  ftpd: RETR pic7.jpg
	   1  ftpd: RETR pic6.jpg

	Clearly these are highly unique events but also not
	interesting. So I add patterns that look like:
	ftpd.*: RETR
	ftpd.*: STOR
	ftpd.*: CWD
	ftpd.*: USER
	ftpd.*: FTP LOGIN FROM

	Now, you apply your stop-list as follows:
	cat * | grep -v -f stoplist | \
		sort, etc --

	This time I get 744 lines. Putting a pattern in that
	matches:
	sendmail.*: .*to=

	Drops it down to 120 lines. Just keep doing this and
	pretty soon you'll have a set of patterns that make your
	whole syslog output disappear. You'll notice that in the
	early example I had a warning from sendmail because
	the aliases database was out of date. Rather than putting
	a pattern for that, I simply ran newalias. Next time my
	aliases database is out of date, my log scanner will tell
	me.

	System reboots are cool, too. My log shows:
	  48  kernel: wdc2 at pcmcia0: PCCARD IDE disk controller
	  48  kernel: wdc1 at pcmcia0: PCCARD IDE disk controller
	  48  kernel: wdc0 at isa0 iobase 0x1f0 irq 14: disk controller
	  48  kernel: wd0 at wdc0 drive 0: sec/int=4 2818368*512
	  ...

	Those will be pretty much static. So I add those exact
	lines. Now they won't show up whenever the system
	boots. BUT I'll get a notification if a new SCSI drive
	is added, or (I did this deliberately!):

	kernel: fd0c: hard error writing fsbn 1 of 1-19 (fd0 bn 1; cn
	kernel: fd0: write protected

	Oooh! Some bad boy trying to step on my tripwire file!

	Or:
	kernel: changing root device to wd1a

	..interesting. My pattern was for wd0a!

	I used to run this kind of stuff on a firewall that I used
	to manage. One day its hard disk burned up and my
	log scan cheerfully found these new messages about
	bad block replacement and sent them to me. :) The
	advantage of this approach is that it's dumb, it's
	cheap -- and it catches stuff you don't know about
	already.

	Once you've got your pattern file tuned, put it in
	cron or whatever, so it runs often. The TIS Gauntlet
	has a hack I wrote called "retail" which I can't
	unfortunately release the code for, but is easy to
	implement. Basically, it was like tail but it remembered
	the offset in the file from the previous run, and the
	inode of the file (so it'd detect file shifts) - the trick is
	to keep one fd open to the file and seek within it,
	then stat it every so often to see if the file has grown
	or changed inode. If it has, read to EOF, open the new
	file, and start again. That way you can chop the end
	of the log file through a filter every couple seconds
	with minimal expense in CPU and disk I/O.

	I'm sure there are lots of fun ways this simple trick
	can be enhanced -- but just in its naive form I've found
	it quite useful. I wish I had a program that helped me
	statistically build my noise filters, but in general I find
	it's about a 2 hour job, tops, and it's one you do once
	and forget about.

	Enjoy!
	mjr.
	-----
	Marcus J. Ranum, CEO, Network Flight Recorder, Inc.

	---------------------- END   HOWTO --------------------------

About

Artificial Ignorance: monitor messages you don't recognize in logs you don't ignore

Resources

License

Stars

Watchers

Forks

Packages

No packages published