log mails in httpd log format
Perl
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
COPYING
HISTORY
README
THANKS
lowercase_log.pl
mail_log.pl
sort_log.pl
sort_log_fast.pl
user_agent_no_space.pl

README

		 mail_log  version 0.0.5  2000-07-29
	    (C) 2000 by Christian Garbs <mitch@cgarbs.de>
	    http://www.h.shuttle.de/mitch/mail_log.en.html
			licensed under GNU GPL


[0]  Table of contents
     ~~~~~~~~~~~~~~~~~

What does this do?  . . . . . . . . . . . . . . . . . . 1
How do I install it?  . . . . . . . . . . . . . . . . . 2
What do I do with the log file that is generated? . . . 3
What does sort_log.pl do? . . . . . . . . . . . . . . . 4
What does sort_log_fast.pl do?  . . . . . . . . . . . . 5
What does lowercase_log.pl do?  . . . . . . . . . . . . 6
What does user_agent_no_space.pl do?  . . . . . . . . . 7
What about errors?  . . . . . . . . . . . . . . . . . . 8
Download locations  . . . . . . . . . . . . . . . . . . 9



[1]  What does this do?
     ~~~~~~~~~~~~~~~~~~

mail_log generates statistics about your emails. It receives your
emails and generates logfiles in either combined or common log
format.



[2]  How do I install it?
     ~~~~~~~~~~~~~~~~~~~~

The mail_log.pl script must receive all your incoming email. The
easiest way to accomplish this is the use of a procmail rule:

:0 wc :
| /path/to/mail_log.pl >> /path/to/incoming_mail.log

This will generate a copy of every email you receive. The mail_log.pl
script reads the entire mail from stdin and prints one line in
combined log file format to stdout. This line is then appended to the
given logfile.

If you want common log file format instead of combined log file
format, you must call mail_log.pl with the "-clf" argument:

:0 wc :
| /path/to/mail_log.pl -clf >> /path/to/incoming_mail.log



[3]  What do I do with the log file that is generated?
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Combined and common log file format are used for web servers. There
are a lot of good programs out there for analyzing these log files.
I prefer webalizer (see [9]).

As both log file formats are not intened for logging emails, this all
is a bit of a hack. Common log file format has these fields:

* HOSTNAME:
  This field is supposed to hold the hostname that has requested a
  page from the web server.
  mail_log.pl writes the From: header into this field.

* REMOTE LOGNAME:
  This is always set to -

* USER-ID:
  This is always set to -

* TIME:
  Normally this is the time of the http-request.
  mail_log.pl puts in the Date: header. It says when the email was
  written.

* REQUEST:
  This should be something like "GET index.html HTTP/1.1" and shows
  which page was requested.
  mail_log.pl uses this field to store the Subject: of an email. As no
  spaces are allowed they will be converted to underscores. If
  somebody sends an email with "Hi there!" as subject, this request
  will be generated: "GET Hi_there! HTTP/1.1"

* STATUS:
  This is always set to 200.

* BYTES:
  This field contains the size of the email in bytes.


These two fields only appear in combined log format:

* REFERER:
  Normally this is used to show which page was requested before the
  current http request (the page visited last).
  mail_log.pl tries to identify subjects that begin with "Re:" and
  will put the original subject in here (without "Re:").

* USER-AGENT:
  This should contain the program that did the http request.
  mail_log.pl will put the mail client in here if an X-Mailer: header
  is found.



[4]  What does sort_log.pl do?
     ~~~~~~~~~~~~~~~~~~~~~~~~~

httpd logs put the time of the request into the TIME field of a log.
As this is always the current time, the log file is automatically
sorted by time.

mail_log.pl puts the time when the email was written into the TIME
field. You don't have to receive the mails in the same order as they
have been written. So the log file generated by mail_log.pl is not
sorted by time.

Some log analyzer programs need sorted input or they will skip most of
the records. The sort_log.pl program is used to sort your log. It acts
as a filter, so you can use it like this:

sort_log.pl < log.unsorted > log.sorted

If your log analyzer can read from stdin, you can also use it like
this:

sort_log.pl < log.unsorted | my_log_analyzer_program



[5]  What does sort_log_fast.pl do?
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

sort_log_fast.pl does exactly the same thing as sort_log.pl but it is
about 4 times faster. This is accomplished by using the Perl module
Date::Manip.

If you don't know whether you have Date::Manip installed or not, just
run sort_log_fast.pl. If it produces no error, you propably have
Date::Manip installed correctly. If it doesn't work, you either have
to use the slow sort_log.pl script or get a copy of Date::Manip
(see [9]).



[6]  What does lowercase_log.pl do?
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

lowercase_log.pl converts all From: fields to lowercase
characters. It acts as a filter. If you want your From: fields all
lowercase you should add it to your procmail rule:

:0 wc :
| /path/to/mail_log.pl | /path/to/lowercase.pl >> \
  /path/to/incoming_mail.log



[6]  What does user_agent_no_space.pl do?
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This script replaces any spaces within a USER-AGENT field with
underscores. This is only useful in combined log format.

I needed this because I use webalizer and wanted to group the
user-agent fields. Unfortunately, webalizer reads spaces as delimiters
in his configuration file. If you run into this problem, you can
remove the spaces in either this way:

user_agent_no_space.pl < log.with.spaces > log.with.underscores

Or if you feed your log analyzer on stdin, use something like this:

user_agent_no_space.pl < log.with.spaces | my_log_analyzer_program



[8]  What about errors?
     ~~~~~~~~~~~~~~~~~~

Both mail_log.pl and sort_log.pl (as well as sort_log_fast.pl) may
print errors on stderr. You might want to collect them in a seperate
error log file.

For mail_log.pl use an entry similar to this in your .procmailrc:

:0 wc :
| /path/to/mail_log.pl >> /path/to/incoming_mail.log 2>> \
  /path/to/error.log

(At the moment this is not very useful because you can't find out
 which mail has caused the error)


For sort_log.pl (or sort_log_fast.pl) try this:

sort_log.pl < log.unsorted > log.sorted 2>> error.log

or this:

sort_log.pl < log.unsorted 2>> error.log | my_log_analyzer_program



[9]  Download locations
     ~~~~~~~~~~~~~~~~~~

* New versions of mail_log.pl
  http://www.cgarbs.de/mail_log.en.html
  http://www.h.shuttle.de/mitch/mail_log.en.html

* The Perl module Date::Manip
  http://www.cise.ufl.edu/~sbeck/
  http://www.cpan.org/modules/index.html

* Webalizer
  http://www.webalizer.org