Skip to content
Simple Python script to normalize web log output.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



weblog_normalizer is a python script I wrote to extract some basic
information from web logs. It was written in an environment where
the standard "just awk it" approach wasn't as much of an option.

This script will extract status code and request url from various
outputs of apache and iislogs. In my environment we have several
varying versions of apache log formats and also have several iis

At this point, it's a bit of a hack. However, it does "just work".

It current outputs all lines of web log as



200 /
404 /notfound
301 /redirectme

With this output you can then pipe it through various shell
scripts as you like.

For exmaple I use it in a shell script to generate a report of
all non-valid status code requests and how many times they've been
hit in the log file.

$ANALYZESCRIPT -f $log | grep -v "^ 200 " | grep -v "^ 301 " | grep -v "^ 302 " | grep -v "^ 304 " | sort -g | uniq -c | sort -g -r | sed 's/^\s*//g' >> $REPORTFILE

The script supports non-compressed or gzip compressed files.

Note: At some point if there's interest in the script I'll clean it
up a bit, adding help and usage methods and such. If you have any
suggestions for improvements please use the issue system, or even 
better, send a pull request.
Something went wrong with that request. Please try again.