Awk for Apache / Nginx logs
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Web server log file analysis & filtering

v1.2; Oct 2012
Ben Carpenter

This awk script processes lines from a log format that matches the 'combined' log often used by the Apache and Nginx web servers. If your log file format is different, amend accordingly, but for reference this is the combined format this script expects by default:

%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"

%h		Remote host
%l		Remote logname (ignored)
%u		Remote user (ignored)
%t		Date and time of the request 
%r		First line of the request, typically "GET /something HTTP/1.1"
%>s		Status
%b		Size of response in bytes

It tries to be efficient on resources, so there's minimal progress messages and no system commands in the main loop other than writing to a file based on the status code. The output files are written in a simplified tab-separated format, error corrected for some strange things like spaces in URLs and double quotes for the userid. This revised format is easier to pass reliably through other awk scripts when filtering for specific data, etc. The file format is:

IP, Date/Time, Method, URL, Status, Size, Referer, User Agent

You should be able to send a large (>1GB) amount of log data through this script quite comfortably. This works well for me, but usual clauses apply (use it at your own risk, etc.). Bug reports and suggestions for improvements are very welcome