Skip to content

Apache2

elB4RTO edited this page Oct 15, 2022 · 2 revisions

Access logs format string


Configuration file


The configuration file should be located at:

/etc/apache2/apache2.conf

The line to configure access logs is the one starting with "LogFormat" followed by the list of fields codes.



Common logs formats


Most commonly used format strings are:


  • Common log format (CLF)
  • LogFormat "%h %l %u %t \"%r\" %>s %O" common

  • Combined log format (NCSA standard)
  • LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-agent}i\"" combined


Suggested logs formats


A suggested format string, to allow using the complete set of functionalities of LogDoctor, is:

LogFormat "%{%F %T}t %H %m %U %q %>s %I %O %D \"%{Referer}i\" \"%{Cookie}i\" \"%{User-agent}i\" %{c}h" combined


The string above should be preferred, but alternatives can be used as well, like:

LogFormat "%{sec}t \"%r\" %q %<s %I %O %D \"%{Referer}i\" \"%{Cookie}i\" \"%{User-agent}i\" %h" combined


Note on custom format strings


If you're using your own custom string, please keep in mind that parsing is not magic. When you define your own string, think about which characters can be there in a field and use separators accordingly to not conflict with the field itself.
As an example: an URI (%U) can't contain whitespaces, so it is safe to use a space to separe this field by the previous and next one. Instead, the User-Agent (*%{User-agent}i*) may contain spaces, as well as parenthesis, brackets, dashes, etc, so it's better to pick an appropriate separator (double-quotes are a good choice, since they get escaped while logging).



Note on control-characters


Although Apache2 does support some control-characters (aka escape sequences), it is reccomended to not use them inside format strings.
In particular, the carriage return will most-likely overwrite previous fields data, making it very difficult to understand where the current field ends (specially for fields like URIs, queries, user-agents, etc) and nearly impossible to retrieve the overwritten data, which will lead in having a wasted database, un-realistic statistics and/or crashes during execution.
About the new line character, it has no sense to use it, if not for testing purposes. The same is true for the horizontal tab, for which is better to use a simple whitespace instead.
The only control-characters supported by Apache2 are \n, \t and \r. Any other character will be ignored and treated as text.





Access logs format fields


Fields considered by LogDoctor


Only the following fields will be considered, meaning that only these fields' data will be stored and used for the statistics.


Code Informations
%% The percent sign character, will result in a single percent sign and treated as normal text (from both Apache and LogDoctor).
%t Time the request was received, in the format [DD/Mon/YYYY:hh:mm:ss ±TZ]. The last number (TZ) indicates the timezone offset from GMT.
%{FORMAT}t Time the request was received, in the form given by FORMAT, which should be in an extended strftime format.
The following format tokens are supported (by LogDoctor, any other than the following will be discarded, even if valid):
Format Description
sec time since epoch, in seconds
msec time since epoch, in milliseconds
usec time since epoch, in microseconds
%b month name, abbreviated (same as %h)
%B month name
%c date and time representation
%d day number, zero padded
%D date, in the form of MM/DD/YY
%e day number, space padded
%F date, in the form of YYYY/MM/DD
%h month name, abbreviated (same as %b)
%H hour, in 24h format, zero padded
%m month number, zero padded
%M minute
%r time if the day, in 12h format, in the form of HH:MM:SS AM/PM
%R time of the day, in HH:MM format
%S second
%T ISO 8601 time, in the form of HH:MM:SS
%x date representation
%X time representation
%y year, last two digits (YY)
%Y year
Note: time formats sec, msec and usec can't be mixed together or with other formats.
%r First line of request, equivalent to: %m %U?%q %H.
%H The request protocol (HTTP/v, HTTPS/v).
%m The request method (GET, POST, HEAD, ...).
%U The URI path requested, not including any query string.
%q Query string (if any).
%s HTTP Status code at the beginning of the request (exclude redirections statuses).
%>s Final HTTP Status code (in case requests have been internally redirected).
%I Bytes received, including request and headers (you need to enable mod_logio to use this).
%O Bytes sent, including headers (you need to enable mod_logio to use this).
%T The time taken to serve the request, in seconds.
%{UNIT}T The time taken to serve the request, in a time unit given by UNIT (only available in 2.4.13 and later).
Valid units are:
Unit Description
s seconds
ms milliseconds
us microseconds
%D The time taken to serve the request, in milliseconds.
%h IP Address of the client (remote hostname).
%{c}h Like %h, but always reports on the hostname of the underlying TCP connection and not any modifications to the remote hostname by modules like mod_remoteip.
%{VARNAME}i The contents of VARNAME: header line(s) in the request sent to the server.
Supported varnames (by LogDoctor) are:
VarName Description
Cookie cookie of the request
Referer referrer host
User-agent web-browser or bot identification string


Fields discarded by LogDoctor


Any field than the ones above won't be considered by LogDoctor.
When generating a log sample, these fields will appear as 'DISCARDED'.
If you aint using logs for any other purpose, please remove unnecessary fields to make the process faster and reduce the possibility of errors.





References