Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
322 lines (265 sloc) 13.5 KB
\chapter{Quick reminder about HTTP}
When haproxy is running in HTTP mode, both the request and the response are
fully analyzed and indexed, thus it becomes possible to build matching criteria
on almost anything found in the contents.
However, it is important to understand how HTTP requests and responses are
formed, and how HAProxy decomposes them. It will then become easier to write
correct rules and to debug existing configurations.
\section{The HTTP transaction model}
The HTTP protocol is transaction-driven. This means that each request will lead
to one and only one response. Traditionally, a TCP connection is established
from the client to the server, a request is sent by the client on the
connection, the server responds and the connection is closed. A new request
will involve a new connection as shown on Figure~\ref{fig:http_close}.
\begin{tikzpicture} [
node distance=3cm,
\node[conclo] (con1) {CON1};
\node[reqresp,right of=con1] (req1) {REQ1};
\node[reqresp,right of=req1] (resp1) {RESP1};
\node[conclo,right of=resp1] (clo1) {CLO1};
\node[conclo,below of=con1] (con2) {CON2};
\node[reqresp,right of=con2] (req2) {REQ2};
\node[reqresp,right of=req2] (resp2) {RESP2};
\node[conclo,right of=resp2] (clo2) {CLO2};
\path[->] (con1) edge (req1);
\path[->] (req1) edge node[above] {\ldots} (resp1);
\path[->] (resp1) edge (clo1);
\path[dashed,->] (clo1) edge node[midway,sloped,above] {Next connection} (con2);
\path[->] (con2) edge (req2);
\path[->] (req2) edge node[above] {\ldots} (resp2);
\path[->] (resp2) edge (clo2);
\caption{HTTP close mode}
\index{HTTP close}
In this mode, called the "HTTP close" mode, there are as many connection
establishments as there are HTTP transactions. Since the connection is closed
by the server after the response, the client does not need to know the content
\index{HTTP keep-alive}
Due to the transactional nature of the protocol, it was possible to improve it
to avoid closing a connection between two subsequent transactions. In this mode
however, it is mandatory that the server indicates the content length for each
response so that the client does not wait indefinitely. For this, a special
header is used: "Content-length". This mode is called the "keep-alive" mode
and is shown on Figure shown on Figure~\ref{fig:http_keep_alive}.
\begin{tikzpicture} [
node distance=4cm,
\node[conclo,diamond] (con) {CON};
\node[reqresp,right of=con] (req1) {REQ1};
\node[reqresp,right of=req1] (resp1) {RESP1};
\node[reqresp,below of=req1] (req2) {REQ2};
\node[reqresp,right of=req2] (resp2) {RESP2};
\node[conclo,right of=resp2] (clo) {CLO};
\path[->] (con) edge node[above] {Start} (req1);
\path[->] (req1) edge node[above] {\ldots} (resp1);
\path[->] (resp1) edge node[midway,sloped,above] {Next request} (req2);
\path[->] (req2) edge node[above] {\ldots} (resp2);
\path[->] (resp2) edge node[above] {End} (clo);
\caption{HTTP keep-alive mode}
Its advantages are a reduced latency between transactions, and less processing
power required on the server side. It is generally better than the close mode,
but not always because the clients often limit their concurrent connections to
a smaller value.
\index{HTTP pipelining}
A last improvement in the communications is the pipelining mode. It still uses
keep-alive, but the client does not wait for the first response to send the
second request. This is useful for fetching large number of images composing a
page. This mode is shown on Figure~\ref{fig:http_pipelining}.
\begin{tikzpicture} [
node distance=3cm,
\node[conclo] (con) {CON};
\node[reqresp,right of=con] (req1) {REQ1};
\node[reqresp,right of=req1] (req2) {REQ2};
\node[reqresp,right of=req2, xshift=0.5cm] (resp1) {RESP1};
\node[reqresp,right of=resp1] (resp2) {RESP2};
\node[conclo,right of=resp2] (clo) {CLO};
\path[->] (con) edge node[above] {Start} (req1);
\path[->] (req1) edge (req2);
\path[->] (req2) edge node[above] {\ldots} (resp1);
\path[->] (resp1) edge (resp2);
\path[->] (resp2) edge node[above] {End} (clo);
\caption{HTTP pipelining mode}
This can obviously have a tremendous benefit on performance because the network
latency is eliminated between subsequent requests. Many HTTP agents do not
correctly support pipelining since there is no way to associate a response with
the corresponding request in HTTP. For this reason, it is mandatory for the
server to reply in the exact same order as the requests were received.
By default HAProxy operates in a tunnel-like mode with regards to persistent
connections: for each connection it processes the first request and forwards
everything else (including additional requests) to selected server. Once
established, the connection is persisted both on the client and server
sides. Use "option http-server-close" to preserve client persistent connections
while handling every incoming request individually, dispatching them one after
another to servers, in HTTP close mode. Use "option httpclose" to switch both
sides to HTTP close mode. "option forceclose" and "option
http-pretend-keepalive" help working around servers misbehaving in HTTP close
\section{HTTP request}
First, let's consider this HTTP request:
Line number & Contents \\
1 & \verb|GET /serv/login.php?lang=en&profile=2 HTTP/1.1| \\
2 & \verb|Host:| \\
3 & \verb|User-agent: my small browser| \\
4 & \verb|Accept: image/jpeg, image/gif| \\
5 & \verb|Accept: image/png| \\
\subsection{The Request line}
Line 1 is the "request line". It is always composed of 3 fields:
- & a METHOD & : & \verb|GET| \\
- & a URI & : & \verb|/serv/login.php?lang=en&profile=2| \\
- & a version tag & : & \verb|HTTP/1.1| \\
All of them are delimited by what the standard calls LWS (linear white spaces),
which are commonly spaces, but can also be tabs or line feeds/carriage returns
followed by spaces/tabs. The method itself cannot contain any colon ('\verb|:|') and
is limited to alphabetic letters. All those various combinations make it
desirable that HAProxy performs the splitting itself rather than leaving it to
the user to write a complex or inaccurate regular expression.
The URI itself can have several forms:
\item[A "relative URI"]
\index{Relative URI}
It is a complete URL without the host part. This is generally what is
received by servers, reverse proxies and transparent proxies.
\item[An "absolute URI", also called a "URL"]
\index{Absolute URI}
It is composed of a "scheme" (the protocol name followed by '\verb|://|'), a host
name or address, optionally a colon ('\verb|:|') followed by a port number, then
a relative URI beginning at the first slash ('\verb|/|') after the address part.
This is generally what proxies receive, but a server supporting HTTP/1.1
must accept this form too.
\item[A star "*"]
This form is only accepted in association with the OPTIONS
method and is not relayable. It is used to inquiry a next hop's
\item[An address:port combination]
This is used with the CONNECT method, which is used to establish TCP
tunnels through HTTP proxies, generally for HTTPS, but sometimes for
other protocols too.
In a relative URI, two sub-parts are identified. The part before the question
mark is called the "path". It is typically the relative path to static objects
on the server. The part after the question mark is called the "query string".
It is mostly used with GET requests sent to dynamic scripts and is very
specific to the language, framework or application in use.
\subsection{The request headers}
The headers start at the second line. They are composed of a name at the
beginning of the line, immediately followed by a colon ('\verb|:|'). Traditionally,
an LWS is added after the colon but that's not required. Then come the values.
Multiple identical headers may be folded into one single line, delimiting the
values with commas, provided that their order is respected. This is commonly
encountered in the "Cookie:" field. A header may span over multiple lines if
the subsequent lines begin with an LWS. In the example in 1.2, lines 4 and 5
define a total of 3 values for the "Accept:" header.
Contrary to a common mis-conception, header names are not case-sensitive, and
their values are not either if they refer to other header names (such as the
"Connection:" header).
The end of the headers is indicated by the first empty line. People often say
that it's a double line feed, which is not exact, even if a double line feed
is one valid form of empty line.
Fortunately, HAProxy takes care of all these complex combinations when indexing
headers, checking values and counting them, so there is no reason to worry
about the way they could be written, but it is important not to accuse an
application of being buggy if it does unusual, valid things.
\emph{Important note:}
As suggested by RFC2616, HAProxy normalizes headers by replacing line breaks
in the middle of headers by LWS in order to join multi-line headers. This
is necessary for proper analysis and helps less capable HTTP parsers to work
correctly and not to be fooled by such complex constructs.
\section{HTTP response}
An HTTP response looks very much like an HTTP request. Both are called HTTP
messages. Let's consider this HTTP response:
\item HTTP/1.1 200 OK
\item Content-length: 350
\item Content-Type: text/html
As a special case, HTTP supports so called "Informational responses" as status
codes 1xx. These messages are special in that they don't convey any part of the
response, they're just used as sort of a signaling message to ask a client to
continue to post its request for instance. In the case of a status 100 response
the requested information will be carried by the next non-100 response message
following the informational one. This implies that multiple responses may be
sent to a single request, and that this only works when keep-alive is enabled
(1xx messages are HTTP/1.1 only). HAProxy handles these messages and is able to
correctly forward and skip them, and only process the next non-100 response. As
such, these messages are neither logged nor transformed, unless explicitly
state otherwise. Status 101 messages indicate that the protocol is changing
over the same connection and that haproxy must switch to tunnel mode, just as
if a CONNECT had occurred. Then the Upgrade header would contain additional
information about the type of protocol the connection is switching to.
\subsection{The Response line}
Line 1 is the "response line". It is always composed of 3 fields:
\item a version tag : HTTP/1.1
\item a status code : 200
\item a reason : OK
The status code is always 3-digit. The first digit indicates a general status:
\index{HTTP response code}
\item[1xx] informational message to be skipped (eg: 100, 101)
\item[2xx] OK, content is following (eg: 200, 206)
\item[3xx] OK, no content following (eg: 302, 304)
\item[4xx] error caused by the client (eg: 401, 403, 404)
\item[5xx] error caused by the server (eg: 500, 502, 503)
Please refer to RFC2616 for the detailed meaning of all such codes. The
"reason" field is just a hint, but is not parsed by clients. Anything can be
found there, but it's a common practice to respect the well-established
messages. It can be composed of one or multiple words, such as "OK", "Found",
or "Authentication Required".
Haproxy may emit the following status codes by itself:
\item[200] access to stats page, and when replying to monitoring requests
\item[301] when performing a redirection, depending on the configured code
\item[302] when performing a redirection, depending on the configured code
\item[303] when performing a redirection, depending on the configured code
\item[400] for an invalid or too large request
\item[401] when an authentication is required to perform the action (when
accessing the stats page)
\item[403] when a request is forbidden by a "block" ACL or "reqdeny" filter
\item[408] when the request timeout strikes before the request is complete
\item[500] when haproxy encounters an unrecoverable internal error, such as a
memory allocation failure, which should never happen
\item[502] when the server returns an empty, invalid or incomplete response, or
when an "rspdeny" filter blocks the response.
\item[503] when no server was available to handle the request, or in response to
monitoring requests which match the "monitor fail" condition
\item[504] when the response timeout strikes before the server responds
The error 4xx and 5xx codes above may be customized (see "errorloc" in section
\subsection{The response headers}
Response headers work exactly like request headers, and as such, HAProxy uses
the same parsing function for both. Please refer to paragraph 1.2.2 for more