# langpavel/haproxy-doc-dmitryilyin forked from dmitryilyin/haproxy-doc

Fetching contributors…
Cannot retrieve contributors at this time
507 lines (434 sloc) 24.7 KB
 \chapter{Statistics and monitoring} It is possible to query HAProxy about its status. The most commonly used mechanism is the HTTP statistics page. This page also exposes an alternative CSV output format for monitoring tools. The same format is provided on the Unix socket. \section{CSV format} The statistics may be consulted either from the unix socket or from the HTTP page. Both means provide a CSV format whose fields follow. \begin{verbatim} 0. pxname: proxy name 1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name for server) 2. qcur: current queued requests 3. qmax: max queued requests 4. scur: current sessions 5. smax: max sessions 6. slim: sessions limit 7. stot: total sessions 8. bin: bytes in 9. bout: bytes out 10. dreq: denied requests 11. dresp: denied responses 12. ereq: request errors 13. econ: connection errors 14. eresp: response errors (among which srv_abrt) 15. wretr: retries (warning) 16. wredis: redispatches (warning) 17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...) 18. weight: server weight (server), total weight (backend) 19. act: server is active (server), number of active servers (backend) 20. bck: server is backup (server), number of backup servers (backend) 21. chkfail: number of failed checks 22. chkdown: number of UP->DOWN transitions 23. lastchg: last status change (in seconds) 24. downtime: total downtime (in seconds) 25. qlimit: queue limit 26. pid: process id (0 for first instance, 1 for second, ...) 27. iid: unique proxy id 28. sid: service id (unique inside a proxy) 29. throttle: warm up status 30. lbtot: total number of times a server was selected 31. tracked: id of proxy/server if tracking is enabled 32. type (0=frontend, 1=backend, 2=server, 3=socket) 33. rate: number of sessions per second over last elapsed second 34. rate_lim: limit on new sessions per second 35. rate_max: max number of new sessions per second 36. check_status: status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TMOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx 37. check_code: layer5-7 code, if available 38. check_duration: time in ms took to finish last health check 39. hrsp_1xx: http responses with 1xx code 40. hrsp_2xx: http responses with 2xx code 41. hrsp_3xx: http responses with 3xx code 42. hrsp_4xx: http responses with 4xx code 43. hrsp_5xx: http responses with 5xx code 44. hrsp_other: http responses with other codes (protocol error) 45. hanafail: failed health checks details 46. req_rate: HTTP requests per second over last elapsed second 47. req_rate_max: max number of HTTP requests per second observed 48. req_tot: total number of HTTP requests received 49. cli_abrt: number of data transfers aborted by the client 50. srv_abrt: number of data transfers aborted by the server (inc. in eresp) \end{verbatim} \section{Unix Socket commands} The following commands are supported on the UNIX stats socket ; all of them must be terminated by a line feed. The socket supports pipelining, so that it is possible to chain multiple commands at once provided they are delimited by a semi-colon or a line feed, although the former is more reliable as it has no risk of being truncated over the network. The responses themselves will each be followed by an empty line, so it will be easy for an external script to match a given response with a given request. By default one command line is processed then the connection closes, but there is an interactive allowing multiple lines to be issued one at a time. It is important to understand that when multiple haproxy processes are started on the same sockets, any process may pick up the request and will output its own stats. \subsubsection[clear counters]{clear counters} Clear the max values of the statistics counters in each proxy (frontend \& backend) and in each server. The cumulated counters are not affected. This can be used to get clean counters after an incident, without having to restart nor to clear traffic counters. This command is restricted and can only be issued on sockets configured for levels "operator" or "admin". \subsubsection[clear counters all]{clear counters all} Clear all statistics counters in each proxy (frontend \& backend) and in each server. This has the same effect as restarting. This command is restricted and can only be issued on sockets configured for level "admin". \subsubsection[clear table]{clear table [ data. ] | [ key ]} Remove entries from the stick-table . This is typically used to unblock some users complaining they have been abusively denied access to a service, but this can also be used to clear some stickiness entries matching a server that is going to be replaced (see "show table" below for details). Note that sometimes, removal of an entry will be refused because it is currently tracked by a session. Retrying a few seconds later after the session ends is usual enough. In the case where no options arguments are given all entries will be removed. When the "data." form is used entries matching a filter applied using the stored data (see "stick-table" in section 4.2) are removed. A stored data type must be specified in , and this data type must be stored in the table otherwise an error is reported. The data is compared according to with the 64-bit integer . Operators are the same as with the ACLs: \begin{description} \item[eq] match entries whose data is equal to this value \item[ne] match entries whose data is not equal to this value \item[le] match entries whose data is less than or equal to this value \item[ge] match entries whose data is greater than or equal to this value \item[lt] match entries whose data is less than this value \item[gt] match entries whose data is greater than this value \end{description} When the key form is used the entry is removed. The key must be of the same type as the table, which currently is limited to IPv4, IPv6, integer and string. Example: \begin{verbatim} $echo "show table http_proxy" | socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:2 >>> 0x80e6a4c: key=127.0.0.1 use=0 exp=3594729 gpc0=0 conn_rate(30000)=1 \ bytes_out_rate(60000)=187 >>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \ bytes_out_rate(60000)=191$ echo "clear table http_proxy key 127.0.0.1" | socat stdio /tmp/sock1 $echo "show table http_proxy" | socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:1 >>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \ bytes_out_rate(60000)=191$ echo "clear table http_proxy data.gpc0 eq 1" | socat stdio /tmp/sock1 $echo "show table http_proxy" | socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:1 \end{verbatim} \subsubsection[disable frontend]{disable frontend } Mark the frontend as temporarily stopped. This corresponds to the mode which is used during a soft restart : the frontend releases the port but can be enabled again if needed. This should be used with care as some non-Linux OSes are unable to enable it back. This is intended to be used in environments where stopping a proxy is not even imaginable but a misconfigured proxy must be fixed. That way it's possible to release the port and bind it into another process to restore operations. The frontend will appear with status "STOP" on the stats page. The frontend may be specified either by its name or by its numeric ID, prefixed with a sharp ('\#'). This command is restricted and can only be issued on sockets configured for level "admin". \subsubsection[disable server]{disable server /} Mark the server DOWN for maintenance. In this mode, no more checks will be performed on the server until it leaves maintenance. If the server is tracked by other servers, those servers will be set to DOWN during the maintenance. In the statistics page, a server DOWN for maintenance will appear with a "MAINT" status, its tracking servers with the "MAINT(via)" one. Both the backend and the server may be specified either by their name or by their numeric ID, prefixed with a sharp ('\#'). This command is restricted and can only be issued on sockets configured for level "admin". \subsubsection[enable frontend]{enable frontend } Resume a frontend which was temporarily stopped. It is possible that some of the listening ports won't be able to bind anymore (eg: if another process took them since the 'disable frontend' operation). If this happens, an error is displayed. Some operating systems might not be able to resume a frontend which was disabled. The frontend may be specified either by its name or by its numeric ID, prefixed with a sharp ('\#'). This command is restricted and can only be issued on sockets configured for level "admin". \subsubsection[enable server]{enable server /} If the server was previously marked as DOWN for maintenance, this marks the server UP and checks are re-enabled. Both the backend and the server may be specified either by their name or by their numeric ID, prefixed with a sharp ('\#'). This command is restricted and can only be issued on sockets configured for level "admin". \subsubsection[get weight]{get weight /} Report the current weight and the initial weight of server in backend or an error if either doesn't exist. The initial weight is the one that appears in the configuration file. Both are normally equal unless the current weight has been changed. Both the backend and the server may be specified either by their name or by their numeric ID, prefixed with a sharp ('\verb|#|'). \subsubsection[help]{help} Print the list of known keywords and their basic usage. The same help screen is also displayed for unknown commands. \subsubsection[prompt]{prompt} Toggle the prompt at the beginning of the line and enter or leave interactive mode. In interactive mode, the connection is not closed after a command completes. Instead, the prompt will appear again, indicating the user that the interpreter is waiting for a new command. The prompt consists in a right angle bracket followed by a space '\verb|> |'. This mode is particularly convenient when one wants to periodically check information such as stats or errors. It is also a good idea to enter interactive mode before issuing a "help" command. \subsubsection[quit]{quit} Close the connection when in interactive mode. \subsubsection[set maxconn frontend]{set maxconn frontend } Dynamically change the specified frontend's maxconn setting. Any non-null positive value is allowed, but setting values larger than the global maxconn does not make much sense. If the limit is increased and connections were pending, they will immediately be accepted. If it is lowered to a value below the current number of connections, new connections acceptation will be delayed until the threshold is reached. The frontend might be specified by either its name or its numeric ID prefixed with a sharp ('\verb|#|'). \subsubsection[set maxconn global]{set maxconn global } Dynamically change the global maxconn setting within the range defined by the initial global maxconn setting. If it is increased and connections were pending, they will immediately be accepted. If it is lowered to a value below the current number of connections, new connections acceptation will be delayed until the threshold is reached. A value of zero restores the initial setting. \subsubsection[set rate-limit connections global]{set rate-limit connections global } Change the process-wide connection rate limit, which is set by the global 'maxconnrate' setting. A value of zero disables the limitation. This limit applies to all frontends and the change has an immediate effect. The value is passed in number of connections per second. \subsubsection[set timeout cli]{set timeout cli } Change the CLI interface timeout for current connection. This can be useful during long debugging sessions where the user needs to constantly inspect some indicators without being disconnected. The delay is passed in seconds. \subsubsection[set weight]{set weight / [\%]} Change a server's weight to the value passed in argument. If the value ends with the ('\verb|%|') sign, then the new weight will be relative to the initially configured weight. Relative weights are only permitted between 0 and 100\%, and absolute weights are permitted between 0 and 256. Servers which are part of a farm running a static load-balancing algorithm have stricter limitations because the weight cannot change once set. Thus for these servers, the only accepted values are 0 and 100\% (or 0 and the initial weight). Changes take effect immediately, though certain LB algorithms require a certain amount of requests to consider changes. A typical usage of this command is to disable a server during an update by setting its weight to zero, then to enable it again after the update by setting it back to 100\%. This command is restricted and can only be issued on sockets configured for level "admin". Both the backend and the server may be specified either by their name or by their numeric ID, prefixed with a sharp ('\verb|#|'). \subsubsection[show errors]{show errors []} Dump last known request and response errors collected by frontends and backends. If is specified, the limit the dump to errors concerning either frontend or backend whose ID is . This command is restricted and can only be issued on sockets configured for levels "operator" or "admin". The errors which may be collected are the last request and response errors caused by protocol violations, often due to invalid characters in header names. The report precisely indicates what exact character violated the protocol. Other important information such as the exact date the error was detected, frontend and backend names, the server name (when known), the internal session ID and the source address which has initiated the session are reported too. All characters are returned, and non-printable characters are encoded. The most common ones (\verb|\t = 9|, \verb|\n = 10|, \verb|\r = 13| and \verb|\e = 27|) are encoded as one letter following a backslash. The backslash itself is encoded as '\verb|\\|' to avoid confusion. Other non-printable characters are encoded '\verb|\xNN|' where NN is the two-digits hexadecimal representation of the character's ASCII code. Lines are prefixed with the position of their first character, starting at 0 for the beginning of the buffer. At most one input line is printed per line, and large lines will be broken into multiple consecutive output lines so that the output never goes beyond 79 characters wide. It is easy to detect if a line was broken, because it will not end with '\verb|\n|' and the next line's offset will be followed by a '\verb|+|' sign, indicating it is a continuation of previous line. Example: \begin{verbatim}$ echo "show errors" | socat stdio /tmp/sock1 >>> [04/Mar/2009:15:46:56.081] backend http-in (#2) : invalid response src 127.0.0.1, session #54, frontend fe-eth0 (#1), server s2 (#1) response length 213 bytes, error at position 23: 00000 HTTP/1.0 200 OK\r\n 00017 header/bizarre:blah\r\n 00038 Location: blah\r\n 00054 Long-line: this is a very long line which should b 00104+ e broken into multiple lines on the output buffer, 00154+ otherwise it would be too large to print in a ter 00204+ minal\r\n 00211 \r\n \end{verbatim} In the example above, we see that the backend "http-in" which has internal ID 2 has blocked an invalid response from its server s2 which has internal ID 1. The request was on session 54 initiated by source 127.0.0.1 and received by frontend fe-eth0 whose ID is 1. The total response length was 213 bytes when the error was detected, and the error was at byte 23. This is the slash ('\verb|/|') in header name "header/bizarre", which is not a valid HTTP character for a header name. \subsubsection[show info]{show info} Dump info about haproxy status on current process. \subsubsection[show sess]{show sess} Dump all known sessions. Avoid doing this on slow connections as this can be huge. This command is restricted and can only be issued on sockets configured for levels "operator" or "admin". \subsubsection[show sess]{show sess } Display a lot of internal information about the specified session identifier. This identifier is the first field at the beginning of the lines in the dumps of "show sess" (it corresponds to the session pointer). Those information are useless to most users but may be used by haproxy developers to troubleshoot a complex bug. The output format is intentionally not documented so that it can freely evolve depending on demands. \subsubsection[show stat]{show stat [ ]} Dump statistics in the CSV format. By passing , and , it is possible to dump only selected items: \begin{itemize} \item[-] is a proxy ID, -1 to dump everything \item[-] selects the type of dumpable objects : 1 for frontends, 2 for backends, 4 for servers, -1 for everything. These values can be ORed, for example: \begin{verbatim} 1 + 2 = 3 -> frontend + backend. 1 + 2 + 4 = 7 -> frontend + backend + server. \end{verbatim} \item[-] is a server ID, -1 to dump everything from the selected proxy. \end{itemize} Example: \begin{verbatim} $echo "show info;show stat" | socat stdio unix-connect:/tmp/sock1 >>> Name: HAProxy Version: 1.4-dev2-49 Release_date: 2009/09/23 Nbproc: 1 Process_num: 1 (...) # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq, (...) stats,FRONTEND,,,0,0,1000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,1,0, (...) stats,BACKEND,0,0,0,0,1000,0,0,0,0,0,,0,0,0,0,UP,0,0,0,,0,250,(...) (...) www1,BACKEND,0,0,0,0,1000,0,0,0,0,0,,0,0,0,0,UP,1,1,0,,0,250, (...) \end{verbatim} Here, two commands have been issued at once. That way it's easy to find which process the stats apply to in multi-process mode. Notice the empty line after the information output which marks the end of the first block. A similar empty line appears at the end of the second block (stats) so that the reader knows the output has not been truncated. \subsubsection[show table]{show table} Dump general information on all known stick-tables. Their name is returned (the name of the proxy which holds them), their type (currently zero, always IP), their size in maximum possible number of entries, and the number of entries currently in use. Example : \begin{verbatim}$ echo "show table" | socat stdio /tmp/sock1 >>> # table: front_pub, type: ip, size:204800, used:171454 >>> # table: back_rdp, type: ip, size:204800, used:0 \end{verbatim} \subsubsection[show table]{show table [ data. ] | [ key ]} Dump contents of stick-table . In this mode, a first line of generic information about the table is reported as with "show table", then all entries are dumped. Since this can be quite heavy, it is possible to specify a filter in order to specify what entries to display. When the "data." form is used the filter applies to the stored data (see "stick-table" in section 4.2). A stored data type must be specified in , and this data type must be stored in the table otherwise an error is reported. The data is compared according to with the 64-bit integer . Operators are the same as with the ACLs: \begin{description} \item[eq] match entries whose data is equal to this value \item[ne] match entries whose data is not equal to this value \item[le] match entries whose data is less than or equal to this value \item[ge] match entries whose data is greater than or equal to this value \item[lt] match entries whose data is less than this value \item[gt] match entries whose data is greater than this value \end{description} When the key form is used the entry is shown. The key must be of the same type as the table, which currently is limited to IPv4, IPv6, integer, and string. Example: \begin{verbatim} $echo "show table http_proxy" | socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:2 >>> 0x80e6a4c: key=127.0.0.1 use=0 exp=3594729 gpc0=0 conn_rate(30000)=1 \ bytes_out_rate(60000)=187 >>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \ bytes_out_rate(60000)=191$ echo "show table http_proxy data.gpc0 gt 0" | socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:2 >>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \ bytes_out_rate(60000)=191 $echo "show table http_proxy data.conn_rate gt 5" | \ socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:2 >>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \ bytes_out_rate(60000)=191$ echo "show table http_proxy key 127.0.0.2" | \ socat stdio /tmp/sock1 >>> # table: http_proxy, type: ip, size:204800, used:2 >>> 0x80e6a80: key=127.0.0.2 use=0 exp=3594740 gpc0=1 conn_rate(30000)=10 \ bytes_out_rate(60000)=191 \end{verbatim} When the data criterion applies to a dynamic value dependent on time such as a bytes rate, the value is dynamically computed during the evaluation of the entry in order to decide whether it has to be dumped or not. This means that such a filter could match for some time then not match anymore because as time goes, the average event rate drops. It is possible to use this to extract lists of IP addresses abusing the service, in order to monitor them or even blacklist them in a firewall. Example: \begin{verbatim} $echo "show table http_proxy data.gpc0 gt 0" \ | socat stdio /tmp/sock1 \ | fgrep 'key=' | cut -d' ' -f2 | cut -d= -f2 > abusers-ip.txt ( or | awk '/key/{ print a[split($2,a,"=")]; }' ) \end{verbatim} \subsubsection[shutdown frontend]{shutdown frontend } Completely delete the specified frontend. All the ports it was bound to will be released. It will not be possible to enable the frontend anymore after this operation. This is intended to be used in environments where stopping a proxy is not even imaginable but a misconfigured proxy must be fixed. That way it's possible to release the port and bind it into another process to restore operations. The frontend will not appear at all on the stats page once it is terminated. The frontend may be specified either by its name or by its numeric ID, prefixed with a sharp ('\#'). This command is restricted and can only be issued on sockets configured for level "admin". \subsubsection[shutdown session]{shutdown session } Immediately terminate the session matching the specified session identifier. This identifier is the first field at the beginning of the lines in the dumps of "show sess" (it corresponds to the session pointer). This can be used to terminate a long-running session without waiting for a timeout or when an endless transfer is ongoing. Such terminated sessions are reported with a 'K' flag in the logs. \subsubsection[shutdown sessions]{shutdown sessions /} Immediately terminate all the sessions attached to the specified server. This can be used to terminate long-running sessions after a server is put into maintenance mode, for instance. Such terminated sessions are reported with a 'K' flag in the logs.