Skip to content

Connection Storm

Felipe Megale edited this page May 14, 2026 · 85 revisions

++++++++++++++++++ UNDER CONSTRUCTION ++++++++++++++++++

Introduction

Connection Storms may happen in any relational database. Here we will be more specific about Db2.

The intention of this post is to add information about:

  • how to identify the the symptoms of a Connection Storm.
  • how to make Db2 more resilient when facing Connection Storms.

What is a Connection Storm?

There are articles about Connection Storm in the internet. I am referencing on in Reference section for more information about the Behavior.

But, a breath about what Connection Storm is:

The short answer

It is when in fact the database suddenly gets a lot of connection in a very few time. Let's say it won't from 300 connections to 3000 in 5 minutes.

A more complete answer

Connection Storm may be summarized in few phases:

1- Something gets slow in some layer that answers the application. It could be the Application Server, the network, the database server. The slowdown could even include the Operation System from the Servers involved. It could be bad queries with slow response time. Anyway, something gets slow.

2- The slowdown in some of the mentioned layers make the Application Server to terminate the connection and open a new connection with the same request that was slow before.

3- The first connection that was terminated by the Application Server is gone in the point of view of the Application Server. But in the Database Server that connection should be processing yet, maybe finishing the authentication, maybe rolling back the transaction. And those process gets CPUs.

4- That behavior becomes a loop. Actually, much more like a snow ball where things gets worst each second:

  • the App. Server opens a connection to the database
  • the slow down makes the request to not answer in time to the App. Server
  • App. Server terminates the connection
  • App. Server opens a new connection with the same request as the first connection that was terminated
  • This time it could be now processing 2 connections in the database (the one terminated and the new one just opened)
  • The new connection isn't answered in time to the App. Server again. The App. Server terminates the connection and opens a new one
  • And this Opening and Terminating Connections increases very very fast like snow ball

What happens in the database server?

Actually, let's talk about a more specific database: Db2 Server.

When a connection comes to the Db2 database a daemon called db2ckpwd is called.

That is the daemon responsible to talk to the Operation System to authenticate the user/password. That causes CPU use.

Let's add, for example, a bad query/UOW (Unit of Work) consuming CPU by the App. Server request.

Add a rollback to that bad UOW caused by the App. Server terminating the connection. And that consumes CPU as well.

Now, add the App. Server sending several New Connections, making the daemons (that are limited - default: 3) to consume CPU, sending the same bad transaction (consuming CPU), terminating those transactions (consuming CPU), and then sending even more connections and making all of that again.

Possible Results/Symptoms in the Database Server:

  • entire Server hang
  • very very slow down in entire server

How to diagnose a Connection Storm?

After your server came back, that means.. If you got a Connection Storm your Database Server had a very very slow down or hang. So, you will start a RCA (Root Cause Analysis) with the Server working (out of the Connection Storm).

So, after your server came back you need to check the Connection history. Maybe from DMC (Now, Db2 Genius) graphs or maybe any monitoring you have.

A graph like this is a clear Connection Storm:

image


If you don't have such monitoring (you should), you can go to db2diag.log.

Remember App. Server may terminate the connections? That logs to the db2diag.log like this:

2012-10-07-03.31.44.543149+480 I560509E538 LEVEL: Error
PID : 2160 TID : 140574002767616PROC : db2sysc
INSTANCE: db2inst1 NODE : 000 DB : SAMPLE
APPHDL : 0-4510 APPID: XX.XX.XX.XX.2286.121006192634
AUTHID : DB2INST1
EDUID : 570 EDUNAME: db2agent (SAMPLE)
FUNCTION: DB2 UDB, common communication, sqlcctcptest, probe:11
MESSAGE : Detected client termination
DATA #1 : Hexdump, 2 bytes
0x00007FD9EF7F6748 : 3600

Also, it may log messages with function AgentBreathingPoint, like this:

2021-06-07-16.35.03.960357-180 I130974058A552       LEVEL: Error
PID     : 44389                TID : 4327502506256  PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : XXXXXXX
APPHDL  : 0-28469              APPID: XX.XX.XX.XX.46374.210627173418
AUTHID  : XXXXXX               HOSTNAME: XXXXXXXX
EDUID   : 25871                EDUNAME: db2agent (XXXXXXX) 0
FUNCTION: DB2 UDB, base sys utilities, sqeAgent::AgentBreathingPoint, probe:10
CALLED  : DB2 UDB, common communication, sqlcctest
RETCODE : ZRC=0x00000036=54

So you can grep your db2diag.log to count those messages in a short time (some minutes). Lots of those messages in short time indicates a Connection Storm.

Example about to grep:

db2diag -t 2026-03-10-03.20:2026-03-10-03.28  | grep  -i "AgentBreathingPoint" |wc -l  
db2diag -t 2026-03-10-03.20:2026-03-10-03.28 | grep -i "Detected client termination" |wc -l  

References:

Beware of Connection Storm
https://www.ibm.com/support/pages/beware-connection-storm

"Detected client termination" in db2diag.log file.
https://www.ibm.com/support/pages/detected-client-termination-db2diaglog-file

Authentication and group cache
https://www.ibm.com/docs/en/db2/11.5.x?topic=details-authentication-group-cache

authn_cache_users - Authentication cache users configuration parameter
https://www.ibm.com/docs/en/db2/11.5.x?topic=dcp-authn-cache-users-authentication-cache-users-configuration-parameter

authn_cache_duration - Authentication cache duration configuration parameter
https://www.ibm.com/docs/en/db2/11.5.x?topic=dcp-authn-cache-duration-authentication-cache-duration-configuration-parameter

DB2_NUM_CKPW_DAEMONS
https://www.ibm.com/docs/en/db2/11.5.x?topic=variables-miscellaneous

Connection pool settings
https://www.ibm.com/docs/en/was-nd/8.5.5?topic=applications-connection-pool-settings

Connection pool advanced settings
https://www.ibm.com/docs/en/was-nd/8.5.5?topic=applications-connection-pool-advanced-settings

Clone this wiki locally