Thin mode: pool grows beyond max after TIMEDWAIT requests time out

1. What versions are you using?

```
oracledb.__version__: 3.4.2
platform.platform: macOS-15.7.5-arm64-arm-64bit-Mach-O
sys.maxsize > 2**32: True
platform.python_version: 3.13.9
```

Oracle Database 21c Express Edition Release 21.0.0.0.0 (21.3.0.0.0), thin mode.

2. Is it an error or a hang or a crash?

Error (incorrect behavior). The connection pool grows beyond the configured `max` when using `POOL_GETMODE_TIMEDWAIT`. After timed-out acquire requests, `pool.opened` significantly exceeds `pool.max`.

3. What error(s) or behavior you are seeing?

With a pool configured as `max=5` and `getmode=POOL_GETMODE_TIMEDWAIT`, after a burst of timed-out requests, `pool.opened` grows to 25 (5x the configured max). The excess connections are real TCP connections to Oracle and consume database resources.

We observed this in production where `max=20` but `pool.opened` grew to 60.

Expected behavior: `pool.opened` should never exceed `pool.max` with `POOL_GETMODE_TIMEDWAIT`. Only `POOL_GETMODE_FORCEGET` should allow exceeding max.

**Root cause** is in `BaseThinPoolImpl._get_next_request()` in `src/oracledb/impl/thin/pool.pyx`:

```cython
cdef PooledConnRequest _get_next_request(self):
    for request in self._requests:
        if not request.waiting \          # <-- BUG: bypasses max check
                or request.requires_ping \
                or request.is_replacing \
                or request.is_extra \
                or self._open_count < self.max:
            request.in_progress = request.waiting
            return request
        break
```

When a TIMEDWAIT request times out, `request.waiting` is set to `False` (in `acquire()`'s `finally` block), but the request remains in `self._requests`. The background task calls `_get_next_request()`, which finds these stale requests. The `not request.waiting` condition evaluates to `True`, short-circuiting the entire `or`-chain -- including the `self._open_count < self.max` guard.

The background task then calls `_process_request()`, which creates a **new** connection via `_create_conn_impl()`. In `_post_process_request()`, `self._open_count += 1` is executed (incrementing beyond max), then `request.reject()` is called (since `not request.waiting`), which returns the connection to the free list. The connection was created for a request nobody is waiting for, but `_open_count` was already incremented.

For N timed-out requests, this creates N extra connections beyond max.

**Note:** This is related to the previously reported [#390](https://github.com/oracle/python-oracledb/issues/390) which was closed as stale. The commit referenced as the fix ([b13418a](https://github.com/oracle/python-oracledb/commit/b13418a56a31cad8d959208be3bb5c605e919c87)) addressed a different symptom (connections stuck in the busy list, #392) but did not fix the `_get_next_request()` logic. The current code on `main` still has this bug.

**Suggested fix** -- do not process timed-out requests when they have no special flags (ping/replace/extra). Separate handling of timed-out requests from active requests, and discard stale requests from the queue:

```cython
cdef PooledConnRequest _get_next_request(self):
    cdef PooledConnRequest request
    while self._requests:
        request = self._requests[0]
        if not request.waiting:
            if request.requires_ping or request.is_replacing \
                    or request.is_extra:
                request.in_progress = False
                return request
            self._requests.pop(0)
            request.bg_processing = False
            continue
        if request.requires_ping or request.is_replacing \
                or request.is_extra \
                or self._open_count < self.max:
            request.in_progress = True
            return request
        break
```

4. Does your application call init_oracle_client()?

No. Using thin mode only.

5. Include a runnable Python script that shows the problem.

No database schema is needed. The bug is in the pool's connection management logic and only requires the ability to connect to any Oracle database.

```python
"""
Reproduction: pool.opened grows beyond pool.max with POOL_GETMODE_TIMEDWAIT.

When timed-out requests remain in the pool's internal request queue,
the background task creates connections for them without checking the
max limit, causing pool.opened to exceed pool.max.

The bug requires the background task to be woken up AFTER the stale
requests accumulate. In production, this happens naturally when new
requests arrive. Here we simulate it by sending additional requests
after the first batch times out.
"""

import oracledb
import threading
import time

# Connect to any Oracle database
DSN = "localhost:1521/XEPDB1"
USER = "system"
PASSWORD = "oracle"

POOL_MAX = 5
TIMEOUT_REQUESTS = 20

pool = oracledb.create_pool(
    user=USER,
    password=PASSWORD,
    dsn=DSN,
    min=2,
    max=POOL_MAX,
    increment=1,
    getmode=oracledb.POOL_GETMODE_TIMEDWAIT,
    wait_timeout=1000,  # 1 second (in ms)
)

print(f"Pool created:   opened={pool.opened}  busy={pool.busy}  max={pool.max}")

# Step 1: Fill the pool to max -- all connections held busy
held_connections = [pool.acquire() for _ in range(POOL_MAX)]
print(f"Pool filled:    opened={pool.opened}  busy={pool.busy}")

# Step 2: Send requests that will all time out (pool is full)
timeout_count = 0
lock = threading.Lock()


def try_acquire():
    global timeout_count
    try:
        conn = pool.acquire()
        pool.release(conn)
    except oracledb.DatabaseError:
        with lock:
            timeout_count += 1


threads = [threading.Thread(target=try_acquire) for _ in range(TIMEOUT_REQUESTS)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Timed out:      {timeout_count}/{TIMEOUT_REQUESTS} requests got DPY-4005")
print(f"After timeouts: opened={pool.opened}  busy={pool.busy}")

# Step 3: Send a few more requests to wake the background task.
#
# Each new request calls fulfill() -> _add_request() -> _notify_bg_task().
# The background task wakes up and iterates _requests. The stale timed-out
# requests (waiting=False) are at the front of the queue. The condition
# "not request.waiting" in _get_next_request() evaluates to True,
# short-circuiting the "_open_count < self.max" guard. The background
# task creates a new connection for each stale request, incrementing
# _open_count beyond max.
print("Triggering background task with new requests...")
trigger_threads = [threading.Thread(target=try_acquire) for _ in range(3)]
for t in trigger_threads:
    t.start()
for t in trigger_threads:
    t.join()

# Step 4: Wait for background task to finish processing stale requests
time.sleep(15)

print(f"After bg task:  opened={pool.opened}  busy={pool.busy}")

# Release held connections
for conn in held_connections:
    pool.release(conn)
time.sleep(2)

print(f"After release:  opened={pool.opened}  busy={pool.busy}")

if pool.opened > pool.max:
    excess = pool.opened - pool.max
    print(f"\nBUG: pool.opened ({pool.opened}) exceeds pool.max ({pool.max})"
          f" by {excess} connections.")
    print(f"Expected pool.opened <= {pool.max}."
          f" Only POOL_GETMODE_FORCEGET should exceed max.")
else:
    print(f"\npool.opened ({pool.opened}) is within pool.max ({pool.max}).")

pool.close(force=True)
```

Verified output (oracledb 3.4.2, Python 3.13.9, Oracle 21c XE, thin mode):

```
Pool created:   opened=0  busy=0  max=5
Pool filled:    opened=5  busy=5
Timed out:      20/20 requests got DPY-4005
After timeouts: opened=5  busy=5
Triggering background task with new requests...
After bg task:  opened=28  busy=5     <-- BUG: should be 5
After release:  opened=28  busy=0     <-- 28 > max(5)

BUG: pool.opened (28) exceeds pool.max (5) by 23 connections.
Expected pool.opened <= 5. Only POOL_GETMODE_FORCEGET should exceed max.
```

The 23 excess connections = 20 stale requests + 3 trigger requests, each creating a connection beyond max.

---

We're happy to answer any questions or provide a pull request with a fix if that would be helpful.
Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thin mode: pool grows beyond max after TIMEDWAIT requests time out #581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Thin mode: pool grows beyond max after TIMEDWAIT requests time out #581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions