Issue summary
There seems to be a memory leak when using executemany function. The issue seems to be prominent when we have a table with many columns. I tested with a 120-column table. Number of rows inserted seems to have no effect on memory increase.
The issue also seems to have massively subsided in 3.2.6. Prior versions see bigger spikes in memory usage. See output below.
Details
- Operating System Name: Red Hat Enterprise Linux release 8.10 (Ootpa)
- Target Db2 Server Version: 11.5.9
- Python Version: 3.12.3
- ibm_db version: 3.2.6
- For non-Windows, output of below commands:
uname: Linux
uname -m: x86_64
- Value of below environment variables if set:
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/nodejs/node-v22.14.0-linux-x64/bin:/root/odbc/odbc_cli/clidriver/bin:/root/bin
Test Script to bulk insert
import ibm_db_dbi
import uuid
import tracemalloc
tracemalloc.start()
# Create a connection string
conn_str = "DATABASE=testdb;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=db2inst1;PWD=*;")
cols = 10
print("Preparing to insert 10 rows into TEST.BULK")
insert = f"insert into TEST.BULK values({','.join(['?']*cols)})"
params = tuple(tuple(str(uuid.uuid4()) for _ in range(cols)) for _ in range(10))
print("Establishing connection to DB2")
conn = ibm_db_dbi.connect(conn_str, "", "")
for i in range(1000):
print(f"\nIteration {i}")
# Bulk insert using python DB API
cur = conn.cursor()
cur.executemany(insert, params)
cur.close()
conn.commit()
print("Data inserted successfully")
# Print memory usage
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
current, peak = tracemalloc.get_traced_memory()
current = current / 10**6
peak = peak / 10**6
print(f"Memory usage: {current:.2f}MB")
print(f"Peak memory usage: {peak:.2f}MB")
for stat in top_stats[:3]:
print(stat)
conn.close()
DDL of the table
CREATE TABLE "TEST"."BULK" (
"1" VARCHAR(50 OCTETS) ,
.......(118 more lines).......
"120" VARCHAR(50 OCTETS) )
IN "USERSPACE1"
ORGANIZE BY ROW;
Output
For 3.2.6
Iteration 0
Data inserted successfully
Memory usage: 0.13MB
Peak memory usage: 0.16MB
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1740: size=3540 B, count=62, average=57 B
........(skipped lines)........
Iteration 500
Data inserted successfully
Memory usage: 0.41MB
Peak memory usage: 0.44MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1740: size=257 KiB, count=1431, average=184 B
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
........(skipped lines)........
Iteration 999
Data inserted successfully
Memory usage: 0.64MB
Peak memory usage: 0.68MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1740: size=490 KiB, count=2428, average=207 B
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
For 3.2.5
Iteration 0
Data inserted successfully
Memory usage: 0.18MB
Peak memory usage: 0.24MB
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1743: size=48.0 KiB, count=13, average=3777 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
........(skipped lines)........
Iteration 500
Data inserted successfully
Memory usage: 24.88MB
Peak memory usage: 24.95MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1743: size=23.2 MiB, count=1979, average=12.0 KiB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1505: size=183 KiB, count=544, average=344 B
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
........(skipped lines)........
Iteration 999
Data inserted successfully
Memory usage: 49.49MB
Peak memory usage: 49.56MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1743: size=46.4 MiB, count=3442, average=13.8 KiB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1505: size=363 KiB, count=1068, average=348 B
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:880: size=94.9 KiB, count=1034, average=94 B
Steps to Reproduce:
- Create a dummy table with large number of columns
- Run the test script and see memory increase over time
Observations
Tracemalloc points to line around 1740 in ibm_db_dbi.py in 3.2.6 where memory keeps on gradually increasing over time. System process manager also reports memory increase. The memory increase is around few KBs in 3.2.6 and few hundred KBs prior to 3.2.6.
This line points to self.__rowcount = ibm_db.execute_many(self.stmt_handler, seq_parameters) in 3.2.6 which I suspect calls the C function from ibm_db.c.
In long running sessions, the server gets OOMs. I have not observed this issue with executemany in pyodbc.
Issue summary
There seems to be a memory leak when using executemany function. The issue seems to be prominent when we have a table with many columns. I tested with a 120-column table. Number of rows inserted seems to have no effect on memory increase.
The issue also seems to have massively subsided in 3.2.6. Prior versions see bigger spikes in memory usage. See output below.
Details
uname: Linux
uname -m: x86_64
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/nodejs/node-v22.14.0-linux-x64/bin:/root/odbc/odbc_cli/clidriver/bin:/root/bin
Test Script to bulk insert
DDL of the table
Output
For 3.2.6
For 3.2.5
Steps to Reproduce:
Observations
Tracemalloc points to line around 1740 in
ibm_db_dbi.pyin 3.2.6 where memory keeps on gradually increasing over time. System process manager also reports memory increase. The memory increase is around few KBs in 3.2.6 and few hundred KBs prior to 3.2.6.This line points to
self.__rowcount = ibm_db.execute_many(self.stmt_handler, seq_parameters)in 3.2.6 which I suspect calls the C function fromibm_db.c.In long running sessions, the server gets OOMs. I have not observed this issue with executemany in
pyodbc.