Skip to content

Memory leak in executemany #998

@useraccessdenied

Description

@useraccessdenied

Issue summary

There seems to be a memory leak when using executemany function. The issue seems to be prominent when we have a table with many columns. I tested with a 120-column table. Number of rows inserted seems to have no effect on memory increase.

The issue also seems to have massively subsided in 3.2.6. Prior versions see bigger spikes in memory usage. See output below.

Details

  • Operating System Name: Red Hat Enterprise Linux release 8.10 (Ootpa)
  • Target Db2 Server Version: 11.5.9
  • Python Version: 3.12.3
  • ibm_db version: 3.2.6
  • For non-Windows, output of below commands:
    uname: Linux
    uname -m: x86_64
  • Value of below environment variables if set:
    PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/nodejs/node-v22.14.0-linux-x64/bin:/root/odbc/odbc_cli/clidriver/bin:/root/bin

Test Script to bulk insert

import ibm_db_dbi
import uuid
import tracemalloc

tracemalloc.start()

# Create a connection string
conn_str = "DATABASE=testdb;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=db2inst1;PWD=*;")

cols = 10
print("Preparing to insert 10 rows into TEST.BULK")
insert = f"insert into TEST.BULK values({','.join(['?']*cols)})"
params = tuple(tuple(str(uuid.uuid4()) for _ in range(cols)) for _ in range(10))

print("Establishing connection to DB2")
conn = ibm_db_dbi.connect(conn_str, "", "")

for i in range(1000):
    print(f"\nIteration {i}")
    # Bulk insert using python DB API
    cur = conn.cursor()
    
    cur.executemany(insert, params)
    cur.close()
    conn.commit()
    
    print("Data inserted successfully")

    # Print memory usage
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')
    current, peak = tracemalloc.get_traced_memory()

    current = current / 10**6
    peak = peak / 10**6

    print(f"Memory usage: {current:.2f}MB")
    print(f"Peak memory usage: {peak:.2f}MB")
    for stat in top_stats[:3]:
        print(stat)

conn.close()

DDL of the table

CREATE TABLE "TEST"."BULK"  (
		  "1" VARCHAR(50 OCTETS) , 
                  .......(118 more lines).......
		  "120" VARCHAR(50 OCTETS) )   
		 IN "USERSPACE1"  
		 ORGANIZE BY ROW;

Output

For 3.2.6

Iteration 0
Data inserted successfully
Memory usage: 0.13MB
Peak memory usage: 0.16MB
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1740: size=3540 B, count=62, average=57 B
........(skipped lines)........
Iteration 500
Data inserted successfully
Memory usage: 0.41MB
Peak memory usage: 0.44MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1740: size=257 KiB, count=1431, average=184 B
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
........(skipped lines)........
Iteration 999
Data inserted successfully
Memory usage: 0.64MB
Peak memory usage: 0.68MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1740: size=490 KiB, count=2428, average=207 B
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B

For 3.2.5

Iteration 0
Data inserted successfully
Memory usage: 0.18MB
Peak memory usage: 0.24MB
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1743: size=48.0 KiB, count=13, average=3777 B
/home/kapil/db2/test_bulk_insert.py:21: size=10000 B, count=10, average=1000 B
........(skipped lines)........
Iteration 500
Data inserted successfully
Memory usage: 24.88MB
Peak memory usage: 24.95MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1743: size=23.2 MiB, count=1979, average=12.0 KiB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1505: size=183 KiB, count=544, average=344 B
/usr/lib/python3.12/uuid.py:282: size=90.2 KiB, count=1200, average=77 B
........(skipped lines)........
Iteration 999
Data inserted successfully
Memory usage: 49.49MB
Peak memory usage: 49.56MB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1743: size=46.4 MiB, count=3442, average=13.8 KiB
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:1505: size=363 KiB, count=1068, average=348 B
/home/kapil/db2/pyenv/lib/python3.12/site-packages/ibm_db_dbi.py:880: size=94.9 KiB, count=1034, average=94 B

Steps to Reproduce:

  1. Create a dummy table with large number of columns
  2. Run the test script and see memory increase over time

Observations

Tracemalloc points to line around 1740 in ibm_db_dbi.py in 3.2.6 where memory keeps on gradually increasing over time. System process manager also reports memory increase. The memory increase is around few KBs in 3.2.6 and few hundred KBs prior to 3.2.6.

This line points to self.__rowcount = ibm_db.execute_many(self.stmt_handler, seq_parameters) in 3.2.6 which I suspect calls the C function from ibm_db.c.

In long running sessions, the server gets OOMs. I have not observed this issue with executemany in pyodbc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions