Memory leak when passing strings within table value parameter to stored proc #1338

mavxg · 2024-03-26T12:52:24Z

Environment

Python: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct 2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
pyodbc: 5.1.0
OS: Windows 10
DB: Windows Server 2019 Standard (10.0)
driver: ODBC Driver 17 for SQL Server

Issue

Memory leak when string values supplied to table valued parameter of stored proc. Same data supplied as float does not leak memory.

Example code

Following code should reproduce the error.

import psutil
import gc
import pyodbc

process = psutil.Process()

# Prepare database with do nothing stored proc
"""
CREATE TYPE Test_Table_Type AS TABLE
(
    RowId     bigint,
    MeasureId int,
    NodeId    int,
    Value     float  -- doesn't make any difference if this is varchar(256)
)
GO

CREATE PROCEDURE Test_Proc
@Facts Test_Table_Type READONLY
AS
BEGIN
  SELECT COUNT(*) FROM @Facts
END

GO
"""


def print_memory_usage():
    memory_usage = process.memory_info().rss  # should this be rss or vms
    print(f"{memory_usage / (1024 * 1024.0):0.4f} MiB")


def run_test(cnxn, call_proc, fn):
    stmt = "{CALL Test_Proc (?)}"
    cursor = cnxn.cursor()
    x = 3.00000001
    y = 2
    z = 4
    for i in range(5):
        x += 1.0
        y += 1
        z += 1
        # Make data with different strings for each row
        data = [(i, z * i, y * i, fn(x * i)) for i in range(1_000_000)]
        if call_proc:
            cursor.execute(stmt, (data,))
        del data
        gc.collect()
        print_memory_usage()


if __name__ == "__main__":
    url = 'DRIVER={ODBC Driver 17 for SQL Server};SERVER=MY_SERVER;DATABASE=MY_DATABASE;Trusted_Connection=yes;'
    connection = pyodbc.connect(url)
    print_memory_usage()
    print("\n# Make data but don't execute stored proc (with floats)")
    run_test(connection, call_proc=False, fn=float)
    print("\n# Make data but don't execute stored proc (with strings)")
    run_test(connection, call_proc=False, fn=str)
    print("\n# Make data and execute stored proc (with floats)")
    run_test(connection, call_proc=True, fn=float)
    print("\n# Make data and execute stored proc (with strings)")
    run_test(connection, call_proc=True, fn=str)

Output of sample code

44.1406 MiB

# Make data but don't execute stored proc (with floats)
45.9414 MiB
47.8555 MiB
47.8633 MiB
47.8672 MiB
47.8672 MiB

# Make data but don't execute stored proc (with strings)
47.8750 MiB
47.8789 MiB
47.8789 MiB
46.1562 MiB
47.8242 MiB

# Make data and execute stored proc (with floats)
46.4648 MiB
48.1289 MiB
46.4727 MiB
48.1406 MiB
46.4766 MiB

# Make data and execute stored proc (with strings)
121.0195 MiB
194.1836 MiB
267.8984 MiB
344.1250 MiB
419.3477 MiB

Process finished with exit code 0

The text was updated successfully, but these errors were encountered:

v-chojas · 2024-03-26T14:39:18Z

Could you post an ODBC trace?

gordthompson · 2024-03-26T14:40:30Z

Reproduced on Ubuntu 20.04. Both .rss and .vms increase. Same behaviour if cursor is created, consumed (.fetchall()), and closed for each iteration.

gordthompson · 2024-03-26T15:23:29Z

ODBC trace logs for "execute with float" and "execute with str", using 5_000 rows. (Trace logs for 1_000_000 rows were almost 500 MiB each unzipped.)

exec_float.log.zip

exec_str.log.zip

gordthompson · 2024-03-27T10:56:18Z

Note also that this is not a version 5.x regression; version 4.0.39 produces the same results.

v-chojas · 2024-03-27T14:54:12Z

No need to run it with that many rows since the pattern shows up with far less.

I think the code is missing a Py_XDECREF(param) after this line: https://github.com/mkleehammer/pyodbc/blob/master/src/params.cpp#L1268

gordthompson · 2024-03-27T15:59:07Z

I added this to my local build but the leak persists.

@@ -1264,10 +1264,11 @@ bool BindParameter(Cursor* cur, Py_ssize_t index, ParamInfo& info)
 
             for(i=0;i<ncols;i++)
             {
                 // Bind the TVP's columns --- all need to use DAE
                 PyObject *param = PySequence_GetItem(row, i);
+                Py_XDECREF(param);
                 GetParameterInfo(cur, i, param, info.nested[i], true);
                 info.nested[i].BufferLength = info.nested[i].StrLen_or_Ind;
                 info.nested[i].StrLen_or_Ind = SQL_DATA_AT_EXEC;
 
                 Py_BEGIN_ALLOW_THREADS

mavxg · 2024-03-27T18:10:53Z

Some more testing that might help.

The leak still occurs if you take the creation of the data object out of the loop. This seems to imply it is leaking memory allocated within pyODBC rather than a reference count issue with the passed in data. My guess was from the following code in cursor.cpp where it seems like the GetParameterInfo there is being called for each data item and doing an encode on the string generating an encoded byte array and the cleanup for this is somehow missing (the leak doesn't occur if you provide a bytes type parameters by doing the encode on the python side but does for str.)

https://github.com/mkleehammer/pyodbc/blob/ff1dd2318ec7792f9f031c6ea3966852ac013496/src/cursor.cpp#L855C22-L863C44

mavxg · 2024-03-29T14:24:01Z

I have found a related memory leak (that I think may have been spotted before) when you insert a long string with cursor.fast_executemany=True you get a leak if the column you are inserting into is varchar(MAX).

Both of these memory leaks are because of the same problem but in two difference places where some coded is duplicated.

The TVP issue is caused because in GetUnicodeInfo you get a new PyObject from PyCodec_Encode with a ref count of 1. While this object is wrapped by encoded the Detach call means it will not decrement the ref count in the destructor. For normal calls this is correct behaviour as the decrement happens in FreeInfos with the line Py_XDECREF(a[i].pObject);. For TVP parameters FreeInfos will only clean up the top level parameter and not all the items in the sequence and so this object is getting leaked.

This is the same issue with cursor.fast_executemany=True when you have a column with MAX size because the detach on this line causes the reference count to not be reduced by the destructor

pyodbc/src/params.cpp

Line 314 in ff1dd23

pParam->cell = encoded.Detach();

mavxg · 2024-04-02T08:26:11Z

Please ignore the second memory leak. I think I may have imagined it as I cannot replicate. The line I thought was leaking is working fine and is cleaned up by this code further down in the file

pyodbc/src/params.cpp

Line 1772 in ff1dd23

Py_XDECREF(objCell);

This was referenced Mar 29, 2024

Fix memory leak for strings passed to TVP mavxg/pyodbc#1

Merged

Fix memory leak for strings passed to TVP #1339

Merged

mavxg closed this as completed in mavxg/pyodbc#1 Mar 29, 2024

mavxg reopened this Mar 29, 2024

mkleehammer closed this as completed in #1339 Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when passing strings within table value parameter to stored proc #1338

Memory leak when passing strings within table value parameter to stored proc #1338

mavxg commented Mar 26, 2024

v-chojas commented Mar 26, 2024

gordthompson commented Mar 26, 2024

gordthompson commented Mar 26, 2024

gordthompson commented Mar 27, 2024

v-chojas commented Mar 27, 2024

gordthompson commented Mar 27, 2024

mavxg commented Mar 27, 2024

mavxg commented Mar 29, 2024 •

edited

Loading

mavxg commented Apr 2, 2024

Memory leak when passing strings within table value parameter to stored proc #1338

Memory leak when passing strings within table value parameter to stored proc #1338

Comments

mavxg commented Mar 26, 2024

Environment

Issue

Example code

Output of sample code

v-chojas commented Mar 26, 2024

gordthompson commented Mar 26, 2024

gordthompson commented Mar 26, 2024

gordthompson commented Mar 27, 2024

v-chojas commented Mar 27, 2024

gordthompson commented Mar 27, 2024

mavxg commented Mar 27, 2024

mavxg commented Mar 29, 2024 • edited Loading

mavxg commented Apr 2, 2024

mavxg commented Mar 29, 2024 •

edited

Loading