Adding Microsoft SECURITY.MD #2

microsoft-github-policy-service · 2025-01-24T12:21:37Z

Please accept this contribution adding the standard Microsoft SECURITY.MD 🔒 file to help the community understand the security policy and how to safely report security issues. GitHub uses the presence of this file to light-up security reminders and a link to the file. This pull request commits the latest official SECURITY.MD file from https://github.com/microsoft/repo-templates/blob/main/shared/SECURITY.md.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

- Replace pybind11 row[col-1] = value with PyLong_FromLong + PyList_SET_ITEM - Applies to SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT - Eliminates pybind11 wrapper overhead, bounds checking, and extra reference counting - Expected improvement: ~40-100ms for integer-heavy result sets (1.2M rows) - Added PERF_TIMER for int_c_api_assign, smallint_c_api_assign, bigint_c_api_assign - PyList_SET_ITEM steals reference (no Py_INCREF needed)

…ypes - Replaced pybind11 wrappers with direct Python C API calls - SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT: PyLong_FromLong/PyLong_FromLongLong - SQL_TINYINT, SQL_BIT: PyLong_FromLong/PyBool_FromLong - SQL_REAL, SQL_DOUBLE, SQL_FLOAT: PyFloat_FromDouble - Uses PyList_SET_ITEM macro for direct list assignment (no bounds checking) - Eliminates pybind11 wrapper overhead for simple numeric types - Added PERF_TIMER instrumentation for each numeric type conversion

- Created typedef ColumnProcessor for function pointer type - Added ColumnProcessors namespace with specialized inline processors: * ProcessInteger, ProcessSmallInt, ProcessBigInt, ProcessTinyInt, ProcessBit * ProcessReal, ProcessDouble * ProcessChar, ProcessWChar, ProcessBinary (handle LOBs, NULL, zero-length) - Added ColumnInfoExt struct to pass metadata efficiently - Build columnProcessors array once during cache_column_metadata - Fast path: Direct function call via columnProcessors[col-1] (no switch) - Slow path: Fallback switch for complex types (DECIMAL, DATETIME, GUID) - Eliminates switch evaluation from O(rows × columns) to O(columns) - All processors use direct Python C API from OPT #1 and OPT #2

Problem: - All numeric conversions used pybind11 wrappers with overhead: * Type detection, wrapper object creation, bounds checking * ~20-40 CPU cycles overhead per cell Solution: - Use direct Python C API calls: * PyLong_FromLong/PyLong_FromLongLong for integers * PyFloat_FromDouble for floats * PyBool_FromLong for booleans * PyList_SET_ITEM macro (no bounds check - list pre-sized) Changes: - SQL_INTEGER, SQL_SMALLINT, SQL_BIGINT, SQL_TINYINT → PyLong_* - SQL_BIT → PyBool_FromLong - SQL_REAL, SQL_DOUBLE, SQL_FLOAT → PyFloat_FromDouble - Added explicit NULL handling for each type Impact: - Eliminates pybind11 wrapper overhead for simple numeric types - Direct array access via PyList_SET_ITEM macro - Affects 7 common numeric SQL types

Problem: -------- Row creation and assignment had multiple layers of overhead: 1. Per-row allocation: py::list(numCols) creates pybind11 wrapper for each row 2. Cell assignment: row[col-1] = value uses pybind11 operator[] with bounds checking 3. Final assignment: rows[i] = row uses pybind11 list assignment with refcount overhead 4. Fragmented allocation: 1,000 separate py::list() calls instead of batch allocation For 1,000 rows: ~30-50 CPU cycles × 1,000 = 30K-50K wasted cycles Solution: --------- Replace pybind11 wrappers with direct Python C API throughout: 1. Row creation: PyList_New(numCols) instead of py::list(numCols) 2. Cell assignment: PyList_SET_ITEM(row, col-1, value) instead of row[col-1] = value 3. Final assignment: PyList_SET_ITEM(rows.ptr(), i, row) instead of rows[i] = row This completes the transition to direct Python C API started in OPT #2. Changes: -------- - Replaced py::list row(numCols) → PyObject* row = PyList_New(numCols) - Updated all NULL/SQL_NO_TOTAL handlers to use PyList_SET_ITEM - Updated all zero-length data handlers to use direct Python C API - Updated string handlers (SQL_CHAR, SQL_WCHAR) to use PyList_SET_ITEM - Updated complex type handlers (DECIMAL, DATETIME, DATE, TIME, TIMESTAMPOFFSET, GUID, BINARY) - Updated final row assignment to use PyList_SET_ITEM(rows.ptr(), i, row) All cell assignments now use direct Python C API: - Numeric types: Already done in OPT #2 (PyLong_FromLong, PyFloat_FromDouble, etc.) - Strings: PyUnicode_FromStringAndSize, PyUnicode_FromString - Binary: PyBytes_FromStringAndSize - Complex types: .release().ptr() to transfer ownership Impact: ------- - ✅ Eliminates pybind11 wrapper overhead for row creation - ✅ No bounds checking in hot loop (PyList_SET_ITEM is a macro) - ✅ Clean reference counting (objects created with refcount=1, transferred to list) - ✅ Consistent with OPT #2 (entire row/cell management via Python C API) - ✅ Expected 5-10% improvement (smaller than OPT #3, but completes the stack) All type handlers now bypass pybind11 for maximum performance.

Microsoft mandatory file

169fb77

microsoft-github-policy-service bot mentioned this pull request Jan 24, 2025

This repo is missing important files #3

Closed

bewithgaurav merged commit 51dad73 into main Jan 24, 2025
1 check passed

bewithgaurav added a commit that referenced this pull request Nov 10, 2025

docs: Update OPTIMIZATION_PR_SUMMARY with OPT #2 details

7159d81

bewithgaurav mentioned this pull request Nov 10, 2025

FEAT: Performance Improvements in Fetch path #320

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding Microsoft SECURITY.MD #2

Adding Microsoft SECURITY.MD #2

Uh oh!

microsoft-github-policy-service bot commented Jan 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adding Microsoft SECURITY.MD #2

Adding Microsoft SECURITY.MD #2

Uh oh!

Conversation

microsoft-github-policy-service bot commented Jan 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants