-
Notifications
You must be signed in to change notification settings - Fork 16
FEAT: BULK COPY #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FEAT: BULK COPY #125
Conversation
… jahnvi/bulk_copy_options
…ft/mssql-python into jahnvi/bulk_copy_main_apis
…soft/mssql-python into jahnvi/bulk_copy_wrapper_cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds full support for SQL Server’s Bulk Copy Program (BCP) in the mssql_python
package by introducing new client classes, configuration options, and C++ bindings.
- Introduces
BCPClient
,BCPOptions
,ColumnFormat
, andBindData
for Python-side BCP workflows - Extends
ddbc_bindings
with BCP API function pointers and implementsBCPWrapper
in C++ - Adds integration and unit tests covering format files, column formatting, queries, and in-memory BCP
Reviewed Changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
File | Description |
---|---|
tests/testBCP/test_002_bcpMain.py | New integration tests exercising BCPClient end-to-end |
tests/testBCP/test_001_bulkOptions.py | Unit tests for BCPOptions , ColumnFormat , and BindData |
mssql_python/pybind/ddbc_bindings.h | Declares BCP API function pointer types |
mssql_python/pybind/ddbc_bindings.cpp | Loads BCP API pointers and exports new BCPWrapper class |
mssql_python/pybind/connection/connection.h | Exposes DBC handle accessor for BCP initialization |
mssql_python/pybind/connection/connection.cpp | Applies SQL_COPT_SS_BCP attribute before connect |
mssql_python/pybind/bcp/bcp_wrapper.h | Declares BCPWrapper interface |
mssql_python/pybind/bcp/bcp_wrapper.cpp | Implements BCP logic, column binding, and error handling |
mssql_python/pybind/CMakeLists.txt | Includes bcp_wrapper.cpp in the build |
mssql_python/constants.py | Adds BCPControlOptions and BCPDataTypes enums |
mssql_python/bcp_options.py | Defines Python dataclasses for BCP options and bindings |
mssql_python/bcp_main.py | Implements BCPClient orchestration logic |
mssql_python/init.py | Exposes BCP classes and constants in the package API |
Comments suppressed due to low confidence (1)
tests/testBCP/test_002_bcpMain.py:1
- [nitpick] Consider adding negative or error‐path tests (e.g., invalid format file, unsupported direction) to cover exceptional flows in
BCPClient
.
import os
+ ", direction code: " + std::to_string(dir_code)); | ||
// Call BCPInitW with the correct parameters | ||
SQLRETURN ret = BCPInitW_ptr(_hdbc, pTable, pDataFile, pErrorFile, dir_code); | ||
LOG("BCPWrapper: HELLOOOO " + std::to_string(ret)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove or replace this informal debug log (HELLOOOO
) with a more descriptive message or eliminate it if no longer needed.
LOG("BCPWrapper: HELLOOOO " + std::to_string(ret)); | |
LOG("BCPWrapper: BCPInitW_ptr returned value: " + std::to_string(ret)); |
Copilot uses AI. Check for mistakes.
std::string value_utf8 = py::cast(value).cast<std::string>(); | ||
// Fix the BCPHINTS issue: use the wide string directly rather than converting to narrow | ||
SQLRETURN ret = BCPControlW_ptr(_hdbc, it->second.option_code, (LPVOID)value_utf8.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're passing a narrow char*
to the wide‐string BCPControlW function. You should convert value
to std::wstring
and pass its c_str()
so the API receives a LPCWSTR
.
std::string value_utf8 = py::cast(value).cast<std::string>(); | |
// Fix the BCPHINTS issue: use the wide string directly rather than converting to narrow | |
SQLRETURN ret = BCPControlW_ptr(_hdbc, it->second.option_code, (LPVOID)value_utf8.c_str()); | |
// Fix the BCPHINTS issue: use the wide string directly rather than converting to narrow | |
SQLRETURN ret = BCPControlW_ptr(_hdbc, it->second.option_code, (LPVOID)value.c_str()); |
Copilot uses AI. Check for mistakes.
if (SQLGetConnectAttr_ptr) { | ||
SQLINTEGER bcp_attr_value = 0; | ||
SQLINTEGER string_length = 0; | ||
std::cout << "BCPWrapper: Checking BCP attribute just before initialization." << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace direct std::cout
usage with the existing LOG(...)
utility to keep all diagnostic output consistent and configurable.
std::cout << "BCPWrapper: Checking BCP attribute just before initialization." << std::endl; | |
LOG("BCPWrapper: Checking BCP attribute just before initialization."); |
Copilot uses AI. Check for mistakes.
typedef SQLRETURN (SQL_API* BCPColFmtWFunc)(SQLHDBC, INT, INT, INT, DBINT, LPCBYTE, INT, INT); | ||
typedef SQLRETURN (SQL_API* BCPExecFunc)(SQLHDBC, DBINT*); | ||
typedef SQLRETURN (SQL_API* BCPDoneFunc)(SQLHDBC); | ||
typedef SQLRETURN (SQL_API* BCPBindFunc)(HDBC, LPCBYTE, INT, DBINT, LPCBYTE, INT, INT, INT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first parameter should be SQLHDBC
instead of the undefined HDBC
type to match the BCP API signature.
typedef SQLRETURN (SQL_API* BCPBindFunc)(HDBC, LPCBYTE, INT, DBINT, LPCBYTE, INT, INT, INT); | |
typedef SQLRETURN (SQL_API* BCPBindFunc)(SQLHDBC, LPCBYTE, INT, DBINT, LPCBYTE, INT, INT, INT); |
Copilot uses AI. Check for mistakes.
@@ -716,6 +728,17 @@ DriverHandle LoadDriverOrThrowException() { | |||
|
|||
SQLGetDiagRec_ptr = GetFunctionPointer<SQLGetDiagRecFunc>(handle, "SQLGetDiagRecW"); | |||
|
|||
// Load BCP functions | |||
BCPInitW_ptr = (BCPInitWFunc)GetProcAddress(handle, "bcp_initW"); | |||
BCPControlW_ptr = (BCPControlWFunc)GetProcAddress(handle, "bcp_control"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’ve bound BCPControlWFunc
but are loading the ANSI entry bcp_control
instead of the wide‐char bcp_controlW
. Update to GetProcAddress(handle, "bcp_controlW")
.
BCPControlW_ptr = (BCPControlWFunc)GetProcAddress(handle, "bcp_control"); | |
BCPControlW_ptr = (BCPControlWFunc)GetProcAddress(handle, "bcp_controlW"); |
Copilot uses AI. Check for mistakes.
) | ||
|
||
self.wrapper = BCPWrapper(connection._conn) | ||
print(f"connection: {connection._conn}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debug print
should be removed or replaced with a logger call to avoid unintended console output in production.
print(f"connection: {connection._conn}") | |
logger.debug(f"connection: {connection._conn}") |
Copilot uses AI. Check for mistakes.
col_fmt_obj.file_col, | ||
col_fmt_obj | ||
) | ||
print("col_fmt_obj: %s", col_fmt_obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this ad-hoc print
. Use the logging framework (logger.debug(...)
) if you need to inspect col_fmt_obj
.
print("col_fmt_obj: %s", col_fmt_obj) | |
logger.debug("col_fmt_obj: %s", col_fmt_obj) |
Copilot uses AI. Check for mistakes.
# self.wrapper.finish() | ||
# self.wrapper.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out cleanup calls (finish
/close
) may lead to leaked BCP resources. Ensure you explicitly finalize and close the wrapper in all paths.
# self.wrapper.finish() | |
# self.wrapper.close() | |
self.wrapper.finish() | |
self.wrapper.close() |
Copilot uses AI. Check for mistakes.
LOG("BCPWrapper: Terminator bytes provided: " + py::cast(terminator_str_holder).cast<std::string>()); | ||
pTerminator = reinterpret_cast<const BYTE*>(terminator_str_holder.data()); | ||
LOG("BCPWrapper: Terminator pointer: " + std::to_string(reinterpret_cast<uintptr_t>(pTerminator))); | ||
LOG("BCPWRapper: Terminator pointer type: " + std::string(typeid(pTerminator).name())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in log label: BCPWRapper
should be BCPWrapper
.
LOG("BCPWRapper: Terminator pointer type: " + std::string(typeid(pTerminator).name())); | |
LOG("BCPWrapper: Terminator pointer type: " + std::string(typeid(pTerminator).name())); |
Copilot uses AI. Check for mistakes.
ADO Work Item Reference
Summary
This pull request introduces significant enhancements to the
mssql_python
package, primarily focused on implementing and supporting the Bulk Copy Program (BCP) functionality for SQL Server. The changes include the addition of a newBCPClient
class, updates to constants and data types, and new classes to configure BCP operations. These updates aim to provide a Pythonic interface for bulk data operations, improving usability and flexibility.BCP Functionality Implementation:
BCPClient
class: Introduced a new class inmssql_python/bcp_main.py
to handle bulk copy operations using SQL Server's BCP utility. The class provides methods for initializing and executing BCP operations, with robust error handling and support for both file-based and in-memory operations.Configuration Enhancements:
BindData
,ColumnFormat
, andBCPOptions
classes inmssql_python/bcp_options.py
to define data bindings, column formats, and overall BCP operation settings. These classes enable detailed configuration of bulk copy operations, including data types, terminators, and column mappings.Constants and Accessibility:
Expanded constants for BCP data types: Updated
mssql_python/__init__.py
to include direct variables for BCP data type constants (e.g.,SQLTEXT
,SQLVARCHAR
,SQLBIGCHAR
). This provides easier access to commonly used SQL Server data types.Import pooling functionality: Added import for
PoolingManager
inmssql_python/__init__.py
to ensure pooling functionality is accessible.