Skip to content

FEAT: BULK COPY #125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

FEAT: BULK COPY #125

wants to merge 17 commits into from

Conversation

jahnvi480
Copy link
Contributor

@jahnvi480 jahnvi480 commented Jul 4, 2025

ADO Work Item Reference

AB#34946


Summary

This pull request introduces significant enhancements to the mssql_python package, primarily focused on implementing and supporting the Bulk Copy Program (BCP) functionality for SQL Server. The changes include the addition of a new BCPClient class, updates to constants and data types, and new classes to configure BCP operations. These updates aim to provide a Pythonic interface for bulk data operations, improving usability and flexibility.

BCP Functionality Implementation:

  • Addition of BCPClient class: Introduced a new class in mssql_python/bcp_main.py to handle bulk copy operations using SQL Server's BCP utility. The class provides methods for initializing and executing BCP operations, with robust error handling and support for both file-based and in-memory operations.

Configuration Enhancements:

  • New BCP configuration classes: Added BindData, ColumnFormat, and BCPOptions classes in mssql_python/bcp_options.py to define data bindings, column formats, and overall BCP operation settings. These classes enable detailed configuration of bulk copy operations, including data types, terminators, and column mappings.

Constants and Accessibility:

  • Expanded constants for BCP data types: Updated mssql_python/__init__.py to include direct variables for BCP data type constants (e.g., SQLTEXT, SQLVARCHAR, SQLBIGCHAR). This provides easier access to commonly used SQL Server data types.

  • Import pooling functionality: Added import for PoolingManager in mssql_python/__init__.py to ensure pooling functionality is accessible.

@Copilot Copilot AI review requested due to automatic review settings July 4, 2025 11:25
@github-actions github-actions bot added the pr-size: large Substantial code update label Jul 4, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds full support for SQL Server’s Bulk Copy Program (BCP) in the mssql_python package by introducing new client classes, configuration options, and C++ bindings.

  • Introduces BCPClient, BCPOptions, ColumnFormat, and BindData for Python-side BCP workflows
  • Extends ddbc_bindings with BCP API function pointers and implements BCPWrapper in C++
  • Adds integration and unit tests covering format files, column formatting, queries, and in-memory BCP

Reviewed Changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/testBCP/test_002_bcpMain.py New integration tests exercising BCPClient end-to-end
tests/testBCP/test_001_bulkOptions.py Unit tests for BCPOptions, ColumnFormat, and BindData
mssql_python/pybind/ddbc_bindings.h Declares BCP API function pointer types
mssql_python/pybind/ddbc_bindings.cpp Loads BCP API pointers and exports new BCPWrapper class
mssql_python/pybind/connection/connection.h Exposes DBC handle accessor for BCP initialization
mssql_python/pybind/connection/connection.cpp Applies SQL_COPT_SS_BCP attribute before connect
mssql_python/pybind/bcp/bcp_wrapper.h Declares BCPWrapper interface
mssql_python/pybind/bcp/bcp_wrapper.cpp Implements BCP logic, column binding, and error handling
mssql_python/pybind/CMakeLists.txt Includes bcp_wrapper.cpp in the build
mssql_python/constants.py Adds BCPControlOptions and BCPDataTypes enums
mssql_python/bcp_options.py Defines Python dataclasses for BCP options and bindings
mssql_python/bcp_main.py Implements BCPClient orchestration logic
mssql_python/init.py Exposes BCP classes and constants in the package API
Comments suppressed due to low confidence (1)

tests/testBCP/test_002_bcpMain.py:1

  • [nitpick] Consider adding negative or error‐path tests (e.g., invalid format file, unsupported direction) to cover exceptional flows in BCPClient.
import os

+ ", direction code: " + std::to_string(dir_code));
// Call BCPInitW with the correct parameters
SQLRETURN ret = BCPInitW_ptr(_hdbc, pTable, pDataFile, pErrorFile, dir_code);
LOG("BCPWrapper: HELLOOOO " + std::to_string(ret));
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or replace this informal debug log (HELLOOOO) with a more descriptive message or eliminate it if no longer needed.

Suggested change
LOG("BCPWrapper: HELLOOOO " + std::to_string(ret));
LOG("BCPWrapper: BCPInitW_ptr returned value: " + std::to_string(ret));

Copilot uses AI. Check for mistakes.

Comment on lines +274 to +276
std::string value_utf8 = py::cast(value).cast<std::string>();
// Fix the BCPHINTS issue: use the wide string directly rather than converting to narrow
SQLRETURN ret = BCPControlW_ptr(_hdbc, it->second.option_code, (LPVOID)value_utf8.c_str());
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're passing a narrow char* to the wide‐string BCPControlW function. You should convert value to std::wstring and pass its c_str() so the API receives a LPCWSTR.

Suggested change
std::string value_utf8 = py::cast(value).cast<std::string>();
// Fix the BCPHINTS issue: use the wide string directly rather than converting to narrow
SQLRETURN ret = BCPControlW_ptr(_hdbc, it->second.option_code, (LPVOID)value_utf8.c_str());
// Fix the BCPHINTS issue: use the wide string directly rather than converting to narrow
SQLRETURN ret = BCPControlW_ptr(_hdbc, it->second.option_code, (LPVOID)value.c_str());

Copilot uses AI. Check for mistakes.

if (SQLGetConnectAttr_ptr) {
SQLINTEGER bcp_attr_value = 0;
SQLINTEGER string_length = 0;
std::cout << "BCPWrapper: Checking BCP attribute just before initialization." << std::endl;
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace direct std::cout usage with the existing LOG(...) utility to keep all diagnostic output consistent and configurable.

Suggested change
std::cout << "BCPWrapper: Checking BCP attribute just before initialization." << std::endl;
LOG("BCPWrapper: Checking BCP attribute just before initialization.");

Copilot uses AI. Check for mistakes.

typedef SQLRETURN (SQL_API* BCPColFmtWFunc)(SQLHDBC, INT, INT, INT, DBINT, LPCBYTE, INT, INT);
typedef SQLRETURN (SQL_API* BCPExecFunc)(SQLHDBC, DBINT*);
typedef SQLRETURN (SQL_API* BCPDoneFunc)(SQLHDBC);
typedef SQLRETURN (SQL_API* BCPBindFunc)(HDBC, LPCBYTE, INT, DBINT, LPCBYTE, INT, INT, INT);
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first parameter should be SQLHDBC instead of the undefined HDBC type to match the BCP API signature.

Suggested change
typedef SQLRETURN (SQL_API* BCPBindFunc)(HDBC, LPCBYTE, INT, DBINT, LPCBYTE, INT, INT, INT);
typedef SQLRETURN (SQL_API* BCPBindFunc)(SQLHDBC, LPCBYTE, INT, DBINT, LPCBYTE, INT, INT, INT);

Copilot uses AI. Check for mistakes.

@@ -716,6 +728,17 @@ DriverHandle LoadDriverOrThrowException() {

SQLGetDiagRec_ptr = GetFunctionPointer<SQLGetDiagRecFunc>(handle, "SQLGetDiagRecW");

// Load BCP functions
BCPInitW_ptr = (BCPInitWFunc)GetProcAddress(handle, "bcp_initW");
BCPControlW_ptr = (BCPControlWFunc)GetProcAddress(handle, "bcp_control");
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’ve bound BCPControlWFunc but are loading the ANSI entry bcp_control instead of the wide‐char bcp_controlW. Update to GetProcAddress(handle, "bcp_controlW").

Suggested change
BCPControlW_ptr = (BCPControlWFunc)GetProcAddress(handle, "bcp_control");
BCPControlW_ptr = (BCPControlWFunc)GetProcAddress(handle, "bcp_controlW");

Copilot uses AI. Check for mistakes.

)

self.wrapper = BCPWrapper(connection._conn)
print(f"connection: {connection._conn}")
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug print should be removed or replaced with a logger call to avoid unintended console output in production.

Suggested change
print(f"connection: {connection._conn}")
logger.debug(f"connection: {connection._conn}")

Copilot uses AI. Check for mistakes.

col_fmt_obj.file_col,
col_fmt_obj
)
print("col_fmt_obj: %s", col_fmt_obj)
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this ad-hoc print. Use the logging framework (logger.debug(...)) if you need to inspect col_fmt_obj.

Suggested change
print("col_fmt_obj: %s", col_fmt_obj)
logger.debug("col_fmt_obj: %s", col_fmt_obj)

Copilot uses AI. Check for mistakes.

Comment on lines +341 to +342
# self.wrapper.finish()
# self.wrapper.close()
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out cleanup calls (finish/close) may lead to leaked BCP resources. Ensure you explicitly finalize and close the wrapper in all paths.

Suggested change
# self.wrapper.finish()
# self.wrapper.close()
self.wrapper.finish()
self.wrapper.close()

Copilot uses AI. Check for mistakes.

LOG("BCPWrapper: Terminator bytes provided: " + py::cast(terminator_str_holder).cast<std::string>());
pTerminator = reinterpret_cast<const BYTE*>(terminator_str_holder.data());
LOG("BCPWrapper: Terminator pointer: " + std::to_string(reinterpret_cast<uintptr_t>(pTerminator)));
LOG("BCPWRapper: Terminator pointer type: " + std::string(typeid(pTerminator).name()));
Copy link
Preview

Copilot AI Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in log label: BCPWRapper should be BCPWrapper.

Suggested change
LOG("BCPWRapper: Terminator pointer type: " + std::string(typeid(pTerminator).name()));
LOG("BCPWrapper: Terminator pointer type: " + std::string(typeid(pTerminator).name()));

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-size: large Substantial code update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant