Skip to content

Conversation

@dlevy-msft-sql
Copy link
Contributor

@dlevy-msft-sql dlevy-msft-sql commented Jan 25, 2026

Summary

Implements the -f flag from ODBC sqlcmd for specifying input and output file code pages.

Changes

  • cmd/sqlcmd/sqlcmd.go: Added CodePage and ListCodePages fields, -f/--code-page flag, validation, and runtime application
  • pkg/sqlcmd/codepage.go: Comprehensive codepage support with parsing and encoding conversion
  • pkg/sqlcmd/codepage_test.go: Unit tests for codepage parsing and supported encodings
  • pkg/sqlcmd/commands.go: Updated :R command to respect codepage settings when reading files
  • pkg/sqlcmd/sqlcmd.go: Added CodePage field to Sqlcmd struct
  • cmd/sqlcmd/sqlcmd_test.go: Command line argument tests for valid and invalid codepage values
  • README.md: Documentation with format examples

Usage

# Single codepage for both input and output
sqlcmd -S server -f 65001 -i script.sql -o results.txt

# Different codepages for input and output
sqlcmd -S server -f i:1252,o:65001 -i windows_script.sql -o utf8_results.txt

# List all supported codepages
sqlcmd --list-codepages

Supported Codepages

  • Unicode: 65001 (UTF-8), 1200 (UTF-16LE), 1201 (UTF-16BE)
  • Windows: 874, 1250-1258
  • OEM/DOS: 437, 850, etc.
  • ISO-8859: 28591-28606
  • CJK: 932 (Shift-JIS), 936 (GB2312), 949 (Korean), 950 (Big5)
  • EBCDIC: 37, 1047, 1140

Testing

  • All existing tests pass
  • Added command line argument parsing tests
  • Added unit tests for codepage parsing and validation

Improves ODBC sqlcmd compatibility.

- Add -f/--code-page flag with ODBC-compatible format parsing

- Support 50+ codepages: Unicode, Windows, OEM/DOS, ISO-8859, CJK, EBCDIC, Macintosh

- Apply input codepage in IncludeFile() for :r command

- Apply output codepage in outCommand() for :OUT file writes

- Add --list-codepages flag to display all supported codepages

- Add comprehensive unit tests for parsing and encoding lookup
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ODBC sqlcmd-compatible -f/--code-page support to control input/output file encodings (plus --list-codepages) for improved interoperability.

Changes:

  • Introduces codepage parsing/validation and a mapping from Windows codepages to Go encodings.
  • Applies configured codepages when reading -i / :R files and when writing -o / :OUT / :ERROR outputs.
  • Adds CLI argument tests, unit tests for codepage parsing/lookup, and README documentation.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cmd/sqlcmd/sqlcmd.go Adds -f/--code-page, --list-codepages, validation, and wiring into runtime Sqlcmd configuration.
cmd/sqlcmd/sqlcmd_test.go Adds CLI parsing/validation test cases for the new flags.
pkg/sqlcmd/codepage.go Implements ParseCodePage, GetEncoding, and SupportedCodePages.
pkg/sqlcmd/codepage_test.go Adds unit tests for parsing and encoding lookup.
pkg/sqlcmd/commands.go Applies output codepage transforms in :OUT and :ERROR.
pkg/sqlcmd/sqlcmd.go Applies input codepage transforms (or BOM-based auto-detect fallback) when including files.
README.md Documents -f usage and --list-codepages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

- Fix nil encoding panic in errorCommand when OutputCodePage is 65001 (UTF-8)
- Close file handle in outCommand when GetEncoding returns an error
- Handle close error properly in errorCommand
- Apply UTF-8 BOM stripping when input codepage is 65001
- Fix test subtest names to use strconv.Itoa instead of string(rune)
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

- Use localizer.Errorf for all user-facing error messages
- Fix UTF-16 BOM handling using ExpectBOM for input decoding
- Add transformWriteCloser to properly close underlying file handles
- Use transformWriteCloser in outCommand and errorCommand for both
  UnicodeOutputFile and CodePage transforms to prevent file handle leaks
- Add integration tests for output/error codepage encoding
Use strconv.Itoa instead of %d to avoid locale-specific
thousands separators in error message.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

- Create codepageRegistry map as single source of truth for codepages
- GetEncoding() now uses the registry instead of switch statement
- SupportedCodePages() now generates list from registry
- Removes duplicate codepage definitions between the two functions
- Sort SupportedCodePages result by codepage number for consistency
- Verify all returned codepages are valid in GetEncoding
- Ensure results are sorted by codepage number
- Check well-known codepages are present
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

@dlevy-msft-sql dlevy-msft-sql self-assigned this Jan 25, 2026
@dlevy-msft-sql dlevy-msft-sql added the sqlcmd switch switch in existing sqlcmd label Jan 25, 2026
@dlevy-msft-sql dlevy-msft-sql added the Size: S Small issue (less than one week effort) label Jan 25, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

settings.OutputCodePage = cp
}
}

Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseCodePage currently accepts inputs that contain no actual codepage value (e.g. "," or only whitespace) and returns a non-nil CodePageSettings with both InputCodePage/OutputCodePage left as 0. That silently disables codepage handling even though the user supplied -f. Consider detecting this case (arg != "" but neither codepage parsed) and returning an error (and add a unit test for it).

Suggested change
// If a non-empty argument was provided but no codepage was parsed,
// treat this as an error rather than silently disabling codepage handling.
if arg != "" && settings.InputCodePage == 0 && settings.OutputCodePage == 0 {
return nil, localizer.Errorf("invalid codepage: %s", arg)
}

Copilot uses AI. Check for mistakes.
Comment on lines +349 to +354
// UTF-8 codepage: still apply BOM stripping
utf8bom := unicode.BOMOverride(unicode.UTF8.NewDecoder())
reader = transform.NewReader(f, utf8bom)
}
} else {
// Default: auto-detect BOM for UTF-16, fallback to UTF-8
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "UTF-8 codepage: still apply BOM stripping" is a bit misleading: unicode.BOMOverride will also switch decoders when it sees a UTF-16 BOM (not just strip a UTF-8 BOM). If the intent is broader BOM auto-detection, consider updating the comment; if the intent is only stripping a UTF-8 BOM, consider using a transformer that only removes the UTF-8 BOM.

Suggested change
// UTF-8 codepage: still apply BOM stripping
utf8bom := unicode.BOMOverride(unicode.UTF8.NewDecoder())
reader = transform.NewReader(f, utf8bom)
}
} else {
// Default: auto-detect BOM for UTF-16, fallback to UTF-8
// UTF-8 codepage: use BOMOverride to strip UTF-8 BOM and auto-detect UTF-16 BOMs, defaulting to UTF-8 otherwise
utf8bom := unicode.BOMOverride(unicode.UTF8.NewDecoder())
reader = transform.NewReader(f, utf8bom)
}
} else {
// Default: auto-detect BOMs (UTF-8/UTF-16) and decode accordingly, falling back to UTF-8 when no BOM is present

Copilot uses AI. Check for mistakes.
Comment on lines +464 to +468
name string
codepage int
expectedBytes []byte
inputText string
skipOnEncError bool
Copy link

Copilot AI Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skipOnEncError field in this test table is never used. Please remove it or use it (e.g., to skip assertions when a target encoding can’t represent the input text), to avoid dead/misleading test code.

Suggested change
name string
codepage int
expectedBytes []byte
inputText string
skipOnEncError bool
name string
codepage int
expectedBytes []byte
inputText string

Copilot uses AI. Check for mistakes.
@dlevy-msft-sql
Copy link
Contributor Author

PR replaced by #638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Size: S Small issue (less than one week effort) sqlcmd switch switch in existing sqlcmd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants