Add result_set_type_hints for precise complex type conversion#690
Merged
laughingman7743 merged 18 commits intomasterfrom Feb 28, 2026
Merged
Add result_set_type_hints for precise complex type conversion#690laughingman7743 merged 18 commits intomasterfrom
laughingman7743 merged 18 commits intomasterfrom
Conversation
The Athena GetQueryResults API only returns base type names (e.g., "array", "map", "row") without nested type signatures, causing _convert_value() to use heuristic inference that incorrectly converts varchar values like "1234" to int(1234) inside complex types. This adds a result_set_type_hints parameter to all cursor execute() methods so users can provide full Athena DDL type signatures for precise conversion. Also changes the default behavior so nested elements without type hints remain as strings instead of being heuristically inferred (breaking change). Closes #689 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TypeNode, TypeSignatureParser, and TypedValueConverter into a new pyathena/parser.py module. TypedValueConverter receives converter dependencies via constructor injection to avoid circular imports. Also moves _split_array_items to parser.py as a shared parsing utility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2251ca6 to
5d05791
Compare
Native format complex types (map, struct) now return string values instead of type-inferred values to prevent incorrect conversions (e.g., varchar "1234" → int 1234). JSON format paths are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestTypeSignatureParser and TestTypedValueConverter test the parser module directly, so they belong in a dedicated test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Place the private helper function before public classes for clearer top-down reading order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document class-based vs standalone function test patterns, fixture usage with indirect parametrization, and integration vs unit test distinction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ll string - Only pass type_hint kwarg when hint exists (avoids breaking custom Converters) - Use json.dumps for dict/list in JSON paths instead of str() (fixes nested structs) - Use convert() instead of _convert_element() in JSON paths (preserves "null" strings) - Use _split_array_items in typed map native path (supports nested row/map values) - Normalize result_set_type_hints keys to lowercase for case-insensitive lookup - Cache DefaultTypeConverter instance in S3FS converter - Add unit tests for all fixed edge cases Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion - Fix _parse_type_hint docstring to match renamed method - Add docstring to DefaultTypeConverter.convert - Remove unused delimiter parameter from _split_type_args - Use TYPE_CHECKING for DefaultTypeConverter type annotation in S3FS converter Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The JSON parse path in _convert_typed_struct used positional indexing (field_types[i]) to assign types to fields. This breaks when JSON key order differs from the type definition order. Use _get_field_type() which matches by field name first, falling back to positional index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the motivation (Athena API lacks nested type info), usage, constraints (nested arrays in native format, Arrow/Pandas/Polars), and the breaking change in 3.30.0 (complex type internals kept as strings without hints). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ResultSet: Pre-compute column_type_hints tuple once in
_process_metadata instead of per-cell dict creation and .lower()
lookup. Replace **({} if ... else {}) with simple if/else branching.
Applied to AthenaResultSet, AthenaDictResultSet, and S3FS.
- Array JSON guard: Add JSON detection heuristic (check for '"', '[{',
'[null') before json.loads in _convert_typed_array, matching the
existing pattern in map/struct to avoid JSONDecodeError exceptions
on native format strings.
- TypeNode field lookup: Add cached _field_type_map dict for O(1)
name-based field type resolution, replacing O(n) list.index() in
_get_field_type.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Check column metadata types against _COMPLEX_TYPES (array, map, row, struct) in _process_metadata. Only compute and store column type hints when the result set actually contains complex type columns with matching hints. This eliminates all hint-related overhead in the hot loop for queries that return only scalar types. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add '[[' to JSON detection guard in _convert_typed_array so nested
arrays like [[1,2],[3]] are parsed via json.loads instead of falling
through to native format (which returns None for nested arrays).
- Pre-compute _column_types and _column_names tuples once in
_process_metadata. Use them in _get_rows to eliminate per-cell
meta.get("Type") and meta.get("Name") dict lookups.
- S3FSResultSet._fetch() reuses _column_types from parent instead of
rebuilding from self.description on every call.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d hints - Normalize Hive-style DDL syntax (array<struct<a:int>>) to Trino-style so users can paste DESCRIBE TABLE output directly as type hints - Resolve type alias "int" to "integer" in the parser - Fall back to untyped conversion when typed converter returns None, preventing silent data loss on parse failures - Support integer keys in result_set_type_hints for index-based column resolution, enabling hints for duplicate column names - Update type annotations across all cursor/result_set files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Feb 28, 2026
- Use _find_matching_paren() instead of assuming closing ')' is at end of string, so trailing modifiers don't break parsing - Replace naive comma split with _split_array_items() in unnamed struct path to handle nested values correctly Closes #693, closes #694. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was
linked to
issues
Feb 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WHAT
Add
result_set_type_hintsparameter to all cursorexecute()methods and change default behavior for nested type conversion.Breaking Change
_convert_value()no longer performs heuristic type inference (isdigit, float detection, bool detection) for elements inside complex types parsed from Athena's native format. Values now remain as strings by default.Before:
[{string: 1234}, {string: "value"}](int inferred from varchar)After:
[{string: "1234"}, {string: "value"}](stays as string)New
result_set_type_hintsParameterUsers who need typed conversion of nested elements can provide full Athena DDL type signatures:
Changes
Core (
pyathena/converter.py)TypeNodedataclass for representing parsed type treesparse_type_signature()recursive parser for Athena DDL type strings_convert_value_with_type(),_convert_typed_array(),_convert_typed_map(),_convert_typed_struct()_convert_value()changed to string-by-default (only null → None)Converter.convert()andDefaultTypeConverter.convert()extended withtype_hintparameterDefaultTypeConverter._parsed_hintsThreading (
result_set_type_hintsparameter added to)execute()methods (10 cursor types across sync/async, standard/pandas/arrow/polars/s3fs)_get_rows()methodsconvert()methodsTests (
tests/pyathena/test_converter.py)parse_type_signature()DDL parserDefaultTypeConverter.convert()with type hintsWHY
The Athena
GetQueryResultsAPI only returns base type names (e.g.,"array","map","row") inColumnInfo.Type, without nested type signatures. This caused_convert_value()to use heuristic inference, incorrectly convertingvarcharvalues like"1234"toint(1234)inside complex types.Closes #689
🤖 Generated with Claude Code