Fixes #25721: Handle Pinot JSON results with a custom type#27530
Fixes #25721: Handle Pinot JSON results with a custom type#27530hyspacex wants to merge 1 commit intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
| def process(value: Any) -> Any: | ||
| if value is None or isinstance(value, (dict, list)): | ||
| return value | ||
|
|
||
| if isinstance(value, (str, bytes, bytearray)): | ||
| try: | ||
| return json.loads(value) | ||
| except (TypeError, ValueError) as exc: | ||
| logger.warning( | ||
| "Failed to deserialize Pinot JSON value. Returning raw value instead: %s", | ||
| exc, | ||
| ) | ||
| return value | ||
|
|
||
| return value |
There was a problem hiding this comment.
💡 Edge Case: result_processor passes through numeric/bool JSON scalars unparsed
In PinotJSONType.result_processor, a raw JSON numeric or boolean scalar (e.g., the string "42" or "true" representing a JSON column value) would fall through to the isinstance(value, str) branch and be parsed correctly via json.loads. However, if Pinot's single-stage engine returns a Python int, float, or bool (which are valid JSON scalars), these fall through to the final return value without being covered by the isinstance(value, (dict, list)) short-circuit. This is functionally correct (they're already deserialized), but the docstring and the test coverage only describe containers. Consider adding a test case for scalar JSON values (int/float/bool) to document this behavior explicitly.
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsIntegrates custom type handling for Pinot JSON results to resolve parsing errors. Ensure result_processor is updated to correctly handle numeric and boolean JSON scalars. 💡 Edge Case: result_processor passes through numeric/bool JSON scalars unparsed📄 ingestion/src/metadata/ingestion/source/database/pinotdb/custom_types.py:33-47 📄 ingestion/tests/unit/topology/database/test_pinotdb.py:66-80 In 🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
What changed
PinotJSONType(types.JSON)with a customresult_processorjsoncolumns to that type instead of bare SQLAlchemyJSONJSONsemantics by registering the custom type in the column type parserWhy
Pinot returns JSON values in two different shapes depending on engine mode:
With SQLAlchemy 1.4.x, the generic
types.JSONresult pipeline crashes when Pinot returns non-bytes JSON payloads on the sample-data path. SQLAlchemy 2.0 masks the issue onmain, but there was no Pinot-specific regression coverage for it.This change makes Pinot JSON handling explicit in the connector instead of depending on SQLAlchemy version-specific behavior, while keeping Pinot columns exposed as
JSONin OM.Validation
Notes
make generatecompleted successfully in the containerized toolchain used for verification.[all](pkg_resources/cx_Oracleunder build isolation) before test executionisinstancefallback inget_column_type_mappingif maintainers prefer that over the current explicit Pinot type registration in the generic parser