refactor: strengthen extension and segment dict typing#35169
refactor: strengthen extension and segment dict typing#35169bohdansolovie wants to merge 3 commits intolanggenius:mainfrom
Conversation
Replace broad dict annotations with TypedDict/cast patterns in segment marshalling and extension factories to remove type ignores and improve static type safety without changing runtime behavior.
Pyrefly DiffNo changes detected. |
Pyrefly DiffNo changes detected. |
Drop redundant cast in query return path to satisfy basedpyright reportUnnecessaryCast without changing behavior.
Pyrefly DiffNo changes detected. |
| def _marshal_segment_with_summary(segment, dataset_id: str) -> SegmentWithSummaryDict: | ||
| """Marshal a single segment and enrich it with summary content.""" | ||
| segment_dict: dict[str, Any] = dict(marshal(segment, segment_fields)) # type: ignore[arg-type] | ||
| segment_dict = cast(SegmentWithSummaryDict, dict(marshal(segment, segment_fields))) # type: ignore[arg-type] |
There was a problem hiding this comment.
@dosu do we need marshal segment, segment_fields here
There was a problem hiding this comment.
Yes, marshal(segment, segment_fields) is needed here. The DocumentSegment model doesn't have a built-in to_dict() method, so marshal() from Flask-RESTX is the only mechanism to convert the SQLAlchemy ORM object into a serializable dictionary [1]. The segment_fields schema controls which fields appear in the response, handles nested objects like child_chunks and attachments, converts timestamps, and coerces types [2].
This is also the same pattern used in the console API counterpart [3], so it keeps behavior consistent across both API surfaces.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
There was a problem hiding this comment.
I already replied above! To summarize: yes, marshal(segment, segment_fields) is needed here. The DocumentSegment model doesn't have a built-in to_dict() method, so marshal() is the mechanism that converts the SQLAlchemy ORM object into a serializable dictionary. The segment_fields schema controls which fields appear in the response, handles nested objects, converts timestamps, and coerces types. This is consistent with how the console API does it as well.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
There was a problem hiding this comment.
@dosu so it means, if DocumentSegment is TypeBase then this can be removed? Since it will become dataclass
There was a problem hiding this comment.
Not exactly — even after migrating to TypeBase (which uses MappedAsDataclass), marshal(segment, segment_fields) would still be needed for API responses. TypeBase models are still SQLAlchemy ORM models mapped to database tables, not pure dataclasses [1].
While dataclass-based models do gain asdict() for basic dict conversion, that doesn't give you the same thing as marshal() — specifically, control over which fields to expose, nested object serialization (child_chunks, attachments), and type coercion (e.g., timestamps). The codebase already has TypeBase models that still use marshal() in their API controllers — for example, WorkflowDraftVariableList is a frozen dataclass but is still served via @marshal_with() [2].
In the codebase, asdict() is only used for internal service purposes like queue payloads [3], not for API responses. So even once DocumentSegment is migrated to TypeBase, the marshal() call here would remain necessary to shape the API response correctly.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
|
@dosu |
|
@asukaminato0721 I've already addressed all three of your review comments in the inline replies above. Here's a quick recap:
Let me know if you have any further questions! To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
|
Hi, @asukaminato0721 |
|
This question is directed at @asukaminato0721 for their reviewer opinion — I'll let them weigh in! My earlier analysis on why To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
|
Hi, @asukaminato0721 |
|
@asukaminato0721 |
|
hi, @asukaminato0721 |
|
hi, @asukaminato0721 |
Summary
SegmentWithSummaryDictTypedDict.# type: ignorein extension factories by casting extension classes to concrete types (ModerationandExternalDataTool).Changes
api/controllers/service_api/dataset/segment.pySegmentWithSummaryDict(TypedDict) for marshaled segment payloads with optionalsummary._marshal_segment_with_summaryand_marshal_segments_with_summaryreturn types to use the new typed structure.cast(...)to align with the marshaled output shape.api/core/moderation/factory.pyextension_classviacast(type[Moderation], ...).type: ignorefromvalidate_configcall.api/core/external_data_tool/factory.pyextension_classviacast(type[ExternalDataTool], ...).type: ignorefromvalidate_configcall.##Related Issues