refactor(api): tighten core rag typing batch 1#35210
refactor(api): tighten core rag typing batch 1#35210tmimmanuel wants to merge 4 commits intolanggenius:mainfrom
Conversation
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-14 19:16:29.937002281 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-14 19:16:19.785880733 +0000
@@ -38,26 +38,6 @@
--> core/llm_generator/llm_generator.py:394:60
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], model_parameters=dict[str, float], stream=Literal[False]) [no-matching-overload]
--> core/llm_generator/llm_generator.py:582:60
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:172:17
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:197:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:217:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:239:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:256:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:281:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:309:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:328:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:344:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:361:13
ERROR Argument `dict[str, list[str] | str | None]` is not assignable to parameter `attributes` with type `dict[str, str] | None` in function `mlflow.tracing.fluent.start_span_no_context` [bad-argument-type]
--> core/ops/mlflow_trace/mlflow_trace.py:271:24
ERROR Argument `dict[str, dict[str, Any] | str | None]` is not assignable to parameter `attributes` with type `dict[str, str] | None` in function `mlflow.tracing.fluent.start_span_no_context` [bad-argument-type]
@@ -598,16 +578,6 @@
--> services/document_indexing_proxy/duplicate_document_indexing_task_proxy.py:14:5
ERROR Class member `DuplicateDocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
--> services/document_indexing_proxy/duplicate_document_indexing_task_proxy.py:15:5
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any` is not assignable to parameter `top_k` with type `int` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:86:19
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | float | int | Any` is not assignable to parameter `score_threshold` with type `float | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:87:29
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any | None` is not assignable to parameter `reranking_model` with type `RerankingModelDict | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:90:29
-ERROR Argument `Literal['reranking_model', True] | RetrievalMethod | dict[str, str] | int | Any` is not assignable to parameter `reranking_mode` with type `str` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:93:28
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any | None` is not assignable to parameter `weights` with type `WeightsDict | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:94:21
ERROR `handled_tenant_count` was assigned in the current scope before the nonlocal declaration [unknown-name]
--> services/plugin/plugin_migration.py:92:34
ERROR `dict[str, Any]` is not assignable to attribute `credentials` with type `Never` [bad-assignment]
|
There was a problem hiding this comment.
Pull request overview
This PR is part of the ongoing #26412 refactor to eliminate pyright ignores by tightening/static-typing fixes across api/core/rag and adjacent services, aiming for no intended runtime behavior changes.
Changes:
- Fix an unused
as_completed()loop binding inRetrievalService.retrieve. - Tighten typing for retrieval-model dict handling in hit testing and for special-token parameters in text splitters.
- Add an explicit initialization/guard for
upload_filein the extract processor for better null-safety under type checking.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| api/services/hit_testing_service.py | Introduces a local typed dict + cast to narrow retrieval model handling. |
| api/core/rag/splitter/text_splitter.py | Narrows special-token parameter types and avoids mutable defaults. |
| api/core/rag/splitter/fixed_text_splitter.py | Aligns special-token typing and makes an intentionally-unused encoder explicit. |
| api/core/rag/extractor/extract_processor.py | Refactors upload_file initialization/guards for type-checking. |
| api/core/rag/datasource/retrieval_service.py | Fixes an unused loop variable in an as_completed() loop. |
| api/core/model_manager.py | Adjusts round-robin invocation calls to satisfy type-check stack. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| upload_file = extract_setting.upload_file | ||
| with tempfile.TemporaryDirectory() as temp_dir: | ||
| if not file_path: | ||
| assert extract_setting.upload_file is not None, "upload_file is required" | ||
| upload_file: UploadFile = extract_setting.upload_file | ||
| assert upload_file is not None, "upload_file is required" | ||
| suffix = Path(upload_file.key).suffix | ||
| # FIXME mypy: Cannot determine type of 'tempfile._get_candidate_names' better not use it here | ||
| file_path = f"{temp_dir}/{next(tempfile._get_candidate_names())}{suffix}" # type: ignore | ||
| storage.download(upload_file.key, file_path) | ||
| input_file = Path(file_path) | ||
| file_extension = input_file.suffix.lower() | ||
| assert upload_file is not None, "upload_file is required" |
There was a problem hiding this comment.
upload_file is asserted as non-None unconditionally after file_path resolution. This breaks callers that pass file_path with no upload_file (e.g. load_from_url() constructs ExtractSetting(datasource_type=FILE, document_model=...) without upload_file and calls extract(..., file_path=...)), causing an AssertionError for non-upload flows.
Consider only asserting upload_file in the branches that actually use upload_file fields (PDF/DOCX paths that need tenant_id/created_by), or restructure so URL-based extraction doesn’t require upload_file unless the chosen extractor needs it.
…rd-router refactor(api): tighten jieba keyword typing batch 2
Head branch was pushed to by a user without write access
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-14 19:58:08.056549989 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-14 19:57:59.637314455 +0000
@@ -38,26 +38,6 @@
--> core/llm_generator/llm_generator.py:394:60
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], model_parameters=dict[str, float], stream=Literal[False]) [no-matching-overload]
--> core/llm_generator/llm_generator.py:582:60
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:172:17
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:197:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:217:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:239:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:256:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:281:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:309:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:328:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:344:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:361:13
ERROR Argument `dict[str, list[str] | str | None]` is not assignable to parameter `attributes` with type `dict[str, str] | None` in function `mlflow.tracing.fluent.start_span_no_context` [bad-argument-type]
--> core/ops/mlflow_trace/mlflow_trace.py:271:24
ERROR Argument `dict[str, dict[str, Any] | str | None]` is not assignable to parameter `attributes` with type `dict[str, str] | None` in function `mlflow.tracing.fluent.start_span_no_context` [bad-argument-type]
@@ -70,14 +50,8 @@
--> core/ops/mlflow_trace/mlflow_trace.py:415:24
ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
--> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:144:13
ERROR Cannot index into `set[Any]` [bad-index]
- --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
- --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
+ --> core/rag/datasource/keyword/jieba/jieba.py:159:29
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:106:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -598,16 +572,6 @@
--> services/document_indexing_proxy/duplicate_document_indexing_task_proxy.py:14:5
ERROR Class member `DuplicateDocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
--> services/document_indexing_proxy/duplicate_document_indexing_task_proxy.py:15:5
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any` is not assignable to parameter `top_k` with type `int` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:86:19
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | float | int | Any` is not assignable to parameter `score_threshold` with type `float | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:87:29
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any | None` is not assignable to parameter `reranking_model` with type `RerankingModelDict | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:90:29
-ERROR Argument `Literal['reranking_model', True] | RetrievalMethod | dict[str, str] | int | Any` is not assignable to parameter `reranking_mode` with type `str` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:93:28
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any | None` is not assignable to parameter `weights` with type `WeightsDict | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:94:21
ERROR `handled_tenant_count` was assigned in the current scope before the nonlocal declaration [unknown-name]
--> services/plugin/plugin_migration.py:92:34
ERROR `dict[str, Any]` is not assignable to attribute `credentials` with type `Never` [bad-assignment]
|
Pyrefly Diffbase → PR--- /tmp/pyrefly_base.txt 2026-04-14 20:00:12.871661953 +0000
+++ /tmp/pyrefly_pr.txt 2026-04-14 20:00:02.365745411 +0000
@@ -38,26 +38,6 @@
--> core/llm_generator/llm_generator.py:394:60
ERROR No matching overload found for function `core.model_manager.ModelInstance.invoke_llm` called with arguments: (prompt_messages=list[SystemPromptMessage | UserPromptMessage], model_parameters=dict[str, float], stream=Literal[False]) [no-matching-overload]
--> core/llm_generator/llm_generator.py:582:60
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:172:17
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:197:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:217:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:239:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:256:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:281:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:309:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:328:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:344:13
-ERROR Missing positional argument `function` in function `ModelInstance._round_robin_invoke` [bad-argument-count]
- --> core/model_manager.py:361:13
ERROR Argument `dict[str, list[str] | str | None]` is not assignable to parameter `attributes` with type `dict[str, str] | None` in function `mlflow.tracing.fluent.start_span_no_context` [bad-argument-type]
--> core/ops/mlflow_trace/mlflow_trace.py:271:24
ERROR Argument `dict[str, dict[str, Any] | str | None]` is not assignable to parameter `attributes` with type `dict[str, str] | None` in function `mlflow.tracing.fluent.start_span_no_context` [bad-argument-type]
@@ -70,14 +50,8 @@
--> core/ops/mlflow_trace/mlflow_trace.py:415:24
ERROR Class member `OpsTraceProviderConfigMap.__getitem__` overrides parent class `UserDict` in an inconsistent manner [bad-param-name-override]
--> core/ops/ops_trace_manager.py:206:9
-ERROR Object of class `NoneType` has no attribute `data_source_type` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:142:36
-ERROR Object of class `NoneType` has no attribute `keyword_table` [missing-attribute]
- --> core/rag/datasource/keyword/jieba/jieba.py:144:13
ERROR Cannot index into `set[Any]` [bad-index]
- --> core/rag/datasource/keyword/jieba/jieba.py:157:29
-ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
- --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
+ --> core/rag/datasource/keyword/jieba/jieba.py:159:29
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
--> core/rag/extractor/notion_extractor.py:106:25
ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]
@@ -598,16 +572,6 @@
--> services/document_indexing_proxy/duplicate_document_indexing_task_proxy.py:14:5
ERROR Class member `DuplicateDocumentIndexingTaskProxy.PRIORITY_TASK_FUNC` overrides parent class `BatchDocumentIndexingProxy` in an inconsistent manner [bad-override]
--> services/document_indexing_proxy/duplicate_document_indexing_task_proxy.py:15:5
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any` is not assignable to parameter `top_k` with type `int` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:86:19
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | float | int | Any` is not assignable to parameter `score_threshold` with type `float | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:87:29
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any | None` is not assignable to parameter `reranking_model` with type `RerankingModelDict | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:90:29
-ERROR Argument `Literal['reranking_model', True] | RetrievalMethod | dict[str, str] | int | Any` is not assignable to parameter `reranking_mode` with type `str` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:93:28
-ERROR Argument `RetrievalMethod | bool | dict[str, str] | int | Any | None` is not assignable to parameter `weights` with type `WeightsDict | None` in function `core.rag.datasource.retrieval_service.RetrievalService.retrieve` [bad-argument-type]
- --> services/hit_testing_service.py:94:21
ERROR `handled_tenant_count` was assigned in the current scope before the nonlocal declaration [unknown-name]
--> services/plugin/plugin_migration.py:92:34
ERROR `dict[str, Any]` is not assignable to attribute `credentials` with type `Never` [bad-assignment]
|
Summary
as_completed()loop binding in retrieval serviceupload_filein extract processorTest plan
uv run --directory api --dev -- basedpyright --threads 8passes with 0 errors./dev/pyrefly-check-localpasses with 0 errorsuv --directory api run mypy --exclude-gitignore --exclude 'tests/' --exclude 'migrations/' --check-untyped-defs --disable-error-code=import-untyped .passes with 0 errorsPart of #26412
Please review my previous PRs (#34809 #34702 #34796 #34938) which is for same issue(#26412).