Skip to content

refactor(api): add null safety to extract_processor and firecrawl retry methods#34796

Open
tmimmanuel wants to merge 7 commits intolanggenius:mainfrom
tmimmanuel:refactor/pyright-core-rag-pr3
Open

refactor(api): add null safety to extract_processor and firecrawl retry methods#34796
tmimmanuel wants to merge 7 commits intolanggenius:mainfrom
tmimmanuel:refactor/pyright-core-rag-pr3

Conversation

@tmimmanuel
Copy link
Copy Markdown
Contributor

Summary

  • Fix upload_file possibly-unbound variable in extract_processor.py by
    binding it before the file_path check and adding assert guards at PDF/DOCX
    extractor call sites
  • Fix response possibly-unbound variable in firecrawl_app.py by initializing
    before retry loops with assert after loop completion

Test plan

  • make type-check-core passes with 0 errors
  • All 194 extractor unit tests pass (no test changes needed)
  • No runtime behavior changes — null safety assertions only

Part of #26412

@tmimmanuel tmimmanuel requested a review from JohnJyong as a code owner April 9, 2026 02:47
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-09 02:48:20.427123763 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-09 02:48:11.960163004 +0000
@@ -247,18 +247,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-09 03:57:29.422811241 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-09 03:57:20.131623360 +0000
@@ -247,18 +247,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-10 17:48:23.605064831 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-10 17:48:15.350109953 +0000
@@ -259,18 +259,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@tmimmanuel tmimmanuel force-pushed the refactor/pyright-core-rag-pr3 branch from 15b4dc9 to 2550f8b Compare April 10, 2026 17:56
@github-actions github-actions bot removed the web This relates to changes on the web. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-10 17:57:10.778772619 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-10 17:57:02.988681878 +0000
@@ -259,18 +259,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-10 17:57:32.977208681 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-10 17:57:24.487247813 +0000
@@ -259,18 +259,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-11 15:59:37.562872998 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-11 15:59:28.546845510 +0000
@@ -259,18 +259,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-13 03:08:30.954834180 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-13 03:08:22.174841240 +0000
@@ -259,18 +259,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-13 04:00:58.013223872 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-13 04:00:49.260163054 +0000
@@ -259,18 +259,6 @@
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:47
 ERROR Argument `list[float] | list[list[float]] | None` is not assignable to parameter `vector` with type `Iterable[LaxFloat] | None` in function `core.rag.models.document.Document.__init__` [bad-argument-type]
    --> core/rag/datasource/vdb/weaviate/weaviate_vector.py:467:60
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-14 05:19:59.137666094 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-14 05:19:49.125636320 +0000
@@ -78,18 +78,6 @@
    --> core/rag/datasource/keyword/jieba/jieba.py:157:29
 ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
   --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-14 16:04:24.920043080 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-14 16:04:14.447881608 +0000
@@ -78,18 +78,6 @@
    --> core/rag/datasource/keyword/jieba/jieba.py:157:29
 ERROR Argument `object` is not assignable to parameter `iterable` with type `Iterable[@_]` in function `list.__init__` [bad-argument-type]
   --> core/rag/datasource/keyword/jieba/jieba_keyword_table_handler.py:88:35
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:116:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:126:62
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:152:61
-ERROR `upload_file` may be uninitialized [unbound-name]
-   --> core/rag/extractor/extract_processor.py:158:62
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:183:16
-ERROR `response` may be uninitialized [unbound-name]
-   --> core/rag/extractor/firecrawl/firecrawl_app.py:192:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.post` [bad-argument-type]
    --> core/rag/extractor/notion_extractor.py:106:25
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._api.request` [bad-argument-type]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant