feat(pdf): add PDF support by jxnl · Pull Request #1434 · 567-labs/instructor

jxnl · 2025-03-19T03:28:51Z

Important

Adds comprehensive PDF support with a new PDF class for handling and converting PDFs across AI providers, along with extensive tests and documentation.

Behavior:
- Adds PDF class in multimodal.py for handling PDFs from URLs, file paths, and base64 data.
- Supports automatic format detection and conversion for OpenAI and Anthropic.
- Includes error handling for invalid formats and missing files.
Testing:
- Adds tests for PDF class in test_multimodal.py, test_openai/test_multimodal.py, and test_anthropic/test_multimodal.py.
- Tests cover loading from different sources, format conversion, and error handling.
Documentation:
- Adds pdf.md to docs/concepts detailing usage, loading methods, and provider-specific instructions.
- Updates mkdocs.yml to include PDF support in the navigation.

^{This description was created by}^{for a1bb82f. It will automatically update as commits are pushed.}

Important

Adds comprehensive PDF support with a new PDF class for handling and converting PDFs across AI providers, along with extensive tests and documentation.

Behavior:
- Adds PDF class in multimodal.py for handling PDFs from URLs, file paths, and base64 data.
- Supports automatic format detection and conversion for OpenAI, Anthropic, and Mistral.
- Includes error handling for invalid formats and missing files.
Testing:
- Adds tests for PDF class in test_multimodal.py, test_openai/test_multimodal.py, and test_anthropic/test_multimodal.py.
- Tests cover loading from different sources, format conversion, and error handling.
Documentation:
- Adds pdf.md to docs/concepts detailing usage, loading methods, and provider-specific instructions.
- Updates mkdocs.yml to include PDF support in the navigation.

^{This description was created by}^{for 0de14bd. It will automatically update as commits are pushed.}

- Add PDF class with support for URL and base64 loading\n- Implement format detection and conversion\n- Add provider-specific formatting for OpenAI and Anthropic\n- Include error handling and validation

- Add core PDF functionality tests\n- Add OpenAI-specific PDF tests\n- Add Anthropic-specific PDF tests\n- Include error handling tests

- Add comprehensive PDF usage guide\n- Include provider-specific details\n- Add code examples for both OpenAI and Anthropic\n- Document best practices and error handling\n- Update mkdocs.yml to include PDF documentation

cloudflare-workers-and-pages · 2025-03-19T03:28:53Z

Deploying instructor-py with Cloudflare Pages

Latest commit:	`0de14bd`
Status:	✅ Deploy successful!
Preview URL:	https://875c564c.instructor-py.pages.dev
Branch Preview URL:	https://feat-pdf-support.instructor-py.pages.dev

View logs

ellipsis-dev

❌ Changes requested. Reviewed everything up to a1bb82f in 2 minutes and 11 seconds

More details

Looked at 741 lines of code in 6 files
Skipped 0 files when reviewing.
Skipped posting 6 drafted comments based on config settings.

1. instructor/multimodal.py:691

Draft comment:
PDF provider methods (to_anthropic and to_openai) correctly differentiate between URL and base64 sources. Consider adding inline comments to clarify the logic for future maintainability.
Reason this comment was not posted:
Confidence changes required: 30% <= threshold 85%
None

2. docs/concepts/pdf.md:13

Draft comment:
Link to OpenAI PDF Files Guide appears incomplete (missing closing parenthesis).
Reason this comment was not posted:
Marked as duplicate.

3. instructor/multimodal.py:22

Draft comment:
Duplicate import of 'Mode' from .mode detected; remove one of the duplicates.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. instructor/multimodal.py:393

Draft comment:
PDF instances are not handled in convert_contents; update isinstance check to include PDF.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. docs/concepts/pdf.md:13

Draft comment:
Typographical error: The markdown link for [OpenAI PDF Files Guide] is missing a closing parenthesis after the URL. Please update the link, e.g., https://platform.openai.com/docs/guides/pdf-files?api-mode) to ensure it renders correctly.
Reason this comment was not posted:
Marked as duplicate.

6. mkdocs.yml:272

Draft comment:
There appears to be a typographical error in the file name 'simtom.md' (line 272) under the 'Zero-Shot' section. Please verify if this should be corrected to, for example, 'simulate.md' or an intended alternative.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_J3ilCZ5QeUZgNeQE

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

docs/concepts/pdf.md

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

jxnl · 2025-03-19T03:37:41Z

This is a draft that does not work @ivanleomk hoping you can take this the final mile sometime in april

ellipsis-dev

❌ Changes requested. Incremental review on 6ae61e6 in 1 minute and 43 seconds

More details

Looked at 195 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 7 drafted comments based on config settings.

1. instructor/multimodal.py:579

Draft comment:
Consider adding a comment about the specific OSError errno 63 check, as it may be platform-specific.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

2. instructor/multimodal.py:719

Draft comment:
The PDF.to_openai method now returns a 'file' type mapping. Ensure the related documentation and tests are updated to reflect this change.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. instructor/multimodal.py:579

Draft comment:
The try/except block in PDF.autodetect uses errno 63 to check for overly long file names, which may not be portable. Consider using a more platform-agnostic method or broader exception handling.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

4. instructor/multimodal.py:719

Draft comment:
In PDF.to_openai, refetching the PDF from URL to convert to base64 may lead to unnecessary overhead. Consider caching the response or reusing existing data if available.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

5. tests/llm/test_openai/test_multimodal.py:158

Draft comment:
Typo: The class name 'Reciept' is likely intended to be 'Receipt'.
Reason this comment was not posted:
Marked as duplicate.

6. tests/llm/test_openai/test_multimodal.py:190

Draft comment:
The test_multimodal_pdf_file asserts fixed expected values (total = 220, 2 items). Ensure that the invoice PDF asset remains consistent; otherwise, update the test to adapt to changes in the input asset.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 85%
None

7. tests/llm/test_openai/test_multimodal.py:164

Draft comment:
Typographical error: The class name 'Reciept' is misspelled. Consider renaming it to 'Receipt' to correct the typo.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_q1GMiYyOiZwEWesX

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

tests/llm/test_openai/test_multimodal.py

instructor/multimodal.py

ellipsis-dev

👍 Looks good to me! Incremental review on a8831ef in 1 minute and 39 seconds

More details

Looked at 178 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 7 drafted comments based on config settings.

1. tests/test_multimodal.py:321

Draft comment:
Using '# type: ignore' on patch statements may hide real type issues. Consider adjusting type annotations or using more specific mocks.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85%
This is a test file where mocking is common and type checkers often struggle with mocked objects. The type: ignore comments are on patch statements which are a standard testing pattern. The mocks return simple values (boolean, bytes, MagicMock) that shouldn't cause type issues. Removing the ignores would likely just add noise from the type checker without improving safety.
The comment has a point that type: ignore can hide issues. In production code, this would be concerning.
However, this is test code where mocking is necessary and type checkers commonly need to be ignored. The mocked return values are simple and type-safe.
Delete the comment. Type ignores on patch statements in tests are a common and acceptable pattern when the mocked values are simple and type-safe.

2. tests/test_multimodal.py:454

Draft comment:
Removal of PDF to OpenAI conversion tests appears deliberate. Ensure that the functionality is now covered by TOOLS mode and that related documentation and examples are updated accordingly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85%
The comment is speculative and asks for verification rather than pointing out a specific issue. It asks the author to "ensure" things are covered, which violates our rule about not asking authors to verify things. The functionality appears to be intentionally moved to TOOLS mode as evidenced by test_convert_contents_with_pdf(), so this isn't pointing out a real problem.
The comment might be highlighting a legitimate concern about documentation and examples needing updates when API changes occur.
While documentation updates are important, the comment is phrased as a verification request rather than pointing out specific missing documentation. If there are specific docs or examples that need updates, those should be called out directly.
Delete this comment. It's asking for verification rather than pointing out specific issues, and the code changes appear intentional with test coverage already in place.

3. tests/test_multimodal.py:488

Draft comment:
In 'test_convert_contents_with_pdf', the conversion mode for OpenAI has switched to TOOLS and the expected result type changed to 'file'. Confirm that these changes are consistent with the intended PDF conversion API.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 85%
The comment is asking the PR author to confirm their intention regarding changes in a test case. This violates the rule against asking for confirmation of intention. The comment does not provide a specific suggestion or point out a clear issue that needs addressing.

4. tests/test_multimodal.py:321

Draft comment:
Avoid using '# type: ignore' on patch calls unless necessary. Consider providing proper type annotations or stubs to prevent hiding genuine type issues.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%

Type: ignore comments in tests are generally more acceptable than in production code. 2. These are standard mock patterns that often have typing issues due to the dynamic nature of mocking. 3. Adding proper type stubs for mocked objects in tests could add more complexity than value. 4. The type ignores are very targeted and specific to mock calls, not hiding broader issues.
The comment has a valid point about type safety, but may be overly strict for test code where type checking is less critical.
While type safety is important, the effort to properly type mock objects in tests may not provide enough value to justify the added complexity.
Delete the comment. The type: ignore comments are acceptable in this test context and removing them would create more work than value.

5. tests/test_multimodal.py:436

Draft comment:
PDF-to-OpenAI conversion tests (test_pdf_to_openai_url and test_pdf_to_openai_base64) were removed. Ensure this change is intentional and that the updated API behavior (using convert_contents with Mode.TOOLS) is clearly documented.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 50% <= threshold 85%
The comment is asking the author to ensure the change is intentional, which is against the rules. However, it also mentions the need for documentation updates, which is valid if the library code changes. The comment is partially useful because it highlights the need for documentation updates.

6. tests/test_multimodal.py:488

Draft comment:
In test_convert_contents_with_pdf, the expected type for PDF conversion using Mode.TOOLS changed to 'file' (instead of 'document_url'). Confirm that this new output format aligns with the intended API design and update the documentation accordingly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 85%
The comment asks the author to "confirm" and "verify" the API design change, which violates our rules about not asking authors to verify things. The comment also suggests updating documentation, but doesn't specify what documentation needs updating or why. The comment is speculative and not actionable.
The change in expected type could indicate an important API change that needs documentation. Maybe removing this comment would cause documentation to get out of sync.
While documentation is important, the comment doesn't provide specific guidance about what documentation needs updating. It just asks the author to "verify" things, which our rules explicitly prohibit.
Delete this comment. It violates our rules by asking the author to verify things and makes vague suggestions about documentation without being specific or actionable.

7. tests/test_multimodal.py:390

Draft comment:
Minor formatting: Several PDF tests have extra blank lines. Consider removing unnecessary whitespace to maintain consistency across the test suite.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 85%
None

Workflow ID: wflow_1Zmc5340OG5JXEMh

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on d0cc7ae in 36 seconds

More details

Looked at 32 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. instructor/multimodal.py:583

Draft comment:
Great use of 'raise ... from err' to preserve original error context.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

2. tests/test_multimodal.py:5

Draft comment:
Removed unused 'os' and 'requests' imports; improves code cleanliness.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

3. instructor/multimodal.py:583

Draft comment:
Good use of exception chaining with 'from err' to preserve context.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

4. tests/test_multimodal.py:3

Draft comment:
Removed unused imports (Audio, os, requests) to clean up the test module.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

Workflow ID: wflow_gdbyQclvsa7yGrhF

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 8b2387d in 55 seconds

More details

Looked at 13 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. tests/llm/test_anthropic/test_multimodal.py:240

Draft comment:
Using the directly imported PDF is cleaner than referencing instructor.multimodal.PDF.
Reason this comment was not posted:
Comment looked like it was already resolved.

2. tests/llm/test_anthropic/test_multimodal.py:240

Draft comment:
Use the directly imported PDF instead of instructor.multimodal.PDF to reduce redundancy.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

Workflow ID: wflow_0lQ6GkQKKSNnGiPj

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on eb40105 in 1 minute and 53 seconds

More details

Looked at 20 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. instructor/multimodal.py:696

Draft comment:
DRY: The literal {"cache_control": {"type": "ephemeral"}} is repeated in both branches. Consider refactoring to a common assignment to reduce duplication.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 85%
None

2. instructor/multimodal.py:707

Draft comment:
Consider extracting the repeated cache control literal into a constant to avoid duplication across branches.
Reason this comment was not posted:
Comment was on unchanged code.

Workflow ID: wflow_2B4hYRnH2CHaiwTq

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-04-01T12:03:52Z

instructor/multimodal.py

+                    "type": "url",
+                    "url": self.source,
+                },
+                "cache_control": {"type": "ephemeral"},


Consider extracting the repeated literal for cache control ("cache_control": {"type": "ephemeral"}) into a constant to adhere to the DRY principle.

ellipsis-dev

❌ Changes requested. Incremental review on 1e28fa9 in 1 minute and 53 seconds

More details

Looked at 212 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. instructor/multimodal.py:507

Draft comment:
Updated error message is clearer now.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

2. instructor/multimodal.py:616

Draft comment:
Consider adding a timeout or maximum retry count in the while loop to prevent potential infinite waiting if file.state never becomes ACTIVE.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
This is a reasonable suggestion for production code. Infinite loops are generally dangerous and having a timeout is a good practice. The code could potentially hang forever if there's an issue with the file processing. However, I need to be careful not to be too speculative - is this a real issue that needs fixing?
The comment is somewhat speculative - we don't know if file.state failing to become ACTIVE is actually a real issue that happens in practice. Maybe the Google API guarantees it will eventually become active.
Even if it's rare, having no timeout is still a code quality issue. Infinite loops are dangerous in production systems and timeouts are a standard safety mechanism. The fix is simple and low-risk.
This is a valid code quality suggestion that would improve the robustness of the code with minimal downside. While somewhat speculative, the risk of infinite loops justifies adding a timeout.

3. instructor/multimodal.py:630

Draft comment:
Provide more context in the error message when an uploaded PDF does not meet the expected state.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

4. instructor/multimodal.py:764

Draft comment:
In to_genai, consider handling errors when fetching and decoding base64 content from the URL to better inform users of failures.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

5. tests/llm/test_genai/test_multimodal.py:53

Draft comment:
Ensure that both local path and base64 string PDF sources are well tested for autodetection, as shown in the parametrization.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

6. instructor/multimodal.py:505

Draft comment:
Improved error message clarity – ensure related docs are updated accordingly.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

7. instructor/multimodal.py:574

Draft comment:
Removed gs:// handling from PDF.autodetect – confirm this removal is intentional and update documentation if necessary.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

8. instructor/multimodal.py:623

Draft comment:
For consistency and testability, consider accepting a client as a parameter in 'from_uploaded_genai_file' rather than instantiating a new Client internally.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 85%
None

Workflow ID: wflow_0JxcRWaic1WQ0gRM

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

instructor/multimodal.py

ellipsis-dev

❌ Changes requested. Incremental review on 2d4e0dd in 1 minute and 57 seconds

More details

Looked at 175 lines of code in 4 files
Skipped 0 files when reviewing.
Skipped posting 11 drafted comments based on config settings.

1. instructor/multimodal.py:788

Draft comment:
Consider adding a docstring to the new 'to_mistral' method in the PDF class to explain its purpose and limitations (only supports document URLs).
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

2. instructor/process_response.py:264

Draft comment:
Ensure that new_kwargs['messages'] is always set when calling convert_to_mistral_messages to avoid potential KeyError.
Reason this comment was not posted:
Confidence changes required: 60% <= threshold 85%
None

3. instructor/utils.py:1010

Draft comment:
Using 'isinstance(content_part, (PDF))' is equivalent to 'isinstance(content_part, PDF)'; consider using the simpler form for clarity.
Reason this comment was not posted:
Confidence changes required: 70% <= threshold 85%
None

4. tests/llm/test_mistral/test_multimodal.py:6

Draft comment:
Verify that the PDF asset URL remains stable to ensure reliable test outcomes over time.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

5. instructor/multimodal.py:406

Draft comment:
Include an inline comment clarifying why PDF objects are only processed for MISTRAL modes in this branch.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. instructor/utils.py:1010

Draft comment:
For clarity, consider using tuple syntax (PDF,) instead of (PDF) in isinstance checks.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
The suggestion is technically correct - when checking for a single type in isinstance(), using (PDF,) is clearer than (PDF) since it explicitly shows tuple syntax. (PDF) works but could be confused with just parentheses. This is a minor but real improvement in code clarity. The change is simple and unambiguous.
The change is very minor and the code works correctly either way. Some might argue this level of nitpicking isn't worth a comment.
While minor, this is a clear, actionable suggestion that follows Python best practices and improves code clarity with virtually no downside. The fact that it's a small change makes it easy to implement.
Keep the comment as it suggests a clear, actionable improvement to code clarity that follows Python best practices.

7. tests/llm/test_mistral/test_multimodal.py:25

Draft comment:
Consider mocking the HTTP request in PDF.from_url to avoid external dependency and ensure test reliability.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%
This is a test file and generally it's good practice to mock external dependencies in tests. However, this appears to be an integration test specifically testing the PDF processing capabilities with Mistral. Mocking the HTTP request might defeat the purpose of testing the full PDF processing pipeline. The PDF is hosted on the same repository, making it relatively stable.
I might be underestimating the importance of test isolation and reliability. External requests can make tests flaky and slow.
While test isolation is important, this appears to be an intentional integration test that needs to test the full PDF processing pipeline, including URL fetching. The PDF is hosted on the project's own GitHub repository, providing reasonable stability.
The comment should be deleted as it suggests a change that could compromise the test's purpose of validating the full PDF processing pipeline.

8. instructor/process_response.py:108

Draft comment:
Typo: In the debug log stating 'Returning takes from IterableBase', 'takes' should be corrected to 'tasks' to match the actual list comprehension that returns tasks.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. instructor/process_response.py:183

Draft comment:
Typo: The log message 'Returning takes from IterableBase' should be updated to 'Returning tasks from IterableBase' for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

10. instructor/process_response.py:396

Draft comment:
Typo: The variable 'implict_forced_tool_message' is misspelled. It should be renamed to 'implicit_forced_tool_message' to correctly reflect its meaning.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. instructor/utils.py:1010

Draft comment:
In the convert_to_mistral_messages function, the isinstance check for PDF is written as isinstance(content_part, (PDF)). Although this works, it's more clear and consistent to use a tuple with a trailing comma, e.g. isinstance(content_part, (PDF,)), especially since elsewhere in the code multiple types are organized in a tuple. Consider updating this for consistency.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%
While the suggestion would make the code more consistent with other similar checks in the file, this is a very minor style issue. The code works correctly either way. The rules state not to make comments that are obvious or unimportant. This seems to fall into that category - it's a purely stylistic suggestion that doesn't impact functionality or readability significantly.
The code would be more consistent with the suggested change, and consistency in code style can make maintenance easier. The comment is also very specific and actionable.
While consistency is good, this is too minor of an issue to warrant a PR comment. The rules explicitly state not to make comments that are obvious or unimportant.
Delete this comment. While technically correct, it's too minor of a style issue to warrant a PR comment, and the rules specifically say not to make unimportant comments.

Workflow ID: wflow_JT0igdl5t6srbtKU

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

instructor/multimodal.py

ellipsis-dev

👍 Looks good to me! Incremental review on 5afaa11 in 1 minute and 52 seconds

More details

Looked at 24 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. instructor/multimodal.py:417

Draft comment:
Avoid using '# type: ignore' without explanation. Consider using an explicit cast or adding a comment on why ignoring type checks is safe.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%
The type: ignore comment is likely needed because the type checker can't verify that content.to_mistral() returns the correct type for converted_contents. However, the code appears to be working correctly and the type: ignore is a reasonable solution. Adding a comment explaining this wouldn't make the code any clearer or safer.
The comment has a point that unexplained type: ignore comments can be confusing. However, in this case the type issue is fairly obvious from context.
While documenting type ignores can be good practice, in this case the reason is clear from the surrounding code context and an explanation would just add noise.
The comment should be deleted. The type: ignore is appropriate here and its purpose is clear enough from context. Adding an explanation would not meaningfully improve code quality.

2. instructor/multimodal.py:637

Draft comment:
Avoid bare '# type: ignore' on file.uri and file.mime_type. Please clarify or cast explicitly.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
The type ignore comments are used because the Google GenAI API types may not properly indicate that uri and mime_type are strings. Using explicit casts would make the type coercion more intentional and clear. The suggestion follows good typing practices by being explicit rather than suppressing type checks. The change would improve code maintainability without changing runtime behavior.
The comment assumes the types should be strings without verifying the actual types from the Google GenAI API. There could be a reason why type ignore was chosen over casting.
Even if the original types are not strings, explicit casting is still better practice than type ignore as it makes the intended type conversion clear and localized.
The comment should be kept as it suggests a concrete improvement to type safety by replacing vague type ignores with explicit casts.

3. instructor/multimodal.py:417

Draft comment:
Prefer explicit casting or proper type annotations over '# type: ignore' for content.to_mistral().
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 85%
None

4. instructor/multimodal.py:637

Draft comment:
Avoid using '# type: ignore' for file.uri and file.mime_type; use explicit casts if necessary.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 85%
None

Workflow ID: wflow_zdwgKSFOz2My0OQ0

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

…ge examples and methods

ellipsis-dev

❌ Changes requested. Incremental review on 05d5ee6 in 1 minute and 36 seconds

More details

Looked at 236 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. tests/llm/test_mistral/test_multimodal.py:45

Draft comment:
Consider adding tests for additional PDF loading methods (e.g., from_path, from_base64) to fully cover the new PDF functionality.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

2. docs/concepts/pdf.md:14

Draft comment:
Typo: 'reciept' should be 'receipt'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
While spelling errors in documentation should generally be fixed, we need to consider if this meets the bar for being an important enough issue. This is user-facing documentation, and "receipt" is a technical term that should be spelled correctly. The correction is simple, clear, and unambiguous.
The typo doesn't affect code functionality or understanding of the documentation. It could be considered a minor issue that doesn't warrant a comment.
However, since this is public documentation and "receipt" is a technical term, having it spelled correctly maintains professionalism and prevents confusion.
Keep the comment. The spelling correction is straightforward and improves the quality of public documentation.

3. docs/concepts/pdf.md:14

Draft comment:
Typo detected: Please change 'reciept' to 'receipt' to correct the spelling error.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
While spelling errors should be fixed, especially in documentation, the rules don't specifically address spelling/grammar fixes. However, since this is documentation that will be user-facing, and it's a clear objective error in newly added content, it seems reasonable to keep this comment.
Documentation readability and professionalism could be considered outside the scope of the review rules. Maybe this should be handled by a spell checker instead of PR comments.
Since this is user-facing documentation and the error is objective and easily fixable, keeping the comment helps maintain documentation quality without creating unnecessary work.
Keep the comment as it identifies a clear spelling error in newly added documentation that should be fixed for professionalism.

Workflow ID: wflow_xSTClFQeoDdhskZG

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-04-01T21:51:29Z

docs/concepts/pdf.md

+
+## Basic Usage
+
+Here's a simple example of how to use PDFs with instructor where we'll extract out the line items and the total cost from a reciept invoice.


Typo: Change 'reciept' to 'receipt' for proper spelling.

Suggested change

Here's a simple example of how to use PDFs with instructor where we'll extract out the line items and the total cost from a reciept invoice.

Here's a simple example of how to use PDFs with instructor where we'll extract out the line items and the total cost from a receipt invoice.

ellipsis-dev

❌ Changes requested. Incremental review on 8ee8970 in 2 minutes and 53 seconds

More details

Looked at 379 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 7 drafted comments based on config settings.

1. docs/integrations/openai.md:175

Draft comment:
Typo: 'vison' should be 'vision'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
This is a simple typo fix in documentation. While typos are generally minor, this one is in a newly added section and could cause confusion since "vision" is a technical term referring to the model's capabilities. The fix is clear and unambiguous.
The typo is minor and doesn't affect code functionality. Documentation typos could be considered too trivial to comment on.
Since this is in newly added documentation about a technical capability, having the correct spelling of "vision" is important for clarity and professionalism.
Keep the comment since it fixes a typo in new documentation where technical accuracy is important.

2. docs/integrations/mistral.md:354

Draft comment:
PDF Analysis example incorrectly uses the Anthropic client and model. It should use the Mistral client (from_mistral) and an appropriate Mistral model (e.g. 'mistral-large-latest').
Reason this comment was not posted:
Marked as duplicate.

3. docs/integrations/openai.md:315

Draft comment:
Possible stray closing bracket in the expected output comment for the Audio analysis example.
Reason this comment was not posted:
Confidence changes required: 66% <= threshold 85%
None

4. docs/integrations/anthropic.md:436

Draft comment:
Typo: 'vison capabilities' should be corrected to 'vision capabilities'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%
While this is technically a typo, the review rules emphasize focusing on logic and code quality issues rather than minor text fixes. The typo doesn't impact functionality or understanding significantly. The rules say to only keep comments that require clear code changes, and this is more of a cosmetic fix.
The typo could potentially confuse readers since "vison" is not a real word. Documentation clarity is important for usability.
While documentation clarity matters, this typo is minor enough that most readers would still understand the meaning from context. The review rules emphasize focusing on more substantial issues.
Delete this comment as it focuses on a minor text issue rather than a substantive code or logic problem that requires changing.

5. docs/integrations/mistral.md:338

Draft comment:
Typo: In the 'PDF Analysis' section, consider replacing 'Anthropic's Claude models' with 'Mistral's models' to keep the naming consistent and avoid confusion.
Reason this comment was not posted:
Marked as duplicate.

6. docs/integrations/openai.md:175

Draft comment:
Typo: 'vison' should be corrected to 'vision'.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
While this is a legitimate typo, typo corrections in documentation are generally low-value comments that clutter the review. The typo is minor and doesn't affect code functionality or understanding. However, since this is user-facing documentation, clarity and professionalism are important.
The typo is very minor and doesn't significantly impact readability. The comment could be seen as nitpicky.
However, this is public-facing documentation where professionalism matters, and the fix is simple and clear.
Keep the comment since it improves documentation quality with minimal effort.

7. docs/integrations/openai.md:421

Draft comment:
Typo: 'ambigious' should be corrected to 'ambiguous'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_N4Ulia4qOhUZl3Vm

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-04-02T14:35:28Z

docs/integrations/mistral.md

+
+Note that we support local files and base64 strings too with the `from_file` and the `from_base64` class methods.
+
+### PDF Analysis


The PDF Analysis section is copy-pasted from Anthropic. Update text to refer to Mistral and adjust the imports accordingly.

ellipsis-dev · 2025-04-02T14:35:28Z

docs/integrations/openai.md

+)
+
+print(response)
+# > transcript='Four score and seven years ago our fathers..."]


Remove stray closing bracket in the sample output comment for Audio.

Suggested change

# > transcript='Four score and seven years ago our fathers..."]

# > transcript='Four score and seven years ago our fathers...'

ellipsis-dev · 2025-04-02T14:35:29Z

docs/integrations/anthropic.md

+
+> For a more in-depth walkthrough of the Image component, check out the [docs here](../concepts/multimodal.md)
+
+Instructor makes it easy to analyse and extract semantic information from images using Anthropic's Claude models. [Click here](https://docs.anthropic.com/en/docs/about-claude/models/all-models) to check if the model you'd like to use has vison capabilities.


Typo: 'vison capabilities' should be 'vision capabilities'.

Suggested change

Instructor makes it easy to analyse and extract semantic information from images using Anthropic's Claude models. [Click here](https://docs.anthropic.com/en/docs/about-claude/models/all-models) to check if the model you'd like to use has vison capabilities.

Instructor makes it easy to analyse and extract semantic information from images using Anthropic's Claude models. [Click here](https://docs.anthropic.com/en/docs/about-claude/models/all-models) to check if the model you'd like to use has vision capabilities.

ellipsis-dev · 2025-04-02T14:35:30Z

docs/integrations/openai.md

+
+By abstracting away the specific implementation details to support each modality, we're able to combine Multiple modalities with ease.
+
+Below let's see an example where we're asking the model to write a story based off the audio file and the image it sees as the seting for the story.


Typo: 'seting' should be corrected to 'setting'.

Suggested change

Below let's see an example where we're asking the model to write a story based off the audio file and the image it sees as the seting for the story.

Below let's see an example where we're asking the model to write a story based off the audio file and the image it sees as the setting for the story.

docs/integrations/openai.md

…ding PDFs

ellipsis-dev

👍 Looks good to me! Incremental review on 0de14bd in 2 minutes and 23 seconds

More details

Looked at 121 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 11 drafted comments based on config settings.

1. docs/integrations/genai.md:406

Draft comment:
Great addition of a PDF Processing section with examples.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

2. docs/integrations/openai.md:225

Draft comment:
Updated PDF example with correct comments; looks consistent with the new PDF support.
Reason this comment was not posted:
Confidence changes required: 0% <= threshold 85%
None

3. instructor/multimodal.py:616

Draft comment:
The new parameters 'max_retries' and 'retry_delay' improve robustness. Consider using a more specific exception than Exception.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%
While using more specific exception types is generally good practice, in this case the distinction between Exception and RuntimeError is minimal. RuntimeError is still quite generic and doesn't add meaningful information. The error message itself is clear and descriptive. This seems like a nitpicky suggestion that doesn't materially improve the code.
The comment does promote a good practice of using specific exceptions. And RuntimeError would technically be more semantically accurate since this is a runtime issue, not a syntax/compile time issue.
However, the benefit is very minor since both Exception types are quite generic. The clear error message is doing the real work of explaining the error to users.
This comment should be deleted as it suggests a change that would not meaningfully improve the code. The existing implementation with a descriptive error message is sufficient.

4. docs/integrations/genai.md:437

Draft comment:
Doc mismatch: The snippet comments reference PDF.from_file(...), but the PDF class defines from_path. Update docs or add an alias for consistency.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85%
I need to be careful here. While the comment points out a potential inconsistency, I don't have access to the actual PDF class implementation to verify if this is correct. The comment makes an assumption about the implementation that I can't verify. Additionally, even if true, this kind of minor documentation inconsistency could be considered not important enough to block the PR.
I could be wrong about dismissing this - consistent API naming is important for developer experience. If there really is an inconsistency, it could confuse users.
However, without being able to verify the actual implementation, keeping this comment risks making an incorrect suggestion. The safer choice is to not make assumptions about code we can't see.
The comment should be deleted since we cannot verify the actual implementation of the PDF class and don't want to risk making incorrect assumptions.

5. docs/integrations/openai.md:257

Draft comment:
Doc mismatch: The example comments use PDF.from_file(...), but the PDF class implements from_path. Consider updating the docs for consistency or providing an alias.
Reason this comment was not posted:
Comment was on unchanged code.

6. instructor/multimodal.py:22

Draft comment:
Duplicate import detected: from .mode import Mode is imported twice (line 2 and line 22). Remove one.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

7. instructor/multimodal.py:624

Draft comment:
Consider enhancing the retry loop in from_local_path_to_genai by adding logging and/or an exponential backoff strategy instead of a fixed sleep.
Reason this comment was not posted:
Confidence changes required: 80% <= threshold 85%
None

8. instructor/multimodal.py:688

Draft comment:
For consistency with documentation, consider adding an alias (e.g., from_file = from_path) in the PDF class so that users can use PDF.from_file as shown in the docs.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. docs/integrations/genai.md:407

Draft comment:
Typographical error: In the sentence 'Instructor makes it easy to analyse and extract semantic information from PDFs using the GenAI SDK .', remove the extra space before the period so that it reads '...using the GenAI SDK.'
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 80% vs. threshold = 85%
While this is a very minor issue, it is a clear formatting error in newly added text. The fix is simple and unambiguous. The comment provides a clear suggestion for the fix. Since this is documentation, formatting matters for professionalism.
The issue is extremely minor and could be considered too trivial to comment on. Some might argue that whitespace issues don't impact functionality.
Documentation quality and consistency matters for professional projects. The fix is clear and quick to implement.
Keep the comment as it identifies a clear formatting issue in new documentation with an unambiguous fix.

10. docs/integrations/openai.md:175

Draft comment:
Typo: 'vison capabilities' should be 'vision capabilities'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. docs/integrations/openai.md:415

Draft comment:
Typo: 'ambigious' should be 'ambiguous'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_tls7odNYcfwe10ju

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

jxnl added 3 commits March 18, 2025 20:28

feat(pdf): add PDF class to multimodal module

0d6e226

- Add PDF class with support for URL and base64 loading\n- Implement format detection and conversion\n- Add provider-specific formatting for OpenAI and Anthropic\n- Include error handling and validation

test(pdf): add comprehensive test suite for PDF support

69e38c5

- Add core PDF functionality tests\n- Add OpenAI-specific PDF tests\n- Add Anthropic-specific PDF tests\n- Include error handling tests

docs(pdf): add PDF support documentation

a1bb82f

- Add comprehensive PDF usage guide\n- Include provider-specific details\n- Add code examples for both OpenAI and Anthropic\n- Document best practices and error handling\n- Update mkdocs.yml to include PDF documentation

jxnl requested a review from ivanleomk March 19, 2025 03:28

github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request python Pull requests that update python code size:M This PR changes 30-99 lines, ignoring generated files. labels Mar 19, 2025

ellipsis-dev bot reviewed Mar 19, 2025

View reviewed changes

docs/concepts/pdf.md Outdated Show resolved Hide resolved

Apply suggestions from code review

33540de

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

ivanleomk added 2 commits April 1, 2025 10:30

Merge branch 'main' into feat/pdf-support

5ca6045

feat: add openai completions support

6ae61e6

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

tests/llm/test_openai/test_multimodal.py Outdated Show resolved Hide resolved

instructor/multimodal.py Outdated Show resolved Hide resolved

ivanleomk added 5 commits April 1, 2025 13:23

Merge branch 'main' into feat/pdf-support

184ee61

feat: add anthropic support

eddf847

fix: refactor out url to base64 logic into a single function

7e16113

fix: fix naming issue

365be0b

fix: remove some tests from test_multimodal

a8831ef

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

Fix: ruff errors

d0cc7ae

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

fix: more ruff

8b2387d

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

fix: add cache control to PDF model

eb40105

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

fix: fixed ordering in Gemini by manually converting to parts

1e28fa9

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

instructor/multimodal.py Show resolved Hide resolved

fix: configured mistral PDF support

2d4e0dd

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

instructor/multimodal.py Show resolved Hide resolved

fix: add type ignore comments for Mistral PDF conversion

5afaa11

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

chore: update version to 1.7.9 and enhance PDF documentation with usa…

05d5ee6

…ge examples and methods

ellipsis-dev bot reviewed Apr 1, 2025

View reviewed changes

feat: add-docs

8ee8970

ellipsis-dev bot reviewed Apr 2, 2025

View reviewed changes

ivanleomk added 2 commits April 2, 2025 16:36

Merge branch 'main' into feat/pdf-support

0a6cd13

feat: enhance PDF processing documentation and update methods for loa…

0de14bd

…ding PDFs

ellipsis-dev bot reviewed Apr 2, 2025

View reviewed changes

ivanleomk closed this Apr 3, 2025

jxnl deleted the feat/pdf-support branch November 12, 2025 00:54


		## Basic Usage

		Here's a simple example of how to use PDFs with instructor where we'll extract out the line items and the total cost from a reciept invoice.


		Note that we support local files and base64 strings too with the `from_file` and the `from_base64` class methods.

		### PDF Analysis

	# > transcript='Four score and seven years ago our fathers..."]
	# > transcript='Four score and seven years ago our fathers...'


		> For a more in-depth walkthrough of the Image component, check out the [docs here](../concepts/multimodal.md)

		Instructor makes it easy to analyse and extract semantic information from images using Anthropic's Claude models. [Click here](https://docs.anthropic.com/en/docs/about-claude/models/all-models) to check if the model you'd like to use has vison capabilities.


		By abstracting away the specific implementation details to support each modality, we're able to combine Multiple modalities with ease.

		Below let's see an example where we're asking the model to write a story based off the audio file and the image it sees as the seting for the story.

Uh oh!

Conversation

jxnl commented Mar 19, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages bot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying instructor-py with Cloudflare Pages

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jxnl commented Mar 19, 2025

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jxnl commented Mar 19, 2025 •

edited by ellipsis-dev bot

Loading

cloudflare-workers-and-pages bot commented Mar 19, 2025 •

edited

Loading