Skip to content

[Data] - Replace read_images test with testing invalid bytes instead testing with empty file#62647

Merged
goutamvenkat-anyscale merged 2 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/test_image_fix
Apr 16, 2026
Merged

[Data] - Replace read_images test with testing invalid bytes instead testing with empty file#62647
goutamvenkat-anyscale merged 2 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/test_image_fix

Conversation

@goutamvenkat-anyscale
Copy link
Copy Markdown
Contributor

Description

The valueerror should be raised only in the event that the contents of the file are invalid not if the file is empty

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

…of empty file

Signed-off-by: Goutam <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner April 15, 2026 23:54
@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Apr 15, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the image datasource tests to use the tmp_path fixture and updates the test for unidentified image errors to use a file with invalid bytes. Feedback indicates that the implementation still fails to address the primary objective of handling empty files without raising a ValueError. Additionally, a suggestion was made to use the existing path variable in the test for better clarity and correctness.

with tempfile.NamedTemporaryFile(suffix=".png") as file:
with pytest.raises(ValueError):
ray.data.read_images(paths=file.name).materialize()
def test_unidentified_image_error(ray_start_regular_shared, tmp_path):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PR description states that ValueError should not be raised for empty files. However, this change only updates the error test to use non-empty invalid bytes. The actual implementation in ImageDatasource._read_stream (in python/ray/data/_internal/datasource/image_datasource.py) still raises ValueError for empty files because PIL.Image.open fails on empty buffers.\n\nTo fully address the PR's objective, the implementation should be updated to handle empty files (e.g., by returning an empty block), and a test case should be added to verify that reading an empty file does not raise an error and results in an empty dataset.

file.write(b"spam") # Invalid bytes for a PNG file

with pytest.raises(ValueError):
ray.data.read_images(paths=file.name).materialize()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable path is already defined as a string on line 172. It is cleaner to use path directly instead of file.name, especially since the file object is closed at this point.

Suggested change
ray.data.read_images(paths=file.name).materialize()
ray.data.read_images(paths=path).materialize()

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit eba3b7a. Configure here.

file.write(b"spam") # Invalid bytes for a PNG file

with pytest.raises(ValueError):
ray.data.read_images(paths=file.name).materialize()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses closed file handle instead of existing path variable

Low Severity

file.name is used on line 177 after the with open(...) block has exited, even though the local variable path (defined on line 172) already holds the identical value. Referencing an attribute on a closed file handle outside its context manager is needlessly confusing when a clearer alternative is already in scope. paths=path would be more straightforward.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit eba3b7a. Configure here.

@goutamvenkat-anyscale goutamvenkat-anyscale enabled auto-merge (squash) April 16, 2026 15:36
@github-actions github-actions bot disabled auto-merge April 16, 2026 15:37
@goutamvenkat-anyscale goutamvenkat-anyscale merged commit 808478d into ray-project:master Apr 16, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants