[Data] - Replace read_images test with testing invalid bytes instead testing with empty file#62647
Conversation
…of empty file Signed-off-by: Goutam <goutam@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the image datasource tests to use the tmp_path fixture and updates the test for unidentified image errors to use a file with invalid bytes. Feedback indicates that the implementation still fails to address the primary objective of handling empty files without raising a ValueError. Additionally, a suggestion was made to use the existing path variable in the test for better clarity and correctness.
| with tempfile.NamedTemporaryFile(suffix=".png") as file: | ||
| with pytest.raises(ValueError): | ||
| ray.data.read_images(paths=file.name).materialize() | ||
| def test_unidentified_image_error(ray_start_regular_shared, tmp_path): |
There was a problem hiding this comment.
The PR description states that ValueError should not be raised for empty files. However, this change only updates the error test to use non-empty invalid bytes. The actual implementation in ImageDatasource._read_stream (in python/ray/data/_internal/datasource/image_datasource.py) still raises ValueError for empty files because PIL.Image.open fails on empty buffers.\n\nTo fully address the PR's objective, the implementation should be updated to handle empty files (e.g., by returning an empty block), and a test case should be added to verify that reading an empty file does not raise an error and results in an empty dataset.
| file.write(b"spam") # Invalid bytes for a PNG file | ||
|
|
||
| with pytest.raises(ValueError): | ||
| ray.data.read_images(paths=file.name).materialize() |
There was a problem hiding this comment.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Reviewed by Cursor Bugbot for commit eba3b7a. Configure here.
| file.write(b"spam") # Invalid bytes for a PNG file | ||
|
|
||
| with pytest.raises(ValueError): | ||
| ray.data.read_images(paths=file.name).materialize() |
There was a problem hiding this comment.
Uses closed file handle instead of existing path variable
Low Severity
file.name is used on line 177 after the with open(...) block has exited, even though the local variable path (defined on line 172) already holds the identical value. Referencing an attribute on a closed file handle outside its context manager is needlessly confusing when a clearer alternative is already in scope. paths=path would be more straightforward.
Reviewed by Cursor Bugbot for commit eba3b7a. Configure here.


Description
The valueerror should be raised only in the event that the contents of the file are invalid not if the file is empty
Related issues
Additional information