-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Add Dataset.write_images
#38228
Conversation
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Dataset.
write_images`
Dataset.
write_images`Dataset.write_images
Let's tag this as an alpha API and make the necessary changes to the documentation (API reference, working with images)? |
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
…o write-images Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Failing code format is unrelated:
|
def write_images( | ||
self, | ||
path: str, | ||
column: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we require users to set column
by default? Should this be optional? By default we should assume the dataset should only have one column, and users want to write it as an image, right?
|
||
image = Image.fromarray(row[column]) | ||
buffer = io.BytesIO() | ||
image.save(buffer, format=file_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the error message looks like if user provides an illegal file format?
@pytest.mark.parametrize("file_format", [None, "png"]) | ||
def test_write_images(ray_start_regular_shared, file_format, tmp_path): | ||
ds = ray.data.read_images("example://image-datasets/simple") | ||
ds.write_images( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also test followed cases:
- the dataset has multiple columns
- the provided file format is illegal
write_images()
, thenread_images()
back.
path: The path to the destination root directory, where | ||
the images are written to. | ||
column: The column containing the data you want to write to images. | ||
file_format: The image file format to write with. For available options, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's good we provide a link here, let's do one more step further, just to give users a list of common file formats, so they don't need to read the other documentation:
Examples of popular file formats are "png", "jpg", "jpeg", "tif", "tiff", "bmp", "gif". By default write in "png" format.
Users want to write images back to S3 (for example, segmentation masks from batch inference). So, this PR adds a Dataset.write_images API. Signed-off-by: NripeshN <nn2012@hw.ac.uk>
ray-project#37043, ray-project#37874, and ray-project#38228 broke the code-format lint. This PR fixes it. Signed-off-by: NripeshN <nn2012@hw.ac.uk>
Users want to write images back to S3 (for example, segmentation masks from batch inference). So, this PR adds a Dataset.write_images API. Signed-off-by: harborn <gangsheng.wu@intel.com>
ray-project#37043, ray-project#37874, and ray-project#38228 broke the code-format lint. This PR fixes it. Signed-off-by: harborn <gangsheng.wu@intel.com>
Users want to write images back to S3 (for example, segmentation masks from batch inference). So, this PR adds a Dataset.write_images API.
ray-project#37043, ray-project#37874, and ray-project#38228 broke the code-format lint. This PR fixes it.
Users want to write images back to S3 (for example, segmentation masks from batch inference). So, this PR adds a Dataset.write_images API. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
ray-project#37043, ray-project#37874, and ray-project#38228 broke the code-format lint. This PR fixes it. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Users want to write images back to S3 (for example, segmentation masks from batch inference). So, this PR adds a Dataset.write_images API. Signed-off-by: Victor <vctr.y.m@example.com>
ray-project#37043, ray-project#37874, and ray-project#38228 broke the code-format lint. This PR fixes it. Signed-off-by: Victor <vctr.y.m@example.com>
Why are these changes needed?
Users want to write images back to S3 (for example, segmentation masks from batch inference). So, this PR adds a
Dataset.write_images
API.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.