Add support for 16 bits png images #4657

NicolasHug · 2021-10-19T17:42:02Z

Closes #4107
Closes #2218

This PR adds support for 16 bits pngs. Since pytorch doesn't support the uint16 dtype, we return int32 tensors instead (we indicate in the doc that we will be returning uint16 tensors in the future, if pytorch start supporting those).

Among other things, this will enable training RAFT on the Kitti dataset, which currently can only be done by relying on openCV.

PIL support for 16 bits png is a bit limited and buggy, especially for grayscale images (python-pillow/Pillow#3011). PIL also automatically converts the 16bits values to uint8, loosing tons of precision. This makes it hard to test. For this reason I only added test for one RGB image and one RGBA image. According to a few ad-hoc tests, grayscale images are decoded properly (unlike for PIL).
Also, for all 200 Kitti-Flow ground-truth flow images, this code returns the exact same values as the cv2 version.

This code takes about the same time as cv2 to decode a 1567 x 1965 RGBA image. PIL is a lot faster but I assume that this is because they downcast everything to uint8:

torchvision
228 ms ± 3.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
PIL
114 µs ± 9.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
CV2
234 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Note: I observe the same relative performance on 8 bits images: torchvision == cv2 >> PIL

NicolasHug · 2021-10-20T17:57:05Z

torchvision/csrc/io/image/cpu/decode_png.cpp

+      {int64_t(height), int64_t(width), channels},
+      bit_depth <= 8 ? torch::kU8 : torch::kI32);
+
+  if (bit_depth <= 8) {


this if block is unchanged and corresponds to the original code. I just renamed ptr into t_ptr, because the other block uses too many pointers for ptr to be explicit enough

NicolasHug · 2021-10-20T17:57:58Z

torchvision/csrc/io/image/cpu/decode_png.cpp

@@ -11,6 +11,11 @@ torch::Tensor decode_png(const torch::Tensor& data, ImageReadMode mode) {
 }
 #else

+bool is_little_endian() {


stolen from https://github.com/pytorch/pytorch/blob/4c4525fa5cffb924d0e9b844449e6bd0a0df4aff/torch/csrc/utils/byte_order.cpp#L118

NicolasHug · 2021-10-20T17:59:29Z

torchvision/csrc/io/image/cpu/decode_png.cpp

+    // We're reading a 16bits png, but pytorch doesn't support uint16.
+    // So we read each row in a 16bits tmp_buffer which we then cast into
+    // a int32 tensor instead.
+    if (is_little_endian()) {


@fmassa I eventually realized that this was a much cleaner and simpler way to handle the endianness. The rest takes care of itself when we cast the uint16 value into a int32_t a few lines below

fmassa

Thanks a ton for adding support for 16-bit PNGs!

I have one concern about current implementation, otherwise the rest LGTM!

fmassa · 2021-10-21T07:58:21Z

torchvision/csrc/io/image/cpu/decode_png.cpp

+    uint16_t* tmp_buffer =
+        (uint16_t*)malloc(num_pixels_per_row * sizeof(uint16_t));


This leads to a memory leak in the end of the function.

If you malloc, you need to free after it's used. But you'll need to handle a few corner cases in the freeing size (what if png_read_row fails?).

I think it would be easier to just allocate the buffer via PyTorch torch::empty (or raw data via at::DataPtr via at::getCPUAllocator()->allocate(length);, but I think torch::empty is easier to use, up to you)

fmassa · 2021-10-21T07:59:57Z

torchvision/io/image.py

@@ -61,7 +61,12 @@ def decode_png(input: torch.Tensor, mode: ImageReadMode = ImageReadMode.UNCHANGE
    """
    Decodes a PNG image into a 3 dimensional RGB Tensor.
    Optionally converts the image to the desired format.
-    The values of the output tensor are uint8 between 0 and 255.
+    The values of the output tensor are uint8 between 0 and 255, except for
+    16-bits pngs which are int32 tensors.


Can you also mention the range for 16-bit pngs, which is from 0-65k?

fmassa · 2021-10-21T08:02:31Z

torchvision/csrc/io/image/cpu/decode_png.cpp

+    // We're reading a 16bits png, but pytorch doesn't support uint16.
+    // So we read each row in a 16bits tmp_buffer which we then cast into
+    // a int32 tensor instead.
+    if (is_little_endian()) {


NicolasHug · 2021-10-21T10:07:56Z

Thanks for the review!

Test failures are unrelated: #4683

Good to go @fmassa ?

fmassa

Thanks!

fmassa · 2021-10-21T10:42:29Z

torchvision/csrc/io/image/cpu/decode_png.cpp

+    auto tmp_buffer_tensor = torch::empty(
+        {int64_t(num_pixels_per_row * sizeof(uint16_t))}, torch::kU8);
+    uint16_t* tmp_buffer =
+        (uint16_t*)tmp_buffer_tensor.accessor<uint8_t, 1>().data();


nit because it was already like this before: you can just do tmp_buffer_tensor.data_ptr<uint8_t, 1>()

Summary: * WIP * cleaner code * Add tests * Add docs * Assert dtype * put back check * Address comments Reviewed By: NicolasHug Differential Revision: D31916334 fbshipit-source-id: 8877266f6e533e8c45c5f202e535944a9a939376 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

* WIP * cleaner code * Add tests * Add docs * Assert dtype * put back check * Address comments Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

WIP

29e1b68

pytorch-probot bot added the ciflow/default label Oct 19, 2021

facebook-github-bot added the cla signed label Oct 19, 2021

NicolasHug marked this pull request as draft October 19, 2021 17:42

NicolasHug mentioned this pull request Oct 19, 2021

Support pngs with more than 8 bits #4107

Closed

NicolasHug added 6 commits October 20, 2021 16:30

cleaner code

3a51b57

Add tests

ec47775

Add docs

e1568d3

Assert dtype

7b14f34

put back check

9cd72a6

Merge branch 'main' of github.com:pytorch/vision into png16bits

2d298d9

NicolasHug marked this pull request as ready for review October 20, 2021 17:54

NicolasHug added enhancement module: io labels Oct 20, 2021

NicolasHug changed the title ~~WIP Add support for 16 bits png images~~ Add support for 16 bits png images Oct 20, 2021

NicolasHug commented Oct 20, 2021

View reviewed changes

fmassa requested changes Oct 21, 2021

View reviewed changes

NicolasHug and others added 2 commits October 21, 2021 09:46

Address comments

67ea116

Merge branch 'main' into png16bits

6df0105

NicolasHug mentioned this pull request Oct 21, 2021

RAFT model and training reference #4644

Closed

12 tasks

NicolasHug mentioned this pull request Oct 21, 2021

Loading 16bit png images #2218

Closed

fmassa approved these changes Oct 21, 2021

View reviewed changes

Merge branch 'main' into png16bits

3fa9c2a

fmassa merged commit e32f543 into pytorch:main Oct 21, 2021

NicolasHug mentioned this pull request Oct 25, 2021

16 bits png support and compatibility with current transforms #4731

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 16 bits png images #4657

Add support for 16 bits png images #4657

NicolasHug commented Oct 19, 2021 •

edited

NicolasHug Oct 20, 2021

NicolasHug Oct 20, 2021

NicolasHug Oct 20, 2021

fmassa Oct 21, 2021

fmassa left a comment

fmassa Oct 21, 2021

fmassa Oct 21, 2021

fmassa Oct 21, 2021

NicolasHug commented Oct 21, 2021 •

edited

fmassa left a comment

fmassa Oct 21, 2021

		uint16_t* tmp_buffer =
		(uint16_t)malloc(num_pixels_per_row sizeof(uint16_t));

Add support for 16 bits png images #4657

Add support for 16 bits png images #4657

Conversation

NicolasHug commented Oct 19, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Oct 21, 2021 • edited

fmassa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Oct 19, 2021 •

edited

NicolasHug commented Oct 21, 2021 •

edited