Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using ARM NEON instructions to speed up image processing #7

Closed
stapelberg opened this issue May 3, 2017 · 6 comments
Closed

Comments

@stapelberg
Copy link
Owner

stapelberg commented May 3, 2017

http://hilbert-space.de/?p=22 contains examples for how one could approach converting from color to grayscale.

We currently perform the following operations:

  • (3s) binarization (color → black/white)
  • (5s) rotating by 180 degrees
  • (4s) g3 compression
  • (15s) JPEG encoding
  • (TODO) PNG encoding of the first page (thumbnail)

Binarization and rotation should be easy to implement, but also provide the smallest wins. Making JPEG encoding faster seems like the biggest win, but I’m not sure if that’s possible.

@stapelberg
Copy link
Owner Author

https://github.com/libjpeg-turbo/libjpeg-turbo uses NEON on ARM, so that part seems to be feasible.

I’ll need to profile the g3 compression to see how to make it faster.

@stapelberg
Copy link
Owner Author

stapelberg commented May 4, 2017

Commit 2f7d475 brings down the rotation + g3 compression to 5s on the Raspberry Pi 3. The longest step now definitely is the JPEG encoding.

stapelberg added a commit that referenced this issue May 4, 2017
stapelberg added a commit that referenced this issue May 5, 2017
This conversion used to take about 3s (for 4960x7016 RGB pixels).
With the new code it takes about 300ms.

More potential for improvement: we could run this code while reading
pixels via USB.

I haven’t looked at the USB timing in detail, but my guess is that we
could squeeze in this post-processing into the time between requesting
data from the device and receiving the data from the kernel.

If that doesn’t work out, we could parallelize and post-process the
previous buffer while reading the current buffer.

Note that we need to use the WORD instruction because the Go assembler
is lacking support for the NEON instructions, see
golang/go#7300

related to issue #7
@stapelberg
Copy link
Owner Author

I got a proof-of-concept which uses a port of libjpeg-turbo’s NEON assembler functions. It completes JPEG encoding within 2.5s of wall-clock time.

I’ll look into whether we can run the processing in parallel to reading data from the scanner. That way, we might be able to pull off scanning in almost real-time, i.e. with minimal wait after each page :).

@stapelberg
Copy link
Owner Author

stapelberg commented Jun 26, 2017

Next steps for cleaning up the optimized JPEG encoder (scan2drive/internal/neonjpeg):

  • create separate git repository (for the cleanup only)
  • remove magic numbers
  • add assertions about image/buffer size
  • remove comments
  • clean up assembler files (indentation, attribution)

stapelberg added a commit that referenced this issue Jul 2, 2017
stapelberg added a commit that referenced this issue Jul 2, 2017
This is a fork of Go1.8’s image/jpeg, changed to use data structures which are
compatible with libjpeg-turbo, so that we can use the NEON assembler routines.

related to issue #7
stapelberg added a commit that referenced this issue Jul 2, 2017
@stapelberg
Copy link
Owner Author

This is the current processing time for each piece of paper after scanning finished:

  • 1s JPEG encoding (most of the processing happens in parallel with scanning)
  • 3s G3 encoding
  • 12s thumbnail PNG encoding

I looked into encoding the G3-encoded image data into a TIFF file (which works), but it turns out that neither Chrome nor Firefox support TIFF. All javascript-based TIFF/PDF viewers and extensions are super slow.

@stapelberg
Copy link
Owner Author

Created issue #15 for speeding up the thumbnail creation. Closing this issue as we’re now using NEON code for jpeg compression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant