Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10x slowdown when converting to avif in some cases #2983

Open
L3tum opened this issue Aug 11, 2022 · 31 comments
Open

10x slowdown when converting to avif in some cases #2983

L3tum opened this issue Aug 11, 2022 · 31 comments
Labels

Comments

@L3tum
Copy link

L3tum commented Aug 11, 2022

Bug report

Describe the bug
I'm not sure if the bug is in libvips, libheif or libaom, but you got to start somewhere...

When converting a PNG (specifically PNG, WEBP does not exhibit this for example) image to an AVIF image of the same dimensions, libvips takes 10x longer than when resizing the image as well.

To Reproduce
Steps to reproduce the behavior:

  1. Use Image source
  2. vipsthumbnail.exe .\source.png --size 3840x -o destination.heif[compression=av1,lossless=false,effort=2]
  3. libvips takes roughly 11 seconds on my machine (effort=2 to keep it lower).

Expected behavior
Running the above command slightly different results in it only taking ~1 second to be done.
vipsthumbnail.exe .\source.png --size 4000x -o destination.heif[compression=av1,lossless=false,effort=2]

You can also downsize and observe the same behaviour.

vipsthumbnail.exe .\source.png --size 3400x -o destination.heif[compression=av1,lossless=false,effort=2]

Or simply only change one dimension.

vipsthumbnail.exe .\source.png --size 3840x1080 -o destination.heif[compression=av1,lossless=false,effort=2]

Actual behavior
Converting to AVIF without resizing the image takes ~10x as long.

Screenshots

vipsthumbnail.exe --vips-info .\source.png --size 4000x -o destination.heif[compression=av1,effort=2,lossless=false,Q=80,strip=true]
VIPS-INFO: 10:10:54.474: thumbnailing .\artwork_landscape.png
VIPS-INFO: 10:10:54.491: selected loader is VipsForeignLoadPngFile
VIPS-INFO: 10:10:54.497: input size is 3840 x 2160
VIPS-INFO: 10:10:54.504: loading with factor 1 pre-shrink
VIPS-INFO: 10:10:54.519: pre-shrunk size is 3840 x 2160
VIPS-INFO: 10:10:54.524: converting to processing space srgb
VIPS-INFO: 10:10:54.529: residual scale 1,04167 x 1,04167
VIPS-INFO: 10:10:54.535: converting to output space srgb
VIPS-INFO: 10:10:54.540: thumbnailing .\source.png as .\destination.heif[compression=av1,effort=2,lossless=false,Q=80,strip=true]
vipsthumbnail.exe --vips-info .\source.png --size 3840x -o destination.heif[compression=av1,effort=2,lossless=false,Q=80,strip=true]
VIPS-INFO: 10:11:09.687: thumbnailing .\artwork_landscape.png
VIPS-INFO: 10:11:09.704: selected loader is VipsForeignLoadPngFile
VIPS-INFO: 10:11:09.712: input size is 3840 x 2160
VIPS-INFO: 10:11:09.734: loading with factor 1 pre-shrink
VIPS-INFO: 10:11:09.742: pre-shrunk size is 3840 x 2160
VIPS-INFO: 10:11:09.750: converting to processing space srgb
VIPS-INFO: 10:11:09.759: converting to output space srgb
VIPS-INFO: 10:11:09.763: thumbnailing .\source.png as .\destination.heif[compression=av1,effort=2,lossless=false,Q=80,strip=true]

Environment
(please complete the following information)

  • OS: Debian Bullseye, Windows 10
  • Vips: 8.13.0, 8.12.2

Additional context
We managed to reproduce this locally with the prebuilt Windows binaries, but also with Linux binaries built from source with both libaom 3.4.0 and libaom 3.3.0, and both vips 8.13.0 and 8.12.2.

Upscaling and downscaling an image takes roughly ~2 seconds, compared to not-scaling the image at all taking ~11 seconds.

Interestingly the "slow" version has a maximum memory consumption of 17MB, whereas upscaling it takes 38MB.

vips-profile-slow.txt
vips-profile-fast.txt

@L3tum L3tum added the bug label Aug 11, 2022
@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

Hi @L3tum,

Wow that's an odd one. It seems to vary with the input format as well:

$ time vipsthumbnail source.png --size 4000x -o x.avif[effort=2]

real	0m2.842s
user	0m7.804s
sys	0m0.363s

$ time vipsthumbnail source.jpg --size 4000x -o x.avif[effort=2]
real	0m9.051s
user	0m14.383s
sys	0m0.562s

(source.jpg is your test PNG converted)

I'll dig a little more.

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

Hey @jcupitt,

yeah, we've been puzzling over it for some time as well. The colourspace question at the start of the week was in relation to it, cause initially we were just investigating strange slow requests^^

Thanks for trying to dig a little more, if you need anything further from me I'm happy to help :)

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

It looks like a libheif issue. I enabled libvips debug and saw this:

$ time vipsthumbnail source.png --size 4000x -o x.avif[effort=2] && ls -l x.avif
vips_heif: module init
vips_foreign_save_heif_build:
	width = 4000
	height = 2250
	alpha = 0
vips__heif_image_print:
	heif_channel_interleaved:
		width = 4000
		height = 2250
		bits = 24
vips_foreign_save_heif_write_block: y = 0
vips_foreign_save_heif_write_block: y = 640
vips_foreign_save_heif_write_block: y = 1280
vips_foreign_save_heif_write_block: y = 1920
calling heif_context_encode_image() ...
... libheif took 2.7 seconds
attaching exif-data ..
memory: high-water mark 122.77 MB

real	0m2.823s
user	0m9.049s
sys	0m0.369s
-rw-r--r-- 1 john john 69847 Aug 11 10:29 x.avif


$ time vipsthumbnail source.png --size 3840x -o x.avif[effort=2] && ls -l x.avif
vips_heif: module init
vips_foreign_save_heif_build:
	width = 3840
	height = 2160
	alpha = 0
vips__heif_image_print:
	heif_channel_interleaved:
		width = 3840
		height = 2160
		bits = 24
vips_foreign_save_heif_write_block: y = 0
vips_foreign_save_heif_write_block: y = 640
vips_foreign_save_heif_write_block: y = 1280
vips_foreign_save_heif_write_block: y = 1920
calling heif_context_encode_image() ...
... libheif took 9.1 seconds
attaching exif-data ..
memory: high-water mark 41.32 MB

real	0m9.273s
user	0m15.676s
sys	0m0.533s
-rw-r--r-- 1 john john 71579 Aug 11 10:29 x.avif

So the small increase in image dimensions made heif_context_encode_image() run much faster.

Could it going over a 4k image change the AVIF threading model? At 4k and under, it only seems to use two threads. Over 4k, it seems to use three.

Maybe going over 4k also changes some quality presets? The 4000 pixel image is noticeably smaller than the 3840 pixel image.

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

... though that's just speculating -- we'd need to ask Dirk for an informed opinion, of course.

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

Hmm, it's still weird since both 4000px and 3400px take roughly the same amount of time to do and only the exact same image size takes substantially longer.

Plus of course the JPEG issue you found where it seems to slow down as well regardless of same-size/different-size semantics.

Do you know someone from the libheif team that could take a look at it? I always get slightly discouraged when looking at that repo with the amount of issues and PRs open^^

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

Huh you're right the JPEG difference is puzzling.

Dirk is the libheif lead, but I know he's been having trouble keeping up recently, so I'm reluctant to ping him. I believe it's mostly an unfunded one person spare time project, unfortunately.

I'd open an issue on libheif. I think libvips has donated $1000 previously, so you could maybe offer sponsorship as well.

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

Okay, it gets even weirder.

I've done the following three tests:
1.

  • Given the source image
  • Convert it to 4000x2250 PNG named 'resized'
  • Convert 'resized' to 4000x2250 AVIF
    -> Result: The conversion to AVIF takes roughly a second, and the resize before takes less than a second
  • Given the source image
  • Convert it to 3840x2160 PNG named 'colourspaced'
  • Convert 'colourspaced' to 3840x2160 AVIF
    -> Result: The conversion to AVIF takes roughly 11 seconds and the colourspace-conversion before takes less than a second
  • Given the source image
  • Convert it to 4000x2250 PNG named 'resized'
  • Convert 'resized' to 3840x2160 AVIF
    -> Result: The conversion to AVIF takes roughly 11 seconds and the resize before takes less than a second

This leads me to belief that the root of the "png problem" is specifically a 'wanted' image size of 3840x2160, which is less than convenient since that's a pretty standard image size^^

I have no idea why a 4000x2250 JPEG also takes substantially longer, as your own tests have shown...
It also seems like there's a specific area where that effect takes place, since 3860x exhibits the same 11 second conversion time.

Is there perhaps a chance to directly adopt libaom to libvips?^^

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

Could this be a libaom problem? Do they have a command-line interface we could test?

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

Yes, but it's hard to find and I honestly can't figure out how to make it work.
I've arrived through trial-and-error at this command .\aomenc.exe --passes=1 --width=3840 --height=2160 --codec=av1 -o destination.avif source.png but it complains about WebM, which honestly could just mean that I built it wrong.

There's some basic instructions to build the thing here, but beyond that I couldn't really find anything and the tests also don't make it obvious. I'm currently trying it again on a Linux machine.

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

Wow yes it's hard to make it do anything. We'd probably need to look at the libaom test suite and see how that works.

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

@jcupitt
Copy link
Member

jcupitt commented Aug 11, 2022

Looks like it encodes y4m to webm, so an uncompressed video stream to AV1, then decompresses back to lossless webm for analysis. Not useful for us ...

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

It also still seems to complain about just everything

./best_encode.sh /mnt/a/Bilder/source.png 200
Fatal: Specify stream dimensions with --width (-w) and --height (-h)
Fatal: Specify stream dimensions with --width (-w) and --height (-h)

I've managed to build and use cavif, which seems to make use of libaom under the hood. Of course whether it's actually optimized or not is another question....

Anyway, using that, running the command ./cavif --cpu-used 1 --encoder-usage realtime -i /mnt/a/Bilder/source.png -o destination.avif encodes the image in ~2 seconds, so it seems to work alright.
However, setting --encoder-usage quality makes it take forever. I left it running for 5 minutes and it didn't complete.
Maybe libheif doesn't set this option when the image is 3840x2160 pixel? 😆

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

It also seems to perform almost directly proportional to destination image size, since running the command ./cavif --vertical-scale-mode 1/2 --horizontal-scale-mode 1/2 -- cpu-used 2 --encoder-usage realtime -i /mnt/a/Bilder/source.png -o destination.avif takes only 0,5 seconds (i.e. 1/4 of the full-fat image).

I'm trying around with a few options to see if there may be something else that's the problem but so far no luck.

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

Setting bitdepth to 16 in libvips also results in an empty image, interestingly. And setting it to anything other than 8 crashes vipsthumbnail hard. Not setting it at all (with the default suppposedly being 12) works though (albeit it's still slow).

Not sure if related but definitely weird.

@L3tum
Copy link
Author

L3tum commented Aug 11, 2022

Unless you have a breakthrough I'll try the same with rav1e and dav1d through libheif tomorrow to see if the issue persists. I'll also set up libheif and its heif-enc tool to test that out separately

@L3tum
Copy link
Author

L3tum commented Aug 12, 2022

Hey @jcupitt, the saga continues 😆

We have an "older" service still running while dealing with this problem. This service is similarly to the prebuilt Windows binaries on libvips 8.12.2 and libaom 3.3.0.

Now the issue is that I can reproduce this behaviour, i.e. the 10 second slowdown, in the prebuilt Windows binaries for libvips 8.12.2.
However I cannot reproduce this issue with our custom-built libvips 8.12.2.

I do not know why, I don't know what changed and this seems more and more bizarre. Between our builds for 8.12.2 and 8.13.0 nothing in our scripts changed. The only thing I can now think of is some other dependency that was only "recently" updated in the Debian Bullseye APT Repos, but which was already present in that version in the prebuilt Windows binaries for a while.

It also suggests to me that the issue may be exposed by libheif, but that the underlying problem is somewhere else.

@jcupitt
Copy link
Member

jcupitt commented Aug 12, 2022

However I cannot reproduce this issue with our custom-built libvips 8.12.2.

Wow! I don't suppose you have a record of how you did that build?

@L3tum
Copy link
Author

L3tum commented Aug 12, 2022

I'll see if I can get the lib and its dependencies at least, most of them are dynamic. The build itself is pretty old so I don't think I can get the exact log, but I can attach the script if it helps.

@L3tum
Copy link
Author

L3tum commented Aug 12, 2022

I got the lib, in case it may help, but looking at the dependencies it doesn't seem anything simple like that. They hardly changed between the two builds. Here's a diff. It doesn't make it all that clear since git isn't really that smart in that context, but the only thing actually changing is the location of the libcurl and curl lines.

diff --git a/deps.txt b/deps_new.txt
index 0d9a222..002b6a0 100644
--- a/deps.txt
+++ b/deps_new.txt
@@ -1,10 +1,10 @@
-Get:1 http://deb.debian.org/debian bullseye/main amd64 imagemagick-6-common all 8:6.9.11.60+dfsg-1.3 [211 kB]
-Get:2 http://security.debian.org/debian-security bullseye-security/main amd64 curl amd64 7.74.0-1.3+deb11u2 [270 kB]
-Get:3 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-headers all 8:6.9.11.60+dfsg-1.3 [50.9 kB]
-Get:4 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-arch-config amd64 8:6.9.11.60+dfsg-1.3 [174 kB]
-Get:5 http://deb.debian.org/debian bullseye/main amd64 libc6-dev amd64 2.31-13+deb11u3 [2348 kB]
-Get:6 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl4 amd64 7.74.0-1.3+deb11u2 [345 kB]
-Get:7 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl3-gnutls amd64 7.74.0-1.3+deb11u2 [342 kB]
+Get:1 http://security.debian.org/debian-security bullseye-security/main amd64 curl amd64 7.74.0-1.3+deb11u2 [270 kB]
+Get:2 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl4 amd64 7.74.0-1.3+deb11u2 [345 kB]
+Get:3 http://deb.debian.org/debian bullseye/main amd64 imagemagick-6-common all 8:6.9.11.60+dfsg-1.3 [211 kB]
+Get:4 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl3-gnutls amd64 7.74.0-1.3+deb11u2 [342 kB]
+Get:5 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-headers all 8:6.9.11.60+dfsg-1.3 [50.9 kB]
+Get:6 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-arch-config amd64 8:6.9.11.60+dfsg-1.3 [174 kB]
+Get:7 http://deb.debian.org/debian bullseye/main amd64 libc6-dev amd64 2.31-13+deb11u3 [2348 kB]
 Get:8 http://deb.debian.org/debian bullseye/main amd64 libc-dev-bin amd64 2.31-13+deb11u3 [275 kB]
 Get:9 http://deb.debian.org/debian bullseye/main amd64 libc6 amd64 2.31-13+deb11u3 [2811 kB]
 Get:10 http://deb.debian.org/debian bullseye/main amd64 libfftw3-double3 amd64 3.3.8-2 [733 kB]
@@ -304,7 +304,7 @@ Get:11 https://mirror.netcologne.de/debian bullseye/main amd64 sensible-utils al
 Get:12 https://mirror.netcologne.de/debian bullseye/main amd64 ucf all 3.0043 [74.0 kB]
 Get:13 https://mirror.netcologne.de/debian bullseye/main amd64 fonts-dejavu-core all 2.37-2 [1069 kB]
 Get:14 https://mirror.netcologne.de/debian bullseye/main amd64 fonts-urw-base35 all 20200910-1 [6367 kB]
-Get:15 /tmp/ext/vips/ext-vips_1.0.13-344+deb11u1_amd64.deb ext-vips amd64 1.0.13-344+deb11u1 [18.9 MB]
+Get:15 /tmp/ext/vips/ext-vips_1.0.13-346+deb11u1_amd64.deb ext-vips amd64 1.0.13-346+deb11u1 [19.4 MB]
 Get:16 https://mirror.netcologne.de/debian bullseye/main amd64 fontconfig-config all 2.13.1-4.2 [281 kB]
 Get:17 https://mirror.netcologne.de/debian bullseye/main amd64 libfontconfig1 amd64 2.13.1-4.2 [347 kB]
 Get:18 https://mirror.netcologne.de/debian bullseye/main amd64 libaom0 amd64 1.0.0.errata1-3 [1158 kB]

@L3tum
Copy link
Author

L3tum commented Aug 12, 2022

Here's the build script, you can basically substitue LIBVIPS_VERSION, LIBSPNG_VERSION, LIBHEIF_VERSION and LIBAOM_VERSION for any version you want. In this case we were using, LIBVIPS_VERSION=8.13.0, LIBSPNG_VERSION=0.7.2, LIBHEIF_VERSION=1.12.0 and LIBAOM_VERSION=3.4.0. The previous build was the same, except LIBVIPS_VERSION=8.12.2 and LIBAOM_VERSION=3.3.0.

Apologies for the word vomit, it's kinda hard to make this more legible.

I know that -O3 is controversial in regards to incorrect behaviour but we did both builds with the same settings. I could try disabling -flto and -O3, but I'd guess that the most it would do is slow down the resulting binary.

Ah, the libheif patch is the same that is applied to the prebuilt Windows binaries and we've applied it to both the 8.12.2 and 8.13.0 builds and since libheif wasn't updated between these two runs I don't think it got any influence.

export VIPS_DIR=$(pwd)
export CXXFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
export CPPFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
export CFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
export LDFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"

cd /tmp
curl -fsSLO https://storage.googleapis.com/aom-releases/libaom-${LIBAOM_VERSION}.tar.gz
tar zvxf libaom-${LIBAOM_VERSION}.tar.gz
cd libaom-${LIBAOM_VERSION}
mkdir cbuild
cd cbuild
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=0 -DC_FLAGS_INIT=”flto=8 -static” -DENABLE_NASM=1 -DENABLE_DOCS=0 -DENABLE_TESTS=0 -DENABLE_TESTDATA=0 -DENABLE_TOOLS=0 -DENABLE_EXAMPLES=0 -DCONFIG_PIC=1 ..
cmake --build . -- -j$(nproc)
cmake --install .
cp /usr/local/lib/pkgconfig/aom.pc /usr/lib/pkgconfig/aom.pc
ldconfig

cd /tmp
curl -fsSLO https://github.com/strukturag/libheif/releases/download/v${LIBHEIF_VERSION}/libheif-${LIBHEIF_VERSION}.tar.gz
tar zvxf libheif-${LIBHEIF_VERSION}.tar.gz
cd libheif-${LIBHEIF_VERSION}
git apply --reject --whitespace=fix ${VIPS_DIR}/patches/libheif.patch
./configure --disable-go --disable-examples --disable-gdk-pixbuf --disable-rav1e --disable-dav1d --disable-libde265 --disable-x265
make -j$(nproc)
make install
cp /usr/local/lib/pkgconfig/libheif.pc /usr/lib/pkgconfig/heif.pc
ldconfig

cd /tmp
curl -fsSLO https://github.com/randy408/libspng/archive/v${LIBSPNG_VERSION}.tar.gz
tar zvxf v${LIBSPNG_VERSION}.tar.gz
cd libspng-${LIBSPNG_VERSION}
mkdir cbuild
cd cbuild
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXAMPLES=0 -DENABLE_OPT=1 ..
cmake --build . -- -j$(nproc)
cmake --install .
cp /usr/local/lib/pkgconfig/libspng.pc /usr/lib/pkgconfig/spng.pc
ldconfig

cd /tmp
curl -fsSLO https://github.com/libvips/libvips/releases/download/v${LIBVIPS_VERSION}/vips-${LIBVIPS_VERSION}.tar.gz
tar zvxf vips-${LIBVIPS_VERSION}.tar.gz
cd /tmp/vips-${LIBVIPS_VERSION}
./configure --with-heif=module --with-libspng=module --disable-debug --disable-doxygen --without-pdfium --disable-introspection --disable-deprecated --enable-gtk-doc-html=no --enable-gtk-doc=no
make -j$(nproc)
make install
ldconfig /usr/local/lib
ls /usr/local/lib

cd /tmp
git clone https://github.com/libvips/php-vips-ext.git /tmp/php-vips
cd /tmp/php-vips
phpize
./configure
make
NO_INTERACTION=1 make test
grep 'define PHP_VIPS_VERSION ' php_vips.h | grep -Eo '[0-9.]+' >/tmp/php-vips/VERSION

@L3tum
Copy link
Author

L3tum commented Aug 12, 2022

Hey @jcupitt, I've dropped you a link via e-mail to our custom-built libvips. I hope it helps, I couldn't really figure out a difference.

@L3tum
Copy link
Author

L3tum commented Mar 8, 2023

Hi @jcupitt, I've spent some time investigating this issue again with all-updated libs (libvips 8.14.1, libheif 1.15.1, libspng 0.7.3 and libaom 3.6.0) and can still reproduce this issue.

However, it does not seem to be related to pixel dimensions. Rather what seems to trigger it is the actual size (as in file size) of the source image.

To test I've converted the source PNG image to BMP, TIF, PNG, JPEG-2000, WEBP (lossy), JPEG (ranked from largest to smallest file size produced). As described above, any image as large or larger than the source PNG image (which clocks in with my test image at 3,7MB) shows the 10x slowdown. The conversion speed drops from ~1-2 seconds down to a consistent 20 seconds.

To force the issue I've created a JPEG file that exceeds that file size and could also observe that slowdown.

I've also tried changing HEIF/AVIF options such as lossless, effort and bitdepth, but no change.

I'm not sure what that tells us exactly. Maybe it's a CPU cache thing, that a larger image does not fit in the cache?

@L3tum L3tum changed the title 10x slowdown when converting png to avif without resizing 10x slowdown when converting to avif in some cases Apr 21, 2023
@L3tum
Copy link
Author

L3tum commented Apr 21, 2023

Hey @jcupitt, I've had a workaround implemented for this, but unfortunately the issue occured again, despite the workaround. Desperate to get a solution, I tried the lastest Windows build of libvips and could not reproduce the issue at all.

While I could repro the issue before with the Windows build, something appears to have changed. Since we've also rebuilt our custom libvips (and are also using the latest versions of all dependencies as well as libvips itself) that suggests to me that some underlying library is involved, which we grab from the package manager (Debain Bullseye repos in our case), while the Windows build will grab itself, thus likely being on a newer version.

I'll see if I can narrow it down some more at some point, to get closure for this issue and provide a solid workaround.

@kleisauke
Copy link
Member

I've a similar issue that only occurs when setting effort to 2 or higher. Curiously, it's not reproducible if you exclude the top scanline while cropping the image.

Test image: https://t0.nl/img_9237.jpg

$ vipsheader img_9237.jpg
img_9237.jpg: 2256x2256 uchar, 3 bands, srgb, jpegload

$ time vips copy img_9237.jpg x.avif[effort=2]

real	0m22.196s
user	0m22.786s
sys	0m1.486s

$ time vips extract_area img_9237.jpg x.avif[effort=2] 0 1 2256 2255

real	0m1.575s
user	0m3.359s
sys	0m0.309s

I also could reproduce this with aomenc, so it looks like a libaom bug/feature.

# libaom requires Y4M as input
$ ffmpeg -i img_9237.jpg img_9237.y4m

# quality = 50
# cq-level = ((100 - quality) * 63 + 50) / 100
# effort = 2
# cpu-used = 9 - effort
$ time aomenc -v -p 1 -w 2256 -h 2256 -t $(nproc) --allintra --cq-level=32 --tune=ssim --cpu-used=7 -o x.ivf img_9237.y4m

real	0m21.559s
user	0m22.604s
sys	0m0.993s

$ time aomenc -v -p 1 -w 2256 -h 2255 -t $(nproc) --allintra --cq-level=32 --tune=ssim --cpu-used=7 -o x.ivf img_9237.y4m

real	0m1.290s
user	0m3.250s
sys	0m0.255s

@kleisauke
Copy link
Member

... while investigating the resulting .ivf files with AOM Analyzer, I noticed that the former file was encoded with intra block copy (IntraBC) mode enabled, see for example:
https://arewecompressedyet.com/analyzer/?maxFrames=1&decoder=https://media.xiph.org/analyzer/inspect.js&file=https://t0.nl/img_9237.ivf
versus:
https://arewecompressedyet.com/analyzer/?maxFrames=1&decoder=https://media.xiph.org/analyzer/inspect.js&file=https://t0.nl/img_9237_crop.ivf

Disabling the intra block copy prediction mode with the --enable-intrabc=0 option of aomenc seems to fix the performance degradation for this specific image.

$ time aomenc -v -p 1 -w 2256 -h 2256 -t $(nproc) --enable-intrabc=0 --allintra --cq-level=32 --tune=ssim --cpu-used=7 -o x.ivf img_9237.y4m

real	0m1.244s
user	0m3.228s
sys	0m0.221s

kleisauke added a commit to kleisauke/libheif that referenced this issue Dec 6, 2023
@lovell
Copy link
Member

lovell commented Dec 6, 2023

The forthcoming aom v3.8.0 will default intrabc=0 for speed >= 3 (effort <= 6).

https://aomedia.googlesource.com/aom/+/ea6c28f94f542bb3a1cc002aaf3ce341f6cb74ff

@kleisauke
Copy link
Member

It seems I forgot to turn off power saving mode on this 7 year old laptop. 🤦‍♂️

Below are the timings for the latest two versions of aom in performance mode, along with the timings for the original reproducer.

aom v3.7.1:

$ time vips copy img_9237.jpg x.avif[effort=2]

real	0m9.344s
user	0m9.870s
sys	0m0.486s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]

real	0m19.307s
user	0m21.908s
sys	0m0.792s

aom v3.8.0:

$ time vips copy img_9237.jpg x.avif[effort=2]

real	0m9.952s
user	0m10.424s
sys	0m0.460s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]

real	0m20.476s
user	0m22.904s
sys	0m0.795s

So, no significant difference. Upon inspecting commit https://aomedia.googlesource.com/aom/+/ea6c28f94f542bb3a1cc002aaf3ce341f6cb74ff, it appears that IntraBC is only disabled for AOM_USAGE_GOOD_QUALITY and not for AOM_USAGE_ALL_INTRA. The latter mode is used by default in libheif since commit strukturag/libheif@de0c159.

Completely disabling IntraBC (work-in-progress commit kleisauke/libheif@9aff88d) makes the timings more expected.

$ time vips copy img_9237.jpg x.avif[effort=2]

real	0m0.470s
user	0m1.003s
sys	0m0.094s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]

real	0m1.152s
user	0m3.051s
sys	0m0.144s

@kleisauke
Copy link
Member

@wantehchang Can I ping you for this? I considered to submit this patch:

Details
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Kleis Auke Wolthuizen <github@kleisauke.nl>
Date: Sat, 9 Dec 2023 10:41:18 +0100
Subject: [PATCH 1/1] Allintra: avoid extensive intrabc search for speed >= 3

This is similar to commit ea6c28f94f542bb3a1cc002aaf3ce341f6cb74ff, but applied
to the ALLINTRA mode as well.

See: https://github.com/libvips/libvips/issues/2983

Upstream-Status: Pending

diff --git a/av1/encoder/speed_features.c b/av1/encoder/speed_features.c
index 1111111..2222222 100644
--- a/av1/encoder/speed_features.c
+++ b/av1/encoder/speed_features.c
@@ -425,6 +425,7 @@ static void set_allintra_speed_features_framesize_independent(
 
     sf->mv_sf.full_pixel_search_level = 1;
     sf->mv_sf.search_method = DIAMOND;
+    sf->mv_sf.use_intrabc = 0;
 
     // TODO(chiyotsai@google.com): the thresholds chosen for intra hog are
     // inherited directly from luma hog with some minor tweaking. Eventually we

However, it seems to help just a little bit.

$ time vips copy img_9237.jpg x.avif[effort=2]

real 0m8.848s
user 0m8.906s
sys 0m0.486s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]

real 0m16.488s
user 0m17.607s
sys 0m0.801s

Looking at:
https://aomedia.googlesource.com/aom/+/refs/tags/v3.8.0/av1/encoder/rdopt.c#3115

Am I correct that it only disables the IntraBC predictor search, rather than the entire IntraBC mode?

@wantehchang
Copy link

wantehchang commented Dec 13, 2023

@kleisauke Hi Kleis. I analyzed the code. There is an important difference between calling aom_codec_control(&codec, AV1E_SET_ENABLE_INTRABC, 0) and setting sf->mv_sf.use_intrabc to 0: the aom_codec_control(&codec, AV1E_SET_ENABLE_INTRABC, 0) call will cause cpi->common.features.allow_intrabc to be set to 0. So the aom_codec_control(&codec, AV1E_SET_ENABLE_INTRABC, 0) call has more side effects.

The two methods have one common side effect: they both cause the rd_pick_intrabc_mode_sb() function to return INT64_MAX immediately without doing any work.

I will discuss this issue with my colleague Jingning Han tomorrow and ask for his advice.

@kleisauke
Copy link
Member

@wantehchang Thanks for the analysis! Let me know if I should fill an issue at https://bugs.chromium.org/p/aomedia/issues/list for this.

Note to self: this also affects the latest libaom (v3.8.1, at the time of writing). For WebAssembly, the performance issue is more severe since SIMD is not supported by default (although it can be partially supported by patching the build script).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants