-
-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10x slowdown when converting to avif in some cases #2983
Comments
Hi @L3tum, Wow that's an odd one. It seems to vary with the input format as well:
( I'll dig a little more. |
Hey @jcupitt, yeah, we've been puzzling over it for some time as well. The Thanks for trying to dig a little more, if you need anything further from me I'm happy to help :) |
It looks like a libheif issue. I enabled libvips debug and saw this:
So the small increase in image dimensions made Could it going over a 4k image change the AVIF threading model? At 4k and under, it only seems to use two threads. Over 4k, it seems to use three. Maybe going over 4k also changes some quality presets? The 4000 pixel image is noticeably smaller than the 3840 pixel image. |
... though that's just speculating -- we'd need to ask Dirk for an informed opinion, of course. |
Hmm, it's still weird since both 4000px and 3400px take roughly the same amount of time to do and only the exact same image size takes substantially longer. Plus of course the JPEG issue you found where it seems to slow down as well regardless of same-size/different-size semantics. Do you know someone from the libheif team that could take a look at it? I always get slightly discouraged when looking at that repo with the amount of issues and PRs open^^ |
Huh you're right the JPEG difference is puzzling. Dirk is the libheif lead, but I know he's been having trouble keeping up recently, so I'm reluctant to ping him. I believe it's mostly an unfunded one person spare time project, unfortunately. I'd open an issue on libheif. I think libvips has donated $1000 previously, so you could maybe offer sponsorship as well. |
Okay, it gets even weirder. I've done the following three tests:
This leads me to belief that the root of the "png problem" is specifically a 'wanted' image size of 3840x2160, which is less than convenient since that's a pretty standard image size^^ I have no idea why a 4000x2250 JPEG also takes substantially longer, as your own tests have shown... Is there perhaps a chance to directly adopt libaom to libvips?^^ |
Could this be a libaom problem? Do they have a command-line interface we could test? |
Yes, but it's hard to find and I honestly can't figure out how to make it work. There's some basic instructions to build the thing here, but beyond that I couldn't really find anything and the tests also don't make it obvious. I'm currently trying it again on a Linux machine. |
Wow yes it's hard to make it do anything. We'd probably need to look at the libaom test suite and see how that works. |
This script has example uses: https://aomedia.googlesource.com/aom/+/refs/heads/main/test/best_encode.sh |
Looks like it encodes y4m to webm, so an uncompressed video stream to AV1, then decompresses back to lossless webm for analysis. Not useful for us ... |
It also still seems to complain about just everything
I've managed to build and use cavif, which seems to make use of libaom under the hood. Of course whether it's actually optimized or not is another question.... Anyway, using that, running the command |
It also seems to perform almost directly proportional to destination image size, since running the command I'm trying around with a few options to see if there may be something else that's the problem but so far no luck. |
Setting bitdepth to 16 in libvips also results in an empty image, interestingly. And setting it to anything other than 8 crashes vipsthumbnail hard. Not setting it at all (with the default suppposedly being 12) works though (albeit it's still slow). Not sure if related but definitely weird. |
Unless you have a breakthrough I'll try the same with rav1e and dav1d through libheif tomorrow to see if the issue persists. I'll also set up libheif and its heif-enc tool to test that out separately |
Hey @jcupitt, the saga continues 😆 We have an "older" service still running while dealing with this problem. This service is similarly to the prebuilt Windows binaries on libvips 8.12.2 and libaom 3.3.0. Now the issue is that I can reproduce this behaviour, i.e. the 10 second slowdown, in the prebuilt Windows binaries for libvips 8.12.2. I do not know why, I don't know what changed and this seems more and more bizarre. Between our builds for 8.12.2 and 8.13.0 nothing in our scripts changed. The only thing I can now think of is some other dependency that was only "recently" updated in the Debian Bullseye APT Repos, but which was already present in that version in the prebuilt Windows binaries for a while. It also suggests to me that the issue may be exposed by libheif, but that the underlying problem is somewhere else. |
Wow! I don't suppose you have a record of how you did that build? |
I'll see if I can get the lib and its dependencies at least, most of them are dynamic. The build itself is pretty old so I don't think I can get the exact log, but I can attach the script if it helps. |
I got the lib, in case it may help, but looking at the dependencies it doesn't seem anything simple like that. They hardly changed between the two builds. Here's a diff. It doesn't make it all that clear since git isn't really that smart in that context, but the only thing actually changing is the location of the libcurl and curl lines. diff --git a/deps.txt b/deps_new.txt
index 0d9a222..002b6a0 100644
--- a/deps.txt
+++ b/deps_new.txt
@@ -1,10 +1,10 @@
-Get:1 http://deb.debian.org/debian bullseye/main amd64 imagemagick-6-common all 8:6.9.11.60+dfsg-1.3 [211 kB]
-Get:2 http://security.debian.org/debian-security bullseye-security/main amd64 curl amd64 7.74.0-1.3+deb11u2 [270 kB]
-Get:3 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-headers all 8:6.9.11.60+dfsg-1.3 [50.9 kB]
-Get:4 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-arch-config amd64 8:6.9.11.60+dfsg-1.3 [174 kB]
-Get:5 http://deb.debian.org/debian bullseye/main amd64 libc6-dev amd64 2.31-13+deb11u3 [2348 kB]
-Get:6 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl4 amd64 7.74.0-1.3+deb11u2 [345 kB]
-Get:7 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl3-gnutls amd64 7.74.0-1.3+deb11u2 [342 kB]
+Get:1 http://security.debian.org/debian-security bullseye-security/main amd64 curl amd64 7.74.0-1.3+deb11u2 [270 kB]
+Get:2 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl4 amd64 7.74.0-1.3+deb11u2 [345 kB]
+Get:3 http://deb.debian.org/debian bullseye/main amd64 imagemagick-6-common all 8:6.9.11.60+dfsg-1.3 [211 kB]
+Get:4 http://security.debian.org/debian-security bullseye-security/main amd64 libcurl3-gnutls amd64 7.74.0-1.3+deb11u2 [342 kB]
+Get:5 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-headers all 8:6.9.11.60+dfsg-1.3 [50.9 kB]
+Get:6 http://deb.debian.org/debian bullseye/main amd64 libmagickcore-6-arch-config amd64 8:6.9.11.60+dfsg-1.3 [174 kB]
+Get:7 http://deb.debian.org/debian bullseye/main amd64 libc6-dev amd64 2.31-13+deb11u3 [2348 kB]
Get:8 http://deb.debian.org/debian bullseye/main amd64 libc-dev-bin amd64 2.31-13+deb11u3 [275 kB]
Get:9 http://deb.debian.org/debian bullseye/main amd64 libc6 amd64 2.31-13+deb11u3 [2811 kB]
Get:10 http://deb.debian.org/debian bullseye/main amd64 libfftw3-double3 amd64 3.3.8-2 [733 kB]
@@ -304,7 +304,7 @@ Get:11 https://mirror.netcologne.de/debian bullseye/main amd64 sensible-utils al
Get:12 https://mirror.netcologne.de/debian bullseye/main amd64 ucf all 3.0043 [74.0 kB]
Get:13 https://mirror.netcologne.de/debian bullseye/main amd64 fonts-dejavu-core all 2.37-2 [1069 kB]
Get:14 https://mirror.netcologne.de/debian bullseye/main amd64 fonts-urw-base35 all 20200910-1 [6367 kB]
-Get:15 /tmp/ext/vips/ext-vips_1.0.13-344+deb11u1_amd64.deb ext-vips amd64 1.0.13-344+deb11u1 [18.9 MB]
+Get:15 /tmp/ext/vips/ext-vips_1.0.13-346+deb11u1_amd64.deb ext-vips amd64 1.0.13-346+deb11u1 [19.4 MB]
Get:16 https://mirror.netcologne.de/debian bullseye/main amd64 fontconfig-config all 2.13.1-4.2 [281 kB]
Get:17 https://mirror.netcologne.de/debian bullseye/main amd64 libfontconfig1 amd64 2.13.1-4.2 [347 kB]
Get:18 https://mirror.netcologne.de/debian bullseye/main amd64 libaom0 amd64 1.0.0.errata1-3 [1158 kB] |
Here's the build script, you can basically substitue Apologies for the word vomit, it's kinda hard to make this more legible. I know that -O3 is controversial in regards to incorrect behaviour but we did both builds with the same settings. I could try disabling -flto and -O3, but I'd guess that the most it would do is slow down the resulting binary. Ah, the libheif patch is the same that is applied to the prebuilt Windows binaries and we've applied it to both the 8.12.2 and 8.13.0 builds and since libheif wasn't updated between these two runs I don't think it got any influence. export VIPS_DIR=$(pwd)
export CXXFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
export CPPFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
export CFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
export LDFLAGS=\"-flto -g -O3 -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2 -mavx -mavx2\"
cd /tmp
curl -fsSLO https://storage.googleapis.com/aom-releases/libaom-${LIBAOM_VERSION}.tar.gz
tar zvxf libaom-${LIBAOM_VERSION}.tar.gz
cd libaom-${LIBAOM_VERSION}
mkdir cbuild
cd cbuild
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=0 -DC_FLAGS_INIT=”flto=8 -static” -DENABLE_NASM=1 -DENABLE_DOCS=0 -DENABLE_TESTS=0 -DENABLE_TESTDATA=0 -DENABLE_TOOLS=0 -DENABLE_EXAMPLES=0 -DCONFIG_PIC=1 ..
cmake --build . -- -j$(nproc)
cmake --install .
cp /usr/local/lib/pkgconfig/aom.pc /usr/lib/pkgconfig/aom.pc
ldconfig
cd /tmp
curl -fsSLO https://github.com/strukturag/libheif/releases/download/v${LIBHEIF_VERSION}/libheif-${LIBHEIF_VERSION}.tar.gz
tar zvxf libheif-${LIBHEIF_VERSION}.tar.gz
cd libheif-${LIBHEIF_VERSION}
git apply --reject --whitespace=fix ${VIPS_DIR}/patches/libheif.patch
./configure --disable-go --disable-examples --disable-gdk-pixbuf --disable-rav1e --disable-dav1d --disable-libde265 --disable-x265
make -j$(nproc)
make install
cp /usr/local/lib/pkgconfig/libheif.pc /usr/lib/pkgconfig/heif.pc
ldconfig
cd /tmp
curl -fsSLO https://github.com/randy408/libspng/archive/v${LIBSPNG_VERSION}.tar.gz
tar zvxf v${LIBSPNG_VERSION}.tar.gz
cd libspng-${LIBSPNG_VERSION}
mkdir cbuild
cd cbuild
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXAMPLES=0 -DENABLE_OPT=1 ..
cmake --build . -- -j$(nproc)
cmake --install .
cp /usr/local/lib/pkgconfig/libspng.pc /usr/lib/pkgconfig/spng.pc
ldconfig
cd /tmp
curl -fsSLO https://github.com/libvips/libvips/releases/download/v${LIBVIPS_VERSION}/vips-${LIBVIPS_VERSION}.tar.gz
tar zvxf vips-${LIBVIPS_VERSION}.tar.gz
cd /tmp/vips-${LIBVIPS_VERSION}
./configure --with-heif=module --with-libspng=module --disable-debug --disable-doxygen --without-pdfium --disable-introspection --disable-deprecated --enable-gtk-doc-html=no --enable-gtk-doc=no
make -j$(nproc)
make install
ldconfig /usr/local/lib
ls /usr/local/lib
cd /tmp
git clone https://github.com/libvips/php-vips-ext.git /tmp/php-vips
cd /tmp/php-vips
phpize
./configure
make
NO_INTERACTION=1 make test
grep 'define PHP_VIPS_VERSION ' php_vips.h | grep -Eo '[0-9.]+' >/tmp/php-vips/VERSION |
Hey @jcupitt, I've dropped you a link via e-mail to our custom-built libvips. I hope it helps, I couldn't really figure out a difference. |
Hi @jcupitt, I've spent some time investigating this issue again with all-updated libs (libvips 8.14.1, libheif 1.15.1, libspng 0.7.3 and libaom 3.6.0) and can still reproduce this issue. However, it does not seem to be related to pixel dimensions. Rather what seems to trigger it is the actual size (as in file size) of the source image. To test I've converted the source PNG image to BMP, TIF, PNG, JPEG-2000, WEBP (lossy), JPEG (ranked from largest to smallest file size produced). As described above, any image as large or larger than the source PNG image (which clocks in with my test image at 3,7MB) shows the 10x slowdown. The conversion speed drops from ~1-2 seconds down to a consistent 20 seconds. To force the issue I've created a JPEG file that exceeds that file size and could also observe that slowdown. I've also tried changing HEIF/AVIF options such as lossless, effort and bitdepth, but no change. I'm not sure what that tells us exactly. Maybe it's a CPU cache thing, that a larger image does not fit in the cache? |
Hey @jcupitt, I've had a workaround implemented for this, but unfortunately the issue occured again, despite the workaround. Desperate to get a solution, I tried the lastest Windows build of libvips and could not reproduce the issue at all. While I could repro the issue before with the Windows build, something appears to have changed. Since we've also rebuilt our custom libvips (and are also using the latest versions of all dependencies as well as libvips itself) that suggests to me that some underlying library is involved, which we grab from the package manager (Debain Bullseye repos in our case), while the Windows build will grab itself, thus likely being on a newer version. I'll see if I can narrow it down some more at some point, to get closure for this issue and provide a solid workaround. |
I've a similar issue that only occurs when setting Test image: https://t0.nl/img_9237.jpg $ vipsheader img_9237.jpg
img_9237.jpg: 2256x2256 uchar, 3 bands, srgb, jpegload
$ time vips copy img_9237.jpg x.avif[effort=2]
real 0m22.196s
user 0m22.786s
sys 0m1.486s
$ time vips extract_area img_9237.jpg x.avif[effort=2] 0 1 2256 2255
real 0m1.575s
user 0m3.359s
sys 0m0.309s I also could reproduce this with # libaom requires Y4M as input
$ ffmpeg -i img_9237.jpg img_9237.y4m
# quality = 50
# cq-level = ((100 - quality) * 63 + 50) / 100
# effort = 2
# cpu-used = 9 - effort
$ time aomenc -v -p 1 -w 2256 -h 2256 -t $(nproc) --allintra --cq-level=32 --tune=ssim --cpu-used=7 -o x.ivf img_9237.y4m
real 0m21.559s
user 0m22.604s
sys 0m0.993s
$ time aomenc -v -p 1 -w 2256 -h 2255 -t $(nproc) --allintra --cq-level=32 --tune=ssim --cpu-used=7 -o x.ivf img_9237.y4m
real 0m1.290s
user 0m3.250s
sys 0m0.255s |
... while investigating the resulting Disabling the intra block copy prediction mode with the $ time aomenc -v -p 1 -w 2256 -h 2256 -t $(nproc) --enable-intrabc=0 --allintra --cq-level=32 --tune=ssim --cpu-used=7 -o x.ivf img_9237.y4m
real 0m1.244s
user 0m3.228s
sys 0m0.221s |
The forthcoming aom v3.8.0 will default https://aomedia.googlesource.com/aom/+/ea6c28f94f542bb3a1cc002aaf3ce341f6cb74ff |
It seems I forgot to turn off power saving mode on this 7 year old laptop. 🤦♂️ Below are the timings for the latest two versions of aom in performance mode, along with the timings for the original reproducer. aom v3.7.1: $ time vips copy img_9237.jpg x.avif[effort=2]
real 0m9.344s
user 0m9.870s
sys 0m0.486s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]
real 0m19.307s
user 0m21.908s
sys 0m0.792s aom v3.8.0: $ time vips copy img_9237.jpg x.avif[effort=2]
real 0m9.952s
user 0m10.424s
sys 0m0.460s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]
real 0m20.476s
user 0m22.904s
sys 0m0.795s So, no significant difference. Upon inspecting commit https://aomedia.googlesource.com/aom/+/ea6c28f94f542bb3a1cc002aaf3ce341f6cb74ff, it appears that IntraBC is only disabled for Completely disabling IntraBC (work-in-progress commit kleisauke/libheif@9aff88d) makes the timings more expected. $ time vips copy img_9237.jpg x.avif[effort=2]
real 0m0.470s
user 0m1.003s
sys 0m0.094s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]
real 0m1.152s
user 0m3.051s
sys 0m0.144s |
@wantehchang Can I ping you for this? I considered to submit this patch: DetailsFrom 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Kleis Auke Wolthuizen <github@kleisauke.nl>
Date: Sat, 9 Dec 2023 10:41:18 +0100
Subject: [PATCH 1/1] Allintra: avoid extensive intrabc search for speed >= 3
This is similar to commit ea6c28f94f542bb3a1cc002aaf3ce341f6cb74ff, but applied
to the ALLINTRA mode as well.
See: https://github.com/libvips/libvips/issues/2983
Upstream-Status: Pending
diff --git a/av1/encoder/speed_features.c b/av1/encoder/speed_features.c
index 1111111..2222222 100644
--- a/av1/encoder/speed_features.c
+++ b/av1/encoder/speed_features.c
@@ -425,6 +425,7 @@ static void set_allintra_speed_features_framesize_independent(
sf->mv_sf.full_pixel_search_level = 1;
sf->mv_sf.search_method = DIAMOND;
+ sf->mv_sf.use_intrabc = 0;
// TODO(chiyotsai@google.com): the thresholds chosen for intra hog are
// inherited directly from luma hog with some minor tweaking. Eventually we However, it seems to help just a little bit. $ time vips copy img_9237.jpg x.avif[effort=2]
real 0m8.848s
user 0m8.906s
sys 0m0.486s
$ time vipsthumbnail source.png -s 3840x -o x.avif[effort=2]
real 0m16.488s
user 0m17.607s
sys 0m0.801s Looking at: Am I correct that it only disables the IntraBC predictor search, rather than the entire IntraBC mode? |
@kleisauke Hi Kleis. I analyzed the code. There is an important difference between calling The two methods have one common side effect: they both cause the I will discuss this issue with my colleague Jingning Han tomorrow and ask for his advice. |
@wantehchang Thanks for the analysis! Let me know if I should fill an issue at https://bugs.chromium.org/p/aomedia/issues/list for this. Note to self: this also affects the latest libaom (v3.8.1, at the time of writing). For WebAssembly, the performance issue is more severe since SIMD is not supported by default (although it can be partially supported by patching the build script). |
Bug report
Describe the bug
I'm not sure if the bug is in libvips, libheif or libaom, but you got to start somewhere...
When converting a PNG (specifically PNG, WEBP does not exhibit this for example) image to an AVIF image of the same dimensions, libvips takes 10x longer than when resizing the image as well.
To Reproduce
Steps to reproduce the behavior:
vipsthumbnail.exe .\source.png --size 3840x -o destination.heif[compression=av1,lossless=false,effort=2]
Expected behavior
Running the above command slightly different results in it only taking ~1 second to be done.
vipsthumbnail.exe .\source.png --size 4000x -o destination.heif[compression=av1,lossless=false,effort=2]
You can also downsize and observe the same behaviour.
vipsthumbnail.exe .\source.png --size 3400x -o destination.heif[compression=av1,lossless=false,effort=2]
Or simply only change one dimension.
vipsthumbnail.exe .\source.png --size 3840x1080 -o destination.heif[compression=av1,lossless=false,effort=2]
Actual behavior
Converting to AVIF without resizing the image takes ~10x as long.
Screenshots
Environment
(please complete the following information)
Additional context
We managed to reproduce this locally with the prebuilt Windows binaries, but also with Linux binaries built from source with both libaom 3.4.0 and libaom 3.3.0, and both vips 8.13.0 and 8.12.2.
Upscaling and downscaling an image takes roughly ~2 seconds, compared to not-scaling the image at all taking ~11 seconds.
Interestingly the "slow" version has a maximum memory consumption of 17MB, whereas upscaling it takes 38MB.
vips-profile-slow.txt
vips-profile-fast.txt
The text was updated successfully, but these errors were encountered: