Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to new-style callbacks #79

Closed
kleisauke opened this issue Dec 21, 2018 · 21 comments
Closed

Switch to new-style callbacks #79

kleisauke opened this issue Dec 21, 2018 · 21 comments

Comments

@kleisauke
Copy link
Member

See: https://cffi.readthedocs.io/en/latest/using.html#extern-python-new-style-callbacks

The original callbacks are slower to invoke and have the same issue as libffi’s callbacks; notably, see the warning. The new style described in the present section does not use libffi’s callbacks at all.

Occurrences of ffi.callback:

_g_free_cb = ffi.callback('VipsCallbackFn', _g_free_cb_function)
(Seems no longer used ?)

_log_handler_cb = ffi.callback('GLogFunc', _log_handler)
(Used by the log handler)

cb = ffi.callback('VipsTypeMap2Fn', fn)
(Used by the documentation generator)

cb = ffi.callback('VipsArgumentMapFn', add_construct)
(Hot path code, used by almost all pyvips operations. Perhaps we should use vips_object_get_args for libvips 8.7+. See the lua-vips implementation)

This comment can also be solved using the new style callbacks.

After this has been resolved, we should look at #21 again and benchmark it under cffi's API mode (the Pillow performance test suite still depends on an older version of pyvips which doesn't support the API mode).

@jcupitt
Copy link
Member

jcupitt commented Dec 22, 2018

Oh, interesting!

Yes, let's make a branch to experiment with this in.

@kleisauke
Copy link
Member Author

Experiment branch: https://github.com/libvips/pyvips/tree/new-style-callbacks

I removed the vips_argument_map callback for libvips >= 8.7 and moved the log handler to new-style callback with 2eaa0bd. I also did some preliminary bench-marking with pillow-perf on my server, see:

# python3.6 run.py vips_load vips_full_cycle -n5 "${@:2}"
Vips load 2560×1600 RGB image
    Jpeg load           0.02725 s   150.34 Mpx/s
    Jpeg save           0.02041 s   200.73 Mpx/s

Vips full cycle 2560×1600 RGB image
    Load+save           0.04659 s    87.92 Mpx/s
    +transpose          0.06124 s    66.88 Mpx/s
    +resize             0.07699 s    53.20 Mpx/s
    +blur               0.10943 s    37.43 Mpx/s

# python3.6 run.py load full_cycle -n5 "${@:2}"
Load 2560×1600 RGB image
    Jpeg load           0.02230 s   183.68 Mpx/s
    Jpeg save           0.02043 s   200.51 Mpx/s

Full cycle 2560×1600 RGB image
    Load+save           0.04292 s    95.44 Mpx/s
    +transpose          0.05016 s    81.66 Mpx/s
    +resize             0.04766 s    85.94 Mpx/s
    +blur               0.06099 s    67.16 Mpx/s

@jcupitt
Copy link
Member

jcupitt commented Jan 7, 2019

Yes, I see about the same:

john@yingna ~/GIT/pillow-perf/testsuite (master) $ ./run.py vips_load vips_full_cycle -n5 "${@:2}"

Vips load 2560×1600 RGB image
    Jpeg load           0.01962 s   208.80 Mpx/s
    Jpeg save           0.01861 s   220.11 Mpx/s

Vips full cycle 2560×1600 RGB image
    Load+save           0.04029 s   101.67 Mpx/s
    +transpose          0.05464 s    74.96 Mpx/s
    +resize             0.06196 s    66.10 Mpx/s
    +blur               0.08460 s    48.42 Mpx/s
john@yingna ~/GIT/pillow-perf/testsuite (master) $ ./run.py load full_cycle -n5 "${@:2}"

Load 2560×1600 RGB image
    Jpeg load           0.01762 s   232.45 Mpx/s
    Jpeg save           0.01694 s   241.82 Mpx/s

Full cycle 2560×1600 RGB image
    Load+save           0.03480 s   117.69 Mpx/s
    +transpose          0.03995 s   102.53 Mpx/s
    +resize             0.03312 s   123.67 Mpx/s
    +blur               0.04096 s    99.99 Mpx/s

pillow-simd wins by a factor of two on the full pipeline, so the pillow-perf results browser should lift the vips bar by a bit more than two.

Of course the same caveats apply as in the previous discussion: this is testing the speed of operations in single-threaded mode (naturally, since that's the focus of pillow-simd), not the end-to-end speed. libvips can overlap input and output (disabled in this test) and can parallelise operations (disabled in this test), so in the vips-bench benchmark, pyvips wins by x2.

We should update vips-bench as well, it's been a while.

@jcupitt
Copy link
Member

jcupitt commented Jan 7, 2019

I re-ran vips-bench with this change:

program, time, peak RES
tiffcp, 0.10, 148.34765625
vips-c, 0.16, 40.765625
vips.lua, 0.15, 41.16796875
vips.php, 0.16, 57.0859375
vips.js, 0.46, 70.41015625
vips.py, 0.18, 49.2890625
ruby-vips, 0.23, 47.98828125
pillow.py: __version__ = 5.3.0.post0
pillow, 0.35, 250.375
vips, 0.33, 39.2890625
gm, 0.62, 488.89453125
pnm, 0.61, 75.83984375
rmagick, 0.79, 722.25390625
1thread-vips-c, 0.52, 34.13671875
opencv, 0.91, 223.44140625
... etc.

Notes:

  1. vips.py is now rather close to C, we've helped the call overhead a lot!
  2. vips.py beats pillow-simd by about x2.
  3. vips in one-thread mode is beaten by pillow-simd by ~60%.
  4. vips-1thread is still overlapping input and output, so perhaps that explains the remaining difference.
  5. My node-vips binding is really slow :(

pillow-perf and vips-bench seems to tell roughly the same story.

@jcupitt
Copy link
Member

jcupitt commented Jan 7, 2019

I guess we should also verify that libvips in pillow-perf is hitting the vector path, and that the mask sizes pyvips and pillow-simd are using are comparable.

@jcupitt
Copy link
Member

jcupitt commented Jan 7, 2019

.. Oh, we could try arguing for moving the transpose to the end again. It hurts libvips to have it at the start of the pipeline, and (I think?) no one uses this ordering.

@jcupitt
Copy link
Member

jcupitt commented Jan 8, 2019

I tested, and pillow-perf does hit the libvips vector path correctly, phew!

I tried adjusting the benchmark slightly:

  • The test cases as they exist now have pyvips on 48 mpix/sec and pillow-simd on 99, so pillow-simd is 2.1x faster.
  • If you move the transpose to the end of the pipeline, pillow is 1.9x faster.
  • If you also enable sequential mode, pillow is 1.83x faster.
  • If you also change min_ampl to 0.13 (to match the max error from pillow's box filter), pillow is 1.77x faster.
  • If you also enable threading, pillow is 10% faster.

The min_ampl change is interesting: pillow-simd is approximating a gaussian blur with a set of box filters. You can see the code here:

https://github.com/python-pillow/pillow-perf/blob/master/testsuite/cases/pillow.py#L70

I tried to get a number for the peak error by finding the max abs difference between the pillow-simd box filter and the "perfect" float gaussian in libvips -- it's 9. Perhaps RMS difference would be a better metric. It would also be better to compare to a perfect pillow gaussian.

Anyway, if you adjust min_ampl until you get a comparable value, you can raise it to 0.13 (abs max diff of 6). This produces a slightly shorter convolution mask and so speeds up processing a little.

@jcupitt
Copy link
Member

jcupitt commented Jan 16, 2019

Oh, interesting, I tried on my two-core i5 laptop and with unmodified pillow-perf (ie. vips single threaded) I see:

john@kiwi:~/GIT/pillow-perf/testsuite (master)$ ./run.py load full_cycle -n5 "${@:2}"
Load 2560×1600 RGB image
    Jpeg load           0.02616 s   156.55 Mpx/s
    Jpeg save           0.02163 s   189.39 Mpx/s

Full cycle 2560×1600 RGB image
    Load+save           0.04365 s    93.84 Mpx/s
    +transpose          0.04953 s    82.69 Mpx/s
    +resize             0.08669 s    47.25 Mpx/s
    +blur               0.11792 s    34.73 Mpx/s

john@kiwi:~/GIT/pillow-perf/testsuite (master)$ ./run.py vips_load vips_full_cycle -n5 "${@:2}"
Vips load 2560×1600 RGB image
    Jpeg load           0.03005 s   136.29 Mpx/s
    Jpeg save           0.02709 s   151.21 Mpx/s

Vips full cycle 2560×1600 RGB image
    Load+save           0.05460 s    75.02 Mpx/s
    +transpose          0.07702 s    53.18 Mpx/s
    +resize             0.08534 s    48.00 Mpx/s
    +blur               0.12341 s    33.19 Mpx/s

So they are essentially the same speed.

If I enable libvips threading, I see:

Vips full cycle 2560×1600 RGB image
    Load+save           0.06365 s    64.36 Mpx/s
    +transpose          0.07805 s    52.48 Mpx/s
    +resize             0.07109 s    57.62 Mpx/s
    +blur               0.09675 s    42.34 Mpx/s

So libvips wins fairly easily.

If I make the other changes too, ie. reorder the pipeline and enable sequential mode, I see:

Full cycle 2560×1600 RGB image
    Load+save           0.04372 s    93.70 Mpx/s
    +resize             0.08488 s    48.26 Mpx/s
    +blur               0.11566 s    35.42 Mpx/s
    +transpose          0.11656 s    35.14 Mpx/s
Vips full cycle 2560×1600 RGB image
    Load+save           0.03601 s   113.75 Mpx/s
    +resize             0.05084 s    80.56 Mpx/s
    +blur               0.06683 s    61.29 Mpx/s
    +transpose          0.07371 s    55.57 Mpx/s

I'll double-check on the big i7 machine.

Shall we merge this change? It seems good to me.

@kleisauke
Copy link
Member Author

Interesting, I'll try on my Intel i5-8600K machine when I'm home. The new results on the speed and memory use wiki looks promising. It's nice to see that pyvips and lua-vips are rather close to C.

I just added a small improvement to the new-style-callbacks branch, see: 1cdef97. I think everything is ready for merging now.

By the way,
If you open a PR to pillow-perf, then you may also want to update the hyperlinks to pyvips and libvips. See here.

@jcupitt
Copy link
Member

jcupitt commented Jan 16, 2019

Yes, I have the same results as before on the i7, so it really is an architectural thing.

Perhaps pillow-simd is using AVX-512 on the desktop, and Orc is not?

Anyway, I'll merge this.

@jcupitt
Copy link
Member

jcupitt commented Jan 16, 2019

I made a PR for the URL change, but decided not to pester homm about rerunning the benchmark or reorganising the full_cycle test. I'm sure he'll update at some point and use the latest version.

@jcupitt jcupitt closed this as completed Jan 16, 2019
@jcupitt jcupitt reopened this Jan 16, 2019
@jcupitt
Copy link
Member

jcupitt commented Jan 16, 2019

Sorry, I forgot, you were going to try on your desktop i5.

@kleisauke
Copy link
Member Author

On my 8th generation Intel Core i5 processor, I see (with an unmodified pillow-perf):

[kleisauke@pc-kaw testsuite]$ ./run.py load full_cycle -n5 "${@:2}"

Load 2560×1600 RGB image
    Jpeg load           0.01327 s   308.78 Mpx/s
    Jpeg save           0.01239 s   330.59 Mpx/s

Full cycle 2560×1600 RGB image
    Load+save           0.02510 s   163.21 Mpx/s
    +transpose          0.02887 s   141.89 Mpx/s
    +resize             0.02374 s   172.54 Mpx/s
    +blur               0.03077 s   133.11 Mpx/s
[kleisauke@pc-kaw testsuite]$ ./run.py vips_load vips_full_cycle -n5 "${@:2}"

Vips load 2560×1600 RGB image
    Jpeg load           0.01502 s   272.67 Mpx/s
    Jpeg save           0.01373 s   298.26 Mpx/s

Vips full cycle 2560×1600 RGB image
    Load+save           0.02940 s   139.32 Mpx/s
    +transpose          0.04957 s    82.63 Mpx/s
    +resize             0.07155 s    57.25 Mpx/s
    +blur               0.07566 s    54.13 Mpx/s

I've installed the AVX2-enabled version of pillow-simd with:

CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

libvips (from commit libvips/libvips@1824c64) with:

CFLAGS="-O3" CXXFLAGS="-O3" ./autogen.sh --prefix=/usr

And pyvips (API mode) from commit 1cdef97.

Environment:

OS: Fedora release 29 (Twenty Nine)
GCC version: (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
Python version: 2.7.15

@ollie-bell
Copy link

I have installed pyvips from conda-forge for osx-arm64. I get an error which I believe is related to this issue.

animate.py 4 <module>
from pyvips import Image

__init__.py 145 <module>
_log_handler_cb = ffi.callback('GLogFunc', _log_handler_callback)

api.py 403 callback
return callback_decorator_wrap(python_callable)  # direct mode

api.py 396 callback_decorator_wrap
return self._backend.callback(cdecl, python_callable,

MemoryError:
Cannot allocate write+execute memory for ffi.callback(). You might be running on a system that prevents this. For more information, see https://cffi.readthedocs.io/en/latest/using.html#callbacks

Please can you advise if this is possible to fix?

@jcupitt
Copy link
Member

jcupitt commented Dec 22, 2021

pyvips only uses ffi.callback in ABI mode, and pyvips only uses ABI mode if it can't find the libvips headers during install.

I'd look into what happened when you installed pyvips, and why it's (I think?) fallen back to ABI mode.

@kleisauke
Copy link
Member Author

It looks like the macOS ARM64 builds in conda-forge is cross-compiled on a macOS x86_64 host.
https://github.com/conda-forge/pyvips-feedstock/blob/a1391a11c2a5c8454a159957f958a98f321f4feb/conda-forge.yml#L2

As a result, it cannot find libvips while building the binary API extension. I noticed this in the build logs:

Falling back to ABI mode. Details: unable to find pkg-config package "vips"

As a possible workaround, you can give your application the entitlement com.apple.security.cs.allow-unsigned-executable-memory (according to https://cffi.readthedocs.io/en/latest/using.html#callbacks-old-style).

Please open a new issue at https://github.com/conda-forge/pyvips-feedstock if this continues to be a problem.

@ollie-bell
Copy link

ollie-bell commented Dec 22, 2021

Ok so looks like the conda installation is broken for osx-arm64. Is there any reason why I couldn't install vips via homebrew then install pyvips via pip? When I try this doesn't work out of the box as

import _libvips

gives a ModuleNotFoundError. Could this be because homebrew has installed libvips in a path that isn't in the library search path? I'm not sure the exact .dylib file pyvips is looking for when it does import _libvips. e.g. is it trying to find /opt/homebrew/lib/libvips.dylib?

@kleisauke
Copy link
Member Author

kleisauke commented Dec 22, 2021

You can check whether libvips is available and whether a sufficient version is installed with:

pkg-config --exists --print-errors "vips >= 8.2" && echo OK

(this is what pyvips does internally to verify if libvips is installed)

If libvips is installed in a non-standard prefix, you'll need to set the PKG_CONFIG_PATH variable. For Homebrew, this would be:

export PKG_CONFIG_PATH=$(brew --prefix vips)/lib/pkgconfig:$PKG_CONFIG_PATH

For checking whether the API or ABI mode is used, you can use pip show pyvips. The --verbose flag of pip can be used to find out the exact reason why the ABI mode is installed.

pip install --user pyvips --verbose

@ollie-bell
Copy link

Thanks for your help. The problem was I had installed pkgconfig (the python interface) via conda, but hadn't realised that also installs the pkg-config tool itself in the conda environment. The pkg-config installed in the conda environment had no idea about vips I had installed via homebrew. So I simply needed to conda remove pkg-config, then the homebrew pkg-config, which is aware of the homebrew vips installation, was able to be used during pip install pyvips.

@jcupitt
Copy link
Member

jcupitt commented Dec 23, 2021

I think mixing conda and homebrew binaries in one process is not really supported and it likely to bite you at some point. My understanding (maybe wrong) is that you are supposed to use the conda libvips binaries with conda python.

(I use homebrew for everything on macos, fwiw)

@ollie-bell
Copy link

If I do a conda install libvips cffi pkgconfig followed by pip install pyvips then it does correctly build pyvips in API mode, and is using the conda libvips binaries. Just for whatever reason, a conda install pyvips will build in ABI mode. I'll open an issue on the conda-forge pyvips feedstock page for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants