Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible problems with pre-built v8.1.0 binaries #11

Closed
lovell opened this issue Oct 7, 2015 · 36 comments
Closed

Possible problems with pre-built v8.1.0 binaries #11

lovell opened this issue Oct 7, 2015 · 36 comments

Comments

@lovell
Copy link
Member

lovell commented Oct 7, 2015

Hi John, I'm still investigating this one, but here's an advanced warning that the Windows-based CI environment for sharp seems to be having a minor meltdown with the pre-built v8.1.0 from http://www.vips.ecs.soton.ac.uk/supported/8.1/win32/

v8.1.0 - fails - https://ci.appveyor.com/project/lovell/sharp/build/183/job/26eemal068fwc9ar
v8.0.2 - passes - https://ci.appveyor.com/project/lovell/sharp/build/184/job/chya6eyurgxhd0pq

sharp's GYP binding file lists the library files expected in the lib directory of the .zip file. A quick check suggests everything is there with the correct filenames.

Has anything changed in the win32 packaging between 8.0.x and 8.1.x?

@jcupitt
Copy link
Member

jcupitt commented Oct 7, 2015

Oh dear! I updated all the dependencies I could to the latest versions, but I don't think anything in the packaging itself has changed.

I'll have a look as well tomorrow.

@lovell
Copy link
Member Author

lovell commented Oct 8, 2015

It looks like libvips was compiled without libjpeg (and possibly libpng/libtiff) support. libmagick is being used to load JPEGs (AssertionError: 'jpeg' === 'magick') and nothing is available to save JPEGs (Unsupported output __input).

@jcupitt
Copy link
Member

jcupitt commented Oct 8, 2015

I switched to libjpeg-turbo, I guess vips didn't link to it correctly. I'll investigate.

@jcupitt
Copy link
Member

jcupitt commented Oct 8, 2015

.. nope, we've been on -turbo for a while, I just updated the version from 1.4.0 to 1.4.2. I'll keep looking.

jcupitt added a commit that referenced this issue Oct 8, 2015
and possibly fix a linking problem with jpeg? see #11
@jcupitt
Copy link
Member

jcupitt commented Oct 8, 2015

I've uploaded a new zip:

http://www.vips.ecs.soton.ac.uk/supported/current/win32/vips-dev-8.1.0-1.zip

I'm not sure why it wasn't linking in the jpg support, I've not changed much :-( I did reorder the deps for vips, perhaps that helped. I now see:

$ ./vips-dev-8.1.0/bin/vips.exe -l | grep -i jp
        VipsForeignLoadJpeg (jpegload_base), load jpeg priority=0
... etc.

so I think that means it should work.

I've also added webp support, I think I forgot to add it to the list of supported libraries.

@lovell
Copy link
Member Author

lovell commented Oct 8, 2015

Cheers John, I'll test the new binaries this evening. Thanks for adding WebP support too!

lovell added a commit to lovell/sharp that referenced this issue Oct 9, 2015
@lovell
Copy link
Member Author

lovell commented Oct 9, 2015

The 8.1.0-1 rebuild appears to segfault on the colourspace conversion tests when run in Appveyor CI :(

https://ci.appveyor.com/project/lovell/sharp/build/job/xenanmm3st6vgtgh

Remote debugging CI environments is far from my idea of fun so I'll try running this on a local Windows machine over the weekend to get some better debug information.

@jcupitt
Copy link
Member

jcupitt commented Oct 9, 2015

Looks like it's CMYK to sRGB that's falling over. Do you know the exact sequence of operations that are being run? Could it be lcms?

@lovell
Copy link
Member Author

lovell commented Oct 9, 2015

The number in front of 13) From CMYK to sRGB means that test finished but failed. The test runner (mocha) prints the reasons for failure at the end of the test run, but sadly the segfault prevents us getting that far.

The segfault is most likely to be occurring during the next test, namely "From CMYK to sRGB with white background, not yellow" which also uses vips_icc_transform as well as a vips_embed.

It could well be lcms, yes, and I should be able to get a lot more info when running it locally.

@jcupitt
Copy link
Member

jcupitt commented Oct 9, 2015

There were some changes to the thing to get ICC profiles from JPEG images, that might be worth checking. The PNG and TIFF profile extractors are unchanged.

@lovell
Copy link
Member Author

lovell commented Oct 11, 2015

I've been able to reproduce this locally and it seems to relate to filesystem-based loaders. All the relevant format libraries are present at compile, link and run time, but reading from any file always falls back to libmagick as the loader. (Reading from a buffer works as expected.)

A common function all the file loaders hit is vips__get_bytes. The only thing I can see that could be remotely problematic are the two stack allocations of FILENAME_MAX. What defines this constant at compile time? Could it have changed recently?

I noticed the use of -O3 with 8.1.0 in commit 62d889b - perhaps this could affect stack allocation?

@jcupitt
Copy link
Member

jcupitt commented Oct 11, 2015

How bizarre! You could try --vips-cache-trace, it shows all the operations that are being executed:

$ vips invert k2.jpg x.jpg --vips-cache-trace
vips cache : jpegload filename="k2.jpg" out=((VipsImage*) 0x7fdb5d002010) flags=((VipsForeignFlags) VIPS_FOREIGN_SEQUENTIAL) access=((VipsAccess) VIPS_ACCESS_SEQUENTIAL_UNBUFFERED) -
vips cache : copy in=((VipsImage*) 0x7fdb5d002010) out=((VipsImage*) 0x7fdb5d002330) -
vips cache+: cast in=((VipsImage*) 0x7fdb5d002330) out=((VipsImage*) 0x7fdb5d0024c0) format=((VipsBandFormat) VIPS_FORMAT_UCHAR) -
vips cache : copy in=((VipsImage*) 0x7fdb5d0024c0) out=((VipsImage*) 0x7fdb5d002650) -
vips cache+: embed in=((VipsImage*) 0x7fdb5d002650) out=((VipsImage*) 0x7fdb5d0027e0) x=0 y=0 width=1450 height=2048 -
vips cache : invert in=((VipsImage*) 0x7fdb5d002010) out=((VipsImage*) 0x7fdb5d0021a0) -
vips cache+: colourspace in=((VipsImage*) 0x7fdb5d0021a0) out=((VipsImage*) 0x7fdb5d002970) space=((VipsInterpretation) VIPS_INTERPRETATION_sRGB) -
vips cache+: cast in=((VipsImage*) 0x7fdb5d002970) out=((VipsImage*) 0x7fdb5d002b00) format=((VipsBandFormat) VIPS_FORMAT_UCHAR) -
vips cache+: linecache in=((VipsImage*) 0x7fdb5d002e20) out=((VipsImage*) 0x7fdb5c88a1b0) tile-height=8 access=((VipsAccess) VIPS_ACCESS_SEQUENTIAL_UNBUFFERED) -
vips cache+: sequential in=((VipsImage*) 0x7fdb5d002e20) out=((VipsImage*) 0x7fdb5c88a020) tile-height=8 access=((VipsAccess) VIPS_ACCESS_SEQUENTIAL_UNBUFFERED) -
vips cache : jpegsave in=((VipsImage*) 0x7fdb5d0021a0) filename="x.jpg" -

You could prove it's going via Magick. I'll test in Wine here.

@lovell
Copy link
Member Author

lovell commented Oct 11, 2015

Could the WebP DLL have been compiled on a machine with AVX2 intrinsics? I'm seeing the following segfault/output from WinDbg on a machine without AVX2:

libwebp_5!VP8EncDspInitAVX2+649
6f75d679 660f7f2c24      movdqa  xmmword ptr [esp],xmm5

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 6f75d679 (libwebp_5!VP8EncDspInitAVX2+0x00000649)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: ffffffff
Attempt to read from address ffffffff

CONTEXT:  00000000 -- (.cxr 0x0;r)
eax=03d6df00 ebx=00000000 ecx=03d6d19c edx=03d6da80 esi=03d6d19c edi=03d6d1bc
eip=6f75d679 esp=03d6d16c ebp=03d6d2ec iopl=0         nv up ei pl nz ac pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010216
libwebp_5!VP8EncDspInitAVX2+0x649:
6f75d679 660f7f2c24      movdqa  xmmword ptr [esp],xmm5 ss:002b:03d6d16c=03d6db880000000003d6d1b0000004d0

DEFAULT_BUCKET_ID:  WRONG_SYMBOLS

https://chromium.googlesource.com/webm/libwebp/+/master/src/dsp/enc.c#778 suggests there's a compile-time check around the inclusion of the VP8EncDspInitAVX2 function.

@lovell
Copy link
Member Author

lovell commented Oct 12, 2015

I spotted there's a runtime check around VP8EncDspInitAVX2 via a call to VP8GetCPUInfo but a recent commit suggests the current published version might not be thread-safe https://chromium-review.googlesource.com/#/c/73369/

@lovell
Copy link
Member Author

lovell commented Oct 12, 2015

" You could try --vips-cache-trace"

Thanks John, everything seems to works as expected via vips.exe. I only experience the problem with file loaders falling back to libmagick when using libvips as a shared library nested inside C++ code inside JavaScript code inside C++ code, hence the possible stack-based suspicions.

Windows has a 1MB stack by default, which is small compared with Linux/Mac OS's 8MB, so my next step will be to attempt to increase this via something like StackReserveSize.

If you have the time, a win32 build of libvips v8.1.0 without the -O3 optimisations and WebP dependency will reduce the difference between 8.0.2 and 8.1.0 and help narrow these two separate problems down.

@jcupitt
Copy link
Member

jcupitt commented Oct 12, 2015

The linux stack is only 2mb with threading enabled, but you're right, it sounds a bit like a stack crash.

I'll make one without webp for you.

vips has been using O3 for a while, I'd think that would be OK. Probably?

@jcupitt
Copy link
Member

jcupitt commented Oct 12, 2015

I've had some strange errors in Wine as well, which could also be a stack overflow. I'll dig some more.

@lovell
Copy link
Member Author

lovell commented Oct 12, 2015

"vips has been using O3 for a while, I'd think that would be OK."

Yes, you're quite right, sorry John - I've just seen https://github.com/jcupitt/build-win32/blob/master/8.0/vips.modules#L361 in the 8.x build config.

"I'll make one without webp for you."

Great, thank you. I'm happy to help test the re-addition of WebP after this file loader thing is solved.

@jcupitt
Copy link
Member

jcupitt commented Oct 12, 2015

Here's a build without webp: http://www.vips.ecs.soton.ac.uk/development/vips-dev-8.1.0.zip

@lovell
Copy link
Member Author

lovell commented Oct 12, 2015

Gotcha! Commit jcupitt/libvips@ec0e74e means a Windows path containing a drive letter such as C:\Program Files... will be mangled. Relative paths work, absolute Windows paths fail.

@lovell
Copy link
Member Author

lovell commented Oct 12, 2015

Temporarily changing paths to be relative also seems to have fixed the WebP segfault, at least for the previously failing test that converts from TIFF input > WebP output. I guess this is because libtiff was used rather than libmagick, which was able to play more nicely with icc_transform. Of course there might still be a WebP thread safety thing to watch out for.

@jcupitt
Copy link
Member

jcupitt commented Oct 13, 2015

Oh well done! Argh, stupid filename parsing, what a terrible idea.

I'll fix it and add some tests.

jcupitt added a commit to libvips/libvips that referenced this issue Oct 13, 2015
helps windows, see libvips/build-win32#11

also add some tests
@jcupitt
Copy link
Member

jcupitt commented Oct 13, 2015

I've changed it a bit, and added some tests:

https://github.com/jcupitt/libvips/blob/368a74abcd3d152f22bae8de5e4221c22236f34c/test/test_iofuncs.py

it passes for me on linux. Do your paths that broke pass now as well? I'm not sure if I changed the splitter enough.

@lovell
Copy link
Member Author

lovell commented Oct 13, 2015

Thanks John. If you're able to create and publish a win32 build for 8.1.0 + this patch then I'll update sharp's Windows CI env to use it.

@jcupitt
Copy link
Member

jcupitt commented Oct 13, 2015

Here's a test build of 8.1.1:

http://www.vips.ecs.soton.ac.uk/development/

it includes the filename parsing change and -O3, but not webp. If it doesn't work, could you paste the exact filename that it fails on?

@lovell
Copy link
Member Author

lovell commented Oct 13, 2015

No joy :( https://ci.appveyor.com/project/lovell/sharp/build/191/job/lq131okib271i7lp

I suspect the scenario where there's only one colon in the filename needs adding to these tests. By way of example, C:\projects\sharp\test\fixtures\2569067123_aca715a2ee_o.jpg is the absolute path of a test image on the CI server.

jcupitt added a commit to libvips/libvips that referenced this issue Oct 13, 2015
oh argh, we were mangling windows paths

see libvips/build-win32#11
@jcupitt
Copy link
Member

jcupitt commented Oct 13, 2015

Oh argh :-( it passes now, I'll make another win32 test build.

@jcupitt
Copy link
Member

jcupitt commented Oct 14, 2015

I've put up 8.1.1-1, including this fix and webp support.

http://www.vips.ecs.soton.ac.uk/development/

@lovell
Copy link
Member Author

lovell commented Oct 14, 2015

The previously failing tests now pass, thanks John!

It looks like the Windows CI environment is still suffering from a WebP-related segfault. I'll need to debug this on a local Windows machine to confirm - possibly this evening.

@lovell
Copy link
Member Author

lovell commented Oct 14, 2015

Attempting to save in the WebP format generates the same AVX2-related segfault as before, as well as a previously-unseen SSE2-related segfault when attempting to save RGBA WebP images:

eax=03880370 ebx=f0f00000 ecx=00000000 edx=03887a60 esi=03887a60 edi=0000000a
eip=6f75e2c8 esp=05e7f038 ebp=03880370 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
libwebp_5!WebPInitAlphaProcessingSSE2+0x208:
6f75e2c8 660f7f3424      movdqa  xmmword ptr [esp],xmm6 ss:002b:05e7f038=f0f00000038803f00388049803886f20

The machine I'm using does have SSE2 (and SSE4.2) support so my best guess is that the intrinsics code generated by the version of gcc in mingw could be borked a la http://www.virtualdub.org/blog/pivot/entry.php?id=363

libwebp has a recent commit to allow the disabling of runtime intrinsic detection but sadly this arrived after the most recent 0.4.3 release - see webmproject/libwebp@46305ca

If the jhbuild tool allows you to build WebP from its git repo rather than the 0.4.3 published release then the new --disable-avx2, --disable-sse4.1 and --disable-sse2 flags will then be available to hopefully mask this buggy behaviour.

@jcupitt
Copy link
Member

jcupitt commented Oct 15, 2015

I think my instinct would be against building from git: it would mean we'd get slightly different code every time we made a binary and could make reproducing problems even harder.

Let's just remove Windows webp support for now. I'll make another 8.1.1 build.

@jcupitt
Copy link
Member

jcupitt commented Oct 15, 2015

Also, great detective work on this annoying issue, thank you Lovell!

@lovell
Copy link
Member Author

lovell commented Oct 15, 2015

Thanks John, totally agree with removing libwebp for now, at least until the next published release.

An alternative solution, should anyone need WebP on Windows in the near future, could be to attempt to apply the single .patch to 0.4.3.

@jcupitt
Copy link
Member

jcupitt commented Oct 15, 2015

I've put up a new test zip:

http://www.vips.ecs.soton.ac.uk/development/vips-dev-8.1.1-2.zip

You can get jhbuild to build from a particular point in git, so we could also get the patch that way, but let's not bother. I'm sure there will be a 0.4.4 pretty soon with this all resolved.

jcupitt added a commit that referenced this issue Oct 15, 2015
see notes in module file, also #11
@lovell
Copy link
Member Author

lovell commented Oct 15, 2015

All tests now passing, thank you. https://ci.appveyor.com/project/lovell/sharp/build/198

@jcupitt
Copy link
Member

jcupitt commented Oct 15, 2015

All the vips tests pass too, so let's bless this as 8.1.1.

I've put a copy of the same zip in the /supported/win32 areas as vips-dev-8.1.1.zip, so you should be able to update your build script with confidence.

Thanks again for nailing this bug, and sorry for the mess up with : parsing. I promise not to touch that bit of code again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants