Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.11.1 release seems broken (edit: fixed now) #157

Closed
lazka opened this issue Nov 5, 2022 · 19 comments · Fixed by #159
Closed

1.11.1 release seems broken (edit: fixed now) #157

lazka opened this issue Nov 5, 2022 · 19 comments · Fixed by #159

Comments

@lazka
Copy link

lazka commented Nov 5, 2022

The version released today fails when called like ninja --version and doesn't have a zero return code.

$ ninja --version
1.11.1.git.kitware.jobserver-1
$ echo $? 
245

This makes for example meson fail which calls ninja --version to detect the ninja version.

$ meson _build
The Meson build system
[...]
ERROR: Could not detect Ninja v1.8.2 or newer

This is on Ubuntu 22.04 using Python 3.10.7

@lazka
Copy link
Author

lazka commented Nov 5, 2022

Just calling ninja or ninja --help is also broken, both exit with an error code.

@lazka lazka changed the title [regression] Error when called with "--version" 1.11.1 release seems broken Nov 5, 2022
@mayeut
Copy link
Contributor

mayeut commented Nov 5, 2022

Thanks for the report. Don't know how it passed all tests in the first place because the simple reproducer indeed shows the failure.
@henryiii, @jcfr, can one of you yank the release on PyPI please. I can't do that.

@henryiii
Copy link
Contributor

henryiii commented Nov 5, 2022

I do not have manage permissions for either cmake or ninja. Only @jcfr or @thewtex. I really think yank permissions should match release permissions. Delete, sure, but yanking is non-destructive.

Edit: if it's only a manylinux1 problem, the whole release doesn't need yanking. Just the manylinux1 file (which could maybe just be deleted?)

@henryiii
Copy link
Contributor

henryiii commented Nov 5, 2022

We should make sure we check the return code in the tests. Edit: it looks like we do...

FYI, this is okay on macOS.

@henryiii
Copy link
Contributor

henryiii commented Nov 6, 2022

If you run it without the Python wrapper, it's:

root@87dda27e737d:/# usr/local/lib/python3.10/dist-packages/ninja/data/bin/ninja --version
1.11.1.git.kitware.jobserver-1
Segmentation fault

The released binary seems fine.

@henryiii
Copy link
Contributor

henryiii commented Nov 6, 2022

I think I'll have a rebuild out for the 1.11.1 issue soon. Looks like ninja 1.11.1 isn't compatible with manylinux1. Though why is it passing tests? Anyway, manylinux2010 builds don't suffer the same problem, and I can upload 2010 builds to the same release. ;)

@eli-schwartz
Copy link

eli-schwartz commented Nov 6, 2022

Isn't manylinux1 vs manylinux2010 just about which version of glibc and libstdc++ you build against? Seems bizarre that ninja would be incompatible with anything there. It might fail to run due to missing symbols on too-old systems, but that should not manifest as successfully printing the version, then failing during cleanup+exit.

@henryiii
Copy link
Contributor

henryiii commented Nov 6, 2022

It's segfaulting. It seems to get prettied up a bit by the python subprocess call. I locally rebuilt both manylinux1 and 2010 versions, and only the manylinux1 segfaults. I've uploaded a 2010 version, let's see if that helps - it shouldn't hurt anything, other than the tiny bit of wasted space.

It should have failed the tests, though I think those also run in a manylinux1 container.

I'd say it seems equally unlikely that the ninja developers would still be testing on CentOS, an operating system that has been end-of-life for something like five years.

Technically manylinux1 and even manylinux2010 are also EoL, though due to old pip versions running around, they still are useful to ship. CentOS 8 shipped with pip 9. 🤦

@henryiii
Copy link
Contributor

henryiii commented Nov 6, 2022

Reproduced in #254. The manylinux1 wheel works in manylinux1, but not in newer OSs - that's weird, I thought GLIBC wasn't supposed to be able to do that. Anyway, that means we have the ideal situation: Working manylinux1 wheels on manylinux1, and manylinux2010 wheels everywhere else (unless you have really old pip, then, sorry, but not shipping manylinux1 would be about the same). The 1.11.1 release is saved! :)

FYI, all special archs requires 2014 anyway, so those were fine.

(I'll upload a 32-bit manylinux2010 wheel to be on the safe side, just in case anybody cares about 32-bit linux)

@henryiii henryiii changed the title 1.11.1 release seems broken 1.11.1 release seems broken (edit: fixed now) Nov 6, 2022
@mayeut
Copy link
Contributor

mayeut commented Nov 6, 2022

Thanks for the quick fix @henryiii.
All tests for wheels are indeed running inside the manylinux container used to build said wheel. This is one thing where multibuild differs from cibuildwheel and there might be something to be done but that's another topic for cibuildwheel.
I'll have a look at #158 once I get some understanding of what's going on (debugging through CI though so might be a bit slow)

@mayeut
Copy link
Contributor

mayeut commented Nov 6, 2022

Here's the backtrace for the observed issue (rebuilt the wheel without stripping but still in Release mode because Debug mode does not fail...):

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
#0  0x0000000000000000 in ?? ()
#1  0x0000000000436052 in std::ctype<char>::_M_widen_init() const ()
#2  0x00000000004174fa in (anonymous namespace)::GuessParallelism() [clone .5562] ()
#3  0x000000000041850e in (anonymous namespace)::real_main(int, char**) [clone .6198] ()
#4  0x00000000004044f9 in main ()

The ninja built on manylinux1 works correctly on manylinux1, manylinux2010, manylinux2014, manylinux_2_24 & ubuntu18.04. It segfaults on manylinux_2_28, ubuntu20.04+

I don't know where is the actual bug (I think it could be either a devtoolset-2 bug or a generic libstdc++ bug).
In any case, another possible solution/workaround for this is to continue building on manylinux1 but statically link libstdc++.

@henryiii
Copy link
Contributor

henryiii commented Nov 6, 2022

What's the size difference?

@mayeut
Copy link
Contributor

mayeut commented Nov 6, 2022

What's the size difference?

x86_64:

  • 301 kB vs 144 kB for the wheel
  • 753 kB vs 299 kB for the ninja binary

i686:

  • 317 kB vs 155 kB for the wheel
  • 768 kB vs 285 kB for the ninja binary

PS: we might want to do this on musllinux where wheels (that are grafting libstdc++) are around 700/750 kB

@lazka
Copy link
Author

lazka commented Nov 26, 2022

thanks!

@ibc
Copy link

ibc commented May 16, 2023

So can I use ninja 1.11.1 or not? PR #159 fixing this issue was merged in Nov 26, 2022. However the available 1.11.1 version still dates from august 30, 2022:
https://github.com/ninja-build/ninja/releases

@eli-schwartz
Copy link

This has nothing to do with ninja releases.

It has everything to do with an external binary package :) So what you care about is the dates for https://pypi.org/project/ninja/#files

@lazka
Copy link
Author

lazka commented May 16, 2023

and as pointed out here: #157 (comment) a wheel for newer distros was uploaded back then, so this is "fixed" for any non-ancient distro.

@henryiii
Copy link
Contributor

Yes, a wheel with static linking for old manylinuxes is the only thing that hasn't been uploaded, but no one seems to have cared about that yet, so we haven't made a post release.

@ibc
Copy link

ibc commented May 17, 2023

Clear now, thanks a lot for explaining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants