Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: macos-latest is changing to macos-14 ARM runners #3092

Merged
merged 3 commits into from
Apr 29, 2024

Conversation

henryiii
Copy link
Member

@henryiii henryiii commented Apr 24, 2024

Committed via https://github.com/asottile/all-repos

The previous meaning was macos-12, now it is switching to macos-14.

@jpivarski
Copy link
Member

On all PRs, the packaging-test.yml, build_cpp_wheels for MacOS (macos-13 or macos-latest) have been failing with this segfault:

  ../../../../../../../../../Users/runner/work/awkward/awkward/tests/test_1345_avro_reader.py Fatal Python error: Segmentation fault
  
  Current thread 0x00007ff8592527c0 (most recent call first):
    File "/private/var/folders/bc/7ycrpl196j7dsckgtn2m6g800000gn/T/cibw-run-imck_09d/cp38-macosx_universal2/venv-test-x86_64/lib/python3.8/site-packages/awkward/_connect/avro.py", line 81 in __init__
    File "/private/var/folders/bc/7ycrpl196j7dsckgtn2m6g800000gn/T/cibw-run-imck_09d/cp38-macosx_universal2/venv-test-x86_64/lib/python3.8/site-packages/awkward/operations/ak_from_avro_file.py", line 48 in from_avro_file
    File "/private/var/folders/bc/7ycrpl196j7dsckgtn2m6g800000gn/T/cibw-run-imck_09d/cp38-macosx_universal2/venv-test-x86_64/lib/python3.8/site-packages/awkward/_dispatch.py", line 39 in dispatch
    File "/Users/runner/work/awkward/awkward/tests/test_1345_avro_reader.py", line 19 in test_int

Fortunately, it's a well-localized segfault; it's always in test_1345_avro_reader.py. In fact, removing that one file makes the test pass.

I used that fact to get the wheel that it built as an artifact and run it on my Mac (ARM M2). With the wheel built by GitHub Actions (macos-13 or macos-latest), I can verify that test_1345_avro_reader.py causes a segfault, and narrowed it down to the single Python line that does it: creating a ForthMachine64 (or ForthMaching32). For instance, here's a minimal reproducer for the segfault:

from awkward.forth import ForthMachine64
tmp = ForthMachine64("123")

However, if I build awkward-cpp on my Mac in the usual way, the above does not cause a segfault, nor does the whole test_1345_avro_reader.py. So there's something special about the way that GitHub Actions (macos-13 or macos-latest) compiles awkward-cpp that makes ForthMachine64 segfault on initialization.

Also, other C++ code in libawkward is fine, even with the GitHub Actions-built wheel. For instance, ArrayBuilder is a C++ class, accessed through pybind11 just like ForthMachine64, but I can construct that and use it without any problems.

So I've got two problems:

  1. This segfault is an important blocker. In principle, we shouldn't merge any PRs with this, but we definitely can't release a version of awkward-cpp until it's fixed.
  2. I don't know how I can diagnose it. Ordinarily, I would set up a workflow in which I modify the C++, recompile, and test until I narrow it down to the cause. However, when I build the C++ locally, there's no segfault. The only method I have of testing this is to have GitHub Actions build the wheel, download it from the webpage, manually install it, etc., and that's way too long of a workflow for guess-and-try.

@henryiii, do you have any idea what might be going on here, at least to get me into a state where I can build awkward-cpp the same way that it's built on GitHub Actions?

  • I've switched to Python 3.8 (in a conda environment).
  • I tried using cibuildwheel, but it complains, thinking that /Users/jpivarski/mambaforge/envs/py38/bin/python is a system-wide Python installation. I was squeamish about installing CPython-3.8.pkg from python.org because I didn't know where it was going to put it and if I'd be able to remove it afterward. And even if I did all of that, cibuildwheel would use the same compiler anyway, right?
  • I tried installing act to run the GitHub Action locally, and then I installed Docker, and after all of that, found that MacOS was the only OS it wouldn't run because you can't put it in a container. The work-around, -P macos-latest=-self-hosted didn't work and probably defeats the purpose, anyway.

@jpivarski
Copy link
Member

The segfault in MacOS ForthMachine64 was introduced after April 24 at 10:32 CDT and before April 24 at 2:06pm CDT.

you, @matthewfeickert, and @jonas-eschle were talking about the MacOS upgrade on GitHub on April 23. Maybe it didn't roll out until the next day? Maybe this new segfault is unrelated?

@henryiii
Copy link
Member Author

henryiii commented Apr 27, 2024

macos-12 is a quick fix. Yes, this is due to the rollout of macos-latest moving from macos-12 to macos-14 (ARM). Interestingly, though, it's not the ARM move, but macOS 13 that breaks it. Probably Xcode 15.

@jpivarski
Copy link
Member

Also, just staring at ArrayBuilder, which can be instantiated with the GitHub Actions-built wheel, and ForthMachine64, which segfaults, I don't see any obvious differences in the class constructors or how they're both wrapped by pybind11.

ForthMachine64 constructor

template <typename T, typename I>
ForthMachineOf<T, I>::ForthMachineOf(const std::string& source,
int64_t stack_max_depth,
int64_t recursion_max_depth,
int64_t string_buffer_size,
int64_t output_initial_size,
double output_resize_factor)
: source_(source)
, output_initial_size_(output_initial_size)
, output_resize_factor_(output_resize_factor)
, stack_buffer_(new T[(size_t)stack_max_depth])
, stack_depth_(0)
, stack_max_depth_(stack_max_depth)
, string_buffer_(new char[(size_t)string_buffer_size])
, string_buffer_size_(string_buffer_size)
, current_inputs_()
, current_outputs_()
, is_ready_(false)
, current_which_(new int64_t[(size_t)recursion_max_depth])
, current_where_(new int64_t[(size_t)recursion_max_depth])
, recursion_current_depth_(0)
, recursion_max_depth_(recursion_max_depth)
, do_recursion_depth_(new int64_t[(size_t)recursion_max_depth])
, do_stop_(new int64_t[(size_t)recursion_max_depth])
, do_i_(new int64_t[(size_t)recursion_max_depth])
, do_current_depth_(0)
, current_error_(util::ForthError::none)
, count_instructions_(0)
, count_reads_(0)
, count_writes_(0)
, count_nanoseconds_(0)
{
std::vector<std::string> tokenized;
std::vector<std::pair<int64_t, int64_t>> linecol;
tokenize(tokenized, linecol);
compile(tokenized, linecol);
}

ForthMachine64 wrapper in pybind11

template <typename T, typename I>
py::class_<ak::ForthMachineOf<T, I>, std::shared_ptr<ak::ForthMachineOf<T, I>>>
make_ForthMachineOf(const py::handle& m, const std::string& name) {
return (py::class_<ak::ForthMachineOf<T, I>,
std::shared_ptr<ak::ForthMachineOf<T, I>>>(m, name.c_str())
.def(py::init([](const std::string& source,
int64_t stack_size,
int64_t recursion_depth,
int64_t string_buffer_size,
int64_t output_initial_size,
double output_resize_factor)
-> std::shared_ptr<ak::ForthMachineOf<T, I>> {
return std::make_shared<ak::ForthMachineOf<T, I>>(source,
stack_size,
recursion_depth,
string_buffer_size,
output_initial_size,
output_resize_factor);
}),
py::arg("source"),
py::arg("stack_size") = 1024,
py::arg("recursion_depth") = 1024,
py::arg("string_buffer_size") = 1024,
py::arg("output_initial_size") = 1024,
py::arg("output_resize_factor") = 1.5)

ArrayBuilder constructor

ArrayBuilder::ArrayBuilder(const BuilderOptions& options)
: builder_(UnknownBuilder::fromempty(options)) { }

ArrayBuilder wrapper in pybind11

py::class_<ak::ArrayBuilder>
make_ArrayBuilder(const py::handle& m, const std::string& name) {
return (py::class_<ak::ArrayBuilder>(m, name.c_str())
.def(py::init([](const int64_t initial, double resize) -> ak::ArrayBuilder {
return ak::ArrayBuilder({initial, resize});
}), py::arg("initial") = 1024, py::arg("resize") = 8)

other than the fact that the ForthMachine64 has more data members (almost all of which are simple std::vectors, while ArrayBuilder holds an object, for options).

In their header files, they're both declared with EXPORT_SYMBOL.

https://github.com/scikit-hep/awkward/blob/main/awkward-cpp/include/awkward/forth/ForthMachine.h

https://github.com/scikit-hep/awkward/blob/main/awkward-cpp/include/awkward/builder/ArrayBuilder.h

(I'm just saving these here, to investigate again later.)

@henryiii
Copy link
Member Author

Do you have the latest Xcode? I think Xcode 15, which is in both images, is likely the change that causes the segfault.

@jpivarski
Copy link
Member

jpivarski commented Apr 27, 2024

I had been trying to figure out what my compiler is. I have the cxx-compiler package installed in my conda environment, which is how I normally control what compiler I'm using, but when I typed which c++, I got files—not symlinks—in /usr/bin.

I'm doing Spotlight search and not finding Xcode. How do I find that?

Xcode is not in the Launchpad, either. I think I minimally installed Xcode to be able to compile from the command line, but I don't have the full GUI.

@jpivarski
Copy link
Member

jpivarski commented Apr 27, 2024

As a quick fix, it would let us merge PRs and maybe even release awkward-cpp (which I was planning to do on Tuesday).

The test passes now:

2024-04-27T00:11:40.8552500Z ../../../../../../../../../Users/runner/work/awkward/awkward/tests/test_1345_avro_reader.py . [  4%]
2024-04-27T00:11:40.8878300Z ...................                                                      [  4%]

If this is a bug in the Xcode compiler, or GitHub's configuration of it because of sed → gsed, asottile, ... reasons, then we can just wait for the fix. I thought I was going to have to debug C++!

As it is, I'd want to merge this working test (and update the test requirements) so that the other PRs can use it.

@jpivarski
Copy link
Member

jpivarski commented Apr 27, 2024

It looks like I have Xcode 15:

% pkgutil --pkg-info=com.apple.pkg.CLTools_Executables
package-id: com.apple.pkg.CLTools_Executables
version: 15.3.0.0.1.1708646388
volume: /
location: /
install-time: 1709827747

That date-stamp is March 7, 2024. I bought this laptop in September 2022 and only installed things like that once, though I let brew update from time to time. I guess Xcode updated itself, or something central in MacOS updated it.

Also,

% gcc --version
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

@jpivarski
Copy link
Member

At least for now, I'm going to merge this so that the other PRs can use it. We'll have to revisit it because just using the older version isn't a solution that we can rely on long-term.

@jpivarski jpivarski merged commit 5e05752 into main Apr 29, 2024
40 of 41 checks passed
@jpivarski jpivarski deleted the all-repos_autofix_all-repos-sed branch April 29, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants