Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run risingwave binary on Mac OS M1 (when linked by zld) #8608

Closed
ahmedriza opened this issue Mar 16, 2023 · 13 comments
Closed

Cannot run risingwave binary on Mac OS M1 (when linked by zld) #8608

ahmedriza opened this issue Mar 16, 2023 · 13 comments
Labels
type/bug Something isn't working
Milestone

Comments

@ahmedriza
Copy link

Describe the bug

I can build the risingwave source on my Mac OS M1 laptop. However, the risingwave binary cannot be run due to a missing symbol. Here's the concise output to demonstrate the error:

$ ./target/debug/risingwave
dyld[14116]: symbol not found in flat namespace (__ZN15protobuf_native2io23DeleteCodedOutputStreamEPN6google8protobuf2io17CodedOutputStreamE)
Abort trap: 6

Using c++filt I can see that symbol as:

$ c++filt __ZN15protobuf_native2io23DeleteCodedOutputStreamEPN6google8protobuf2io17CodedOutputStreamE
protobuf_native::io::DeleteCodedOutputStream(google::protobuf::io::CodedOutputStream*)

Not sure where to look to diagnose this further. Any help is appreciated.

To Reproduce

Build from source and try to run as documented on a Mac OS M1 laptop

./risedev playground

Expected behavior

No response

Additional context

No response

@ahmedriza ahmedriza added the type/bug Something isn't working label Mar 16, 2023
@github-actions github-actions bot added this to the release-0.1.18 milestone Mar 16, 2023
@xxchan
Copy link
Member

xxchan commented Mar 16, 2023

Did you check the developer guide?

Maybe you can try

brew install protobuf

@ahmedriza
Copy link
Author

ahmedriza commented Mar 17, 2023

Yes, I've followed the developer guide and protobuf is installed. If I did not have that, then I would not have been able to compile the code at all, since protoc is required to build the code in the first place.

As mentioned, all of the code compiles fine, but the error is at runtime. I am a bit surprised by this, and in fact libprotobuf is not dynamically linked; it is statically linked. Perhaps something is going wrong during link time.

@CAJan93
Copy link
Contributor

CAJan93 commented Mar 17, 2023

Maybe we could add a dependency to the Makefile.toml that checks if the host has all the required binaries installed? This way we could break early and give devs clear feedback what is missing.

@xxchan
Copy link
Member

xxchan commented Mar 17, 2023

It seems a common protobuf issue. You might find a solution here:
protocolbuffers/protobuf#10571

@xxchan
Copy link
Member

xxchan commented Mar 17, 2023

What's your protobuf version? According the the issue, you might try to upgrade it..

@xxchan
Copy link
Member

xxchan commented Mar 17, 2023

I am a bit surprised by this, and in fact libprotobuf is not dynamically linked; it is statically linked. Perhaps something is going wrong during link time.

This indeed sounds weird. Do you have more diagnostic information like where is the symbol used?

@ahmedriza
Copy link
Author

ahmedriza commented Mar 17, 2023

Maybe we could add a dependency to the Makefile.toml that checks if the host has all the required binaries installed? This way we could break early and give devs clear feedback what is missing.

Note that this is not related to missing binaries or anything of that sort. Please see the explanation above.

@ahmedriza
Copy link
Author

I am a bit surprised by this, and in fact libprotobuf is not dynamically linked; it is statically linked. Perhaps something is going wrong during link time.

This indeed sounds weird. Do you have more diagnostic information like where is the symbol used?

Let me check on the protoc version and precisely what libprotobuf is being linked against.

@ahmedriza
Copy link
Author

ahmedriza commented Mar 18, 2023

This is indeed related to the linker.

  1. I use a different linker (zld) which is faster than lld and is configured in my $HOME/.cargo/config.toml.
[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-undefined", "-C", "link-arg=dynamic_lookup", "-C", "link-arg=-fuse-ld=/opt/bin/zld"]
  1. The risingwave repo contains its own version of .cargo/config.toml which fixes the linker to lld, i.e:
[target.aarch64-apple-darwin]
rustflags = [
    "-Clink-arg=-fuse-ld=/opt/homebrew/opt/llvm/bin/ld64.lld"
]

The outcome is that if I use zld as the linker, then we end up with the error reported above at runtime. The reason why this happens at runtime is because I've used the -undefined and dynamic_lookup linker arguments. If I remove them, then the error will be reported during the build.

However, if I use lld then everything works.

I can see an issue about the Apple linker mentioned in the developer guide

It will be good to mention in the developer guide about the presence of the .cargo/config.toml which provides various linker configurations for different platforms.

It appears that using zld is causing a different issue as I have observed. It's not entirely clear why this fails when zld is used. I have used zld successfully with many Rust projects and not seen an issue like this.

@xxchan
Copy link
Member

xxchan commented Mar 18, 2023

Wow, thanks a lot for finding the cause. This indeed looks interesting 👀

BTW one quick question: How is zld invoked? Shouldn't the repo's .cargo/config.toml take precedence?


Oh I get it, the RUSTFLAGS will be merged

@ahmedriza
Copy link
Author

ahmedriza commented Mar 18, 2023

BTW one quick question: How is zld invoked? Shouldn't the repo's .cargo/config.toml take precedence?

Oh I get it, the RUSTFLAGS will be merged

That's correct. It took me a little while to discover the flags that were coming from the Cargo config in the repo.

I'll try upgrading zld since this could, of course, be a bug in zld.

@xxchan xxchan changed the title Cannot run risingwave binary on Mac OS M1 Cannot run risingwave binary on Mac OS M1 (when linked by zld) Mar 18, 2023
@ahmedriza
Copy link
Author

Looks like zld's developer has decided to stop developing it further and the llvm linker lld has now caught up to where zld was: https://eisel.me/lld. Hence, we can close this issue. Thanks @xxchan for adding the additional notes to the docs.

@CAJan93
Copy link
Contributor

CAJan93 commented Mar 20, 2023

Wild 😮 Thanks a lot for your work 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants