Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DuckDB amalgamation instead of shared libraries #40

Merged
merged 37 commits into from
Sep 27, 2022

Conversation

begelundmuller
Copy link
Contributor

@begelundmuller begelundmuller commented Aug 25, 2022

See issue #38 for context.

This is a minimal PR that uses the DuckDB amalgamation to compile from source instead of relying upon a shared library. A few notes:

  • No environment variables or flags required to test/run/build.
  • This should work on any platform that supports CGO (incl. Windows).
  • On first build, it compiles DuckDB, which is excruciatingly slow (~10 mins). Go caches the build on subsequent runs.
  • It's necessary to check the amalgamation into the repository (like in go-sqlite3).

Next steps:

  • Test on Linux and Windows.
  • Double-check the compiler and linker flags set in duckdb.go.
  • Add build flag to optionally link to libduckdb instead of compiling from source.
  • What do you think of pre-compiling and checking in static libraries ('.a', not '.so') for common platforms? That's what rogchap/v8go does (see deps folder). That would still get rid of the build flags and produce a static binary, but you wouldn't have to wait for DuckDB to compile from source on most systems.

@marcboeker
Copy link
Owner

Thanks for the PR. I've tested it successfully under macOS with a build time of around 6 minutes. It's really cool to get rid of external dependencies.

Test on Linux and Windows.

I've tried to compile it on a MacBook Pro 2014 running Windows through Bootcamp. But after 12 hours it was still not done. The CPU usage on this Mac was around 20%. I've tried to increase the number of cores via -j4 but this didn't work either. Have you had success compiling it under Windows in a reasonable time?
Maybe we could add a flag to make use of multiple cores while compiling?

Double-check the compiler and linker flags set in duckdb.go.

What is linking against ws2_32.dll needed for?

Add build flag to optionally link to libduckdb instead of compiling from source.

This is a great suggestion, especially if you want to reduce the initial build time or cross compile for another architecture without having to install several different cross compile chains.

What do you think of pre-compiling and checking in static libraries ('.a', not '.so') for common platforms? That's what rogchap/v8go does (see deps folder). That would still get rid of the build flags and produce a static binary, but you wouldn't have to wait for DuckDB to compile from source on most systems.

Would it make sense to just download the different DuckDB libs and check them in? Or do you want to compile them separately on your own?

One thing I have been thinking about is, that we should explain how to cross compile for other platforms and OSes. I've tried to use a cross compile chain on my M1 Mac for a 64bit linux, but GOARCH=amd64 GOOS=linux CGO_ENABLED=1 CC=x86_64-unknown-linux-gnu-gcc go build -o linux examples/simple.go failed with the error ld: symbol(s) not found for architecture arm64 even though I've explicitly selected amd64.

I think your PR is an awesome addition to go-duckdb as it simplifies a lot. Hopefully we can merge this soon.

@begelundmuller
Copy link
Contributor Author

begelundmuller commented Sep 12, 2022

Hey, sorry for the delay here, but finally had time to work some more on it.

Firstly, I've updated the PR with several changes:

  • It now pre-compiles and uses static libraries for Linux (Intel) and macOS (Intel and ARM). It defaults to compiling from source on other platforms. (It should be relatively straightforward to add pre-compilation for more platforms, with the caveat that it increases the download size.)
  • I've added two build tags: duckdb_from_source forces compilation from source and duckdb_use_lib to dynamically link DuckDB (like the old approach). Obviously, you can't use both tags at the same time.
  • I've updated the README to reflect these changes, and also added a CONTRIBUTING file that describes how to upgrade to a new version of DuckDB.

Now, to answer your questions:

  1. Have you had success compiling it under Windows in a reasonable time? I haven't built it on Windows. However, I was able to successfully cross-compile from macOS to Windows using Zig, which has a C/C++ cross-compiler (see this link). I'm able to successfully run the resulting binary for examples/simple.go on Windows. It's obviously very experimental. If you want to try it: (initially, it throws two errors from the DuckDB side that you can resolve by removing the DUCKDB_API macro from the lines it mentions)
CGO_ENABLED=1 GOOS=windows GOARCH=amd64 \
  CC="zig cc -target x86_64-windows-gnu" \
  CXX="zig c++ -target x86_64-windows-gnu" \
  go build -trimpath ./examples/simple.go 
  1. Why link with ws2_32.dll? I don't know, but DuckDB does it and it doesn't compile for Windows without it.

  2. Would it make sense to just download the different DuckDB libs and check them in? Unfortunately, they only publish dynamic libraries (.so and .dylib), they no longer publish static libraries (.a). However, If you compile DuckDB from source using make, it will output static libraries, although not a single one (the third party dependencies are compiled separately, so you have to copy over a list of static libraries from several nested paths). Considering that, I thought it was simpler and more predictable to just directly pre-compile the amalgamation. However, it might make sense to lobby DuckDB to publish static library builds of the amalgamation.

  3. One thing I have been thinking about is, that we should explain how to cross compile for other platforms and OSes. Totally agree, although cross-compiling CGo is a pretty deep rabbit hole. I've done some research on it, and thought it was safest just to use GH Actions ability to run builds on multiple platforms. But if you really want to cross-compile, it seems like techknowlogick/xgo or goreleaser/goreleaser-cross are the best options. The aforementioned approach using Zig seems very promising, but is pretty experimental at the moment (for example, it appears that the binaries it creates for macOS can't handle C++ exceptions).

Remaining next steps (if you like this direction):

  • Document how to use it on Windows (or maybe even add pre-compilation for Windows)?
  • Ask the DuckDB maintainers for a review of the compiler and linker flags set in the cgo_*.go files

@marcboeker
Copy link
Owner

Thanks for the additional build tags, the fix for the return of an empty row and your answers to my question. So far I've reviewed and tested it on macOS and it looks good to me. I like the new flexibility derived by the build tags a lot.

The pipeline has some hiccups with rules which it thinks are missing in the Makefile, but are there. Do you have any idea why?

We can ask the DuckDB creators to review the compiler and linker flags and then merge the PR. In my opinion the missing building blocks for Windows can be added after the merge, so that we can get the release out in the hands of testers quickly. What do you think?

@begelundmuller
Copy link
Contributor Author

Hey, a few comments:

  • I tidied up some gcc compiler flags (I had originally copied them from v8go, but they don't seem to do anything important at this stage, and are not found in the sqlite3 project). Every remaining flag is necessary to make the build work – except for the -O3 optimizer flag, which appears to be DuckDB's recommended optimization level for production (as far as I can tell, Go doesn't itself set optimization flags for CGo code – but I might be wrong).
  • I think the main risks right now is that static linking might not be perfectly configured. DuckDB is getting statically linked, but the need to link in libc++ in cgo_static.go doesn't feel quite right to me. But I don't think this should be a blocker, everything seems to work – we can fix if anyone encounters a problem with this approach.
  • I updated the Github workflow to build and push the static libraries on any push to a non-master branch that changes the DuckDB amalgamation files. It failed before because the old approach couldn't correctly deal with cross-repo PRs (where the code is in another repo than the repo where the PR was filed). To show that the builds do work, here's a run that succeeded in building and pushing fresh static libs. The new workflow rule is:
push:
    branches-ignore:
      - master
    paths:
      - "duckdb.cpp"
      - "duckdb.h"
      - "duckdb.hpp"

By the way, I found this HN thread where someone requested a static DuckDB build for Go: https://news.ycombinator.com/item?id=31531219 😄

@marcboeker
Copy link
Owner

Hi @begelundmuller

Thanks for the adjustments. I've tested the latest version on macOS successfully. Are you fine with merging the current state so that further development is based on the new static builds?

That HN thread is very interesting. Great to see, that there is now a solution to this problem.

@begelundmuller
Copy link
Contributor Author

Yeah, it'd be great to merge this initial functionality!

@marcboeker marcboeker merged commit 2568f13 into marcboeker:master Sep 27, 2022
@marcboeker
Copy link
Owner

Awesome, it's merged now. Thank you for this great contribution, which makes using this driver much easier!

@begelundmuller
Copy link
Contributor Author

Nice – and thank you for building and maintaining this useful driver!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants