Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvm 13.0{0,1rc1} on s390x: failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1: 8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit) #53009

Open
df7cb opened this issue Jan 5, 2022 · 12 comments

Comments

@df7cb
Copy link

df7cb commented Jan 5, 2022

PostgreSQL's query JITing broke on s390x when I moved from llvm11 to llvm13:

2022-01-05 16:28:45.980 CET client backend[44733] pg_regress/partition_join ERROR:  failed to JIT module: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)

Seen both with Debian llvm 1:13.0.0-9and 1:13.0.1~+rc1-1~exp3 packages. Architectures other than s390x are fine.

Full build log: https://buildd.debian.org/status/fetch.php?pkg=postgresql-14&arch=s390x&ver=14.1-3&stamp=1638529682&raw=0

Cc: @anarazel @sylvestre
Debian Bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1002029

@LebedevRI
Copy link
Member

LebedevRI commented Jan 5, 2022

I'm not sure if it's the same issue, but i have seen similar failure on ppc64 little-endian:
https://buildd.debian.org/status/fetch.php?pkg=halide&arch=ppc64el&ver=13.0.2-2&stamp=1640086522&raw=0

Unhandled exception: Internal Error at /<<PKGBUILDDIR>>/src/LLVM_Output.cpp:357 triggered by user code at : Warning: module's data layout does not match target machine's
e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512
e-m:e-i64:64-n32:64

@df7cb
Copy link
Author

df7cb commented Jan 5, 2022

PostgreSQL 14 JIT works fine with LLVM 13.0.0 on ppc64el, fwiw.

@anarazel
Copy link
Contributor

This is likely because the s390 target changes its ABI "on its own":
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp#L88

When PG JIT is used CPU features are detected (no point in optimizing for a more generic CPU). If a new enough CPU for vector ABI is detected, the systemz target emits an additional "-v128:64". So this will fail whenever s390x CPU is new enough for
UsesVectorABI() to return true, unless the bitcode files were also generated with a compatible ABI (not normally the case).

It seems quite broken to me to just change the ABI like this. -march=native or such shouldn't trigger an ABI break.

Somewhere on my overflowing TODO list I have an entry to implement a workaround for this. IIRC some Red Hat folks hit this previously.

@df7cb
Copy link
Author

df7cb commented Jan 10, 2022

Just to clarify: This is the build and the tests running in the same environment. (I guess that means the issue is possibly even worse in the wild.)

@df7cb
Copy link
Author

df7cb commented Jan 10, 2022

This is likely because the s390 target changes its ABI "on its own": https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp#L88

This got introduced in e09a1dc by @aniprasad.

Thanks @anarazel for investigating!

@skriesch
Copy link
Member

That is our integrated patch for a workaround at PostgreSQL in openSUSE. This patch has been contributed to PostgreSQL and has not been accepted. But it is better than nothing.
https://build.opensuse.org/package/view_file/openSUSE:Factory:zSystems/postgresql14/0001-jit-Workaround-potential-datalayout-mismatch-on-s390.patch?expand=1

@anarazel
Copy link
Contributor

FWIW, the patch hasn't been rejected, just neglected :(. There were a few minor cleanups needed before merging it, and somehow I lost track at that point. -ETOOMUCHSTUFF

@anarazel
Copy link
Contributor

anarazel commented Feb 16, 2022

@skriesch Btw, have you considered to fix this on the LLVM side of things instead (I mean in a distribution patch)?

@df7cb
Copy link
Author

df7cb commented Feb 16, 2022

Fwiw, in my case the problem is already present when the code is compiled and executed on the very same machine.

But of course it needs to be portable to other machines later as well via the distributed .deb files.

@skriesch
Copy link
Member

skriesch commented Feb 17, 2022

We wanted to do that and we have created a bug report around half a year ago for IBM because of that. The bug report has been rejected with the reason, that it should have been resolved on application level. Then I asked the Product Owner for the reason and the statement was, that it would be self-healing, if all Linux distributions are using z13....
That is an IBM problem.

A lot of research has been required to find the solution with the patch as a workaround. I do not have so much experience with compiler development at the moment. :(

@skriesch
Copy link
Member

skriesch commented Feb 17, 2022

If you are interested, that is the openSUSE bug report related to this topic by our LLVM Developer with a possible suggestion for a solution: https://bugs.llvm.org/show_bug.cgi?id=50386

Oh... No. That has got the state New. Sorry! It is not rejected.

@anarazel
Copy link
Contributor

@skriesch Thanks for the explanation / references. A bit depressing :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants