Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling and running jffi for powerpc64-AIX gives illegal instructions errors #53

Closed
Taywee opened this issue May 3, 2018 · 6 comments · Fixed by #54
Closed

Compiling and running jffi for powerpc64-AIX gives illegal instructions errors #53

Taywee opened this issue May 3, 2018 · 6 comments · Fixed by #54

Comments

@Taywee
Copy link
Contributor

Taywee commented May 3, 2018

System Details

Running on powerpc64-AIX 7.1, with gcc 4.9.4, build vars as follows:

export OBJECT_MODE=64
export CONFIG_SHELL=/opt/freeware/bin/bash
export CONFIG_ENV_ARGS=/opt/freeware/bin/bash

export CC="gcc -mcpu=power4 -mtune=power4 -maix64 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGE_FILES"

export CFLAGS="-DSYSV -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_AIX61 -D_AIX71 -D_ALL_SOURCE -DFUNCPROTO=15 -O -I/opt/freeware/include"
export LD=ld
export LDFLAGS="-L/opt/freeware/lib64 -L/opt/freeware/lib -Wl,-blibpath:/opt/freeware/lib64:/opt/freeware/lib:/usr/lib:/lib -Wl,-bmaxdata:0x80000000"

export PATH=/usr/bin:/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/vac/bin:/usr/vacpp/bin:/usr/ccs/bin:/usr/dt/bin:/usr/opt/perl5/bin:/opt/freeware/bin:/opt/freeware/sbin:/usr/local/bin:/usr/lib/instl

export JAVA_HOME=/usr/java7_64
export ANT_HOME=/opt/apache-ant

export PATH="${JAVA_HOME}/bin:${ANT_HOME}/bin:$PATH:/opt/freeware/bin"

Problems Description

Compiling and running a test program with jffi throws errors and crashes. Trying an unmodified ant jar gives the following error:

gmake[3]: Leaving directory '/home/tcr/jffi/build/jni/libffi-ppc64-aix'
gmake[2]: Leaving directory '/home/tcr/jffi/build/jni/libffi-ppc64-aix'
gmake[1]: Leaving directory '/home/tcr/jffi/build/jni/libffi-ppc64-aix'
In file included from /home/tcr/jffi/build/jni/libffi-ppc64-aix/include/ffi.h:67:0,
                 from /home/tcr/jffi/jni/jffi/jffi.h:45,
                 from /home/tcr/jffi/jni/jffi/Array.c:39:
/home/tcr/jffi/build/jni/libffi-ppc64-aix/include/ffitarget.h:159:5: error: "_CALL_ELF" is not defined [-Werror=undef]
 #if _CALL_ELF == 2

Modifying the GNUmakefile with the following change allows compilation, though:

diff --git a/jni/GNUmakefile b/jni/GNUmakefile
index 85ab6f2..7942f45 100755
--- a/jni/GNUmakefile
+++ b/jni/GNUmakefile
@@ -66,8 +66,10 @@ OFLAGS = -O2 $(JFLAGS)
 # MacOS headers aren't completely warning free, so turn them off
 WERROR = -Werror
 ifneq ($(OS),darwin)
+ifneq ($(OS),aix)
   WFLAGS += -Wundef $(WERROR)
 endif
+endif
 WFLAGS += -W -Wall -Wno-unused -Wno-parentheses -Wno-unused-parameter
 PICFLAGS = -fPIC
 SOFLAGS = # Filled in for each OS specifically

Compiling and running this C program with the library works (both static and dynamic):

#include <ffi.h>
#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char **argv)
{
    if (argc < 2) {
        fputs("Must specify a library on the command line\n", stderr);
        return 1;
    }
    // Load in the library itself
    void *libhandle = dlopen(argv[1], RTLD_LOCAL | RTLD_LAZY | RTLD_MEMBER);
    if (!libhandle) {
        fputs("Could not link library\n", stderr);
        fprintf(stderr, "%s\n", dlerror());
        return 1;
    }
    void *fun = dlsym(libhandle, "sqlite3_libversion");
    if (!fun) {
        fputs("Could not find symbol in library\n", stderr);
        fprintf(stderr, "%s\n", dlerror());
        return 1;
    }

    // Set up libffi stuff
    ffi_cif cif;
    ffi_type *arg_types[0];

    /* Initialize the cif */
    if (ffi_prep_cif(&cif, FFI_DEFAULT_ABI, 0, &ffi_type_pointer, arg_types) == FFI_OK)
    {
        ffi_arg retval;
        ffi_call(&cif, fun, &retval, NULL);
        puts((const char *)retval);
    }
    dlclose(libhandle);
    return 0;
}
$ $CC -o ffitest ffitest.c $CFLAGS $LDFLAGS jffi/build/jni/libffi-ppc64-aix/.libs/libffi.a
$ ./ffitest 'libsqlite3.a(libsqlite3.so.0)'
3.8.8.3
$ $CC -o ffitest ffitest.c $CFLAGS $LDFLAGS jffi/build/jni/libffi-ppc64-aix/.libs/libffi.so.6 
$ LD_LIBRARY_PATH=jffi/build/jni/libffi-ppc64-aix/.libs ./ffitest 'libsqlite3.a(libsqlite3.so.0)'
3.8.8.3
tcr@api-lou-jum01-p ~ $ LD_LIBRARY_PATH=jffi/build/jni/libffi-ppc64-aix/.libs ldd ./ffitest  
./ffitest needs:
         /usr/lib/libc.a(shr_64.o)
         jffi/build/jni/libffi-ppc64-aix/.libs/libffi.so.6
         /unix
         /usr/lib/libcrypt.a(shr_64.o)
         /opt/freeware/lib64/libgcc_s.a(shr.o)
         /usr/lib/libpthreads.a(shr_xpg5_64.o)

However, a simple Java example using jffi, when compiled with this in the archive, fails:

package com.absperf.ffitest;

import com.kenai.jffi.Invoker;
import com.kenai.jffi.Type;
import com.kenai.jffi.Library;
import com.kenai.jffi.Function;
import com.kenai.jffi.MemoryIO;
import com.kenai.jffi.HeapInvocationBuffer;
import com.kenai.jffi.ArrayFlags;

public class App 
{
    public static void main(String[] args) {
        Library lib = Library.openLibrary(args[0], Library.LOCAL | Library.NOW);
        if (lib == null) {
            System.err.print("Error!:  ");
            System.err.println(lib.getLastError());
            System.exit(1);
        }
        System.out.println("GetSymbolAddress");
        long func_address = lib.getSymbolAddress("sqlite3_libversion");
        if (func_address == 0) {
            System.err.print("Error; could not find function address:  ");
            System.err.println(lib.getLastError());
            System.exit(1);
        }
        System.out.println("make function");
        Function func = new Function(func_address, Type.POINTER);
        System.out.println("make buffer");
        HeapInvocationBuffer buf = new HeapInvocationBuffer(func);
        MemoryIO mem = MemoryIO.getInstance();
        System.out.println("put pointers");
        Invoker invoker = Invoker.getInstance();
        System.out.println("invoke function");
        long address = invoker.invokeAddress(func, buf);
        System.out.println(new String(mem.getZeroTerminatedByteArray(address)));
    }
}

On Linux:

$ java -jar ./target/ffitest-1.0-SNAPSHOT-shaded.jar libsqlite3.so
GetSymbolAddress
make function
make buffer
put pointers
invoke function
3.23.1

On AIX:

$ /usr/java7_64/bin/java -jar ~/ffitest-1.0-SNAPSHOT-shaded.jar 'libsqlite3.a(libsqlite3.so.0)'
GetSymbolAddress
make function
make buffer
put pointers
invoke function
Unhandled exception
Type=Illegal instruction vmState=0x00040000
J9Generic_Signal_Number=00000010 Signal_Number=00000004 Error_Value=00000000 Signal_Code=0000001e
Handler1=09001000A0277D10 Handler2=09001000A026DAE0
R0=0000000000000000 R1=000001001012F690 R2=0000000000000000 R3=0000010010423D50
R4=0000010010423EF8 R5=000001001012F760 R6=0000010010423EF8 R7=0900000000654094
R8=0000010010423C70 R9=000001001012F760 R10=09001000A027B6D0 R11=0000000000000000
R12=000000008200042B R13=000001001013A800 R14=00000000600257A0 R15=0000000060000100
R16=0000000000000007 R17=0000000000000000 R18=09001000A027C4E8 R19=09001000A07254C0
R20=0000000060107EE0 R21=00000000600257E8 R22=0000000000000000 R23=0000010010474090
R24=09001000A071D4A8 R25=000000000000007E R26=0000000000000000 R27=0000000000000000
R28=000001001012F720 R29=09001000A071D4A8 R30=000001001012F940 R31=0000000000000008
IAR=0000000000000000 LR=09000000024D3C40 MSR=A00000000000D032 CTR=0000000000000000
CR=2000242220000002 FPSCR=8200000000000000 XER=2000000282000000
FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR1 404d800000000000 (f: 0.000000, d: 5.900000e+01)
FPR2 41e0000000000000 (f: 0.000000, d: 2.147484e+09)
FPR3 3fe8000000000000 (f: 0.000000, d: 7.500000e-01)
FPR4 3fa9999a00000000 (f: 0.000000, d: 5.000001e-02)
FPR5 412e848000000000 (f: 0.000000, d: 1.000000e+06)
FPR6 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
FPR7 4530000000000000 (f: 0.000000, d: 1.934281e+25)
FPR8 0072002400530069 (f: 5439593.000000, d: 1.602102e-306)
FPR9 006e0067006c0065 (f: 7077989.000000, d: 1.335114e-306)
FPR10 0074006f006e0048 (f: 7209032.000000, d: 1.780210e-306)
FPR11 006f006c00640065 (f: 6553701.000000, d: 1.379619e-306)
FPR12 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
FPR13 404d800000000000 (f: 0.000000, d: 5.900000e+01)
FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Target=2_60_20170718_357001 (AIX 7.1)
CPU=ppc64 (4 logical CPUs) (0x80000000 RAM)
----------- Stack Backtrace -----------
ffi_call+0x98 (0x09000000024D374C [jffi8605287488869115587.so+0x174c])
invokeArrayWithObjects_+0x57c (0x09000000024E8864 [jffi8605287488869115587.so+0x16864])
Java_com_kenai_jffi_Foreign_invokeArrayReturnInt+0x30 (0x09000000024E8B48 [jffi8605287488869115587.so+0x16b48])
(0x09000000006D2348 [libj9vm26.so+0x8a348])
(0x090000000065B5A8 [libj9vm26.so+0x135a8])
(0x090000000067CCAC [libj9vm26.so+0x34cac])
(0x09000000009CECE0 [libj9prt26.so+0x2ce0])
(0x090000000067CDDC [libj9vm26.so+0x34ddc])
(0x090000000065BCFC [libj9vm26.so+0x13cfc])
(0x090000000066241C [libj9vm26.so+0x1a41c])
JavaMain+0x3dc (0x0000010000003CC0 [java+0x3cc0])
_pthread_body+0xf0 (0x0900000000568E14 [libpthreads.a+0x3e14])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2018/05/03 23:24:28 - please wait.
JVMDUMP032I JVM requested System dump using '/home/tcr/core.20180503.232428.19267668.0001.dmp' in response to an event
Note: "Enable full CORE dump" in smit is set to FALSE and as a result there will be limited threading information in core file.
JVMDUMP010I System dump written to /home/tcr/core.20180503.232428.19267668.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/tcr/javacore.20180503.232428.19267668.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/tcr/javacore.20180503.232428.19267668.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/tcr/Snap.20180503.232428.19267668.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/tcr/Snap.20180503.232428.19267668.0003.trc
JVMDUMP013I Processed dump event "gpf", detail "".

I'm not quite sure what's going wrong here, and my JNI knowledge is limited at best, but I can probably move from here and isolate what the issue is. I'm pretty sure it's not in libffi, but somewhere in the JNI C code, based on the backtrace, but I'm not sure why, or how to avoid it. I'm not sure if it's in my invocation of the JNI build. I can go a few places from here to experiment, but I'm mostly poking around in the dark here.

@Taywee
Copy link
Contributor Author

Taywee commented May 4, 2018

Doing some good old printf debugging, I'm seeing some interesting behavior. I've changed Library.c so that dlsym will also immediately invoke the function like so:

    addr = dl_sym(j2p(handle), sym);
    printf("dlsym addr: %p\n", addr);
    printf("dlsym test: %s\n", (((const char *(*)(void))addr)()));

And as a test to see if the issue was happening in libffi, I changed the invoke in Invoke.c to also manually invoke in the same way:

    puts("manual call test");
    printf("long: %p\n", function);
    printf("ptr: %p\n", j2p(function));
    printf("test: %s\n", (((const char *(*)(void))j2p(function))()));
    FAULTPROT_CTX(env, ctx, ffi_call(&ctx->cif, FFI_FN(j2p(function)), retval, ffiArgs), );

The run looks like this:

GetSymbolAddress
dlsym addr: 9001000a071d4a8
dlsym test: 3.8.8.3
make function
make buffer
put pointers
invoke function
manual call test
long: 9001000a071d4a8
ptr: 9001000a071d4a8
Unhandled exception
Type=Illegal instruction vmState=0x00040000
...

The address is the same and doesn't get invalidated, but it only works in the first location. My knowledge on the specifics of how AIX handles its dynamic loading and memory mapping are limited, but perhaps the opened library is becoming invalidated before the invocation, or maybe the address is only valid from particular memory pages, just as a guess in the dark. There are some notes about dlopen's behavior with "inter-module references" and "intra-module references" when the runtime linker is or isn't invoked, so that might be related. This might just need the -brtl linker option. I'll run some more tests around the rtl option and see if that might be a necessary prerequisite.

@Taywee
Copy link
Contributor Author

Taywee commented May 5, 2018

I found out what's happening, but I'm not completely sure why. I added output to the dlclose:

JNIEXPORT void JNICALL
Java_com_kenai_jffi_Foreign_dlclose(JNIEnv* env, jclass cls, jlong handle)
{
    puts("closing module");
    dl_close(j2p(handle));
}

This is the relevant Java code, just as a reminder:

        System.out.println("make function");
        Function func = new Function(func_address, Type.POINTER);
        System.out.println("make buffer");
        HeapInvocationBuffer buf = new HeapInvocationBuffer(func);
        System.out.println("get memory instance");
        MemoryIO mem = MemoryIO.getInstance();
        System.out.println("put pointers");
        Invoker invoker = Invoker.getInstance();
        System.out.println("invoke function");
        long address = invoker.invokeAddress(func, buf);
        System.out.println(new String(mem.getZeroTerminatedByteArray(address)));

And the output looks like this:

GetSymbolAddress
dlsym addr: 9001000a071d4a8
dlsym test: 3.8.8.3
make function
make buffer
closing module
get memory instance
put pointers
invoke function
manual call test
long: 9001000a071d4a8
ptr: 9001000a071d4a8
Unhandled exception
Type=Illegal instruction vmState=0x00040000

For some reason, creating the HeapInvocationBuffer causes JNI to close the library handle. I'm not sure if this happens on other platforms and just doesn't die due to different dlclose semantics or if it's a JVM difference that changes when things like these are invoked (this is the IBM J9 JVM).

@Taywee
Copy link
Contributor Author

Taywee commented May 7, 2018

I think I found one source of that issue. J9's garbage collector appears to collect objects that are detected to be unused. The Library is collected (and finalized) almost as soon as it isn't used or referenced anymore, and jffi communicates addresses and relationships in a raw form. Because functions don't keep a reference to the library, it is assumed unused and cleaned up early, before the function is even called. Adding a simple reference to the library afterward (like printing out a string representation) actually gets it called successfully in the invoke function, which completes, though I run into another issue in trying to retrieve the value from memory. I haven't figured out what this issue is coming from (it's a segfault), but I think it might be another early-free.

@Taywee
Copy link
Contributor Author

Taywee commented May 7, 2018

Segfault wasn't from an early free. For some reason, the java program isn't getting the full address back in its long:

retptr: 9000000024bd440
invoked function
address: 24bd440

Looks like it's getting truncated somewhere along the line.

@Taywee
Copy link
Contributor Author

Taywee commented May 7, 2018

It was these lines in Invoker:

        public final long invokeAddress(CallContext ctx, long function, HeapInvocationBuffer buffer) {
            return ((long)invokeInt(ctx, function, buffer)) & ADDRESS_MASK;
        }

For some reason, Platform.getPlatform().addressSize() returns 32, even though AIX64 uses 64-bit addresses, so the address is getting truncated. This might be a GCC issue. It appears that GCC doesn't define any ppc64 defines, only the plain ppc ones.

@Taywee
Copy link
Contributor Author

Taywee commented May 7, 2018

With some modifications, I get a successful program running:

$ /usr/java7_64/bin/java -jar ~/ffitest-1.0-SNAPSHOT-shaded.jar 'libsqlite3.a(libsqlite3.so.0)'
GetSymbolAddress
dlsym addr: 9001000a071d4a8
dlsym test: 3.8.8.3
make function
make buffer
Getting CPU architecture
ppc64
get memory instance
put pointers
invoke function
manual call test
long: 9001000a071d4a8
ptr: 9001000a071d4a8
test: 3.8.8.3
retptr: 9000000024bd440
invoked function
address: 9000000024bd440
3.8.8.3
com.kenai.jffi.Library@d879af27
com.kenai.jffi.HeapInvocationBuffer@e32d3392
com.kenai.jffi.MemoryIO$UnsafeImpl64@e091afec
com.kenai.jffi.Function@b5d86f15
com.kenai.jffi.Invoker$LP64@f4902716

There's a pretty good chance that this means that AIX64 can be built and run just like the other platforms, but this doesn't fix the issues with J9's aggressive GC. Either way, this problem is largely fixed, and the other issues with the GC early free isn't directly related. I'll make a pull request for the AIX changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant