Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native-image] naitive doesn't run in AWS Lambda custrom runtime beacuse of "Util_sun_misc_Signal.ensureInitialized: CSunMiscSignal.open() failed." #841

Closed
kencharos opened this issue Dec 5, 2018 · 27 comments

Comments

@kencharos
Copy link

I tried running naitive image in AWS Lambda custrom runtime.
But it did not run. Error Summary is bellow.

Util_sun_misc_Signal.ensureInitialized: CSunMiscSignal.create() failed. errno: 38 Function not implemented
VMError.shouldNotReachHere: Util_sun_misc_Signal.ensureInitialized: CSunMiscSignal.open() failed.

Sample code is bellow:

package sample;

public class Main {
    public static void main(String[] args) {
        System.out.println("Hello Graal " + args[0]);
    }
}

Then, build native image command is bellow:

native-image --no-server \
            --class-path aws-graal.jar \
	     -H:EnableURLProtocols=http \
	     -H:Name=aws-graal \
	     -H:Class=sample.Main \
	     -H:+ReportUnsupportedElementsAtRuntime \
	     -H:+AllowVMInspection

I run native-image in Docker oracle/graalvm-ce:1.0.0-rc9 .
And this image run in local.

lambda's full log is bellow:

START RequestId: 2fef123c-f827-11e8-ad2a-13d1ebd52648 Version: $LATEST
Util_sun_misc_Signal.ensureInitialized: CSunMiscSignal.create() failed. errno: 38 Function not implemented
VMError.shouldNotReachHere: Util_sun_misc_Signal.ensureInitialized: CSunMiscSignal.open() failed.

JavaFrameAnchor dump:

No anchors

DeoptStubPointer address: 0000000000000000

TopFrame info:

Lookup TotalFrameSize in CodeInfoTable:
SourceTotalFrameSize 96

VMThreads info:

VMThread 0000000001899010 STATUS_IN_JAVA (safepoints disabled) java.lang.Thread@0x7fe894099ea0

VM Thread State for current thread 0000000001899010:

0 (8 bytes): com.oracle.svm.jni.JNIThreadLocalEnvironment.jniFunctions = (bytes) 
0000000001899010: 0000000000000000

8 (32 bytes): com.oracle.svm.core.genscavenge.ThreadLocalAllocation.pinnedTLAB = (bytes) 
0000000001899018: 0000000000000000 0000000000000000
0000000001899028: 0000000000000000 0000000000000000


40 (32 bytes): com.oracle.svm.core.genscavenge.ThreadLocalAllocation.regularTLAB = (bytes) 
0000000001899038: 00007fe893c00000 00007fe893d00000
0000000001899048: 00007fe893c01490 0000000000000000


72 (8 bytes): com.oracle.svm.core.genscavenge.PinnedAllocatorImpl.openPinnedAllocator = (Object) null
80 (8 bytes): com.oracle.svm.core.heap.NoAllocationVerifier.openVerifiers = (Object) null
88 (8 bytes): com.oracle.svm.core.jdk.IdentityHashCodeSupport.hashCodeGeneratorTL = (Object) null
96 (8 bytes): com.oracle.svm.core.snippets.SnippetRuntime.currentException = (Object) null
104 (8 bytes): com.oracle.svm.core.thread.JavaThreads.currentThread = (Object) java.lang.Thread 00007fe894099ea0
112 (8 bytes): com.oracle.svm.core.thread.ThreadingSupportImpl.activeTimer = (Object) null
120 (8 bytes): com.oracle.svm.jni.JNIThreadLocalHandles.handles = (Object) null
128 (8 bytes): com.oracle.svm.jni.JNIThreadLocalPendingException.pendingException = (Object) null
136 (8 bytes): com.oracle.svm.jni.JNIThreadLocalPinnedObjects.pinnedObjectsListHead = (Object) null
144 (8 bytes): com.oracle.svm.jni.JNIThreadOwnedMonitors.ownedMonitors = (Object) null
152 (8 bytes): com.oracle.svm.core.genscavenge.ThreadLocalAllocation.freeList = (Word) 0 0000000000000000
160 (8 bytes): com.oracle.svm.core.stack.JavaFrameAnchors.lastAnchor = (Word) 0 0000000000000000
168 (8 bytes): com.oracle.svm.core.thread.VMThreads.IsolateTL = (Word) 140636889464832 00007fe893d6c000
176 (8 bytes): com.oracle.svm.core.thread.VMThreads.OSThreadIdTL = (Word) 140636912592704 00007fe89537a740
184 (8 bytes): com.oracle.svm.core.thread.VMThreads.nextTL = (Word) 0 0000000000000000
192 (4 bytes): com.oracle.svm.core.thread.Safepoint.safepointRequested = (int) -8883 ffffdd4d
196 (4 bytes): com.oracle.svm.core.thread.Safepoint.safepointRequestedValueBeforeSafepoint = (int) 0 00000000
200 (4 bytes): com.oracle.svm.core.thread.ThreadingSupportImpl.currentPauseDepth = (int) 0 00000000
204 (4 bytes): com.oracle.svm.core.thread.VMOperationControl.isLockOwner = (int) 0 00000000
208 (4 bytes): com.oracle.svm.core.thread.VMThreads$StatusSupport.safepointsDisabledTL = (int) 1 00000001
212 (4 bytes): com.oracle.svm.core.thread.VMThreads$StatusSupport.statusTL = (int) 1 00000001

VMOperation dump:

No VMOperation in progress


RuntimeCodeCache dump:

== [Recent RuntimeCodeCache operations: ]

== [RuntimeCodeCache: 0 methods]

Deoptimizer dump:

== [Recent Deoptimizer Events: 
]

Dump Counters:


Raw Stacktrace:

00007ffe5a23fa20: 0000000000000001 00007ffe5a23fa90
00007ffe5a23fa30: 00007fe893d6c168 0000000000441eef
00007ffe5a23fa40: 0000000000000026 0000000000000026
00007ffe5a23fa50: 0000000000000001 00007fe893f6c998
00007ffe5a23fa60: 00007fe8940310c8 00007fe893d6c000
00007ffe5a23fa70: 00007fe893f6fd00 00000000004772df
00007ffe5a23fa80: 00000000002c0b38 00007fe8940310c8
00007ffe5a23fa90: 00007fe893f6fd00 000000000046089e
00007ffe5a23faa0: 00007fe893d6c000 00007fe8940310c8
00007ffe5a23fab0: 0000002694111c78 00007fe8940310c8
00007ffe5a23fac0: 0000000000000000 ffffffff00000000
00007ffe5a23fad0: 00007fe89409b010 00007ffe5a23faa0
00007ffe5a23fae0: 0000000000000000 00000000004612de
00007ffe5a23faf0: 0000000000000000 0000000000000000
00007ffe5a23fb00: 00007fe893c01130 00007ffe5a23fb10
00007ffe5a23fb10: 000000000040c670 0000000000000000
00007ffe5a23fb20: 00007ffe5a23fb70 00007ffe5a384e47
00007ffe5a23fb30: 0000000000000000 00007fe893f6c3f8
00007ffe5a23fb40: 0000000000000000 00000000006c0246
00007ffe5a23fb50: 00007ffe5a23fb90 0000000000000001
00007ffe5a23fb60: 0000000000000032 00007fe893f6c3f8
00007ffe5a23fb70: 00007fe893c01478 00000000004033f9
00007ffe5a23fb80: 0000000000000000 00000000004d4646
00007ffe5a23fb90: 00007fe893c01478 0000000000409309
00007ffe5a23fba0: 0000000000095729 00000000004092de
00007ffe5a23fbb0: 00007fe893f583b8 00007ffe5a23fba0
00007ffe5a23fbc0: 00007fe893f58138 000000000043ea1d
00007ffe5a23fbd0: 00007fe893f70120 0000000400000003
00007ffe5a23fbe0: 00007fe893f58138 00007fe894112030
00007ffe5a23fbf0: 00007fe894099f50 0000000000404b69
00007ffe5a23fc00: 00007fe893d6c158 00007fe893d6c000
00007ffe5a23fc10: 00000000000000d8 0000000000437239
00007ffe5a23fc20: 00007fe893c010c8 00007fe893c010c8
00007ffe5a23fc30: 00007fe893f6c568 00007ffe5a23fd88
00007ffe5a23fc40: 0000000293d6c000 000000000040bea9
00007ffe5a23fc50: 0000000000000000 00007fe8951753a7
00007ffe5a23fc60: 0000000000000001 00007ffe5a23fd88
00007ffe5a23fc70: 000000022f2f2f2f 0000000000000000
00007ffe5a23fc80: 0000000000000000 0000000000000000
00007ffe5a23fc90: 00007ffe5a23fd80 0000000000402000
00007ffe5a23fca0: 0000000000000000 00007fe894347c05
00007ffe5a23fcb0: 0000000000000000 00007ffe5a23fd88
00007ffe5a23fcc0: 0000000200000000 000000000040be30
00007ffe5a23fcd0: 0000000000000000 ef2543c0fea25749
00007ffe5a23fce0: 0000000000402000 00007ffe5a23fd80
00007ffe5a23fcf0: 0000000000000000 0000000000000000
00007ffe5a23fd00: 10d9f78707c25749 10f46ba809d85749
00007ffe5a23fd10: 00007ffe00000000 0000000000000000
00007ffe5a23fd20: 0000000000000000 00000000007ba430
00007ffe5a23fd30: 00007ffe5a23fd88 0000000000000002
00007ffe5a23fd40: 0000000000000000 0000000000000000
00007ffe5a23fd50: 0000000000402000 00007ffe5a23fd80
00007ffe5a23fd60: 0000000000000000 0000000000402029
00007ffe5a23fd70: 00007ffe5a23fd78 000000000000001c
00007ffe5a23fd80: 0000000000000002 00007ffe5a240acf
00007ffe5a23fd90: 00007ffe5a240ae3 0000000000000000
00007ffe5a23fda0: 00007ffe5a240b08 00007ffe5a240b2c
00007ffe5a23fdb0: 00007ffe5a240ca7 00007ffe5a240cd7
00007ffe5a23fdc0: 00007ffe5a240cf2 00007ffe5a240d53
00007ffe5a23fdd0: 00007ffe5a240d79 00007ffe5a240dc9
00007ffe5a23fde0: 00007ffe5a240df3 00007ffe5a240e16
00007ffe5a23fdf0: 00007ffe5a240e42 00007ffe5a240e64
00007ffe5a23fe00: 00007ffe5a240e72 00007ffe5a240eb1
00007ffe5a23fe10: 00007ffe5a240ed1 00007ffe5a240ee2

Stacktrace Stage0:

RSP 00007ffe5a23fa20 RIP 0000000000440b3b FrameSize 96
RSP 00007ffe5a23fa80 RIP 00000000004772df FrameSize 32
RSP 00007ffe5a23faa0 RIP 000000000046089e FrameSize 80
RSP 00007ffe5a23faf0 RIP 00000000004612de FrameSize 96
RSP 00007ffe5a23fb50 RIP 00000000006c0246 FrameSize 48
RSP 00007ffe5a23fb80 RIP 00000000004033f9 FrameSize 32
RSP 00007ffe5a23fba0 RIP 0000000000409309 FrameSize 16
RSP 00007ffe5a23fbb0 RIP 00000000004092de FrameSize 32
RSP 00007ffe5a23fbd0 RIP 000000000043ea1d FrameSize 48
RSP 00007ffe5a23fc00 RIP 0000000000404b69 FrameSize 80
RSP 00007ffe5a23fc50 RIP 000000000040bea9 FrameSize 1

Stacktrace Stage1:

RSP 00007ffe5a23fa20 RIP 0000000000440b3b com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fa80 RIP 00000000004772df com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23faa0 RIP 000000000046089e com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23faf0 RIP 00000000004612de com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fb50 RIP 00000000006c0246 com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fb80 RIP 00000000004033f9 com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fba0 RIP 0000000000409309 com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fbb0 RIP 00000000004092de com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fbd0 RIP 000000000043ea1d com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fc00 RIP 0000000000404b69 com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code
RSP 00007ffe5a23fc50 RIP 000000000040bea9 com.oracle.svm.core.code.ImageCodeInfo@0x7fe893ffb650 name = image code

Full Stacktrace:

RSP 00007ffe5a23fa20 RIP 0000000000440b3b [image code] com.oracle.svm.core.jdk.VMErrorSubstitutions.shutdown(VMErrorSubstitutions.java:146)
RSP 00007ffe5a23fa80 RIP 00000000004772df [image code] com.oracle.svm.core.jdk.Target_com_oracle_svm_core_util_VMError.shouldNotReachHere(VMErrorSubstitutions.java:63)
RSP 00007ffe5a23faa0 RIP 000000000046089e [image code] com.oracle.svm.core.posix.Util_jdk_internal_misc_Signal.ensureInitialized(SunMiscSubstitutions.java:171)
RSP 00007ffe5a23faf0 RIP 00000000004612de [image code] com.oracle.svm.core.posix.Util_jdk_internal_misc_Signal.numberFromName(SunMiscSubstitutions.java:218)
RSP 00007ffe5a23fb50 RIP 00000000006c0246 [image code] com.oracle.svm.core.posix.Target_jdk_internal_misc_Signal.findSignal(SunMiscSubstitutions.java:74)
RSP 00007ffe5a23fb50 RIP 00000000006c0246 [image code] sun.misc.Signal.<init>(Signal.java:140)
RSP 00007ffe5a23fb80 RIP 00000000004033f9 [image code] com.oracle.svm.core.DumpAllStacks.install(VMInspection.java:87)
RSP 00007ffe5a23fba0 RIP 0000000000409309 [image code] com.oracle.svm.core.VMInspection.lambda$beforeAnalysis$0(VMInspection.java:68)
RSP 00007ffe5a23fbb0 RIP 00000000004092de [image code] com.oracle.svm.core.VMInspection$$Lambda$148/1175531108.run(Unknown Source)
RSP 00007ffe5a23fbd0 RIP 000000000043ea1d [image code] com.oracle.svm.core.jdk.RuntimeSupport.executeHooks(RuntimeSupport.java:142)
RSP 00007ffe5a23fc00 RIP 0000000000404b69 [image code] com.oracle.svm.core.jdk.RuntimeSupport.executeStartupHooks(RuntimeSupport.java:87)
RSP 00007ffe5a23fc00 RIP 0000000000404b69 [image code] com.oracle.svm.core.JavaMainWrapper.run(JavaMainWrapper.java:156)
RSP 00007ffe5a23fc50 RIP 000000000040bea9 [image code] com.oracle.svm.core.code.CEntryPointCallStubs.com_002eoracle_002esvm_002ecore_002eJavaMainWrapper_002erun_0028int_002corg_002egraalvm_002enativeimage_002ec_002etype_002eCCharPointerPointer_0029(generated:0)

[Native image heap boundaries: 
ReadOnly Primitives: 0x7fe893d6c008 .. 0x7fe893f6c1b0
ReadOnly References: 0x7fe893f6c388 .. 0x7fe894030198
Writable Primitives: 0x7fe894031000 .. 0x7fe8940992a8
Writable References: 0x7fe8940992c0 .. 0x7fe8941238d8]


[Heap:
[Young generation: 
[youngSpace:
aligned: 0/0 unaligned: 0/0]]
[Old generation: 
[fromSpace:
aligned: 0/0 unaligned: 0/0]
[toSpace:
aligned: 0/0 unaligned: 0/0]
[pinnedFromSpace:
aligned: 0/0 unaligned: 0/0]
[pinnedToSpace:
aligned: 0/0 unaligned: 0/0]]
[Unused:
aligned: 0/0]]

END RequestId: 2fef123c-f827-11e8-ad2a-13d1ebd52648
REPORT RequestId: 2fef123c-f827-11e8-ad2a-13d1ebd52648	Init Duration: 75.25 ms	Duration: 868.50 ms	Billed Duration: 1000 ms Memory Size: 128 MB	Max Memory Used: 37 MB	
RequestId: 2fef123c-f827-11e8-ad2a-13d1ebd52648 Error: Runtime exited with error: exit status 99
Runtime.ExitError
@Peter-B-Kessler
Copy link
Contributor

I can not reproduce your issue. However, I only tried with our GraalVM/CE/1.0.0-rc9 docker image, running on Darwin, not in an AWS instance.

First I built a GraalVM/CE/1.0.0-rc9 docker image:

$ mkdir GitHub-841-CSunMiscSignal
$ cd GitHub-841-CSunMiscSignal
GitHub-841-CSunMiscSignal $ git clone https://github.com/oracle/docker-images.git
Cloning into 'docker-images'...
.... git clone elided ....

GitHub-841-CSunMiscSignal $ cd docker-images/GraalVM/CE/1.0.0-rc9
GitHub-841-CSunMiscSignal/docker-images/GraalVM/CE/1.0.0-rc9 $ docker build -t oracle/graalvm-ce:1.0.0-rc9 .
Sending build context to Docker daemon  14.34kB
.... docker build elided ....
Successfully built 21129e063c22
Successfully tagged oracle/graalvm-ce:1.0.0-rc9

In that docker container I tried your test program:

GitHub-841-CSunMiscSignal/docker-images/GraalVM/CE/1.0.0-rc9 $ docker run --interactive --tty --name oracle-graalvm-ce --rm 21129e063c22 bash
bash-4.2# PS1='oracle-graalvm-ce $ '
oracle-graalvm-ce $ cd /tmp
oracle-graalvm-ce $ mkdir sample
oracle-graalvm-ce $ cat > sample/Main.java << -EOF-
package sample;

public class Main {
    public static void main(String[] args) {
        System.out.println("Hello Graal " + args[0]);
    }
}
-EOF-
oracle-graalvm-ce $ javac sample/Main.java
oracle-graalvm-ce $ jar cfe sampleMain.jar sample.Main sample/Main.class
oracle-graalvm-ce $ java -jar sampleMain.jar "using java"
Hello Graal using java

and built an image for that:

oracle-graalvm-ce $ native-image --no-server --class-path sampleMain.jar -H:Name=sampleMainImage -H:Class=sample.Main
.... image build elided ....

oracle-graalvm-ce $ ./sampleMainImage "from image"
Hello Graal from image

(Just to make sure it was not one of your additional options that was causing the issue, I built another image with all your options and ran it:)

oracle-graalvm-ce $ native-image --no-server \
            --class-path sampleMain.jar \
	     -H:EnableURLProtocols=http \
	     -H:Name=aws-graal \
	     -H:Class=sample.Main \
	     -H:+ReportUnsupportedElementsAtRuntime \
	     -H:+AllowVMInspection
.... image build elided ....
oracle-graalvm-ce $ ./aws-graal "from aws-graal"
Hello Graal from aws-graal

Since the issue seems to be around signal handling I built an image for one of our internal tests (not shown) that raises SIGHUP and handles it:

oracle-graalvm-ce $ cat > SignalTest.java << -EOF-
....
-EOF-
oracle-graalvm-ce $ javac SignalTest.java
oracle-graalvm-ce $ java SignalTest
[SignalTests.test00HandleAndRaise:
[SignalTests.handleAndRaise:
  signal:  name: HUP  number: 1
  Registering new handler:
  oldHandler: java.lang.Terminator$1@2a139a55
  waiting for at most 10 seconds for the signal to be handled.
[SignalTests.handleAndRaise().new SignalHandler() {...}:  dispatched:  name: HUP  number: 1
]
  raised: 1
  That took: 2201000 nanoseconds.
]
]

and built an image for that

oracle-graalvm-ce $ native-image --no-server --class-path . -H:Class=SignalTest -H:Name=signalTestImage
.... image build elided ....
oracle-graalvm-ce $ ./signalTestImage
[SignalTests.test00HandleAndRaise:
[SignalTests.handleAndRaise:
  signal:  name: HUP  number: 1
  Registering new handler:
  oldHandler: sun.misc.NativeSignalHandler@7f063d250520
  waiting for at most 10 seconds for the signal to be handled.
[SignalTests.handleAndRaise().new SignalHandler() {...}:  dispatched:  name: HUP  number: 1
]
  raised: 1
  That took: 392600 nanoseconds.
]
]

and that all seemed to work.

I suspect the problem is that the docker container you are running in does not have the semaphore functions from <semaphore.h>: sem_open, sem_close, etc. Those functions are part of the POSIX.1-2001 platform. We use those functions to coordinate between the Java signal handling mechanism and the C signal dispatch mechanism. I would have expected you to get some kind of unresolved symbol error when loading the image if your platform did not have them, instead of the "errno: 38 Function not implemented" message at runtime.

During image start-up we register a signal handler for SIGSEGV so we can get dumps (like the one you quote above) on segmentation faults. I suspect that is the immediate cause of your issue. If that is the issue (and you don't need POSIX-level signal handling yourself), you could try building an image with -R:-InstallSegfaultHandler to avoid installing our segmentation fault signal handler.

I admit that the error message is less than enlightening. I will go make it clearer. At least I should report which function says it is not implemented.

@kencharos
Copy link
Author

Thanks for response.

I tried -R:-InstallSegfaultHandler , but it dosen't woks in lambda.
Then, I changed -H:+AllowVMInspection to -H:-AllowVMInspection , and It works.

@Peter-B-Kessler
Copy link
Contributor

Building an image with -H:-AllowVMInspection disables signal handlers for dumping stacks on {{SIGQUIT}}, dumping the heap on {{SIGUSR1}}, and a dumping runtime compilations on {{SIGUSR2}}. That continues to suggest that the problem is with starting up the signal handling mechanism, which uses the POSIX semaphore methods. If your application uses signal in some other way (directly or indirectly), then I suspect it will fail when it tries to register signal handlers.

When you say "it doesn't work in lambda", are you trying to run the native image for your application in the Docker image for {{GraalVM/CE/1.0.0-rc9}}, or are you trying to take that native image and run it on some different platform (on AWS)? A complete log that would let me reproduce the problem would be useful.

@kencharos
Copy link
Author

I try building a native image in GraalVM/CE/1.0.0-rc9 image. And try running it on AWS Lambda Custrom Runtime.

AWS Lambda Custom Runtime is here.

And I pushed reproduce sample at https://github.com/kencharos/graal-native-on-lambda

I attached log file and deployed zip file that contains image and bootstrap shell.

log.txt
aws-graal.zip

@Peter-B-Kessler
Copy link
Contributor

In general, you can not build a native image on one platform and execute it on different platform. For example, building a native image on the GraalVM/CE/1.0.0-rc9 Docker container and running it on an Amazon Linux platform. The native image contains values from the image building platform. For example, sizes and layouts of platform data structures, operating system constants, etc. If those values are different on the execution platform, things will not work well.

In your case, your execution platform seems not to have the semaphore functions from the pthread library.

When I look at the libraries required to run the sampleMainImage i built in the GraalVM/CE/1.0.0-rc9 Docker container, I see

oracle-graalvm-ce $ ldd ./sampleMainImage
	linux-vdso.so.1 =>  (0x00007ffc277e3000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdb4f993000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fdb4f78f000)
	libz.so.1 => /lib64/libz.so.1 (0x00007fdb4f579000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fdb4f371000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fdb4f13a000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fdb4ed6d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fdb4fbaf000)
	libfreebl3.so => /lib64/libfreebl3.so (0x00007fdb4eb6a000)

and if I look in /lib64/libpthread.so.0 I see the semaphore functions (sem_open, sem_close, etc.) used by our implementation of sun.misc.Signal

oracle-graalvm-ce $ nm /lib64/libpthread.so.0 | grep ' T sem_'
000000000000d860 T sem_close
000000000000d000 T sem_destroy@@GLIBC_2.2.5
000000000000da90 T sem_getvalue@@GLIBC_2.2.5
000000000000cfd0 T sem_init@@GLIBC_2.2.5
000000000000d3e0 T sem_open
000000000000de70 T sem_post@@GLIBC_2.2.5
000000000000de10 T sem_timedwait
000000000000dc20 T sem_trywait@@GLIBC_2.2.5
000000000000d960 T sem_unlink
000000000000dbe0 T sem_wait@@GLIBC_2.2.5

What do you see as the libraries needed by your image, and do you see the definitions of the semaphore functions in those libraries on your execution platform? I am asking to see if that is the cause of the "Function not implemented" message.

What happens if you build the image on Amazon Linux and execute it there? E.g., using one of our prebuilt binaries for Linux?

@kencharos
Copy link
Author

I tried to build native image in EC2 using same AMI.
AMI detail is here https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html
I install GCC, zlibc-devel, glic-devel, and download graal prebuilt binaries (RC10),

native image works EC2. but I got same error when it rut in Lambda.

mn and ldd output is bellow:

[ec2-user@ip-172-31-37-54 ~]$ ldd aws-graal
        linux-vdso.so.1 =>  (0x00007fffba81c000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03bd879000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f03bd675000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f03bd45e000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f03bd227000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f03bd01f000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f03bcc51000)
        /lib64/ld-linux-x86-64.so.2 (0x0000563d4421c000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007f03bca4f000)

[ec2-user@ip-172-31-37-54 ~]$ nm /lib64/libpthread.so.0 | grep ' T sem_'
000000000000d870 T sem_close
000000000000d010 T sem_destroy@@GLIBC_2.2.5
000000000000daa0 T sem_getvalue@@GLIBC_2.2.5
000000000000cfe0 T sem_init@@GLIBC_2.2.5
000000000000d3f0 T sem_open
000000000000de80 T sem_post@@GLIBC_2.2.5
000000000000de20 T sem_timedwait
000000000000dc30 T sem_trywait@@GLIBC_2.2.5
000000000000d970 T sem_unlink
000000000000dbf0 T sem_wait@@GLIBC_2.2.5

Then, I run nm /lib64/libpthread.so.0 | grep ' T sem_' in bootstrap shell, and got bellow:

000000000000d790 T sem_close
000000000000cfa0 T sem_destroy@@GLIBC_2.2.5
000000000000d980 T sem_getvalue@@GLIBC_2.2.5
000000000000cf70 T sem_init@@GLIBC_2.2.5
000000000000d340 T sem_open
000000000000dd70 T sem_post@@GLIBC_2.2.5
000000000000dd10 T sem_timedwait
000000000000db10 T sem_trywait@@GLIBC_2.2.5
000000000000d890 T sem_unlink
000000000000dad0 T sem_wait@@GLIBC_2.2.5

So that, sem_open exists in Lambda enviroment. But it doen't woks.

@pekd
Copy link
Contributor

pekd commented Dec 11, 2018

I would have expected you to get some kind of unresolved symbol error when loading the image if your platform did not have them, instead of the "errno: 38 Function not implemented" message at runtime.

Looks like the kernel on AWS Lambda does not provide the necessary syscall(s). errno = 38 means ENOSYS. Basically it means the kernel running on AWS Lambda is strange. You cannot find such problems by inspecting the contents of libraries. In order to find out what is going on, you could use strace (if this is possible on AWS Lambda).

@hjander
Copy link

hjander commented Dec 11, 2018

Hi, i have done the same as OP and have the same problem. I just wanted to point to this article and the according github that is basically doing the same as OP and me tried. The repo uses rc9 as far as i can tell and everything seems to work fine. Did not try it though.

@Peter-B-Kessler
Copy link
Contributor

If AWS Lambda does not have the syscall for sem_open (and presumably sem_unlink, but I don't think we get that far), does anyone know if AWS Lambda supports sem_init (and sem_destroy)?

@graemerocher
Copy link
Member

Also hit this issue, any hope for a resolution? GraalVM + AWS Lambda Custom Runtimes could be very promising if we can get the Kernel differences resolved

@plutext
Copy link

plutext commented Feb 2, 2019

Works for me (I was using rc10); fwiw, I'm doing my builds on Manjaro (4.19 kernel).

See also https://qiita.com/kencharos/items/69e43965515f368bc4a3

@graemerocher
Copy link
Member

@plutext thanks for the tip, got it working

@Peter-B-Kessler
Copy link
Contributor

Do I understand the workaround? @plutext says that the solution is to build on "Manjaro (4.19 kernel)". Do you then also have run with that kernel in the AWS Lambda Custom Runtimes? Can we come up with a general recommendation for GraalVM users running on AWS?

@Peter-B-Kessler
Copy link
Contributor

Peter-B-Kessler commented Feb 5, 2019

If the workaround is sufficient, can this issue be closed?

@plutext
Copy link

plutext commented Feb 5, 2019

@Peter-B-Kessler I build on Manjaro (4.19 kernel) but there is nothing particularly special about that, I don't think (unless I was lucky!). kencharos builds in a docker image: https://github.com/kencharos/try-graal-lambda/blob/master/AmazonGraal/Dockerfile

No need to run with that kernel at run time (nor do i think you could). Just zip up the native image and whatever else you need, and upload to lambda.

A custom runtime's entry point is an executable file named bootstrap. The bootstrap file can be the runtime, or it can invoke another file that creates the runtime.

You can code/include the bootstrap functionality in your native image, in which case you name your native image bootstrap. This seems the best way performance-wise. Or you can use the example bootstrap at https://docs.aws.amazon.com/lambda/latest/dg/runtimes-walkthrough.html to invoke the native image. That's an easy way to get started.

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 12, 2019

I would say "not"; the workaround appears essentially to disable all signal handlers so that the code in question is never called.

I would say that it might be a better idea to come up with some different signal-handling solution than using POSIX named semaphores. If a semaphore must be used - in preference to, say, a POSIX thread mutex or something simpler - then why not use sem_init to establish an unnamed, unshared semaphore?

@Peter-B-Kessler
Copy link
Contributor

We use POSIX named semaphores because they are available on Darwin and Linux. Unnamed semaphores are not available on Darwin. Having platform-specific code is a maintenance burden, especially for platforms we do not have in our testing infrastructure.

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 13, 2019

Why does it need to be a semaphore at all? What about using a pthread_mutex_t or something?

The problem (in case it wasn't clear) is that sem_open is apparently simply unsupported on AWS.

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 14, 2019

In the very, very worst case, maybe a pipe...

@pmlopes
Copy link

pmlopes commented Mar 14, 2019

native images can run on AWS lambda: https://github.com/pmlopes/aws-lambda-native-vertx

In the example above we successfully built images on:

  • Fedora 29 64bit (with GraalVM rc12/rc13)
  • Windows 10 Pro with WSL Ubuntu

And the produced binaries do run on AWS Lambda.

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 14, 2019

Are you using signal handlers?

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 21, 2019

@Peter-B-Kessler is there a reason for relying on GCC-specific builtins in this code e.g. __sync_val_compare_and_swap and so forth? If you're already requiring GCC in this way... why not use the C11 standard constructs instead?

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 26, 2019

Why does it need to be a semaphore at all? What about using a pthread_mutex_t or something?

It looks like the design of signal handling doesn't employ a single-threaded sigwait strategy, instead relying on regular process-wide async signal handlers. As such, the handler needs a way to unblock a waiting thread that's safe for use within a signal handler. The sem_post function fits this bill, as do read/write. So it's either use an anonymous semaphore (but only on Linux), switch to pipes (maybe use eventfd on Linux because it's better), or else rework the signal handling to be thread-based instead (probably a big job).

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 26, 2019

Hmm I don't understand how/why I just unassigned @Peter-B-Kessler just by replying to a comment. GitHub confounds again.

dmlloyd added a commit to dmlloyd/graal that referenced this issue Mar 26, 2019
timbaer added a commit to timbaer/happy-stars that referenced this issue Feb 6, 2020
* We have two lambdas, one for each context
* Each lambda can CreateReadDelete
* The last parts of the frontend stack are removed (cleanup)

* TODO:
** Add tests
** Generalize the handler code (too many duplications)
** Add persistence
** Find out why native deployment fails on AWS Lambdas (same errors as stated here: oracle/graal#841)
** Add routing to via proper URL (happy-stars.play.ideas.de ??)
** Think about better deployment strategie (atm, we need to execute mvn two times for the two handler-jars to get created, the second time without "clean" goal to not delete the first jar-file)
** Think about multi-user support (two frontend devs doing the challenge parallel)
@borkdude
Copy link

@dmlloyd Should this issue have been fixed? As of which GraalVM version? @dainiusjocas still ran into it because babashka uses a PIPE signal handler. We had to work around this using an environment variable for now.

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 23, 2020

No, the issue is not fixed. It turns out the anonymous sem_post doesn't work either. So, the signal handling code still cannot be run on AWS. In Quarkus we've worked around the problem using system properties to disable our signal handlers, but the problem of not having clean shutdown or handling for other signals on AWS remains.

@dmlloyd
Copy link
Contributor

dmlloyd commented Mar 23, 2020

One idea might be to use eventfd, which can be interacted with using just read/write system calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants