Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native-image] Getting bad address when writing to socket (PosixJavaNIOSubstitutions) #400

Closed
sundbp opened this issue May 3, 2018 · 56 comments
Assignees

Comments

@sundbp
Copy link

sundbp commented May 3, 2018

I'm trying to make a CLI tool for accessing a grpc API run using native-image. It's using grpc java libs, and hence grpc-netty. I've managed to make an executable, and via a reflection config file I dealt with netty's ReflectiveChannelFactory. I can see that I can actually create the grpc client and see a few TCP packets between my client and server. So far so good.

When I actually try to make an API call I get an error relating to a failed write to the socket. I tried both on OSX and Linux, same exact outcome.

The stacktrace is:

Exception in thread "main" java.lang.reflect.InvocationTargetException
	at java.lang.Throwable.<init>(Throwable.java:310)
	at java.lang.Exception.<init>(Exception.java:102)
	at java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:89)
	at java.lang.reflect.InvocationTargetException.<init>(InvocationTargetException.java:72)
	at com.oracle.svm.reflect.proxies.Proxy_1_main_main.invoke(Unknown Source)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.oracle.svm.core.JavaMainWrapper.run(JavaMainWrapper.java:199)
	at Lcom/oracle/svm/core/code/CEntryPointCallStubs;.com_002eoracle_002esvm_002ecore_002eJavaMainWrapper_002erun_0028int_002corg_002egraalvm_002enativeimage_002ec_002etype_002eCCharPointerPointer_0029(generated:0)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
	at java.lang.Throwable.<init>(Throwable.java:287)
	at java.lang.Exception.<init>(Exception.java:84)
	at java.lang.RuntimeException.<init>(RuntimeException.java:80)
	at io.grpc.StatusRuntimeException.<init>(StatusRuntimeException.java:41)
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:221)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:202)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:131)
	at permafrost.api.PermafrostGrpc$PermafrostBlockingStub.health(PermafrostGrpc.java:1779)
	at permafrost.client.PermafrostClientGrpc._health(client.clj:36)
	at permafrost.client$health.invokeStatic(client.clj:21)
	at permafrost.client$health.invoke(client.clj:19)
	at permafrost.cli.health$health.invokeStatic(health.clj:22)
	at permafrost.cli.health$health.invoke(health.clj:20)
	at permafrost.cli.health$fn__1228.invokeStatic(health.clj:54)
	at permafrost.cli.health$fn__1228.invoke(health.clj:47)
	at permafrost.cli.main$dispatch_handler.invokeStatic(main.clj:92)
	at permafrost.cli.main$dispatch_handler.invoke(main.clj:80)
	at permafrost.cli.main$process_cmd.invokeStatic(main.clj:128)
	at permafrost.cli.main$process_cmd.invoke(main.clj:104)
	at permafrost.cli.main$_main.invokeStatic(main.clj:154)
	at permafrost.cli.main$_main.doInvoke(main.clj:138)
	at clojure.lang.RestFn.applyTo(RestFn.java:137)
	at permafrost.cli.main.main(Unknown Source)
	... 4 more
Caused by: java.io.IOException: Bad address
	at java.lang.Throwable.<init>(Throwable.java:265)
	at java.lang.Exception.<init>(Exception.java:66)
	at java.io.IOException.<init>(IOException.java:58)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions.throwIOExceptionWithLastError(PosixJavaNIOSubstitutions.java:236)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions.convertLongReturnVal(PosixJavaNIOSubstitutions.java:304)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions.convertLongReturnVal(PosixJavaNIOSubstitutions.java:285)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions$Target_sun_nio_ch_FileDispatcherImpl.writev0(PosixJavaNIOSubstitutions.java:1352)
	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
	at sun.nio.ch.IOUtil.write(IOUtil.java:148)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
	at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:418)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:360)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1376)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:201)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.grpc.netty.ProtocolNegotiators$AbstractBufferingHandler.writeBufferedAndRemove(ProtocolNegotiators.java:585)
	at io.grpc.netty.ProtocolNegotiators$BufferUntilChannelActiveHandler.channelActive(ProtocolNegotiators.java:681)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.channelActive(Http2ConnectionHandler.java:421)
	at io.grpc.netty.AbstractNettyHandler.channelActive(AbstractNettyHandler.java:71)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1402)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199)
	at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:921)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:311)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:341)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
	at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:222)
	at Lcom/oracle/svm/core/code/CEntryPointCallStubs;.com_002eoracle_002esvm_002ecore_002eposix_002ethread_002ePosixJavaThreads_002epthreadStartRoutine_0028com_002eoracle_002esvm_002ecore_002eposix_002ethread_002ePosixJavaThreads_0024ThreadStartData_0029(generated:0)

I'm feeling close to getting it all to work, but stumped at this point now. I'm hoping that more experienced eyes may make something out of why the writev() fails here?

As I said, on connecting the grpc client to the server I can see a few packets of negotiations being sent so it seems to have the "ability to write to socket" working somewhere inside the image..

Very encouraged by the startup time after using native-image! Makes clojure cli tools nice to use.

Thanks!

@cstancu
Copy link
Member

cstancu commented May 3, 2018

Hi @sundbp, thank you for the report. This seems like an issue in our Posix NIO implementation. We'll take a look at it. Can you provide the app that you are trying to run so we can reproduce?

@sundbp
Copy link
Author

sundbp commented May 4, 2018 via email

@sundbp
Copy link
Author

sundbp commented May 4, 2018

A cut down version that replicates it can be found here: https://gitlab.com/sundbp/native-image-grpc-test

The README has instructions for how to reproduce. Please let me know if you have issues reproducing it!

@sundbp sundbp changed the title Getting bad address when writing to socket (PosixJavaNIOSubstitutions) [native-image] Getting bad address when writing to socket (PosixJavaNIOSubstitutions) May 4, 2018
@cstancu
Copy link
Member

cstancu commented May 5, 2018

@sundbp thank you for providing a cut down version and for the detailed steps to reproduce the issue. We'll run it and let you know what we find.

@sundbp
Copy link
Author

sundbp commented May 9, 2018

Did you manage to reproduce from that example repo? Thanks

@cstancu
Copy link
Member

cstancu commented May 12, 2018

I was able to reproduce, thank you! Looking at the log output of ./create-exe.sh I see a lot of warnings of the form:

RecomputeFieldValue.FieldOffset automatic substitution failed. The automatic substitution registration was attempted because a call to sun.misc.Unsafe.objectFieldOffset(Field) was detected in the static initializer of com.google.protobuf.UnsafeUtil. Add a RecomputeFieldValue.FieldOffset manual substitution for com.google.protobuf.UnsafeUtil. 

These are generated by SubstrateVM and are quite serious. (As a side node: we should increase the visibility of this kind of messages especially when they are intermingled with log messages from the static initializers of the library that we are trying to build, like it happens with the Netty DEBUG log messages). The reason for these messages is that com.google.protobuf.UnsafeUtil uses the Java unsafe API and this needs a special configuration to work on SubstrateVM. Initially I thought that this might be the reason for the failure: writing at a wrong offset results in memory corruption. (By the way, we will publish an article next week that includes details about the use of unsafe on SubstrateVM.) However, after fixing these I still get the java.io.IOException: Bad address, so I guess the problem is really in our PosixJavaNIOSubstitutions implementation.

@sundbp
Copy link
Author

sundbp commented May 12, 2018 via email

@cstancu
Copy link
Member

cstancu commented May 12, 2018

Definitely, we have a mechanism that allows you to do that, although the end goal is to automate the process of detecting and patching unsafe operations. We have a blog article scheduled for next week that goes into details.

@sundbp
Copy link
Author

sundbp commented May 15, 2018

Excellent - Looking forward to reading the blog article! (and follow up on what goes wrong with PosixJavaNio!)

@sundbp
Copy link
Author

sundbp commented May 22, 2018

I've now read the blog post - great! (referring to this post: https://medium.com/graalvm/instant-netty-startup-using-graalvm-native-image-generation-ed6f14ff7692)

@cstancu would you mind making a PR with the unsafe substitutions you made for my example repo? I'd love to see if it matches up with how I'd do it after reading the blog post.

@cstancu
Copy link
Member

cstancu commented May 23, 2018

Actually the substitutions that I used were quite simple. For protobuf I cheated and essentially disabled unsafe memory access by just returning null from com.google.protobuf.UnsafeUtil.getUnsafe(). Protobuf is built with a failsafe mechanism if unsafe memory access is not available on a platform. For JCTools is just the same substitution as in the article. Here is the entire substitution file:

import com.oracle.svm.core.annotate.Alias;
import com.oracle.svm.core.annotate.RecomputeFieldValue;
import com.oracle.svm.core.annotate.RecomputeFieldValue.Kind;
import com.oracle.svm.core.annotate.Substitute;
import com.oracle.svm.core.annotate.TargetClass;

@TargetClass(className = "com.google.protobuf.UnsafeUtil")
final class Target_com_google_protobuf_UnsafeUtil {

    @Substitute
    static sun.misc.Unsafe getUnsafe() {
        return null;
    }
}

@TargetClass(className = "io.netty.util.internal.shaded.org.jctools.util.UnsafeRefArrayAccess")
final class Target_io_netty_util_internal_shaded_org_jctools_util_UnsafeRefArrayAccess {
    @Alias @RecomputeFieldValue(kind = Kind.ArrayIndexShift, declClass = Object[].class) //
    public static int REF_ELEMENT_SHIFT;
}

public class SubstrateVMSubstitutions {
}

@sundbp
Copy link
Author

sundbp commented May 23, 2018

Ah! Nifty to disable it all! thanks

@sundbp
Copy link
Author

sundbp commented May 23, 2018

@cstancu I'm struggling to get a build going with those substitutions included. If I use graal as distributed in the 1.0.0-rc1 I don't have the --report-usupported-elements-at-runtime option available and the build fails with:

Caused by: java.lang.NoClassDefFoundError: io/netty/handler/ssl/ReferenceCountedOpenSslServerContext$OpenSslSniHostnameMatcher
	at java.lang.Class.getDeclaringClass0(Native Method)
	at java.lang.Class.getDeclaringClass(Class.java:1235)
	at java.lang.Class.getEnclosingClass(Class.java:1277)
	at jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.getEnclosingType(HotSpotResolvedObjectTypeImpl.java:894)
	at jdk.vm.ci.hotspot.HotSpotResolvedObjectTypeImpl.getEnclosingType(HotSpotResolvedObjectTypeImpl.java:58)
	at com.oracle.graal.pointsto.meta.AnalysisType.getEnclosingType(AnalysisType.java:917)
	at com.oracle.svm.hosted.analysis.Inflation.checkType(Inflation.java:138)
	at java.lang.Iterable.forEach(Iterable.java:75)
	at java.util.Collections$UnmodifiableCollection.forEach(Collections.java:1080)
	at com.oracle.svm.hosted.analysis.Inflation.checkObjectGraph(Inflation.java:123)
	at com.oracle.graal.pointsto.BigBang.checkObjectGraph(BigBang.java:580)
	at com.oracle.graal.pointsto.BigBang.finish(BigBang.java:552)
	at com.oracle.svm.hosted.NativeImageGenerator.doRun(NativeImageGenerator.java:653)
	at com.oracle.svm.hosted.NativeImageGenerator.lambda$run$0(NativeImageGenerator.java:381)
	at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1386)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

If I try to use native-image from git (after installing the svm.jar from that build), and using the --report-unsupported-elements-at-runtime option I get:

error: Error encountered while parsing io.netty.util.internal.InternalThreadLocalMap.expandIndexedVariableTableAndSet(int, java.lang.Object)
Parsing context:
	parsing io.netty.util.internal.InternalThreadLocalMap.setIndexedVariable(InternalThreadLocalMap.java:297)
	parsing io.netty.util.concurrent.FastThreadLocal.registerCleaner(FastThreadLocal.java:159)
	parsing io.netty.util.concurrent.FastThreadLocal.get(FastThreadLocal.java:147)
	parsing io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:319)
	parsing io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
	parsing io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
	parsing io.netty.handler.ssl.ReferenceCountedOpenSslContext.toBIO(ReferenceCountedOpenSslContext.java:797)
	parsing io.netty.handler.ssl.ReferenceCountedOpenSslContext.setKeyMaterial(ReferenceCountedOpenSslContext.java:715)
	parsing io.netty.handler.ssl.ReferenceCountedOpenSslClientContext.newSessionContext(ReferenceCountedOpenSslClientContext.java:100)
	parsing io.netty.handler.ssl.ReferenceCountedOpenSslClientContext.<init>(ReferenceCountedOpenSslClientContext.java:63)
	parsing io.netty.handler.ssl.SslContext.newClientContextInternal(SslContext.java:781)
	parsing io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:446)
	parsing io.grpc.netty.NettyChannelBuilder$NettyTransportFactory$DefaultNettyTransportCreationParamsFilterFactory.<init>(NettyChannelBuilder.java:558)
	parsing io.grpc.netty.NettyChannelBuilder$NettyTransportFactory$DefaultNettyTransportCreationParamsFilterFactory.<init>(NettyChannelBuilder.java:551)
	parsing io.grpc.netty.NettyChannelBuilder$NettyTransportFactory.<init>(NettyChannelBuilder.java:489)
	parsing io.grpc.netty.NettyChannelBuilder.buildTransportFactory(NettyChannelBuilder.java:337)
	parsing io.grpc.internal.AbstractManagedChannelImplBuilder.build(AbstractManagedChannelImplBuilder.java:393)
	parsing grpc_test.client.main$create_client.invokeStatic(main.clj:17)
	parsing grpc_test.client.main$create_client.invoke(main.clj:15)
	parsing clojure.tools.logging.impl$fn__544$G__528__551.invoke(impl.clj:25)
	parsing grpc_test.client.main$_main$fn__693.invoke(main.clj:72)
	parsing clojure.lang.AFn.run(AFn.java:22)
	parsing com.oracle.svm.core.jdk.RuntimeSupport.executeHooks(RuntimeSupport.java:106)
	parsing com.oracle.svm.core.jdk.RuntimeSupport.executeStartupHooks(RuntimeSupport.java:83)
	parsing com.oracle.svm.core.JavaMainWrapper.run(JavaMainWrapper.java:170)
	parsing Lcom/oracle/svm/core/code/CEntryPointCallStubs;.com_002eoracle_002esvm_002ecore_002eJavaMainWrapper_002erun_0028int_002corg_002egraalvm_002enativeimage_002ec_002etype_002eCCharPointerPointer_0029(generated:0)
Original error: org.graalvm.compiler.graph.GraalGraphError: java.lang.AbstractMethodError: Method com/oracle/graal/pointsto/nodes/AnalysisArraysCopyOfNode.length()Lorg/graalvm/compiler/nodes/ValueNode; is abstract
	at node: 24|ArrayLength
	at org.graalvm.compiler.phases.common.CanonicalizerPhase$Instance.tryCanonicalize(CanonicalizerPhase.java:338)
	at org.graalvm.compiler.phases.common.CanonicalizerPhase$Instance.processNode(CanonicalizerPhase.java:260)
	at org.graalvm.compiler.phases.common.CanonicalizerPhase$Instance.processWorkSet(CanonicalizerPhase.java:239)
	at org.graalvm.compiler.phases.common.CanonicalizerPhase$Instance.run(CanonicalizerPhase.java:209)
	at org.graalvm.compiler.phases.common.CanonicalizerPhase.run(CanonicalizerPhase.java:123)
	at org.graalvm.compiler.phases.common.CanonicalizerPhase.run(CanonicalizerPhase.java:64)
	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:195)
	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:137)
	at com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.parse(MethodTypeFlowBuilder.java:205)
	at com.oracle.graal.pointsto.flow.MethodTypeFlowBuilder.apply(MethodTypeFlowBuilder.java:319)
	at com.oracle.graal.pointsto.flow.MethodTypeFlow.doParse(MethodTypeFlow.java:308)
	at com.oracle.graal.pointsto.flow.MethodTypeFlow.ensureParsed(MethodTypeFlow.java:298)
	at com.oracle.graal.pointsto.flow.MethodTypeFlow.addContext(MethodTypeFlow.java:105)
	at com.oracle.graal.pointsto.flow.SpecialInvokeTypeFlow.onObservedUpdate(InvokeTypeFlow.java:419)
	at com.oracle.graal.pointsto.flow.TypeFlow.notifyObservers(TypeFlow.java:345)
	at com.oracle.graal.pointsto.flow.TypeFlow.update(TypeFlow.java:387)
	at com.oracle.graal.pointsto.BigBang$2.run(BigBang.java:498)
	at com.oracle.graal.pointsto.util.CompletionExecutor.lambda$execute$0(CompletionExecutor.java:172)
	at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
	at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
	at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
	at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
	at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: java.lang.AbstractMethodError: Method com/oracle/graal/pointsto/nodes/AnalysisArraysCopyOfNode.length()Lorg/graalvm/compiler/nodes/ValueNode; is abstract
	at com.oracle.graal.pointsto.nodes.AnalysisArraysCopyOfNode.length(AnalysisArraysCopyOfNode.java)
	at org.graalvm.compiler.nodes.util.GraphUtil.arrayLength(GraphUtil.java:677)
	at org.graalvm.compiler.nodes.java.ArrayLengthNode.readArrayLength(ArrayLengthNode.java:122)
	at org.graalvm.compiler.nodes.java.ArrayLengthNode.canonical(ArrayLengthNode.java:86)
	at org.graalvm.compiler.nodes.java.ArrayLengthNode.canonical(ArrayLengthNode.java:51)
	at org.graalvm.compiler.graph.spi.Canonicalizable$Unary.canonical(Canonicalizable.java:97)
	at org.graalvm.compiler.phases.common.CanonicalizerPhase$Instance.tryCanonicalize(CanonicalizerPhase.java:333)
	... 22 more

Is that somehow related to mixing compiling using graal-sdk.jar from one version (1.0.0-rc1) but native image from another?

I haven't worked out how to use the git built graal for both building my jar and doing native image I'm afraid :(

I've pushed an update to the repo including these new changes (using 1.0.0-rc1): https://gitlab.com/sundbp/native-image-grpc-test

How did you build it including the substitutions?

@sundbp
Copy link
Author

sundbp commented May 23, 2018

Realized that error was because of a recent commit to svm a few days ago. Rolled back graal git to commit 1b109ea1c9ed59239c5e266ee69ba7350fd9ce37. Then I'm able to build incl the substitutions.

I've pushed updates to the reproduction repo at https://gitlab.com/sundbp/native-image-grpc-test

Like you describe I still get the same error as before in PosixJavaNIOSubstitutions.

@sundbp
Copy link
Author

sundbp commented May 23, 2018

Dumping packages we can see for the non-native image case:
image

Native image case:
image

Looks like the initial handshake takes place in both sessions, but then for in native-image case it ends up closing down straight away. Seems consistent with the stacktrace - roughly says that connection was opened, then as part of negotiating protocol (at gRPC level) it does a write+flush, which triggers a write to socket and it breaks.

@sundbp
Copy link
Author

sundbp commented May 23, 2018

Added a printout to the args passed to writev before it crashes:
System.out.println("Address is " + String.format("0x%X", address) + " with len " + len + ". fdo is " + fdo.valid() + " and fd is " + fd);
gives:
Address is 0x7FE91A602DB0 with len 2. fdo is true and fd is 27

So that doesn't obviously look "bad".

One thing that is a little suspicious to me is that fd always seems to be 27 (regardless if I open some other files, or some nc -lk etc on the same machine). I'm not sure if that is normal or not.

Starting to get to the end of my ideas of things to look at.

Happy to keep digging if anyone has any input on sensible next steps!

@sundbp
Copy link
Author

sundbp commented May 23, 2018

Another experiment - check lsof of the process before I've created gRPC client, after I've created it but before I've sent any API request over it, and after it has crashed:
Before creating gRPC client:

» lsof -p (pgrep cli-client)                                                                                    14:32:33
COMMAND     PID   USER   FD   TYPE DEVICE   SIZE/OFF       NODE NAME
cli-clien 20537 sundbp  cwd    DIR    1,4        640 8604128240 /Users/sundbp/dev/native-image-grpc-test
cli-clien 20537 sundbp  txt    REG    1,4   34982976 8607168028 /Users/sundbp/dev/native-image-grpc-test/cli-client
cli-clien 20537 sundbp  txt    REG    1,4     841440 8604675997 /usr/lib/dyld
cli-clien 20537 sundbp  txt    REG    1,4 1175293952 8604684784 /private/var/db/dyld/dyld_shared_cache_x86_64h
cli-clien 20537 sundbp    0u   CHR  16,10   0t832405       8425 /dev/ttys010
cli-clien 20537 sundbp    1u   CHR  16,10   0t832405       8425 /dev/ttys010
cli-clien 20537 sundbp    2u   CHR  16,10   0t832405       8425 /dev/ttys010

After creating client, but before API call:

» lsof -p (pgrep cli-client)                                                                                    14:35:33
COMMAND     PID   USER   FD     TYPE             DEVICE   SIZE/OFF       NODE NAME
cli-clien 20537 sundbp  cwd      DIR                1,4        640 8604128240 /Users/sundbp/dev/native-image-grpc-test
cli-clien 20537 sundbp  txt      REG                1,4   34982976 8607168028 /Users/sundbp/dev/native-image-grpc-test/cli-client
cli-clien 20537 sundbp  txt      REG                1,4     841440 8604675997 /usr/lib/dyld
cli-clien 20537 sundbp  txt      REG                1,4 1175293952 8604684784 /private/var/db/dyld/dyld_shared_cache_x86_64h
cli-clien 20537 sundbp    0u     CHR              16,10   0t832437       8425 /dev/ttys010
cli-clien 20537 sundbp    1u     CHR              16,10   0t832437       8425 /dev/ttys010
cli-clien 20537 sundbp    2u     CHR              16,10   0t832437       8425 /dev/ttys010
cli-clien 20537 sundbp    3     PIPE 0xf348d64c968cceb7      16384            ->0xf348d64c968cd6f7
cli-clien 20537 sundbp    4     PIPE 0xf348d64c968cd6f7      16384            ->0xf348d64c968cceb7
cli-clien 20537 sundbp    5u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp    6     PIPE 0xf348d64c968cb837      16384            ->0xf348d64c968cc5b7
cli-clien 20537 sundbp    7     PIPE 0xf348d64c968cc5b7      16384            ->0xf348d64c968cb837
cli-clien 20537 sundbp    8u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp    9     PIPE 0xf348d64c968ccf77      16384            ->0xf348d64caea635b7
cli-clien 20537 sundbp   10     PIPE 0xf348d64caea635b7      16384            ->0xf348d64c968ccf77
cli-clien 20537 sundbp   11u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp   12     PIPE 0xf348d64caea64877      16384            ->0xf348d64caea62a77
cli-clien 20537 sundbp   13     PIPE 0xf348d64caea62a77      16384            ->0xf348d64caea64877
cli-clien 20537 sundbp   14u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp   15     PIPE 0xf348d64caea63c77      16384            ->0xf348d64caea62ef7
cli-clien 20537 sundbp   16     PIPE 0xf348d64caea62ef7      16384            ->0xf348d64caea63c77
cli-clien 20537 sundbp   17u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp   18     PIPE 0xf348d64caea647b7      16384            ->0xf348d64caea62fb7
cli-clien 20537 sundbp   19     PIPE 0xf348d64caea62fb7      16384            ->0xf348d64caea647b7
cli-clien 20537 sundbp   20u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp   21     PIPE 0xf348d64caea64577      16384            ->0xf348d64caea64db7
cli-clien 20537 sundbp   22     PIPE 0xf348d64caea64db7      16384            ->0xf348d64caea64577
cli-clien 20537 sundbp   23u  KQUEUE                                          count=0, state=0x8
cli-clien 20537 sundbp   24     PIPE 0xf348d64c9a770037      16384            ->0xf348d64c9a76ff77
cli-clien 20537 sundbp   25     PIPE 0xf348d64c9a76ff77      16384            ->0xf348d64c9a770037
cli-clien 20537 sundbp   26u  KQUEUE                                          count=0, state=0x8

After crash (before process exits):

» lsof -p (pgrep cli-client)                                                                                    14:36:25
COMMAND     PID   USER   FD   TYPE             DEVICE   SIZE/OFF       NODE NAME
cli-clien 20537 sundbp  cwd    DIR                1,4        640 8604128240 /Users/sundbp/dev/native-image-grpc-test
cli-clien 20537 sundbp  txt    REG                1,4   34982976 8607168028 /Users/sundbp/dev/native-image-grpc-test/cli-client
cli-clien 20537 sundbp  txt    REG                1,4     841440 8604675997 /usr/lib/dyld
cli-clien 20537 sundbp  txt    REG                1,4 1175293952 8604684784 /private/var/db/dyld/dyld_shared_cache_x86_64h
cli-clien 20537 sundbp    0u   CHR              16,10   0t839452       8425 /dev/ttys010
cli-clien 20537 sundbp    1u   CHR              16,10   0t839452       8425 /dev/ttys010
cli-clien 20537 sundbp    2u   CHR              16,10   0t839452       8425 /dev/ttys010
cli-clien 20537 sundbp   28u  unix 0xf348d64c84366b57        0t0            ->(none)

Not making me that much wiser, but at least I see how 27 is the "next fd in line" (and obviously FDs are per process, not global as I temporarily was thinking).

Given the EFAULT errno (indicated by "Bad address") it seems relatively conclusive to me that it's the pointers passed via the iovec to writev that is the problem then.

If I follow the path from netty to SocketChannelImpl to IOUtil to SocketDispatcher and finally to the substitutions - then it seems the most relevant bit is here:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/ch/IOUtil.java/#148

That's where ByteBuffers are translated to IOVecWrapper where the native memory lives.

The SocketDispatcher after that is just passing it on to writev0: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/ch/SocketDispatcher.java/#51

So the final bit of code I can see is the WordFactory.pointer(address) call to get the iovec that is passed on to writev.

WordFactory.pointer seems to use WordBoxFactory.box with the address that we passed in. This must be implemented in BoxFactoryImpl in compiler/src/org.graalvm.compiler.word/src/org/graalvm/compiler/word/Word.java if I understand correctly, and this uses HostedWord.boxLong with the address, which just returns a new HostedWord(address).

It doesn't strike me that this should have any impact on pointer finally ending up in the writev call.

So I'm at the end of my road here I think - input most welcome! @cstancu @Peter-B-Kessler

@sundbp
Copy link
Author

sundbp commented May 23, 2018

A quick check that the iovec address doesn't get altered:

            System.out.println("Address is " + String.format("0x%X", address) + " with len " + len + ". fdo is " + fdo.valid() + " and fd is " + fd);
            System.out.println("the iov null check: " + iov.isNull() + " and the raw value: " + String.format("0x%X", iov.rawValue()));

gives:

Address is 0x7FEBA7F001E0 with len 2. fdo is true and fd is 27
the iov null check: false and the raw value: 0x7FEBA7F001E0

@Peter-B-Kessler
Copy link
Contributor

Where are your System.out.printlns shown just above? Are they in Target_sun_nio_ch_FileDispatcherImpl.writev0(FileDescriptor, long, int)?

Is the address/iov a valid address? Is it from a new IOVecWrapper, or reusing a cached one? I assume the Deallocator has not already gone off, or the address would be 0. If you follow the address, does it look like a (C) list of struct iovec* of the right length?

Can you follow the arguments through SocketDispatcher.writev(FileDescriptor, long, int) to Target_sun_nio_ch_FileDispatcherImpl.writev0(FileDescriptor, long, int). That is, making sure that address and iov have the same (valid) value. They should.

From there we go directly to Uio.writev(int, iovec, int) which just calls the C writev function on your platform.

What is the returned result from Uio.writev(int, iovec, int)? Is it an indication of success (>0), or an error code (<0). See PosixJavaNIOSubstitutions.convertLongReturnVal(long, boolean) for details on the interpretation of the result.

Thanks for your diagnosis of this problem.

@sundbp
Copy link
Author

sundbp commented May 23, 2018

Yes, more exactly line 1355 in substratevm/src/com.oracle.svm.core.posix/src/com/oracle/svm/core/posix/PosixJavaNIOSubstitutions.java.

I'm not quite sure how to investigate the address / iov. How would I attempt to print out the content of the memory address here?

I'm not a java developer, not using any IDE/debugger - happy for brief instruction of useful tooling you'd use here!

The relevant code for setting up the IOVecWrapper must be line 110 to 148 here: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/nio/ch/IOUtil.java/#110
Without some runtime info I can't determine if it's cached or not..

I can't seem to find the source for SocketDispatcher anywhere on the graal side, so I'm not sure how I'd modify be able to observe the values passing through there..

Uio.writev definitely returns n < 0 as we hit the case where we call throwIOExceptionWithLastError() on line 305 (in PosixJavaNioSubstitutions). That's where it indicates we got errno EFAULT -> "Bad address".

Sounds like my current tooling may need an upgrade to dig a level deeper..

@Peter-B-Kessler
Copy link
Contributor

Since you are comfortable with modifying the source and rebuilding, but not debugging, for "tooling" I would recommend adding tracing code.

There is nothing for SocketDispatcher on the graal side. We use the straight http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/556b17038b5c/src/solaris/classes/sun/nio/ch/SocketDispatcher.java, which just trampolines to the corresponding native methods FileDispatcherImpl (e.g., http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/556b17038b5c/src/solaris/native/sun/nio/ch/FileDispatcherImpl.c#l128), which we do substitute.

What do you get if you replace Target_sun_nio_ch_FileDispatcherImpl.writev0(FileDescriptor, long, int) with

        @Substitute
        private static long writev0(FileDescriptor fdo, long address, int len) throws IOException {
            final Log trace = Log.log().string("[Target_sun_nio_ch_FileDispatcherImpl.writev0:");
            int fd = fdval(fdo);
            trace.string("  fd: ").signed(fd);
            iovec iov = WordFactory.pointer(address);
            trace.string("  ion: ").hex(iov);
            final SignedWord writevResult = writev(fd, iov, len);
            trace.string("  writevResult: ").signed(writevResult);
            trace.string("  errno: ").signed(Errno.errno());
            final long result = convertLongReturnVal(writevResult, false);
            trace.string("  result: ").signed(result);
            trace.string("]").newline();
            return result;
        }

(with the usual caveats about code written in a post, not in an IDE. :-) Log is our low-level printf-debugging mechanism, from when we did not have System.out.println working.

If writevResult is < 0, then I suspect you will never get past the call to convertLongReturnVal because you will get an IOException thrown at you. (You could catch the exception, log it, and rethrow it. But that would not tell you anything new.)

Rummaging through the fields of iov as if it were a C struct iovec* would be more work. We do have Uio.iovec, which would let you get at the iov_base and iov_len fields, but nothing that would help you find out which of the addresses is bad (the struct iovec* in the C array, or the base addresses in the struct iovecs themselves). As you say, the assembling of the iov happens in IOUtil.write, with the storage allocated (and freed!) in IOVecWrapper.

@sundbp
Copy link
Author

sundbp commented Jun 2, 2018

@Peter-B-Kessler I ended up first sick and then busy - sorry about delay.

I ran through that trace:

[Target_sun_nio_ch_FileDispatcherImpl.writev0:  fd: 27  iov: 0x7fc3bec00c20  writevResult: -1  errno: 14Exception in thread "main" java.lang.reflect.InvocationTargetException
...

The call to convertLongReturnVal is where the exception is thrown on unclean returns so it's natural we don't see the result: part of the trace.

As I expected it shows nothing I didn't expect from other traces etc I've done.

When I say I'm not using a debugger that doesn't mean I wouldn't be happy to give it whirl. It's just not part of my standard day to day setup when I deal with clojure. If you have suggestions for what to use and rough guidelines for inspecting the iovec struct I'd be happy to give it a go.

It does strike me as that's what we'll need to get to the next step of why we get bad address. To then be able to work out why/when/how it ends up "bad" (or if it starts "bad").

@sundbp
Copy link
Author

sundbp commented Jun 8, 2018

@Peter-B-Kessler happy to dig more - incl debuggers etc. If you have the time to briefly describe what you'd do next I'm happy to try to run with it.

@FroMage
Copy link

FroMage commented Aug 2, 2018

I seem to have run into the exact same issue with the latest master build of substratevm:

Aug 02, 2018 5:49:17 PM io.vertx.core.net.impl.ConnectionBase
SEVERE: Unhandled exception
java.io.IOException: Bad address
	at java.lang.Throwable.<init>(Throwable.java:265)
	at java.lang.Exception.<init>(Exception.java:66)
	at java.io.IOException.<init>(IOException.java:58)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions.throwIOExceptionWithLastError(PosixJavaNIOSubstitutions.java:236)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions.convertLongReturnVal(PosixJavaNIOSubstitutions.java:304)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions.convertLongReturnVal(PosixJavaNIOSubstitutions.java:285)
	at com.oracle.svm.core.posix.PosixJavaNIOSubstitutions$Target_sun_nio_ch_FileDispatcherImpl.writev0(PosixJavaNIOSubstitutions.java:1401)
	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
	at sun.nio.ch.IOUtil.write(IOUtil.java:148)
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:504)
	at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:418)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
	at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
	at io.vertx.core.net.impl.ConnectionBase.endReadAndFlush(ConnectionBase.java:83)
	at io.vertx.core.net.impl.VertxHandler$$Lambda$968/35805262.run(Unknown Source)
	at io.vertx.core.impl.ContextImpl.lambda$wrapTask$2(ContextImpl.java:337)
	at io.vertx.core.impl.ContextImpl$$Lambda$665/1290664511.run(Unknown Source)
	at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:195)
	at io.vertx.core.net.impl.VertxHandler.channelReadComplete(VertxHandler.java:136)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:398)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:380)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:373)
	at io.netty.handler.codec.ByteToMessageDecoder.channelReadComplete(ByteToMessageDecoder.java:324)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:398)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:380)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelReadComplete(AbstractChannelHandlerContext.java:373)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelReadComplete(DefaultChannelPipeline.java:1364)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:398)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelReadComplete(AbstractChannelHandlerContext.java:380)
	at io.netty.channel.DefaultChannelPipeline.fireChannelReadComplete(DefaultChannelPipeline.java:941)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:146)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)
	at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:238)

@Peter-B-Kessler
Copy link
Contributor

Does this failure happen on every call to sun.nio.ch.SocketChannelImpl.write (or methods called from there) or just some of them? That is: am I looking for an intermittent failure or a solid failure? I guess if it were a solid failure someone would have presented a reproducing test case.

Does it only happen when sun.nio.ch.IOUtil.write is called when ByteBuffer[] srcs has more than one element? It looks like maybe there is not a lot of visibility in the netty levels.

I can try to write some iovec verification code. I can only add tracing code to our substitutions. That will not fix the problem, but it might give us some idea about what is wrong.

Does anyone who hits this run with -XX:+PrintGC -XX:+PrintGCTimeStamps to see if this failure follows closely (?) after a garbage collection? This is just general paranoia on my part what with raw pointers flying around. They should all be to DirectBuffer memory (that is, not moved by an inopportune collection), and they should not be freed by Cleaners, but if we could rule that out it would be good.

@FroMage
Copy link

FroMage commented Aug 3, 2018

In my case it's 100% reproductible: I start an http server, and every time I connect to it I get this error. I can try to make the test smaller, but otherwise I can also give you the jar if it helps?

@sundbp
Copy link
Author

sundbp commented Aug 3, 2018 via email

@sundbp
Copy link
Author

sundbp commented Aug 3, 2018 via email

@FroMage
Copy link

FroMage commented Aug 3, 2018

@sundbp
Copy link
Author

sundbp commented Aug 3, 2018 via email

@Peter-B-Kessler
Copy link
Contributor

When I build an image of native-vert-chunked-fail I see

 $ mx native-image -g --report-unsupported-elements-at-runtime -Dio.netty.noUnsafe=true -H:EnableURLProtocols=http -Djava.net.preferIPv4Stack=true -H:ReflectionConfigurationFiles=../../native-vertx-chunked-fail-master/reflect.json -jar ../../native-vertx-chunked-fail-master/target/native-vertx-chunked-fail-0.0.1-SNAPSHOT.jar
   classlist:   5,876.85 ms
       (cap):   1,838.74 ms
       setup:   4,043.60 ms
Warning: RecomputeFieldValue.ArrayIndexScale automatic substitution failed. The automatic substitution registration was attempted because a call to sun.misc.Unsafe.arrayIndexScale(Class) was detected in the static initializer of io.netty.util.internal.shaded.org.jctools.util.UnsafeRefArrayAccess. Detailed failure reason(s): Could not determine the field where the value produced by the call to sun.misc.Unsafe.arrayIndexScale(Class) for the array index scale computation is stored. The call is not directly followed by a field store or by a sign extend node followed directly by a field store. 
  (typeflow):  28,721.37 ms
   (objects):  18,540.57 ms
  (features):     209.08 ms
    analysis:  48,554.12 ms
    universe:   1,475.05 ms
     (parse):   9,714.22 ms
    (inline):   6,294.66 ms
   (compile):  56,843.34 ms
     compile:  75,483.35 ms
       image:   6,615.42 ms
   debuginfo:  12,386.15 ms
       write:   8,850.38 ms
     [total]: 163,432.16 ms

and, needless to say, that warning about missing an RecomputeFieldValue.ArrayIndexScale is worrying since I now know that unsafe access of arrays is happening in this code.

I am going to hand this over to our RecomputeFieldValue guy, @cstancu, to see if he has any wise words to say.

@Peter-B-Kessler
Copy link
Contributor

A nit on the otherwise exemplary README.txt file: The lines

- ./native-vertx-chunked-fail-0.0.1-SNAPSHOT

Load http://localhost:9000/

made me (just a VM engineer) think that "Load http://localhost:9000/" was output from the application, so I stared at it not doing anything until I thought to take a browser and point it at that URL.

But I can reliably reproduce the problem. I get (at least) two of them per refresh in the browser.

@Peter-B-Kessler
Copy link
Contributor

I added some tracing to Target_sun_nio_ch_FileDispatcherImpl.writev0(FileDescriptor, long, int). That is the first place where the call chain enters the VM. Here is what I see:

[Target_sun_nio_ch_FileDispatcherImpl.writev0:  fd: 34  iov: 0x7fffe0000970  len: 5  sizeof(struct iovec): 16
  
  First-level of the iovec[]:
    &iovec[ 0 ]: 0x7fffe0000970  .iov_base: 0x7ffff4601010  .iov_len: 47
    &iovec[ 1 ]: 0x7fffe0000980  .iov_base: 0x7ffff4603010  .iov_len: 4
    &iovec[ 2 ]: 0x7fffe0000990  .iov_base: 0x7ffff4603020  .iov_len: 16
    &iovec[ 3 ]: 0x7fffe00009a0  .iov_base: 0x7fbf6802a400  .iov_len: 2
    &iovec[ 4 ]: 0x7fffe00009b0  .iov_base: 0x7fbf6802a420  .iov_len: 5
  
  Second-level of the iovec[]:
    &iovec[ 0 ]: 0x7fffe0000970  .iov_base: 0x7ffff4601010  .iov_len: 47
      &.iov_base[ 0 ]: 0x7ffff4601010  .iov_base[ 0 ]: 0x48 = H
      &.iov_base[ 1 ]: 0x7ffff4601011  .iov_base[ 1 ]: 0x54 = T
      &.iov_base[ 2 ]: 0x7ffff4601012  .iov_base[ 2 ]: 0x54 = T
      &.iov_base[ 3 ]: 0x7ffff4601013  .iov_base[ 3 ]: 0x50 = P
      &.iov_base[ 4 ]: 0x7ffff4601014  .iov_base[ 4 ]: 0x2f = /
      &.iov_base[ 5 ]: 0x7ffff4601015  .iov_base[ 5 ]: 0x31 = 1
      &.iov_base[ 6 ]: 0x7ffff4601016  .iov_base[ 6 ]: 0x2e = .
      &.iov_base[ 7 ]: 0x7ffff4601017  .iov_base[ 7 ]: 0x31 = 1
      &.iov_base[ 8 ]: 0x7ffff4601018  .iov_base[ 8 ]: 0x20 =  
      &.iov_base[ 9 ]: 0x7ffff4601019  .iov_base[ 9 ]: 0x32 = 2
      &.iov_base[ 10 ]: 0x7ffff460101a  .iov_base[ 10 ]: 0x30 = 0
      &.iov_base[ 11 ]: 0x7ffff460101b  .iov_base[ 11 ]: 0x30 = 0
      &.iov_base[ 12 ]: 0x7ffff460101c  .iov_base[ 12 ]: 0x20 =  
      &.iov_base[ 13 ]: 0x7ffff460101d  .iov_base[ 13 ]: 0x4f = O
      &.iov_base[ 14 ]: 0x7ffff460101e  .iov_base[ 14 ]: 0x4b = K
      &.iov_base[ 15 ]: 0x7ffff460101f  .iov_base[ 15 ]: 0xd = 
      &.iov_base[ 16 ]: 0x7ffff4601020  .iov_base[ 16 ]: 0xa = 

      &.iov_base[ 17 ]: 0x7ffff4601021  .iov_base[ 17 ]: 0x54 = T
      &.iov_base[ 18 ]: 0x7ffff4601022  .iov_base[ 18 ]: 0x72 = r
      &.iov_base[ 19 ]: 0x7ffff4601023  .iov_base[ 19 ]: 0x61 = a
      &.iov_base[ 20 ]: 0x7ffff4601024  .iov_base[ 20 ]: 0x6e = n
      &.iov_base[ 21 ]: 0x7ffff4601025  .iov_base[ 21 ]: 0x73 = s
      &.iov_base[ 22 ]: 0x7ffff4601026  .iov_base[ 22 ]: 0x66 = f
      &.iov_base[ 23 ]: 0x7ffff4601027  .iov_base[ 23 ]: 0x65 = e
      &.iov_base[ 24 ]: 0x7ffff4601028  .iov_base[ 24 ]: 0x72 = r
      &.iov_base[ 25 ]: 0x7ffff4601029  .iov_base[ 25 ]: 0x2d = -
      &.iov_base[ 26 ]: 0x7ffff460102a  .iov_base[ 26 ]: 0x45 = E
      &.iov_base[ 27 ]: 0x7ffff460102b  .iov_base[ 27 ]: 0x6e = n
      &.iov_base[ 28 ]: 0x7ffff460102c  .iov_base[ 28 ]: 0x63 = c
      &.iov_base[ 29 ]: 0x7ffff460102d  .iov_base[ 29 ]: 0x6f = o
      &.iov_base[ 30 ]: 0x7ffff460102e  .iov_base[ 30 ]: 0x64 = d
      &.iov_base[ 31 ]: 0x7ffff460102f  .iov_base[ 31 ]: 0x69 = i
      &.iov_base[ 32 ]: 0x7ffff4601030  .iov_base[ 32 ]: 0x6e = n
      &.iov_base[ 33 ]: 0x7ffff4601031  .iov_base[ 33 ]: 0x67 = g
      &.iov_base[ 34 ]: 0x7ffff4601032  .iov_base[ 34 ]: 0x3a = :
      &.iov_base[ 35 ]: 0x7ffff4601033  .iov_base[ 35 ]: 0x20 =  
      &.iov_base[ 36 ]: 0x7ffff4601034  .iov_base[ 36 ]: 0x63 = c
      &.iov_base[ 37 ]: 0x7ffff4601035  .iov_base[ 37 ]: 0x68 = h
      &.iov_base[ 38 ]: 0x7ffff4601036  .iov_base[ 38 ]: 0x75 = u
      &.iov_base[ 39 ]: 0x7ffff4601037  .iov_base[ 39 ]: 0x6e = n
      &.iov_base[ 40 ]: 0x7ffff4601038  .iov_base[ 40 ]: 0x6b = k
      &.iov_base[ 41 ]: 0x7ffff4601039  .iov_base[ 41 ]: 0x65 = e
      &.iov_base[ 42 ]: 0x7ffff460103a  .iov_base[ 42 ]: 0x64 = d
      &.iov_base[ 43 ]: 0x7ffff460103b  .iov_base[ 43 ]: 0xd = 
      &.iov_base[ 44 ]: 0x7ffff460103c  .iov_base[ 44 ]: 0xa = 

      &.iov_base[ 45 ]: 0x7ffff460103d  .iov_base[ 45 ]: 0xd = 
      &.iov_base[ 46 ]: 0x7ffff460103e  .iov_base[ 46 ]: 0xa = 

    
    &iovec[ 1 ]: 0x7fffe0000980  .iov_base: 0x7ffff4603010  .iov_len: 4
      &.iov_base[ 0 ]: 0x7ffff4603010  .iov_base[ 0 ]: 0x31 = 1
      &.iov_base[ 1 ]: 0x7ffff4603011  .iov_base[ 1 ]: 0x30 = 0
      &.iov_base[ 2 ]: 0x7ffff4603012  .iov_base[ 2 ]: 0xd = 
      &.iov_base[ 3 ]: 0x7ffff4603013  .iov_base[ 3 ]: 0xa = 

    
    &iovec[ 2 ]: 0x7fffe0000990  .iov_base: 0x7ffff4603020  .iov_len: 16
      &.iov_base[ 0 ]: 0x7ffff4603020  .iov_base[ 0 ]: 0x48 = H
      &.iov_base[ 1 ]: 0x7ffff4603021  .iov_base[ 1 ]: 0x69 = i
      &.iov_base[ 2 ]: 0x7ffff4603022  .iov_base[ 2 ]: 0x20 =  
      &.iov_base[ 3 ]: 0x7ffff4603023  .iov_base[ 3 ]: 0x66 = f
      &.iov_base[ 4 ]: 0x7ffff4603024  .iov_base[ 4 ]: 0x72 = r
      &.iov_base[ 5 ]: 0x7ffff4603025  .iov_base[ 5 ]: 0x6f = o
      &.iov_base[ 6 ]: 0x7ffff4603026  .iov_base[ 6 ]: 0x6d = m
      &.iov_base[ 7 ]: 0x7ffff4603027  .iov_base[ 7 ]: 0x20 =  
      &.iov_base[ 8 ]: 0x7ffff4603028  .iov_base[ 8 ]: 0x62 = b
      &.iov_base[ 9 ]: 0x7ffff4603029  .iov_base[ 9 ]: 0x75 = u
      &.iov_base[ 10 ]: 0x7ffff460302a  .iov_base[ 10 ]: 0x66 = f
      &.iov_base[ 11 ]: 0x7ffff460302b  .iov_base[ 11 ]: 0x66 = f
      &.iov_base[ 12 ]: 0x7ffff460302c  .iov_base[ 12 ]: 0x65 = e
      &.iov_base[ 13 ]: 0x7ffff460302d  .iov_base[ 13 ]: 0x72 = r
      &.iov_base[ 14 ]: 0x7ffff460302e  .iov_base[ 14 ]: 0x20 =  
      &.iov_base[ 15 ]: 0x7ffff460302f  .iov_base[ 15 ]: 0x32 = 2
    
    &iovec[ 3 ]: 0x7fffe00009a0  .iov_base: 0x7fbf6802a400  .iov_len: 2
      &.iov_base[ 0 ]: 0x7fbf6802a400
Thread 5 "ntloop-thread-1" received signal SIGSEGV, Segmentation fault.

That is, I have been handed an iovec[5]. The top-level structure seems okay: there are 5 elements, they are correctly laid out in memory (16 bytes each), and they have iov_base and iov_len fields that seem plausible. (But I know nothing about what iovec should look like: the iov_base fields look like addresses and the iov_len fields look like lengths.)

Looking at contents of the iov_base fields fails. The first 3 seem fine. (But again, I have no idea what they should contain: the first looks like some kind of response header ending in an empty line; the second might be length (as a hex string!?) of the next iovec; and the third is the contents of the response.) Then things go pear-shaped. iovec[3].iov_base points to unmapped memory. Fully justifying writev returning an error code of "Bad address".

Just for fun (?) I used gdb to look at the iovec.iov_base fields:

(gdb) x/47c 0x7ffff4601010
0x7ffff4601010:	'H'	'T'	'T'	'P'	'/'	'1'	'.'	'1'
0x7ffff4601018:	' '	'2'	'0'	'0'	' '	'O'	'K'	'\r'
0x7ffff4601020:	'\n'	'T'	'r'	'a'	'n'	's'	'f'	'e'
0x7ffff4601028:	'r'	'-'	'E'	'n'	'c'	'o'	'd'	'i'
0x7ffff4601030:	'n'	'g'	':'	' '	'c'	'h'	'u'	'n'
0x7ffff4601038:	'k'	'e'	'd'	'\r'	'\n'	'\r'	'\n'
(gdb) x/4c 0x7ffff4603010
0x7ffff4603010:	'1'	'0'	'\r'	'\n'
(gdb) x/16c 0x7ffff4603020
0x7ffff4603020:	'H'	'i'	' '	'f'	'r'	'o'	'm'	' '
0x7ffff4603028:	'b'	'u'	'f'	'f'	'e'	'r'	' '	'2'
(gdb) x/2c 0x7fbf6802a400
0x7fbf6802a400:	Cannot access memory at address 0x7fbf6802a400
(gdb) x/5c 0x7fbf6802a420
0x7fbf6802a420:	Cannot access memory at address 0x7fbf6802a420

Not much news there. Not only is iovec[3].iov_base a bad address, but so is iovec[4].iov_base.

I am hoping that the problem is in the RecomputeFieldValue.ArrayIndexScale warning, and that @cstancu can figure it out.

@Peter-B-Kessler
Copy link
Contributor

I added some tracing to Target_sun_misc_Unsafe.allocateMemory that shows me

[Target_sun_misc_Unsafe.allocateMemory:  bytes: 16777216  result: 0x7ffff45ff010]
[Target_sun_misc_Unsafe.allocateMemory:  bytes: 80  result: 0x7fffe0000970]
[Target_sun_nio_ch_FileDispatcherImpl.writev0:  fd: 34  iov: 0x7fffe0000970  len: 5  sizeof(struct iovec): 16
  
  First-level of the iovec[]:
    &iovec[ 0 ]: 0x7fffe0000970  .iov_base: 0x7ffff4601010  .iov_len: 47
    &iovec[ 1 ]: 0x7fffe0000980  .iov_base: 0x7ffff4603010  .iov_len: 4
    &iovec[ 2 ]: 0x7fffe0000990  .iov_base: 0x7ffff4603020  .iov_len: 16
    &iovec[ 3 ]: 0x7fffe00009a0  .iov_base: 0x7f57bc2c47f0  .iov_len: 2
    &iovec[ 4 ]: 0x7fffe00009b0  .iov_base: 0x7f57bc2c4810  .iov_len: 5

The iovec[] fits exactly in the 80 bytes at 0x7fffe0000970. The first 3 iov_bases are in the middle of the 16MB (!) at 0x7ffff45ff010. 0x7f57bc2c47f0 and 0x7f57bc2c4810 point to unallocated memory.

I still do not know whose fault that is. Should there be 5 iovecs in the response, or is 3 the right answer?

@Peter-B-Kessler
Copy link
Contributor

I was hoping that #400 (comment) meant I could set setChunked(false) and watch what happens on a successful run. But if I do that I get

SEVERE: Unhandled exception
java.lang.IllegalStateException: You must set the Content-Length header to be the total size of the message body BEFORE sending any data if you are not using HTTP chunked encoding.

I am obviously out of my depth.

@FroMage
Copy link

FroMage commented Aug 6, 2018

Sorry about the lack of working version, I just pushed it:

    	vertx.createHttpServer().requestHandler(req -> {
    		// this doesn't work
    		req.response().setChunked(true);
    		// this alternative works
//    		req.response().putHeader("Content-Length", "16");
    		req.response().write(Buffer.buffer("Hi from buffer 2"));
    		req.response().end();
    	}).listen(9000);

Basically if you're not chunking, you have to set the http content length instead.

@FroMage
Copy link

FroMage commented Aug 6, 2018

I was able to get rid of the unsafe warning by following https://medium.com/graalvm/instant-netty-startup-using-graalvm-native-image-generation-ed6f14ff7692 and adding this substitution:

@TargetClass(className = "io.netty.util.internal.shaded.org.jctools.util.UnsafeRefArrayAccess")
final class Target_io_netty_util_internal_shaded_org_jctools_util_UnsafeRefArrayAccess {
     @Alias @RecomputeFieldValue(kind = Kind.ArrayIndexShift, declClass = Object[].class)
     public static int REF_ELEMENT_SHIFT;
}

However, that does not appear related to this particular bad address issue, because it does not fix it.

@Peter-B-Kessler
Copy link
Contributor

The underlying problem is that the netty code constructs DirectByteBuffer instances in static initializers. We run the static initializers during image building, so we currently build an image with DirectByteBuffer instances whose address() method returns the address of memory in the image builder, which is long gone by the time we are running the image. When those DirectByteBuffer instances are used, if we are lucky we get a segfault. (If we are unlucky we write on memory. Or read from memory: that would be an issue, too.)

I will add a detector so the problem source code will be identified and we will not construct images with DirectByteBuffer instances in their image heaps. That will not fix the underlying problems in netty.

@FroMage
Copy link

FroMage commented Aug 7, 2018

Thanks. Do you have any pointer to the netty code that does this?

@FroMage
Copy link

FroMage commented Aug 7, 2018

I only saw places where netty would do that only if -Dio.netty.noUnsafe=true was missing, which is not the case here.

@FroMage
Copy link

FroMage commented Aug 7, 2018

Also, I assume you guys must have a pretty damn good reason to run static init code during compilation and not run time, right? Why do you do that? Because it sounds very dangerous to me since those static init blocks are expected to be called at run-time in the Java semantics. There must be a ton of code that does IO in static blocks and they must surely break if that is not done at run-time. I'm sure you must get that question a lot, sorry if it's documented somewhere and I didn't see it, but perhaps you can point me to it?

@Peter-B-Kessler
Copy link
Contributor

That we run static initializers during image building is documented as a limitation.

There are lots of reasons to want to run static initializers during image building (initialization order, faster startup of the generated image, etc.). We are working on a mechanism to allow static initializers to be delayed until runtime, but it is not ready quite yet.

@FroMage
Copy link

FroMage commented Aug 8, 2018

Thanks for the explanation. I haven't run any kind of study on how most static init is written in Maven Central, but 20 years of writing Java code tells me that I expect a significant percentage (>20%) of init code that can only work at runtime, so I strongly suspect that the new mechanism to run static init at run-time will be extremely useful in getting most apps running on graal.

@FroMage
Copy link

FroMage commented Aug 8, 2018

OK, I found it: it was in io.netty.handler.codec.http.HttpObjectEncoder that we had static direct buffers for the last chunks. I was able to fix them with this subst:

import static io.netty.buffer.Unpooled.buffer;
import static io.netty.buffer.Unpooled.directBuffer;
import static io.netty.buffer.Unpooled.unreleasableBuffer;
import static io.netty.handler.codec.http.HttpConstants.CR;
import static io.netty.handler.codec.http.HttpConstants.LF;

import com.oracle.svm.core.annotate.Alias;
import com.oracle.svm.core.annotate.RecomputeFieldValue;
import com.oracle.svm.core.annotate.RecomputeFieldValue.Kind;

import io.netty.buffer.Unpooled;
import io.netty.buffer.ByteBuf;
import jdk.vm.ci.meta.MetaAccessProvider;
import jdk.vm.ci.meta.ResolvedJavaField;

@TargetClass(className = "io.netty.handler.codec.http.HttpObjectEncoder")
final class Target_io_netty_handler_codec_http_HttpObjectEncoder {
     @Alias @RecomputeFieldValue(kind = Kind.Custom, declClass = Recomputer1.class)
     private static ByteBuf CRLF_BUF;
     @Alias @RecomputeFieldValue(kind = Kind.Custom, declClass = Recomputer2.class)
     private static ByteBuf ZERO_CRLF_CRLF_BUF;
}

final class Recomputer1 implements RecomputeFieldValue.CustomFieldValueComputer {

	@Override
	public Object compute(MetaAccessProvider metaAccess, ResolvedJavaField original, ResolvedJavaField annotated,
			Object receiver) {
		return Unpooled.unreleasableBuffer(buffer(2).writeByte(CR).writeByte(LF));
	}
	
}

final class Recomputer2 implements RecomputeFieldValue.CustomFieldValueComputer {
    private static final byte[] ZERO_CRLF_CRLF = { '0', CR, LF, CR, LF };

	@Override
	public Object compute(MetaAccessProvider metaAccess, ResolvedJavaField original, ResolvedJavaField annotated,
			Object receiver) {
		return Unpooled.unreleasableBuffer(buffer(ZERO_CRLF_CRLF.length)
		        .writeBytes(ZERO_CRLF_CRLF));
	}
	
}

@Peter-B-Kessler
Copy link
Contributor

I think I see

(A) io.netty.handler.codec.http.HttpObjectEncoder.CRLF_BUF = unreleasableBuffer(directBuffer(2).writeByte(CR).writeByte(LF));
       public static ByteBuf io.netty.buffer.Unpooled.directBuffer(int initialCapacity) {
           return ALLOC.directBuffer(initialCapacity);
       }
           ByteBuf io.netty.buffer.ByteBufAllocator.directBuffer(int initialCapacity);
               public ByteBuf io.netty.buffer.AbstractByteBufAllocator.directBuffer(int initialCapacity) {
                   return directBuffer(initialCapacity, DEFAULT_MAX_CAPACITY);
               }
                   public ByteBuf io.netty.buffer.AbstractByteBufAllocator.directBuffer(int initialCapacity, int maxCapacity) {
                       ....
                       return newDirectBuffer(initialCapacity, maxCapacity);
                   }
(B)                    protected abstract ByteBuf io.netty.buffer.AbstractByteBufAllocator.newDirectBuffer(int initialCapacity, int maxCapacity);

(C1)                   protected ByteBuf io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(int initialCapacity, int maxCapacity) {
                           ....
                           buf = PlatformDependent.hasUnsafe() ?
                               UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity) :
                               new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
                           }
(D)                        public io.netty.buffer.UnpooledDirectByteBuf.UnpooledDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
                               ....
(E)                            setByteBuffer(ByteBuffer.allocateDirect(initialCapacity));
                           }

(C2)                   protected ByteBuf io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(int initialCapacity, int maxCapacity) {
                           ....
                           buf = new InstrumentedUnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
                       }
                          InstrumentedUnpooledDirectByteBuf(UnpooledByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
                              super(alloc, initialCapacity, maxCapacity);
                          }
(D)                           public io.netty.buffer.UnpooledDirectByteBuf.UnpooledDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity) {
                                  ....
(E)                               setByteBuffer(ByteBuffer.allocateDirect(initialCapacity));
                              }

Starting from the static initialization during image building of (A) io.netty.handler.codec.http.HttpObjectEncoder.CRLF_BUF, we call down to the abstract method (B) io.netty.buffer.AbstractByteBufAllocator.newDirectBuffer(int initialCapacity, int maxCapacity), which could be either (C1) io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(int initialCapacity, int maxCapacity) or (C2) io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(int initialCapacity, int maxCapacity), but both of those end up calling down to (D) io.netty.buffer.UnpooledDirectByteBuf.UnpooledDirectByteBuf(ByteBufAllocator alloc, int initialCapacity, int maxCapacity), which calls (E) ByteBuffer.allocateDirect(initialCapacity)), which allocates a java.nio.DirectByteBuffer.

Adding a little more tracing to App.main() to print out CRLF_BUF.toString() and CRLF_BUF.isDirect(), I see

  io.netty.handler.codec.http.HttpObjectEncoder.CRLF_BUF: UnreleasableByteBuf(UnpooledByteBufAllocator$InstrumentedUnpooledDirectByteBuf(ridx: 0, widx: 2, cap: 2))  isDirect: true

I think that CRLF_BUF was a DirectByteBuffer that was created during image building (e.g., with a long address field pointing to malloc'ed memory in the image builder). If we are lucky, that address will segfault at runtime.

I do not think we can allow DirectByteBuffer instances created during image building to persist into the heap at runtime. I do not want to do this analysis for each instance. I am not, yet, sure what to do about them.

@Peter-B-Kessler
Copy link
Contributor

Adding some experimental tracing to the image builder to find DirectByteBuffer instances for the sample program shows

Error: Detected a direct ByteBuffer in the image heap. This is not supported. The object was reached from a static initializer. All static class initialization is done during native image construction, thus a static initializer cannot contain code that captures state dependent on the build machine. Write your own initialization methods and call them explicitly from your main entry point.
Trace:
    object io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledDirectByteBuf
    object io.netty.buffer.UnreleasableByteBuf
    at io.netty.handler.codec.http.HttpObjectEncoder.encodeChunkedContent(HttpObjectEncoder.java:195)
....
Trace:
    object io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledDirectByteBuf
    object io.netty.buffer.UnreleasableByteBuf
    at io.netty.handler.codec.http.HttpObjectEncoder.encodeChunkedContent(HttpObjectEncoder.java:195)
....
Trace:
    at parsing io.netty.buffer.CompositeByteBuf.Internalniobuffer(CompositeByteBuf.java:1445)
....
Trace:
    at parsing io.netty.buffer.CompositeByteBuf.nioBuffer(CompositeByteBuf.java:1459)
....
Trace:
    at parsing io.netty.buffer.CompositeByteBuf.nioBuffers(CompositeByteBuf.java:1482)
....
Trace:
    at parsing io.netty.buffer.EmptyByteBuf.internalNioBuffer(EmptyByteBuf.java:934)
....
Trace:
    at parsing io.netty.buffer.EmptyByteBuf.nioBuffer(EmptyByteBuf.java:912)
....
Trace:
    at parsing io.netty.buffer.EmptyByteBuf.nioBuffers(EmptyByteBuf.java:923)

The "object" trace is the class hierarchy. I don't know if that is useful. The "parsing" trace shows the method whose bytecodes we were examining when the DirectByteBuffer was encountered. That is not a lot of help, but suggestions for improvements are welcome (since you are the users of these messages). By the time we are building the initial heap for the image we do not know how the object ended up in the heap of the image builder, just that it is there because we ran a static initializer.

@FroMage
Copy link

FroMage commented Aug 9, 2018

I think it would help a lot more if you could instrument the constructor of DirectByteBuffer to yell on static block initialisation during native-image. Or better yet, support static init at runtime :)

@sundbp
Copy link
Author

sundbp commented Aug 9, 2018

@FroMage Your workaround that imports from jdk.vm.ci.meta - what JDK version and options did you use to compile that? I'm keen to see if I can get my gRPC case working with this netty workaround!

I seem to get errors like this whatever I do using java9 or java10:

Compiling 7 Java source files...
ERROR: /Users/sundbp/dev/dotfiles/boot/cache/tmp/Users/sundbp/dev/native-image-grpc-test/jd9/idqqty/grpctest/api/GrpcTestGrpc.java, line 20: cannot find symbol
  symbol:   class Generated
  location: package javax.annotation
ERROR: /Users/sundbp/dev/dotfiles/boot/cache/tmp/Users/sundbp/dev/native-image-grpc-test/jd9/idqqty/SVMSubstitutions.java, line 22: package jdk.vm.ci.meta is not visible
  (package jdk.vm.ci.meta is declared in module jdk.internal.vm.ci, which is not in the module graph)
ERROR: /Users/sundbp/dev/dotfiles/boot/cache/tmp/Users/sundbp/dev/native-image-grpc-test/jd9/idqqty/SVMSubstitutions.java, line 23: package jdk.vm.ci.meta is not visible
  (package jdk.vm.ci.meta is declared in module jdk.internal.vm.ci, which is not in the module graph)

I've tried added JVM flag --add-modules jdk.internal.vm.ci, but no cigar.

@FroMage
Copy link

FroMage commented Aug 9, 2018

I think you have to run a JVM that is the jdk8 from Graal (with compiler extensions) or jdk11.

@sundbp
Copy link
Author

sundbp commented Aug 9, 2018 via email

@FroMage
Copy link

FroMage commented Aug 9, 2018

I am indeed using jdk11, but I got that code from the graal API that runs on jdk8. Probably you need a system jar added to your build in java8?

@Peter-B-Kessler
Copy link
Contributor

@FroMage Re #400 (comment)
We can not complain about all calls to the DirectByteBuffer constructor during native image building, because there might be legitimate uses from the native image builder itself. We do not explicitly run the static initializers: they are run because the native image builder loads those classes. The initial image heap is built from objects that are reachable from the code that goes into the native image, which is a small fraction of the objects in the image builder heap.

(Re)Initialization at runtime seems like the better answer. That will require some work on the part of the application programmer. Either, as you have found with RecomputeFieldValue for source you can not change, or with explicit runtime initialization in source you can change.

@cstancu
Copy link
Member

cstancu commented Aug 9, 2018

@sundbp Re #400 (comment)
jdk.vm.ci.meta.MetaAccessProvider and jdk.vm.ci.meta.ResolvedJavaField are imported from jvmci-api.jar which, in GraalVM 1.0.0 rc5, is under $GRAALVM_HOME/jre/lib/jvmci/jvmci-api.jar.

@sundbp
Copy link
Author

sundbp commented Aug 9, 2018 via email

@sundbp
Copy link
Author

sundbp commented Sep 10, 2018

After RC6 was released with delayed static initialization I gave my repo another go. When doing so I realized I had forgotten about needing to generate the .class files for my substitutions on my last try (too used to clojure compile on demand). I fixed up my build, as well as moved to RC6 - and I got a working test case!

The updated repo is at https://gitlab.com/sundbp/native-image-grpc-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants