Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Apache Arrow for improved Snowflake integration #9475

Closed
hubertp opened this issue Mar 19, 2024 · 10 comments · Fixed by #9664
Closed

Enable Apache Arrow for improved Snowflake integration #9475

hubertp opened this issue Mar 19, 2024 · 10 comments · Fixed by #9664
Assignees
Labels
-compiler -libs Libraries: New libraries to be implemented

Comments

@hubertp
Copy link
Contributor

hubertp commented Mar 19, 2024

Lack of Arrow is apparently problematic for Snowflake. Enabling it also means we need to add

--add-opens=java.base/java.nio=ALL-UNNAMED

as per https://arrow.apache.org/docs/java/install.html

@hubertp hubertp added -compiler -libs Libraries: New libraries to be implemented labels Mar 19, 2024
@hubertp hubertp self-assigned this Mar 19, 2024
@hubertp
Copy link
Contributor Author

hubertp commented Mar 21, 2024

Tried to reproduce by enabling arrow in the connection.

So far couldn't reproduce.

@hubertp
Copy link
Contributor Author

hubertp commented Mar 26, 2024

java.lang.RuntimeException: Failed to initialize MemoryUtil. Was Java started with `--add-opens=java.base/java.nio=ALL-UNNAMED`? (See https://arrow.apache.org/docs/java/install.html)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.util.MemoryUtil.<clinit>(MemoryUtil.java:146)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:234)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:229)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:87)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:728)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:67)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:145)
	at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.setFirstChunkRowCountForArrow(SnowflakeResultSetSerializableV1.java:1159)
	at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.create(SnowflakeResultSetSerializableV1.java:629)
	at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.create(SnowflakeResultSetSerializableV1.java:525)
	at net.snowflake.client.core.SFResultSetFactory.getResultSet(SFResultSetFactory.java:34)
	at net.snowflake.client.core.SFStatement.executeQueryInternal(SFStatement.java:243)
	at net.snowflake.client.core.SFStatement.executeQuery(SFStatement.java:149)
	at net.snowflake.client.core.SFStatement.execute(SFStatement.java:785)
	at net.snowflake.client.core.SFStatement.execute(SFStatement.java:693)
	at net.snowflake.client.jdbc.SnowflakeStatementV1.executeQueryInternal(SnowflakeStatementV1.java:296)
	at net.snowflake.client.jdbc.SnowflakePreparedStatementV1.executeQuery(SnowflakePreparedStatementV1.java:151)
	at org.graalvm.truffle/com.oracle.truffle.host.HostMethodDesc$SingleMethod$MHBase.invokeHandle(HostMethodDesc.java:371)
	at org.graalvm.truffle/com.oracle.truffle.host.GuestToHostCodeCache$GuestToHostInvokeHandle.executeImpl(GuestToHostCodeCache.java:88)
	at org.graalvm.truffle/com.oracle.truffle.host.GuestToHostRootNode.execute(GuestToHostRootNode.java:80)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:746)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callInlined(OptimizedCallTarget.java:550)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedRuntimeSupport.callInlined(OptimizedRuntimeSupport.java:250)
	at org.graalvm.truffle/com.oracle.truffle.host.GuestToHostRootNode.guestToHostCall(GuestToHostRootNode.java:102)
	at org.graalvm.truffle/com.oracle.truffle.host.HostMethodDesc$SingleMethod$MHBase.invokeGuestToHost(HostMethodDesc.java:407)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNode.doInvoke(HostExecuteNode.java:876)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNode.doOverloadedCached(HostExecuteNode.java:290)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNodeGen$Inlined.executeAndSpecialize(HostExecuteNodeGen.java:506)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNodeGen$Inlined.execute(HostExecuteNodeGen.java:363)
	at org.graalvm.truffle/com.oracle.truffle.host.HostObject.invokeMember(HostObject.java:464)
	at org.graalvm.truffle/com.oracle.truffle.host.HostObjectGen$InteropLibraryExports$Cached.invokeMemberNode_AndSpecialize(HostObjectGen.java:6701)
	at org.graalvm.truffle/com.oracle.truffle.host.HostObjectGen$InteropLibraryExports$Cached.invokeMember(HostObjectGen.java:6687)
	at org.graalvm.truffle/com.oracle.truffle.api.interop.InteropLibraryGen$CachedDispatch.invokeMember(InteropLibraryGen.java:8477)
	at org.enso.runtime/org.enso.interpreter.node.callable.resolver.HostMethodCallNode.resolveHostMethod(HostMethodCallNode.java:219)
	at org.enso.runtime/org.enso.interpreter.node.callable.resolver.HostMethodCallNodeGen.executeAndSpecialize(HostMethodCallNodeGen.java:157)
	at org.enso.runtime/org.enso.interpreter.node.callable.resolver.HostMethodCallNodeGen.execute(HostMethodCallNodeGen.java:119)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeMethodNode.doPolyglot(InvokeMethodNode.java:524)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeMethodNodeGen.executeAndSpecialize(InvokeMethodNodeGen.java:813)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeMethodNodeGen.execute(InvokeMethodNodeGen.java:507)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeCallableNode.invokeDynamicSymbol(InvokeCallableNode.java:268)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeCallableNodeGen.executeAndSpecialize(InvokeCallableNodeGen.java:218)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeCallableNodeGen.execute(InvokeCallableNodeGen.java:170)
	at org.enso.runtime/org.enso.interpreter.node.callable.ApplicationNode.executeGeneric(ApplicationNode.java:97)
	at org.enso.runtime/org.enso.interpreter.node.scope.AssignmentNodeGen.executeGeneric_generic1(AssignmentNodeGen.java:78)
	at org.enso.runtime/org.enso.interpreter.node.scope.AssignmentNodeGen.executeGeneric(AssignmentNodeGen.java:55)
	at org.enso.runtime/org.enso.interpreter.node.scope.AssignmentNodeGen.executeVoid(AssignmentNodeGen.java:98)
	at org.enso.runtime/org.enso.interpreter.node.callable.function.BlockNode.executeGeneric(BlockNode.java:52)
	at org.enso.runtime/org.enso.interpreter.node.callable.function.BlockNode.executeGeneric(BlockNode.java:54)
	at org.enso.runtime/org.enso.interpreter.node.ClosureRootNode.execute(ClosureRootNode.java:85)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:746)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:669)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:602)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:586)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:535)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedDirectCallNode.call(OptimizedDirectCallNode.java:94)
	at org.enso.runtime/org.enso.interpreter.node.callable.ExecuteCallNode.callDirect(ExecuteCallNode.java:94)
	at org.enso.runtime/org.enso.interpreter.node.callable.ExecuteCallNodeGen.executeAndSpecialize(ExecuteCallNodeGen.java:171)
	at org.enso.runtime/org.enso.interpreter.node.callable.ExecuteCallNodeGen.executeCall(ExecuteCallNodeGen.java:101)
	at org.enso.runtime/org.enso.interpreter.node.callable.dispatch.SimpleCallOptimiserNode.executeDispatch(SimpleCallOptimiserNode.java:56)
	at org.enso.runtime/org.enso.interpreter.node.callable.dispatch.CurryNode.doCall(CurryNode.java:161)
	at org.enso.runtime/org.enso.interpreter.node.callable.dispatch.CurryNode.execute(CurryNode.java:107)
	at org.enso.runtime/org.enso.interpreter.node.callable.dispatch.InvokeFunctionNode.invokeCached(InvokeFunctionNode.java:116)
	at org.enso.runtime/org.enso.interpreter.node.callable.dispatch.InvokeFunctionNodeGen.executeAndSpecialize(InvokeFunctionNodeGen.java:137)
	at org.enso.runtime/org.enso.interpreter.node.callable.dispatch.InvokeFunctionNodeGen.execute(InvokeFunctionNodeGen.java:99)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeCallableNode.invokeFunction(InvokeCallableNode.java:167)
	at org.enso.runtime/org.enso.interpreter.node.callable.InvokeCallableNodeGen.execute(InvokeCallableNodeGen.java:125)
	at org.enso.runtime/org.enso.interpreter.node.expression.builtin.resource.BracketNode.doBracket(BracketNode.java:74)
	at org.enso.runtime/org.enso.interpreter.node.expression.builtin.resource.BracketNodeGen.execute(BracketNodeGen.java:49)
	at org.enso.runtime/org.enso.interpreter.node.expression.builtin.resource.BracketMethodGen.execute(BracketMethodGen.java:161)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:746)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:669)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:602)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:586)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:535)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedDirectCallNode.call(OptimizedDirectCallNode.java:94)
...

@hubertp
Copy link
Contributor Author

hubertp commented Mar 26, 2024

As per @JaroslavTulach 's request - I will try out his patch and a) create a temporary solution b) submit patch to arrow/snowflake that will deprecate the former once merged

@hubertp
Copy link
Contributor Author

hubertp commented Mar 27, 2024

So I don't think the patch that simply replaces throwing the exception with logging it will work for Snowflake.
Steps to reproduce:

  1. Take arrow repo and checkout maint-10.0.x to match snowflake's version
  2. Build it
  3. Unpack existing sources jar -xvf <snowflake-jdbc>/dependencies/arrow-memory-core-10.0.1.jar
  4. cp arrow/java/memory/memory-core/target/classes/org/apache/arrow/memory/util/MemoryUtil* org/apache/arrow/memory/util/
  5. Pack it again jar -cvf ../arrow-memory-core-10.0.1.jar .
  6. Build snowflake jar: ./mvnw clean -DskipTests=true package
  7. Copy it to unmanaged classpath to std-snowflake: cp target/snowflake-jdbc.jar <enso>/std-bits/snowflake/lib/snowflake-jdbc-3.15.0.jar
  8. build distribution and test it on the project

For the custom snowflake to be picked up you will need this rather quick and easy hack to include unmanaged classpath:

iff --git a/project/StdBits.scala b/project/StdBits.scala
index 1e17616de3..76c89bf7a3 100644
--- a/project/StdBits.scala
+++ b/project/StdBits.scala
@@ -44,7 +44,7 @@ object StdBits {
           !graalVmOrgs.contains(orgName)
         })
       )
-      val relevantFiles =
+      val relevantFiles0 =
         libraryUpdates
           .select(
             configuration = configFilter,
@@ -52,6 +52,12 @@ object StdBits {
             artifact      = DependencyFilter.artifactFilter()
           )
 
+      val relevantFiles = if (destination.getPath.contains("Snowflake")) {
+        val all = (Compile/unmanagedJars).value.map(_.data)
+        relevantFiles0 ++ all
+      } else {
+        relevantFiles0
+      }
       val dependencyStore =
         streams.value.cacheStoreFactory.make("std-bits-dependencies")
       Tracked.diffInputs(dependencyStore, FileInfo.hash)(relevantFiles.toSet) {

and

--- a/build.sbt
+++ b/build.sbt
@@ -513,7 +513,7 @@ val hamcrestVersion         = "1.3"
 val netbeansApiVersion      = "RELEASE180"
 val fansiVersion            = "0.4.0"
 val httpComponentsVersion   = "4.4.1"
-val apacheArrowVersion      = "14.0.1"
+val apacheArrowVersion      = "10.0.1"
 val snowflakeJDBCVersion    = "3.15.0"
 
 // ============================================================================
@@ -2996,8 +2996,8 @@ lazy val `std-snowflake` = project
     Compile / packageBin / artifactPath :=
       `std-snowflake-polyglot-root` / "std-snowflake.jar",
     libraryDependencies ++= Seq(
-      "org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided",
-      "net.snowflake"    % "snowflake-jdbc"          % snowflakeJDBCVersion
+      "org.netbeans.api" % "org-openide-util-lookup" % netbeansApiVersion % "provided"//,
+      //"net.snowflake"    % "snowflake-jdbc"          % snowflakeJDBCVersion
     ),

After all is done you will still get something along the lines of

Error: There was an SQL error: JDBC driver internal error: Fail to retrieve row count for first arrow chunk: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available.. [Query was: SELECT "NEWRETAILDATA"."INVOICE" AS "INVOICE", "NEWRETAILDATA"."STOCKCODE" AS  …])

when trying to return Arrow rows.

Note that it would be nice to simply replace package Arrow's arrow-memory-core jar with the one checked in to snowflake repo in dependencies directory but it won't work. They seem to have some custom classes there which are nowhere to be found in the official repo.So unpacking and packing jar appears to be the only solution to try out the patch.

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Mar 27, 2024

Fail to retrieve row count for first arrow chunk: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available..

Thank you for the investigation. Can we get a stacktrace that fails on java.nio.DirectByteBuffer or Unsafe access?

@hubertp
Copy link
Contributor Author

hubertp commented Mar 27, 2024

Fail to retrieve row count for first arrow chunk: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available..

Thank you for the investigation. Can we get a stacktrace that fails on java.nio.DirectByteBuffer or Unsafe access?

Roughly

ava.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,long) accessible: module java.base does not "opens java.nio" to unnamed module @61d42275
	at java.base/java.lang.reflect.AccessibleObject.throwInaccessibleObjectException(AccessibleObject.java:391)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:367)
	at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:315)
	at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:194)
	at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:187)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.util.MemoryUtil$2.run(MemoryUtil.java:138)
	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.util.MemoryUtil.directBufferConstructor(MemoryUtil.java:131)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.util.MemoryUtil.<clinit>(MemoryUtil.java:96)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:234)
	at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:229)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:87)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:728)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:67)
	at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:145)
	at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.setFirstChunkRowCountForArrow(SnowflakeResultSetSerializableV1.java:1159)
	at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.create(SnowflakeResultSetSerializableV1.java:629)
	at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.create(SnowflakeResultSetSerializableV1.java:525)
	at net.snowflake.client.core.SFResultSetFactory.getResultSet(SFResultSetFactory.java:34)
	at net.snowflake.client.core.SFStatement.executeQueryInternal(SFStatement.java:243)
	at net.snowflake.client.core.SFStatement.executeQuery(SFStatement.java:149)
	at net.snowflake.client.core.SFStatement.execute(SFStatement.java:785)
	at net.snowflake.client.core.SFStatement.execute(SFStatement.java:693)
	at net.snowflake.client.jdbc.SnowflakeStatementV1.executeQueryInternal(SnowflakeStatementV1.java:296)
	at net.snowflake.client.jdbc.SnowflakePreparedStatementV1.executeQuery(SnowflakePreparedStatementV1.java:151)
	at org.graalvm.truffle/com.oracle.truffle.host.HostMethodDesc$SingleMethod$MHBase.invokeHandle(HostMethodDesc.java:371)
	at org.graalvm.truffle/com.oracle.truffle.host.GuestToHostCodeCache$GuestToHostInvokeHandle.executeImpl(GuestToHostCodeCache.java:88)
	at org.graalvm.truffle/com.oracle.truffle.host.GuestToHostRootNode.execute(GuestToHostRootNode.java:80)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:746)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedCallTarget.callInlined(OptimizedCallTarget.java:550)
	at org.graalvm.truffle.runtime/com.oracle.truffle.runtime.OptimizedRuntimeSupport.callInlined(OptimizedRuntimeSupport.java:250)
	at org.graalvm.truffle/com.oracle.truffle.host.GuestToHostRootNode.guestToHostCall(GuestToHostRootNode.java:102)
	at org.graalvm.truffle/com.oracle.truffle.host.HostMethodDesc$SingleMethod$MHBase.invokeGuestToHost(HostMethodDesc.java:407)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNode.doInvoke(HostExecuteNode.java:876)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNode.doOverloadedCached(HostExecuteNode.java:290)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNodeGen$Inlined.executeAndSpecialize(HostExecuteNodeGen.java:506)
	at org.graalvm.truffle/com.oracle.truffle.host.HostExecuteNodeGen$Inlined.execute(HostExecuteNodeGen.java:363)
	at org.graalvm.truffle/com.oracle.truffle.host.HostObject.invokeMember(HostObject.java:464)
	at org.graalvm.truffle/com.oracle.truffle.host.HostObjectGen$InteropLibraryExports$Cached.invokeMemberNode_AndSpecialize(HostObjectGen.java:6701)
...

@enso-bot
Copy link

enso-bot bot commented Mar 27, 2024

Hubert Plociniczak reports a new STANDUP for yesterday (2024-03-26):

Progress: Attempting to patch #9475, as requested, to avoid opening java modules.Snowflake appears to use some custom Arrow version, making the process difficult. It should be finished by 2024-03-27.

Next Day: Next day I will be working on the #9475 task. Continue with the task. Also go back to benchmark issue.

@JaroslavTulach
Copy link
Member

at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:234)
at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:229)

I assume the code here could just ByteBuffer.slice() rather than trying to obtain address of the ByteBuffer itself...

at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:87)

...catching the exception and doing some regular Java operation (ByteBuffer.slice or ByteBuffer.allocateDirect, etc.) would allow us to get further. I'd like to understand the scope of fixes that need to be done to allow Arrow/Snowflake to run on the regular JDK 21.

If you don't share my enthusiasm for patching upstream projects, then please share a reproducer that works with --add-opens and fails with this exception without opening NIO.

@hubertp
Copy link
Contributor Author

hubertp commented Apr 2, 2024

Steps to reproduce:

  1. Custom Snowflake_Details that enables Arrow:
--- a/distribution/lib/Standard/Snowflake/0.0.0-dev/src/Snowflake_Details.enso
+++ b/distribution/lib/Standard/Snowflake/0.0.0-dev/src/Snowflake_Details.enso
@@ -46,7 +46,7 @@ type Snowflake_Details
     jdbc_properties : Vector (Pair Text Text)
     jdbc_properties self =
         ## Avoid the Arrow dependency (https://community.snowflake.com/s/article/SAP-BW-Java-lang-NoClassDefFoundError-for-Apache-arrow)
-        no_arrow = [Pair.new 'jdbc_query_result_format' 'json']
+        no_arrow = [] #[Pair.new 'jdbc_query_result_format' 'json']
         account = [Pair.new 'account' self.account]
         credentials = [Pair.new 'user' self.credentials.username, Pair.new 'password' self.credentials.password]
         database = [Pair.new 'db' self.database]
from Standard.Snowflake import all
from Standard.Database import all
from Standard.Base import all

main =
    operator63293 = "<account>"
    operator93047 = Credentials.Username_And_Password "<user>" "<password>"
    connection = Database.connect (Snowflake_Details.Snowflake operator63293 operator93047 '<DB_NAME>')
    v = connection.read (SQL_Query.Table_Name '<TABLE_NAME>')

You will need a snowflake account with full access to <DB_NAME> and <TABLE_NAME> (ping @jdunkerley for the account)

If you run runner with JAVA_OPTS="--add-opens=java.base/java.nio=ALL-UNNAMED" then there is no crash for the above program.

@JaroslavTulach
Copy link
Member

JaroslavTulach commented Apr 8, 2024

Report from my today's investigation. Arrow version is specified here and it is 10.0.1

Get sources from

wget https://repo1.maven.org/maven2/org/apache/arrow/arrow-memory-core/10.0.1/arrow-memory-core-10.0.1-sources.jar
wget https://repo1.maven.org/maven2/org/apache/arrow/arrow-memory-netty/10.0.1/arrow-memory-netty-10.0.1-sources.jar

Compile as

javac -cp snowflake-jdbc-3.15.0.jar:$HOME/.m2/repository/org/slf4j/slf4j-api/1.7.29/slf4j-api-1.7.29.jar MemoryUtil.java -d .

Alas, the furthest I could get is to:

Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
        at net.snowflake.client.jdbc.internal.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:186)
        at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:227)
        at net.snowflake.client.jdbc.internal.apache.arrow.memory.ArrowBuf.nioBuffer(ArrowBuf.java:222)
        at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:87)

Debugging shows there is https://github.com/apache/arrow/blob/84f6edef697fd0fa0f5fce252c017a31e4ba3944/java/memory/memory-core/src/main/java/org/apache/arrow/memory/DefaultAllocationManagerOption.java#L94 and one can use a property to specify https://github.com/apache/arrow/blob/84f6edef697fd0fa0f5fce252c017a31e4ba3944/java/memory/memory-core/src/main/java/org/apache/arrow/memory/DefaultAllocationManagerOption.java#L39C30-L39C67 allocation manager. However neither Netty or Unsafe allocation managers work without the --add-opens=java.base/java.nio=ALL-UNNAMED option.

At the end they want to convert ArrowBuf to ByteBuffer, but the ArrowBuf only has https://github.com/apache/arrow/blob/84f6edef697fd0fa0f5fce252c017a31e4ba3944/java/memory/memory-core/src/main/java/org/apache/arrow/memory/ArrowBuf.java#L73 addr field - the underlaying buffer is long lost somewhere deep (if it was allocated at all) - we would need to change https://github.com/apache/arrow/blob/84f6edef697fd0fa0f5fce252c017a31e4ba3944/java/memory/memory-netty/src/main/java/org/apache/arrow/memory/netty/NettyAllocationManager.java#L79 to record it. And that's far bigger endeavor than we should be trying.

Looks like PlatformDependent was written by people who don't trust Java GC much...

GitHub
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing - apache/arrow
GitHub
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing - apache/arrow
GitHub
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing - apache/arrow
GitHub
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing - apache/arrow

hubertp added a commit that referenced this issue Apr 9, 2024
In the absence of other workarounds we need to ensure that every Java
process is started with extra options:
`--add-opens=java.base/java.nio=ALL-UNNAMED`
Verified locally. Note: this only affects LS and cli will continue to
crash on examples using Snowflake.

Closes #9475.
@mergify mergify bot closed this as completed in #9664 Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-compiler -libs Libraries: New libraries to be implemented
Projects
Status: 🟢 Accepted
Development

Successfully merging a pull request may close this issue.

2 participants