New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement java bindings #1928
Conversation
} | ||
|
||
impl BlockingDataset { | ||
pub fn write(reader: ArrowArrayStreamReader, uri: &str) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is async calls not encouraged cross JNI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just thinking async call in jni might introduce complexities. Also I prefer to rely on the thread management in java compute engine (I'm still targeting to implement a lance connector for Trino/Presto). But anyway, async api will be more efficient in many places, I think we could implement the async jni later, can we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @beinan Any plans to add a Trino connector for this? If so, any sense of ETA? Thanks for building this :)
@@ -0,0 +1,36 @@ | |||
use arrow::ffi_stream::ArrowArrayStreamReader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add apache license to headers :)
cargo clippy
will pick this rule up. Lets make github action to run on this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I will add the github actions for both java and rust-jni projects
)?; | ||
Ok(BlockingDataset { inner, rt }) | ||
} | ||
pub fn open(uri: &str) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be great that we can pass Read / Write options from JNI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm also thinking about that, I will add the options very soon
pub fn count_rows(&self) -> Result<usize> { | ||
self.rt.block_on(self.inner.count_rows()) | ||
} | ||
pub fn close(&self) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So input and output are FFI_ArrowArrayStream
iiuc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the input of the write path can be either a FFI_ArrowArrayStream or FFI_ArrowArray (I have implement the stream api for write, I'm thinking to implement append api for both structure). For the output of the read path, I think it will be something very similar but I didn't get a chance to try to implement it.
378ed98
to
f809e86
Compare
Hi @eddyxu, I integrated a mvn plugin to build rust project in maven, you can run Then you can find jar at The structure of the jar is simple
I only added one target platform "darwin-aarch64" for macos on arm. (I will figure out how to add more target platform later) It will also run the test in both Rust and Java project. By the way, I also removed quite a few redundant code. |
b735c42
to
1dfe53a
Compare
.github/workflows/java.yml
Outdated
matrix: | ||
include: | ||
- os: ubuntu-22.04 | ||
java-version: 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote to not support java8 :) Even ok to not support java11 if it is up to me :)
3c4badc
to
824181c
Compare
29bbe7c
to
3447b43
Compare
distribution: temurin | ||
java-version: 17 | ||
cache: 'maven' | ||
- run: echo "JAVA_17=$JAVA_HOME" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a matrix to split it into two jobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the CI job, testing another Java version only took 6s. If we split to another job, it would have to recompile the Rust (and also do the installation steps). I think we should keep as-is so that our CI usage is kept minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comments @wjones127 ! Yes, that's why I merge these two together as recompiling the rust part really takes a lot of time.
pull_request: | ||
paths: | ||
- java/** | ||
- rust/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do expect to run this everytime with rust change, could we change this package as a workspace package with the rest of rust code, and share the same workspace Cargo.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds great to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @beinan. Thanks for working on this.
The things I would like to see:
- Make the Tokio runtime a lazy static, so we aren't recreating runtimes.
- Make this crate part of the workspace, if possible.
- Minimize scope of your
unsafe
calls.
I think additional features like supporting ReadParams
can be left for a future PR. More important that we get the foundation of FFI and async runtime working.
java/lance-jni/src/lib.rs
Outdated
unsafe { | ||
match ArrowArrayStreamReader::from_raw(arrow_array_stream_addr as *mut FFI_ArrowArrayStream) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure when we use unsafe
blocks, we only put the minimal unsafe parts there and leave a comment why the calls are safe.
let stream_ptr = arrow_array_stream_addr as *mut FFI_ArrowArrayStream;
// SAFETY: the pointer is recieved directly from Java's
// ArrowArrayStream.memory_address(), which guarantees to return a non-null
// valid pointer.
let stream = unsafe { ArrowArrayStreamReader::from_raw(stream_ptr) };
match stream { .... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eddyxu This is still outstanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
let rt = tokio::runtime::Builder::new_current_thread() | ||
.enable_all() | ||
.build()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to a lazy static variable? We don't want to create a new runtime every time we write to a dataset.
Note, however, in the future we should probably port the executor we did in Python #1172. This allows things like writing to a Lance dataset from a scan of another Lance dataset, and support for KeyboardInterrupt (#1438).
distribution: temurin | ||
java-version: 17 | ||
cache: 'maven' | ||
- run: echo "JAVA_17=$JAVA_HOME" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the CI job, testing another Java version only took 6s. If we split to another job, it would have to recompile the Rust (and also do the installation steps). I think we should keep as-is so that our CI usage is kept minimal.
392454e
to
43d6959
Compare
uses: actions/setup-java@v4 | ||
with: | ||
distribution: temurin | ||
java-version: 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be 17 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -35,9 +35,6 @@ jobs: | |||
toolchain: | |||
- stable | |||
- nightly | |||
defaults: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think https://github.com/lancedb/lance/blob/main/.github/workflows/bump-version/action.yml also needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
java/lance-jni/Cargo.toml
Outdated
@@ -0,0 +1,26 @@ | |||
[package] | |||
name = "lance-jni" | |||
version = "0.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this version stay consistent with the rust crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure i can use the rust version. Still need to find a way to bump maven version i guess.
use jni::JNIEnv; | ||
|
||
pub fn throw_java_exception(env: &mut JNIEnv, err_msg: &str) { | ||
env.throw_new("java/lang/RuntimeException", err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we make a com.lancedb.lance.LanceException
class? a bare RuntimeException
might make catch statements unnecessarily wide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do as follow up.
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
#1928 moved the workspace `Cargo.toml` to the root directory
This pull request is still under development and addresses the interconnection between Java and Rust using JNI for both write and read paths. Here's a summary of the progress:
Write Path:
Read Path:
Development for the read path (data flow from Rust to Java) is ongoing within this PR.
The same approach with Arrow and JNI will be utilized for consistency and performance.
Next Steps:
Highlights: