Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement java bindings #1928

Merged
merged 19 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
File renamed without changes.
1 change: 0 additions & 1 deletion .github/workflows/bump-version/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ runs:
cargo install cargo-bump
cargo bump ${{ inputs.part }}
- name: Synchronize rust version
working-directory: rust
shell: bash
run: |
cargo install cargo-workspaces --version 0.2.44
Expand Down
66 changes: 66 additions & 0 deletions .github/workflows/java.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: Build and Run Java JNI Tests
on:
push:
branches:
- main
pull_request:
paths:
- java/**
- rust/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do expect to run this everytime with rust change, could we change this package as a workspace package with the rest of rust code, and share the same workspace Cargo.toml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds great to me!

- .github/workflows/java.yml
env:
# This env var is used by Swatinem/rust-cache@v2 for the cache
# key, so we set it to make sure it is always consistent.
CARGO_TERM_COLOR: always
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
RUST_BACKTRACE: "1"
# according to: https://matklad.github.io/2021/09/04/fast-rust-builds.html
# CI builds are faster with incremental disabled.
CARGO_INCREMENTAL: "0"
CARGO_BUILD_JOBS: "1"
jobs:
linux-build:
runs-on: ubuntu-22.04
name: ubuntu-22.04 + Java 11 & 17
defaults:
run:
working-directory: ./java
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/java-jni
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/lance-jni
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Install Java 17
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: 17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be 11 here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

cache: "maven"
- run: echo "JAVA_17=$JAVA_HOME" >> $GITHUB_ENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a matrix to split it into two jobs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at the CI job, testing another Java version only took 6s. If we split to another job, it would have to recompile the Rust (and also do the installation steps). I think we should keep as-is so that our CI usage is kept minimal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments @wjones127 ! Yes, that's why I merge these two together as recompiling the rust part really takes a lot of time.

- name: Install Java 11
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be 17 instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

cache: "maven"
- name: Java Style Check
run: mvn checkstyle:check
- name: Rust Clippy
working-directory: java/lance-jni
run: cargo clippy --all-targets -- -D warnings
- name: Build with Maven with Java 11
run: mvn package -DskipTests=true
- name: Running tests with Java 11
run: mvn test
- name: Running tests with Java 17
run: JAVA_HOME=$JAVA_17 JAVA_TOOL_OPTIONS="--add-opens=java.base/java.nio=ALL-UNNAMED" mvn test
9 changes: 0 additions & 9 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@ jobs:
toolchain:
- stable
- nightly
defaults:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

run:
working-directory: ./rust
steps:
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
Expand Down Expand Up @@ -73,9 +70,6 @@ jobs:
linux-arm:
runs-on: buildjet-4vcpu-ubuntu-2204-arm
timeout-minutes: 30
defaults:
run:
working-directory: ./rust
steps:
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
Expand All @@ -94,9 +88,6 @@ jobs:
clippy:
runs-on: ubuntu-22.04
timeout-minutes: 30
defaults:
run:
working-directory: ./rust
steps:
- uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
Expand Down
48 changes: 25 additions & 23 deletions rust/Cargo.toml → Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
[workspace]
members = [
"lance-arrow",
"lance-core",
"lance-datagen",
"lance-file",
"lance-index",
"lance-io",
"lance-linalg",
"lance-table",
"lance-testing",
"lance-test-macros",
"lance",
"java/lance-jni",
"rust/lance",
"rust/lance-arrow",
"rust/lance-core",
"rust/lance-datagen",
"rust/lance-file",
"rust/lance-index",
"rust/lance-io",
"rust/lance-linalg",
"rust/lance-table",
"rust/lance-test-macros",
"rust/lance-testing",
]
exclude = ["python"]
# Python package needs to be built by maturin.
resolver = "2"

Expand Down Expand Up @@ -39,18 +41,18 @@ categories = [
rust-version = "1.75"

[workspace.dependencies]
lance = { version = "=0.10.0", path = "./lance" }
lance-arrow = { version = "=0.10.0", path = "./lance-arrow" }
lance-core = { version = "=0.10.0", path = "./lance-core" }
lance-datafusion = { version = "=0.10.0", path = "./lance-datafusion" }
lance-datagen = { version = "=0.10.0", path = "./lance-datagen" }
lance-file = { version = "=0.10.0", path = "./lance-file" }
lance-index = { version = "=0.10.0", path = "./lance-index" }
lance-io = { version = "=0.10.0", path = "./lance-io" }
lance-linalg = { version = "=0.10.0", path = "./lance-linalg" }
lance-table = { version = "=0.10.0", path = "./lance-table" }
lance-test-macros = { version = "=0.10.0", path = "./lance-test-macros" }
lance-testing = { version = "=0.10.0", path = "./lance-testing" }
lance = { version = "=0.10.0", path = "./rust/lance" }
lance-arrow = { version = "=0.10.0", path = "./rust/lance-arrow" }
lance-core = { version = "=0.10.0", path = "./rust/lance-core" }
lance-datafusion = { version = "=0.10.0", path = "./rust/lance-datafusion" }
lance-datagen = { version = "=0.10.0", path = "./rust/lance-datagen" }
lance-file = { version = "=0.10.0", path = "./rust/lance-file" }
lance-index = { version = "=0.10.0", path = "./rust/lance-index" }
lance-io = { version = "=0.10.0", path = "./rust/lance-io" }
lance-linalg = { version = "=0.10.0", path = "./rust/lance-linalg" }
lance-table = { version = "=0.10.0", path = "./rust/lance-table" }
lance-test-macros = { version = "=0.10.0", path = "./rust/lance-test-macros" }
lance-testing = { version = "=0.10.0", path = "./rust/lance-testing" }
approx = "0.5.1"
# Note that this one does not include pyarrow
arrow = { version = "50.0.0", optional = false, features = ["prettyprint"] }
Expand Down
1 change: 1 addition & 0 deletions java/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.iml
21 changes: 21 additions & 0 deletions java/lance-jni/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
[package]
name = "lance-jni"
version.workspace = true
edition.workspace = true
repository.workspace = true
readme.workspace = true
license.workspace = true
description = "JNI bindings for Lance Columnar format"
keywords.workspace = true
categories.workspace = true

[lib]
crate-type = ["cdylib"]

[dependencies]
lance.workspace = true
arrow = { workspace = true, features = ["ffi"] }
tokio = { workspace = true }
jni = "0.21.1"
snafu.workspace = true
lazy_static.workspace = true
46 changes: 46 additions & 0 deletions java/lance-jni/src/blocking_dataset.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use arrow::array::RecordBatchReader;

use lance::dataset::{Dataset, WriteParams};
use lance::Result;

use crate::RT;

pub struct BlockingDataset {
inner: Dataset,
}

impl BlockingDataset {
pub fn write(
reader: impl RecordBatchReader + Send + 'static,
uri: &str,
params: Option<WriteParams>,
) -> Result<Self> {
let inner = RT.block_on(Dataset::write(reader, uri, params))?;
Ok(Self { inner })
}

pub fn open(uri: &str) -> Result<Self> {
let inner = RT.block_on(Dataset::open(uri))?;
Ok(Self { inner })
}

pub fn count_rows(&self) -> Result<usize> {
RT.block_on(self.inner.count_rows())
}

pub fn close(&self) {}
Copy link
Contributor

@eddyxu eddyxu Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So input and output are FFI_ArrowArrayStream iiuc
?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the input of the write path can be either a FFI_ArrowArrayStream or FFI_ArrowArray (I have implement the stream api for write, I'm thinking to implement append api for both structure). For the output of the read path, I think it will be something very similar but I didn't get a chance to try to implement it.

}
87 changes: 87 additions & 0 deletions java/lance-jni/src/jni_helpers.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use jni::objects::{JMap, JObject, JString};
use jni::JNIEnv;

pub fn throw_java_exception(env: &mut JNIEnv, err_msg: &str) {
env.throw_new("java/lang/RuntimeException", err_msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we make a com.lancedb.lance.LanceException class? a bare RuntimeException might make catch statements unnecessarily wide

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do as follow up.

.expect("Error throwing exception");
}

pub fn extract_path_str(env: &mut JNIEnv, path: &JString) -> Result<String, String> {
match env.get_string(path) {
Ok(path) => Ok(path.into()),
Err(err) => Err(format!("Invalid path string: {}", err)),
}
}

pub trait ExtractJniValue {
// extract JNI JObject to rust type
fn extract(env: &mut JNIEnv, obj: JObject) -> Result<Self, String>
where
Self: Sized;
}

impl ExtractJniValue for i32 {
fn extract(env: &mut JNIEnv, obj: JObject) -> Result<Self, String> {
env.call_method(obj, "intValue", "()I", &[])
.and_then(|jvalue| jvalue.i())
.map_err(|e| e.to_string())
}
}

impl ExtractJniValue for i64 {
fn extract(env: &mut JNIEnv, obj: JObject) -> Result<Self, String> {
env.call_method(obj, "longValue", "()J", &[])
.and_then(|jvalue| jvalue.j())
.map_err(|e| e.to_string())
}
}

impl ExtractJniValue for String {
fn extract(env: &mut JNIEnv, obj: JObject) -> Result<Self, String> {
env.get_string(&JString::from(obj))
.map(|jstr| jstr.into())
.map_err(|e| e.to_string())
}
}

pub fn extract_and_process_jni_map_value<T, F>(
env: &mut JNIEnv,
map: &JMap,
key: &str,
mut process: F,
) -> Result<(), String>
where
T: ExtractJniValue,
F: FnMut(T) -> Result<(), String>, // func to apply to return value
{
let key_obj = env
.new_string(key)
.map(JObject::from)
.map_err(|e| e.to_string())?;

match map.get(env, &key_obj) {
Ok(Some(value_obj)) => {
if !value_obj.is_null() {
let value = T::extract(env, value_obj)?;
process(value)?;
}
Ok(())
}
Ok(None) => Ok(()), // Key is not present in the map, so we do nothing
Err(e) => Err(e.to_string()),
}
}