feat(rust): openai embedding function #1275

universalmind303 · 2024-05-07T16:01:49Z

part of #994.

Adds the ability to use the openai embedding functions.

the example can be run by the following

> EXPORT OPENAI_API_KEY="sk-..."
> cargo run --example openai --features=openai

which should output

Closest match: Winter Parka

…penai

westonpace

This is pretty cool! A few initial questions.

westonpace · 2024-05-14T16:25:16Z

rust/lancedb/examples/openai.rs

+        .execute()
+        .await?;
+
+    // there is no equivalent to '.search(<query>)' yet


Is the plan to have an equivalent at some point in the future?

IIRC, there were plans to add similar functionality to rust sdk

rust/lancedb/src/embeddings/openai.rs

westonpace · 2024-05-14T16:28:03Z

rust/lancedb/src/embeddings/openai.rs

+        // We can't use the FixedSizeListBuilder here because it always adds a null bitmap
+        // and we want to explicitly work with non-nullable arrays.


I see that OpenAi only supports non-nullable but will this builder hide the nulls by dropping the nullability (so we don't get an error when we should get an error)?

no, we assert that the input array is non null, so we should never run into a scenario where we drop the nullability.

westonpace · 2024-05-14T16:32:21Z

rust/lancedb/src/embeddings/openai.rs

+            _ => unreachable!("This should not happen. We already checked the data type."),
+        };
+
+        let client = Client::with_config(creds);


Do we want to cache / pool this client at some point in the future? Although maybe that's more work than its worth given the amount of time it takes to calculate embeddings the connection establishment might be a small fraction anyways.

that was my initial thought. We can definitely add this as an optimization later on if we need to.

westonpace · 2024-05-14T16:33:44Z

rust/lancedb/src/embeddings/openai.rs

+        task::block_in_place(move || {
+            Handle::current().block_on(async {


Should compute_inner be an async function?

I think we'd still need to spawn an async task for this regardless unless we made EmbeddingFunction::compute_source_embeddings async, but we can't do that without some significant refactoring.

We use the RecordBatchReader which isn't async. We'd likely have to refactor all instances of that to an async equivalent.

However, if you prefer the wrapping to happen outside of compute_inner, that's totally fine & i can make those changes.

We don't have to worry about it right now. I am thinking we might eventually want compute_source_embeddings to be async though. The top-level add and query methods are already async so it seems odd to go async -> sync -> async.

Also, we might want to run embeddings in parallel at some point too. Although that might be best left to embedding function since different embeddings might have different preferences on batch size / parallel execution.

We don't have to worry about it right now. I am thinking we might eventually want compute_source_embeddings to be async though. The top-level add and query methods are already async so it seems odd to go async -> sync -> async.

makes sense. Will update!

Also, we might want to run embeddings in parallel at some point too. Although that might be best left to embedding function since different embeddings might have different preferences on batch size / parallel execution.

I made a similar comment on an earlier PR. #1259 (comment)

…penai

westonpace

Just some minor nits but looks great otherwise

rust/lancedb/src/embeddings/openai.rs

universalmind303 added 12 commits May 1, 2024 15:50

wip: rust embedding registry

e65a709

clippy & fmt

7beb965

better examples & pluggable registry

61a4fee

cleanup

9826627

remove commented code

56df2b8

fix remote

80281d2

comments

606470c

comments

2a90d36

feat: rust "openai" embedding function

fc60a12

clippy

53546e6

Merge branch 'main' of https://github.com/lancedb/lancedb into rust-o…

ce3fead

…penai

update example

a80a6b8

universalmind303 requested review from wjones127 and westonpace May 7, 2024 16:02

westonpace reviewed May 14, 2024

View reviewed changes

add docstrings

5ea2b33

github-actions bot added enhancement New feature or request Rust Rust related issues labels May 14, 2024

universalmind303 requested a review from westonpace May 17, 2024 15:09

universalmind303 added 2 commits May 30, 2024 14:04

Merge branch 'main' of https://github.com/lancedb/lancedb into rust-o…

358a4d4

…penai

clippy

4f24ef7

westonpace approved these changes May 30, 2024

View reviewed changes

rust/lancedb/src/embeddings/openai.rs Outdated Show resolved Hide resolved

rust/lancedb/src/embeddings/openai.rs Outdated Show resolved Hide resolved

chore: pr feedback

59be520

universalmind303 merged commit 01dd6c5 into lancedb:main May 30, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust): openai embedding function #1275

feat(rust): openai embedding function #1275

universalmind303 commented May 7, 2024 •

edited

Loading

westonpace left a comment

westonpace May 14, 2024

universalmind303 May 14, 2024

westonpace May 14, 2024

universalmind303 May 14, 2024

westonpace May 14, 2024

universalmind303 May 14, 2024

westonpace May 14, 2024

universalmind303 May 14, 2024

westonpace May 14, 2024

universalmind303 May 14, 2024

westonpace left a comment

		// We can't use the FixedSizeListBuilder here because it always adds a null bitmap
		// and we want to explicitly work with non-nullable arrays.

		task::block_in_place(move \|\| {
		Handle::current().block_on(async {

feat(rust): openai embedding function #1275

feat(rust): openai embedding function #1275

Conversation

universalmind303 commented May 7, 2024 • edited Loading

westonpace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

universalmind303 commented May 7, 2024 •

edited

Loading