Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for add to async python API #1037

Merged
merged 10 commits into from
Mar 4, 2024

Conversation

westonpace
Copy link
Contributor

@westonpace westonpace commented Feb 28, 2024

In order to add support for add we needed to migrate the rust Table trait to a Table struct and TableInternal trait (similar to the way the connection is designed).

While doing this we also cleaned up some inconsistencies between the SDKs:

  • Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object.
  • We made significant improvements to table creation in cc5f213 for the node SDK. I copied these changes to the nodejs SDK.
  • The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories.
  • countRows in the node SDK was returning bigint. I changed it to return number (this actually happened in a previous PR)
  • Tables and connections now implement std::fmt::Display which is hooked into python's __repr__. Node has no concept of a regular "to string" function and so I added a display method.
  • Python method signatures are changing so that optional parameters are always Optional[foo] = None instead of something like foo = False. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation).
  • I changed the python AsyncConnection/AsyncTable classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python.

Note: this does NOT add the add function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns)

@westonpace westonpace changed the title feat: add support for add/query to async python API feat: add support for add to async python API Feb 29, 2024
@westonpace westonpace marked this pull request as ready for review February 29, 2024 21:34
@westonpace westonpace requested a review from eddyxu March 1, 2024 13:20
@westonpace
Copy link
Contributor Author

For simplicity I've disabled nodejs lint in this PR. It was reporting a bunch of violations on the arrow/embedding functions/openai code that was copied over from the node folder. In the next PR I am reworking nodejs eslint/prettier formatting significantly and will re-enable it.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few minor comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests look very nice!

mode: String,
) -> napi::Result<Table> {
let schema = ipc_file_to_schema(schema_buf.to_vec())
.map_err(|e| napi::Error::from_reason(format!("Failed to read IPC file: {}", e)))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the user would be confused by the idea of an "IPC file" involved here. This is really about the schema, right?

Suggested change
.map_err(|e| napi::Error::from_reason(format!("Failed to read IPC file: {}", e)))?;
.map_err(|e| napi::Error::from_reason(format!("Failed to parse schema: {}", e)))?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use IPC to marshal schemas to and from Rust (we should ideally come up with something based on the C API, I'll make a ticket) Good idea. I went with something slightly different Failed to marshal schema from JS to Rust just because the most likely cause is an internal bug of some kind something and not an invalid schema.

python/python/lancedb/db.py Outdated Show resolved Hide resolved
python/python/lancedb/table.py Outdated Show resolved Hide resolved
@westonpace westonpace merged commit abaf315 into lancedb:main Mar 4, 2024
22 checks passed
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
lancedb@d33026d
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
lancedb@d33026d
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
westonpace added a commit that referenced this pull request Apr 5, 2024
In order to add support for `add` we needed to migrate the rust `Table`
trait to a `Table` struct and `TableInternal` trait (similar to the way
the connection is designed).

While doing this we also cleaned up some inconsistencies between the
SDKs:

* Python and Node are garbage collected languages and it can be
difficult to trigger something to be freed. The convention for these
languages is to have some kind of close method. I added a close method
to both the table and connection which will drop the underlying rust
object.
* We made significant improvements to table creation in
cc5f213
for the `node` SDK. I copied these changes to the `nodejs` SDK.
* The nodejs tables were using fs to create tmp directories and these
were not getting cleaned up. This is mostly harmless but annoying and so
I changed it up a bit to ensure we cleanup tmp directories.
* ~~countRows in the node SDK was returning `bigint`. I changed it to
return `number`~~ (this actually happened in a previous PR)
* Tables and connections now implement `std::fmt::Display` which is
hooked into python's `__repr__`. Node has no concept of a regular "to
string" function and so I added a `display` method.
* Python method signatures are changing so that optional parameters are
always `Optional[foo] = None` instead of something like `foo = False`.
This is because we want those defaults to be in rust whenever possible
(though we still need to mention the default in documentation).
* I changed the python `AsyncConnection/AsyncTable` classes from
abstract classes with a single implementation to just classes because we
no longer have the remote implementation in python.

Note: this does NOT add the `add` function to the remote table. This PR
was already large enough, and the remote implementation is unique
enough, that I am going to do all the remote stuff at a later date (we
should have the structure in place and correct so there shouldn't be any
refactor concerns)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
alexkohler pushed a commit to alexkohler/lancedb that referenced this pull request Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants