New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): expose drop_columns()
in Python
#1904
Conversation
463fed5
to
817cfd0
Compare
test feat: support returning Pandas DF feat: add_columns supports pools fix test name conflict fix: shutdown pool fix remove process pool remove method implement Rust API clean up testing of drop_columns feat: rename columns partially implement casting wip: add Python bindings wip: start fixing impl make it work remove irrelevant changes pass tests format format
817cfd0
to
812d9f3
Compare
drop()
to drop_columns()
drop_columns()
in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments / questions. This looks good!
|
||
fn drop_columns(&mut self, columns: Vec<&str>) -> PyResult<()> { | ||
let mut new_self = self.ds.as_ref().clone(); | ||
RT.block_on(None, new_self.drop_columns(&columns))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block_on
and not spawn
? Is that because this API can't possibly call back into python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it should be safe if there's no way to call back into Python.
lance::Error::InvalidInput { source, .. } => { | ||
PyValueError::new_err(source.to_string()) | ||
} | ||
_ => PyIOError::new_err(err.to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably have a generic error conversion that always maps InvalidInput
to PyValueError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah refactoring our error handling is on my TODO list. I'll put up an issue soon.
@@ -1466,6 +1436,56 @@ impl Dataset { | |||
} | |||
} | |||
|
|||
impl Dataset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in a new impl
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put it in a separate impl
block, it groups the methods together in the API docs. You can add a documentation block to it. My plan is to put all the "schema evolution" methods under one header and have a little doc string describing them together.
/// underlying storage. In order to remove the data, you must subsequently | ||
/// call `compact_files` to rewrite the data without the removed columns and | ||
/// then call `cleanup_files` to remove the old files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most users aren't going to see this but I feel like we're going to get questions about it. Might need to end up in a top level rst document someday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I should add this to the Python API docs. But later adding it to the user guide will be good as well.
} | ||
} | ||
|
||
let columns_to_remove = self.manifest.schema.project(columns)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think project can handle nested column references right? What happens if a nested reference is provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will drop the nested column. There is a unit test for this:
lance/rust/lance/src/dataset.rs
Line 2706 in 812d9f3
dataset.drop_columns(&["s.d"]).await?; |
@@ -2663,270 +2667,65 @@ mod tests { | |||
metadata.clone(), | |||
)); | |||
|
|||
let struct_array_1 = Arc::new(StructArray::from(vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for simplifying these tests!
Renames the Rust method
drop()
todrop_columns()
for clarity and also alignment withadd_columns()
andalter_columns()
.Closes #1076
Related #1674