Vector search should support appending new rows#593
Conversation
cec42e4 to
03c5543
Compare
|
|
||
|
|
||
| @pytest.mark.skipif( | ||
| (platform.system() == "Darwin") and (platform.machine() != "arm64"), |
There was a problem hiding this comment.
the intel mac should fallback to the AVX version? not sure why need to skip on mac GHA
There was a problem hiding this comment.
we added this awhile back because the index related tests on intel mac runners would take forever on github
| Self::checkout_manifest(object_store, base_path, &manifest_file).await | ||
| } | ||
|
|
||
| pub async fn checkout_version(&self, version: u64) -> Result<Self> { |
There was a problem hiding this comment.
Just checkout ? Also, pls add some doc to the public API
There was a problem hiding this comment.
checkout is already a static method so this needs a new name. Will add docstring
| // If overwrite, invalidate index | ||
| manifest.index_section = None; | ||
| } | ||
| manifest.version = dataset.as_ref().map_or(1, |d| d.manifest.version + 1); |
There was a problem hiding this comment.
This should be the latest version + 1?
For example, if you check out version 2 of [1, 2, 3], and write again, it will overwrite the version 3, iiuc
| manifest.version = dataset.as_ref().map_or(1, |d| d.manifest.version + 1); | ||
| let indices = if matches!(params.mode, WriteMode::Append) { | ||
| if let Some(d) = dataset.as_ref() { | ||
| Some(d.load_indices().await?) |
There was a problem hiding this comment.
yes, i'll add code comments as well
| // Check if we've created new versions since the index | ||
| let version = index.dataset_version; | ||
| if version != self.dataset.version().version { | ||
| // If we've added more rows, then we'll have new fragments |
There was a problem hiding this comment.
could you extract this piece to a separate function?
| .manifest | ||
| .max_fragment_id() | ||
| .ok_or_else(|| Error::IO("No fragments in index version".to_string()))?; | ||
| let max_fragment_id_ds = self |
There was a problem hiding this comment.
This does not allow to reclaim / recycle fragment IDs? Could you add a comment?
In case of a recycle Ids, it seems need to re-mapping all fragment ids.
| self.fragments.iter().map(|f| f.id).max() | ||
| } | ||
|
|
||
| pub fn fragments_since(&self, since: Manifest) -> Vec<Fragment> { |
|
|
||
| pub fn fragments_since(&self, since: Manifest) -> Vec<Fragment> { | ||
| let mut fragments = vec![]; | ||
| let mut fragment_map = HashMap::new(); |
There was a problem hiding this comment.
should we check since version is ealier than self.version?
Feel it is more reliable that fragment has a "version" attach to it, so that this function just compares the version number.
To handle case like, deletion / insert and etc
There was a problem hiding this comment.
i can add an assert check for the version.
| /// Returns an error if the preconditions are not met. | ||
| pub fn try_new(input: Arc<dyn ExecutionPlan>, query: Query) -> Result<Self> { | ||
| let schema = input.schema(); | ||
| pub fn try_new(inputs: Vec<Arc<dyn ExecutionPlan>>, query: Query) -> Result<Self> { |
There was a problem hiding this comment.
What is the case where KNNFlatExec has multiple children execute plan? Should it be its child to take care the multiple sources are?
There was a problem hiding this comment.
you'd need to add a concat node? feels like unnecessary complexity for now? what use case are you anticipating with multiple child nodes
There was a problem hiding this comment.
oh my question was that i dont see a case that KNNFlatExec has multiple inputs, so wanted to know why change the signature from input to inputs: Vec<_>.
Because the number of children is kind important for the execution plan contract. It feels wrong that KNNFlatExec needs to handle multiple children.
| // Index name. Must be unique within one dataset version. | ||
| string name = 3; | ||
|
|
||
| // The latest version of the dataset this index covers. |
There was a problem hiding this comment.
the version of the dataset this index was built on?
88591f9 to
d8a3b96
Compare
next PR will join flat search for new fragments