Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: support range-filter scan by sort key #589

Closed
Tracked by #572
skyzh opened this issue Mar 27, 2022 · 0 comments · Fixed by #644
Closed
Tracked by #572

storage: support range-filter scan by sort key #589

skyzh opened this issue Mar 27, 2022 · 0 comments · Fixed by #644
Labels
enhancement New feature or request

Comments

@skyzh
Copy link
Member

skyzh commented Mar 27, 2022

Our storage engine already supports sorting by sort key. To leverage the property in storage, we can provide a range_scan interface on storage, so as to support range filter scan for executors.

To support range_scan, we should be able to start scanning from a sort key in storage engine. This requires two steps:

  • record sort key in block index
  • support create iterator by sort key
  • add interface for executors' use

Currently, the first_key field is empty in block builder. We should record first_key in order to support seek. Note that #588 will add composite sort key support. For this issue, we may only support range filter scan by the first element in composite sort key.

pub fn finish_block(
&mut self,
block_type: BlockType,
column_data: &mut Vec<u8>,
block_data: &mut Vec<u8>,
stats: Vec<BlockStatistics>,
) {
self.indexes.push(BlockIndex {
offset: column_data.len() as u64,
length: block_data.len() as u64 + BLOCK_HEADER_SIZE as u64,
first_rowid: self.last_row_count as u32,
row_count: (self.row_count - self.last_row_count) as u32,
/// TODO(chi): support sort key
first_key: "".into(),
stats,
});

After that, we should be able to support create RowSetIterator by sort key.

pub async fn new(
rowset: Arc<DiskRowset>,
column_refs: Arc<[StorageColumnRef]>,
dvs: Vec<Arc<DeleteVector>>,
seek_pos: ColumnSeekPosition,
expr: Option<BoundExpr>,
) -> StorageResult<Self> {

The interface should have two new fields start_key: Option<Vec<u8>> and end_key: Option<Vec<u8>>. RowSetIterator should return rows only in this range. At the same time, the RowSetIterator should directly seek to the block of start_key instead of scanning from the beginning.

When everything's ready, we can finally expose range-filter scan interface to the compute layer. This requires us to modify:

fn scan<'a>(
&'a self,
begin_sort_key: Option<&'a [u8]>,
end_sort_key: Option<&'a [u8]>,
col_idx: &'a [StorageColumnRef],
is_sorted: bool,
reversed: bool,
expr: Option<BoundExpr>,
) -> Self::ScanResultFuture<'a> {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant