Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortedIndex baseline implementation #75

Merged
merged 11 commits into from
Dec 17, 2022
Merged

Conversation

Dreeseaw
Copy link
Collaborator

@Dreeseaw Dreeseaw commented Nov 27, 2022

This PR introduces a new Sorted Index that keeps an actively sorted b-tree (github.com/tidwall/btree) for a column of the user's choosing (currently limited to string-type only). The index holds one b-tree that is not copied between transactions (mutexed).

Future work would consider other type columns being sorted (currently only string columns), PK sorting, and custom Less() functionality for users.

@Dreeseaw
Copy link
Collaborator Author

Dreeseaw commented Dec 4, 2022

Hello @kelindar, I have a few questions regarding next steps to getting a sorted index into the project.

  • Should the b-tree be copied for each new transaction?
  • DML speed* - each insert/update/delete requires a full scan due to the commit.Reader not differentiating between Insert/Update, and Delete commits not containing the row's data to find the key (which makes sense). How would you recommend speeding these up? This also applied to the index check in SortedRange().

*update - I added a map to constantly scan for key given a value, as opposed to the original scan. I recognize that this data is effectively stored in the target column - I'm trying to find a "workaround" for the index to query this column instead of allocating it's own map.

Thanks!

txn.go Outdated Show resolved Hide resolved
txn_test.go Show resolved Hide resolved
@kelindar
Copy link
Owner

  • Should the b-tree be copied for each new transaction?

I'm not sure, ideally not because it will slown down the transaction. But then, what exactly happens if another txn is updating the b-tree while you're iterating over it? (or even when you're updating while iterating).

  • DML speed*

Isn't b-tree already O(log n) for the lookup?

@Dreeseaw
Copy link
Collaborator Author

  • Should the b-tree be copied for each new transaction?

I'm not sure, ideally not because it will slown down the transaction. But then, what exactly happens if another txn is updating the b-tree while you're iterating over it? (or even when you're updating while iterating).

I agree that the b-tree should not blindly be copied to each transaction, regardless of use. I've worked around this for now.

  • DML speed*

Isn't b-tree already O(log n) for the lookup?

Yes, but in the case of an update, the commit.Reader will only contain the idx & new value, when the old value is required to do a O(log n) key search. To mitigate having to scan the entire b-tree, I added a mapping of idx -> current value, so updates remain O(1 * log n). This is not a final solution, but a call to the actual column's underlying data could expose further race problems.

@kelindar kelindar merged commit 3e795a1 into kelindar:main Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants