feat!: move object store registry to the session, re-use stores#3689
feat!: move object store registry to the session, re-use stores#3689wjones127 merged 13 commits intolance-format:mainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3689 +/- ##
==========================================
+ Coverage 78.38% 78.41% +0.03%
==========================================
Files 267 267
Lines 100049 100257 +208
Branches 100049 100257 +208
==========================================
+ Hits 78421 78619 +198
- Misses 18513 18516 +3
- Partials 3115 3122 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d59af32 to
79287a6
Compare
| #[tokio::test] | ||
| async fn test_session_store_registry() { |
There was a problem hiding this comment.
This is the main test of these changes.
bb0047c to
fb7b579
Compare
westonpace
left a comment
There was a problem hiding this comment.
A few questions but no major concerns.
| }) | ||
| } | ||
|
|
||
| fn extract_path(url: &Url) -> Path { |
There was a problem hiding this comment.
Maybe some comment here about what this function does?
There was a problem hiding this comment.
Yeah this isn't clear. I think it actually makes much more sense to move it to the store provider.
| Ok((store, path)) | ||
| } | ||
|
|
||
| #[deprecated(note = "Use `from_uri` instead")] |
There was a problem hiding this comment.
So now we can just pass a path into from_uri?
There was a problem hiding this comment.
Yeah that should work just the same. Trying to reduce the number of code paths we have to handle.
| let valid_schemes = self | ||
| .providers | ||
| .read() | ||
| .expect("ObjectStoreRegistry lock poisoned") |
There was a problem hiding this comment.
This expect message isn't unique. If this triggers we won't know which statement triggers it. Can we do something like LanceOptionExt::expect_ok for results? That adds the location and it avoids needed to specify a string.
| // Cache of object stores currently in use. We use a weak reference so the | ||
| // cache itself doesn't keep them alive if no object store is actually using | ||
| // it. | ||
| active_stores: RwLock<HashMap<(String, ObjectStoreParams), Weak<ObjectStore>>>, |
There was a problem hiding this comment.
So when will this session be active? Is it only when we have multiple readers / writers?
There was a problem hiding this comment.
Yeah this is mainly to address issues when we have a lot of tables open on the same bucket. We might one day cache with TTL, but not sure if it's worth the complexity given that it's not too expensive to reopen the connection.
| active_stores: RwLock<HashMap<(String, ObjectStoreParams), Weak<ObjectStore>>>, | ||
| } | ||
|
|
||
| impl DeepSizeOf for ObjectStoreRegistry { |
There was a problem hiding this comment.
Why do we need to impl DeepSizeOf for ObjectStoreRegistry? The registry itself is not cached is it? Or is this just for debugging purposes?
There was a problem hiding this comment.
I guess it's not very helpful. I had it because we added the registry on Session, and session does derive(DeepSizeOf). But I can just manually implement for session and skip this. It's memory size doesn't matter that much.
9096acf to
d5bf0d0
Compare
d5bf0d0 to
33d14dd
Compare
BREAKING CHANGE: removes
object_store_registryfromWriteParamsandReadParams. The registry is now taken from theSession, which is already on those parameters. Also, mostObjectStoreconstructors now returnArc<ObjectStore>.Closes #3684
ObjectStoreRegistryonto theSessionobject. Combined with the cache, this lets datasets using the same session share object stores, as long as they use the same parameters.