Moving from qdrant to lancedb #899
Replies: 6 comments 14 replies
-
Pooling / sharing your table instances and keeping them alive in memory will help query performance because the table metadata and indices are cached.
|
Beta Was this translation helpful? Give feedback.
-
An observation was that if batch size was reduced to 100 from 400..the insert time remained more or less the same...That could also mean larger batches could be inserted in roughly the same time, improving the throughput |
Beta Was this translation helpful? Give feedback.
-
How can we get the number of records in a table? |
Beta Was this translation helpful? Give feedback.
-
How to add a scalar index for something nested...e.g on qdrant I would specify something the field name like "classifications_ipcr[].classification" for a nested sort of arrangement. Can I do this in lancedb and pl share some code both for indexing and searching. a sample json is provided below Thanks in advance |
Beta Was this translation helpful? Give feedback.
-
Can I assist?
…On Mon, 5 Feb, 2024, 8:33 pm Weston Pace, ***@***.***> wrote:
I'm not sure if we support scalar indices on nested fields at the moment
but it should be straightforward. I will try and look into it this week.
#929 <#929>
—
Reply to this email directly, view it on GitHub
<#899 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJ6XWJZTTECC2ZKSMAGQMC3YSDYDFAVCNFSM6AAAAABCQ53LPGVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DGNZRGM2TO>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Wouldn't this need some change in rust code as well?
…On Tue, 6 Feb, 2024, 7:03 pm Weston Pace, ***@***.***> wrote:
Sure. You might start by creating a test case here:
https://github.com/lancedb/lance/blob/v0.9.13/python/python/tests/test_scalar_index.py
and seeing what fails. I would expect the syntax to be something like...
dataset.create_scalar_index("c2.e4", index_type="BTREE")
I'm not exactly sure what the corresponding filter would look like.
Datafusion supports nested fields in their SQL (at least according to this
issue <apache/datafusion#119>) but I
don't know what the syntax looks like. That might be the next step. Then
give the filter a try and see what fails.
—
Reply to this email directly, view it on GitHub
<#899 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJ6XWJYM5XEZS2IHUSOQNRDYSIWJPAVCNFSM6AAAAABCQ53LPGVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DGOBSHAZDO>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi,
We are trying to move our billion scale vector(1024 dim and payload/metadata) db to lancedb.
Observed that as inserts go from 0 and have reached around 25 million...the batch inserts(batch size 400) are now taking 1.2 seconds , up from 0.06 seconds while starting. How can we speed the inserts up? We have not enabled any indexes as of yet so this behaviour is a bit strange.
Is there a way where I can separate the payload and vectors into separate tables? Reason for this is there is a lot of redundant data in a single table(due to payload repeating across 100s of vectors). Is it possible yet or on the roadmap? Also would need to query as a joint view with some filters on the payload table and vector similarity on the vector table.
If s3 can be used instead of file system ,how does using s3 instead of filesystem impact insert/query performance assuming network latency can be discounted?
Is this the best place to put such queries?
Beta Was this translation helpful? Give feedback.
All reactions