Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs]: Fix issues with Rust code snippets in "quick start" #1047

Merged
merged 3 commits into from
Mar 3, 2024

Conversation

prrao87
Copy link
Contributor

@prrao87 prrao87 commented Mar 1, 2024

The renaming of vectordb to lancedb broke the quick start docs (it's pointing to a non-existent directory). This PR fixes the code snippets and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.

@prrao87 prrao87 requested a review from eddyxu March 1, 2024 01:41
@prrao87 prrao87 changed the title Fix issues with Rust code snippets in "quick start" [docs]: Fix issues with Rust code snippets in "quick start" Mar 1, 2024
@prrao87
Copy link
Contributor Author

prrao87 commented Mar 1, 2024

Apologies for combining two tasks in this one PR, but I forgot to switch branches 😅. These second set of commits address the manual indexing requirements in the docs.

Indexing docs

To clarify to users why we need to index manually, I'd word it carefully in three places:

  • Quick start
  • Concepts
  • Guides

Remedies

  • In the quick start section, we add an admonition box stating why we need to index manually: first, that we don't always need an ANN index because LanceDB is 🔥 for datasets of <1M vectors, and second, because there's no one-size-fits-all when it comes to indexing (most other vector DB vendors have sane defaults, and HNSW is more forgiving, but that doesn't mean that those settings are optimal)
  • In the "ANN indexes" section of the guides, we add a subsection titled "Why do I need to manually create an index?" clarifying the same things
  • In the "Concepts" section covering the IVF_PQ index, I've removed the FAQ section (it's repetitive as it already appears twice in the FAQ and the Guides sections), but I link to the guides section as that's where the most detailed explanation on how/why we index sits.

All in all, I think this helps a new user gain a sequential understanding on how/why indexing is done in LanceDB. Because IVF_PQ is fundamentally different from HNSW, some manual tuning is desirable in most cases and it's best to let the user be aware of that through each of these sections. Because we cannot predict which section a user might chance upon when they first encounter LanceDB, it's best to drop these hints/explanations in all these sections.

@changhiskhan @AyushExel and @raghavdixit99, if you have more additions or suggestions on top of these, am happy to go through and offer my inputs. Hope this makes sense!

@prrao87
Copy link
Contributor Author

prrao87 commented Mar 1, 2024

@AyushExel No idea about why CI is failing here, seems like a lot has changed since my last PR!

@changhiskhan
Copy link
Contributor

@prrao87 thanks! We've done some major refactoring so that we can support an Async API and a much better Rust API in general. I'm fixing the CI in a separate PR. Once that's merge I'll update this one so CI can pass.

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm. Just one small nit. Thanks!

cc @westonpace in case there's additional places where we need to update vectordb references to lancedb instead

docs/src/ann_indexes.md Show resolved Hide resolved
@changhiskhan changhiskhan merged commit 14566df into main Mar 3, 2024
4 of 5 checks passed
@changhiskhan changhiskhan deleted the index-docs branch March 3, 2024 23:59
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
…1047)

The renaming of `vectordb` to `lancedb` broke the [quick start
docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's
pointing to a non-existent directory). This PR fixes the code snippets
and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this pull request Apr 5, 2024
…1047)

The renaming of `vectordb` to `lancedb` broke the [quick start
docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's
pointing to a non-existent directory). This PR fixes the code snippets
and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.
westonpace pushed a commit that referenced this pull request Apr 5, 2024
The renaming of `vectordb` to `lancedb` broke the [quick start
docs](https://lancedb.github.io/lancedb/basic/#__tabbed_5_3) (it's
pointing to a non-existent directory). This PR fixes the code snippets
and the paths in the docs page.

Additionally, more fixes related to indexing docs below 👇🏽.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants