Skip to content

Persist simple diskann index#787

Merged
eddyxu merged 16 commits intomainfrom
lei/search_diskann
Apr 23, 2023
Merged

Persist simple diskann index#787
eddyxu merged 16 commits intomainfrom
lei/search_diskann

Conversation

@eddyxu
Copy link
Copy Markdown
Member

@eddyxu eddyxu commented Apr 19, 2023

No description provided.

@eddyxu eddyxu force-pushed the lei/search_diskann branch 3 times, most recently from 7fb7e6b to c7bcfc3 Compare April 20, 2023 06:05
@eddyxu eddyxu force-pushed the lei/search_diskann branch from c7bcfc3 to 4bd25dd Compare April 23, 2023 03:51
@eddyxu eddyxu marked this pull request as ready for review April 23, 2023 04:59
@eddyxu eddyxu changed the title Persist and search diskann index Persist simple diskann index Apr 23, 2023
@eddyxu eddyxu requested a review from changhiskhan April 23, 2023 05:00
@eddyxu eddyxu self-assigned this Apr 23, 2023
Comment thread protos/index.proto
// L parameter
uint32 L = 5;

/// Entry points to the graph
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the vertex id's?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the IDs in the graph. Basically the "index" in the graph file. Not the "row id" in original dataset

Comment thread python/python/lance/dataset.py Outdated
if index_type != "IVF_PQ":
if index_type not in ["IVF_PQ", "DISKANN"]:
raise NotImplementedError(
f"Only IVF_PQ index_type supported. Got {index_type}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the message to include DISKANN?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

Comment thread python/src/dataset.rs
ivf_params.num_partitions = PyAny::downcast::<PyInt>(n)?.extract()?
};

if let Some(n) = kwargs.get_item("num_bits") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i forgot do we actually support configuring this now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont support "throw error for unsupported params". This is just do the handing silently. I am fine to remove it if you feel strongly.

Comment thread python/src/dataset.rs
}
"DISKANN" => {
let mut params = DiskANNParams::default();
if let Some(kwargs) = kwargs {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what're the default values for these? prolly worth tuning it and set reasonable defaults for like high embedding dimensions and like up to 1M vectors ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default values were from the diskann paper. Will adjust them if necessary when we benchmark diskann

@eddyxu eddyxu merged commit ee738e2 into main Apr 23, 2023
@eddyxu eddyxu deleted the lei/search_diskann branch April 23, 2023 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants