Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patelyash/index factory #340

Merged
merged 168 commits into from
Jun 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
29bbef7
gi# This is a combination of 2 commits.
harsha-simhadri Mar 28, 2023
0267012
added some seed files
harsha-simhadri Mar 28, 2023
7131d70
add seed files
harsha-simhadri Mar 31, 2023
73e5287
New distance metric hierarchy
gopal-msr Apr 2, 2023
5351dd5
Merged new distance implementation
gopal-msr Apr 3, 2023
8adc959
Refactoring changes
gopal-msr Apr 3, 2023
412ed82
Fixing compile errors in refactored code
gopal-msr Apr 3, 2023
2af4b1c
Fixing compile errors
gopal-msr Apr 3, 2023
8d6b832
DiskANN Builds with initial refactoring changes
gopal-msr Apr 3, 2023
c417f9a
Saving changes for Ravi
gopal-msr Apr 4, 2023
e82b784
More refactoring
gopal-msr Apr 4, 2023
54c70c5
Refactor
gopal-msr Apr 5, 2023
42becfb
Fixed most of the bugs related to _data
gopal-msr Apr 5, 2023
f62259c
add seed files
harsha-simhadri Mar 31, 2023
d18804c
gi# This is a combination of 2 commits.
harsha-simhadri Mar 28, 2023
4de26eb
added some seed files
harsha-simhadri Mar 28, 2023
a1c07a1
New distance metric hierarchy
gopal-msr Apr 2, 2023
e9c6697
Refactoring changes
gopal-msr Apr 3, 2023
7684cf5
Fixing compile errors in refactored code
gopal-msr Apr 3, 2023
713e4ff
Fixing compile errors
gopal-msr Apr 3, 2023
260766e
DiskANN Builds with initial refactoring changes
gopal-msr Apr 3, 2023
516fb99
Saving changes for Ravi
gopal-msr Apr 4, 2023
f66f975
More refactoring
gopal-msr Apr 4, 2023
91b486c
Refactor
gopal-msr Apr 5, 2023
b348a07
Fixed most of the bugs related to _data
gopal-msr Apr 5, 2023
7cbea15
Merge branch 'rakri_gopal/refactor' of https://github.com/microsoft/D…
gopal-msr Apr 6, 2023
aadafad
Post merge with main
gopal-msr Apr 6, 2023
418ccb9
Refactored version which compiles on Windows
gopal-msr Apr 6, 2023
469c6a5
now compiles on linux
Apr 6, 2023
874c53c
minor clean-up
Apr 9, 2023
c502147
minor bug fix
Apr 9, 2023
310f373
minor bug
Apr 10, 2023
bf11919
clang format fix + build error fix
yashpatel007 Apr 10, 2023
f7e3424
clang format fix
yashpatel007 Apr 10, 2023
f890a96
minor changes
harsha-simhadri Apr 11, 2023
0528046
added back the fast_l2 feature
Apr 11, 2023
85b44e1
added back set_start_points in index.cpp
Apr 11, 2023
f7fec45
Version for review
gopal-msr Apr 11, 2023
bb5cd79
Merge branch 'rakri_gopal/refactor' of https://github.com/microsoft/D…
gopal-msr Apr 11, 2023
921cec5
Incorporating Harsha's comments - 2
gopal-msr Apr 12, 2023
a310e44
move implementation of abstract data store methods to a cpp file
harsha-simhadri Apr 12, 2023
d751fa4
clang format
harsha-simhadri Apr 12, 2023
e7aa49b
clang format
harsha-simhadri Apr 12, 2023
e6e5fc9
Added slot manager file (empty) and fixed compile errors
gopal-msr Apr 13, 2023
b780905
fixed a linux compile error
Apr 14, 2023
23d5a10
clang
Apr 14, 2023
362e896
debugging workflow failure
Apr 16, 2023
85e008d
clang
Apr 16, 2023
0b8fc55
more debug
Apr 16, 2023
dc183e9
more debug
Apr 16, 2023
a191354
debug for workflow
Apr 16, 2023
6ea93ea
remove slot manager
harsha-simhadri Apr 17, 2023
1f3e88a
Merge branch 'main' into rakri_gopal/refactor
harsha-simhadri Apr 17, 2023
fc2e5ad
Removed the #ifdef WINDOWS directive from class definitions
gopal-msr Apr 18, 2023
8fd32d1
Incorporating changes from remote
gopal-msr Apr 18, 2023
e301aeb
Refactoring alignment factor into distance hierarchy
gopal-msr Apr 18, 2023
879a37e
Fixing cosine distance
gopal-msr Apr 18, 2023
7375708
Ensuring we call preprocess_query always
gopal-msr Apr 18, 2023
112d4fe
Fixed distance invocations
gopal-msr Apr 18, 2023
3f73901
fixed cosine bug, clang-formatted
Apr 18, 2023
ec90ca6
cleaned up and added comments
Apr 19, 2023
6025c33
clang-formatted
Apr 19, 2023
1c12790
more clang-format
Apr 19, 2023
a4a5cb5
clang-format 3
Apr 19, 2023
d9fc915
remove deleted code in scratch.cpp
harsha-simhadri Apr 19, 2023
4ce5abf
reverted clang to Microsoft
Apr 20, 2023
ca6080e
small change
Apr 20, 2023
f46f2bd
Removed slot_manager from this PR
gopal-msr Apr 20, 2023
96c5dc2
newline at EOF in_mem_Graph_store.cpp
harsha-simhadri Apr 19, 2023
d2f3bd7
rename distance_metric to distance_fn
harsha-simhadri Apr 23, 2023
4da65db
resolving PR comments
yashpatel007 Apr 24, 2023
c9db2de
minor bug fix for initialization
Apr 25, 2023
08b5361
creating index_factory
yashpatel007 Apr 27, 2023
3d94dbb
using index factory to build inmem index
yashpatel007 Apr 27, 2023
fdd6e9f
clang format fix
yashpatel007 Apr 27, 2023
b181042
minor bug fix
yashpatel007 Apr 27, 2023
39855f7
rebasing from main
yashpatel007 Apr 27, 2023
947bbf6
fixing build error
yashpatel007 Apr 27, 2023
a67316d
replacing mem_store with abstract_mem_store + injecting data_store to…
yashpatel007 Apr 28, 2023
048b808
minor fix
yashpatel007 Apr 28, 2023
f039157
clang format fix
yashpatel007 Apr 28, 2023
4189ea5
commenting data_store injection to prevent double invocation and mem …
yashpatel007 Apr 28, 2023
83f8bf6
fixing the build for fiters
yashpatel007 Apr 28, 2023
64cfc31
moving abstract index to abstract_index.h
yashpatel007 Apr 28, 2023
e6eca79
IndexBuildParamsbuilder to build IndexBuildParams properly with error…
yashpatel007 May 1, 2023
3aeedbf
fixing build errors
yashpatel007 May 1, 2023
fc9be3a
fixing minor error
yashpatel007 May 1, 2023
fad983f
refactoring index search to be simple
yashpatel007 May 4, 2023
8664790
clang format fix
yashpatel007 May 4, 2023
6c32d90
refactoring search_mem_index to use index factory
yashpatel007 May 5, 2023
23b37f9
clang fix
yashpatel007 May 5, 2023
a9142c2
minor fix
yashpatel007 May 5, 2023
8dac66b
minor fix for build
yashpatel007 May 5, 2023
bdb97fb
optimize for fast l2 restore
yashpatel007 May 5, 2023
539a8a9
removing comments
yashpatel007 May 5, 2023
75b960f
removing comments
yashpatel007 May 8, 2023
e47c4ad
adding templating to IndexFactory (can't avoide it anymore)
yashpatel007 May 10, 2023
9732c9c
Merge branch 'main' of https://github.com/microsoft/DiskANN into pate…
yashpatel007 May 10, 2023
1dc4326
fixing build error
yashpatel007 May 10, 2023
e18707d
fixing ubuntu build error
yashpatel007 May 11, 2023
35ca446
ubuntu build exception fix
yashpatel007 May 11, 2023
748729c
passing num_pq_bytes
yashpatel007 May 11, 2023
3a39337
giving one more shot to config dricen arch with boost::any (type eras…
yashpatel007 May 15, 2023
f05d168
clang fix
yashpatel007 May 15, 2023
b6e36b1
modifying search to use boost::any
yashpatel007 May 16, 2023
7aac514
fixing ubuntu build errors/warning
yashpatel007 May 16, 2023
0238b52
created indexconfigbuilder and fixed a typo
yashpatel007 May 16, 2023
14a98ab
fixing error in pq build
yashpatel007 May 16, 2023
7c4d4e5
some comments + lazy_delete impl
yashpatel007 May 18, 2023
7d4b563
bumping to std c++17 & replacing boost::any with std::any
yashpatel007 May 19, 2023
3f62cd7
clang fix
yashpatel007 May 19, 2023
df977ea
c++ std 17 for ubuntu
yashpatel007 May 19, 2023
4ef514e
minor fix
yashpatel007 May 19, 2023
bb04343
converting search to batch_search + A vector wrapper using std::any t…
yashpatel007 May 22, 2023
b35bdca
adding AnyVector to encapsulate vector in std::any + adding basic yam…
yashpatel007 May 22, 2023
b9b380e
adding wrapper code for vector and set, checked with Andrija
yashpatel007 May 23, 2023
dcb76d9
Merge branch 'main' of https://github.com/microsoft/DiskANN into pate…
yashpatel007 May 23, 2023
1a3a72f
fixinh ubuntu build error
yashpatel007 May 23, 2023
8962859
trying to resolve ubuntu build error
yashpatel007 May 23, 2023
671df9b
testing test streaming index with IndexFactory
yashpatel007 May 24, 2023
ae6ee0d
fixing ubuntu build error
yashpatel007 May 24, 2023
75cae01
fixing search for test insert delete consolidate
yashpatel007 May 24, 2023
04b0fa0
refactored test_streaming_scenario
yashpatel007 May 25, 2023
a167d1f
refactored test_insert_delete_consolidate to use AbstractIndex and In…
yashpatel007 May 25, 2023
fcd8cd3
fixing ubuntu build error
yashpatel007 May 25, 2023
7f74cc1
making build method in abstract index consistent
yashpatel007 May 26, 2023
0a94464
some code cleanup + abstract_cpp to add implementation
yashpatel007 May 31, 2023
7ec3feb
remoing coments and code cleanup
yashpatel007 May 31, 2023
92da2e9
build error fix
yashpatel007 May 31, 2023
8d43945
fixing -Wreorder warning
yashpatel007 May 31, 2023
9ac7b9c
separating build structs to their header + refactor search and remove…
yashpatel007 Jun 1, 2023
0d960b4
fixing ubuntu build errors
yashpatel007 Jun 2, 2023
568332f
resolving segfault error from search_mem_index
yashpatel007 Jun 2, 2023
0b46f17
fixing query_result_tag allocation
yashpatel007 Jun 2, 2023
a31f626
minor update
yashpatel007 Jun 2, 2023
a758054
search fix
yashpatel007 Jun 2, 2023
88a1c46
trying to fix windows latest build for dynamic index
yashpatel007 Jun 4, 2023
95213c3
ading temp loggin to debug windows latest build issue
yashpatel007 Jun 4, 2023
c8958ee
removing logging for debug
yashpatel007 Jun 4, 2023
be28a8f
fixning windows latest build error for dynamix index search
yashpatel007 Jun 5, 2023
64e35ef
moving any wrappers to separate file + organizing code
yashpatel007 Jun 5, 2023
d25b127
fixing check error
yashpatel007 Jun 5, 2023
c0b3382
updating private vsr naming convention
yashpatel007 Jun 5, 2023
f1454c4
minor update
yashpatel007 Jun 5, 2023
a1fc67d
unravelig search methods in abstract index. Iteraton 1
yashpatel007 Jun 7, 2023
ea524d4
minor fix
yashpatel007 Jun 7, 2023
48694a3
unused vars remove
yashpatel007 Jun 7, 2023
8b69d45
returning a unique_ptr to Abstract Index from index factory
yashpatel007 Jun 8, 2023
93f14ff
adding implementation from abstract_index.h to abstract_index.cpp
yashpatel007 Jun 8, 2023
2a90c18
making abstract index api to be more explicit (expriment)
yashpatel007 Jun 8, 2023
ff6b927
some code cleanup
yashpatel007 Jun 8, 2023
97bafba
removing detected memory leaks (free up index)
yashpatel007 Jun 9, 2023
11bde33
separtaing enums for data and graph stratagy
yashpatel007 Jun 9, 2023
570f1fd
Index ctor(config) now uses injected datastore from IndexFactory
yashpatel007 Jun 12, 2023
247e5e3
distance in index population in new config ctor
yashpatel007 Jun 12, 2023
7590d96
resolving some comments from Andrija
yashpatel007 Jun 14, 2023
d7ead66
Resolving some restructuring comments by Andrija
yashpatel007 Jun 15, 2023
d023f2e
minor fix
yashpatel007 Jun 15, 2023
6513ba5
fixing ubuntu build error
yashpatel007 Jun 15, 2023
2b84099
warning fix
yashpatel007 Jun 15, 2023
c967119
simplified get() in anywrappers
yashpatel007 Jun 15, 2023
9258bee
making index config a unique ptr and owned by IndexFactory
yashpatel007 Jun 19, 2023
fb4ec8e
removing complex if/else calling recursively + added unimplemented Ta…
yashpatel007 Jun 19, 2023
e254510
renaming get_instance to create_instance
yashpatel007 Jun 19, 2023
c704230
clang format fix
yashpatel007 Jun 19, 2023
85e485a
removing const_cast from any_wrapper
yashpatel007 Jun 20, 2023
2d36d18
fixing andrija's comments
yashpatel007 Jun 21, 2023
807a491
removing warnings
yashpatel007 Jun 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 37 additions & 40 deletions apps/build_memory_index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

#include "memory_mapper.h"
#include "ann_exception.h"
#include "index_factory.h"

namespace po = boost::program_options;

Expand Down Expand Up @@ -155,46 +156,42 @@ int main(int argc, char **argv)
{
diskann::cout << "Starting index build with R: " << R << " Lbuild: " << L << " alpha: " << alpha
<< " #threads: " << num_threads << std::endl;
if (label_file != "" && label_type == "ushort")
{
if (data_type == std::string("int8"))
return build_in_memory_index<int8_t, uint32_t, uint16_t>(
metric, data_path, R, L, alpha, index_path_prefix, num_threads, use_pq_build, build_PQ_bytes,
use_opq, label_file, universal_label, Lf);
else if (data_type == std::string("uint8"))
return build_in_memory_index<uint8_t, uint32_t, uint16_t>(
metric, data_path, R, L, alpha, index_path_prefix, num_threads, use_pq_build, build_PQ_bytes,
use_opq, label_file, universal_label, Lf);
else if (data_type == std::string("float"))
return build_in_memory_index<float, uint32_t, uint16_t>(
metric, data_path, R, L, alpha, index_path_prefix, num_threads, use_pq_build, build_PQ_bytes,
use_opq, label_file, universal_label, Lf);
else
{
std::cout << "Unsupported type. Use one of int8, uint8 or float." << std::endl;
return -1;
}
}
else
{
if (data_type == std::string("int8"))
return build_in_memory_index<int8_t>(metric, data_path, R, L, alpha, index_path_prefix, num_threads,
use_pq_build, build_PQ_bytes, use_opq, label_file, universal_label,
Lf);
else if (data_type == std::string("uint8"))
return build_in_memory_index<uint8_t>(metric, data_path, R, L, alpha, index_path_prefix, num_threads,
use_pq_build, build_PQ_bytes, use_opq, label_file,
universal_label, Lf);
else if (data_type == std::string("float"))
return build_in_memory_index<float>(metric, data_path, R, L, alpha, index_path_prefix, num_threads,
use_pq_build, build_PQ_bytes, use_opq, label_file, universal_label,
Lf);
else
{
std::cout << "Unsupported type. Use one of int8, uint8 or float." << std::endl;
return -1;
}
}

size_t data_num, data_dim;
diskann::get_bin_metadata(data_path, data_num, data_dim);

auto config = diskann::IndexConfigBuilder()
.with_metric(metric)
.with_dimension(data_dim)
.with_max_points(data_num)
.with_data_load_store_strategy(diskann::MEMORY)
.with_data_type(data_type)
.with_label_type(label_type)
.is_dynamic_index(false)
.is_enable_tags(false)
.is_use_opq(use_opq)
.is_pq_dist_build(use_pq_build)
.with_num_pq_chunks(build_PQ_bytes)
.build();

auto index_build_params = diskann::IndexWriteParametersBuilder(L, R)
.with_filter_list_size(Lf)
.with_alpha(alpha)
.with_saturate_graph(false)
.with_num_threads(num_threads)
.build();

auto build_params = diskann::IndexBuildParamsBuilder(index_build_params)
.with_universal_label(universal_label)
.with_label_file(label_file)
.with_save_path_prefix(index_path_prefix)
.build();
auto index_factory = diskann::IndexFactory(config);
auto index = index_factory.create_instance();
index->build(data_path, data_num, build_params);
index->save(index_path_prefix.c_str());
index.reset();
return 0;
}
catch (const std::exception &e)
{
Expand Down
67 changes: 37 additions & 30 deletions apps/search_memory_index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "index.h"
#include "memory_mapper.h"
#include "utils.h"
#include "index_factory.h"

namespace po = boost::program_options;

Expand All @@ -30,14 +31,14 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
const bool dynamic, const bool tags, const bool show_qps_per_thread,
const std::vector<std::string> &query_filters, const float fail_if_recall_below)
{
using TagT = uint32_t;
// Load the query file
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
T *query = nullptr;
uint32_t *gt_ids = nullptr;
float *gt_dists = nullptr;
size_t query_num, query_dim, query_aligned_dim, gt_num, gt_dim;
diskann::load_aligned_bin<T>(query_file, query, query_num, query_dim, query_aligned_dim);

// Check for ground truth
bool calc_recall_flag = false;
if (truthset_file != std::string("null") && file_exists(truthset_file))
{
Expand Down Expand Up @@ -66,18 +67,32 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
}
}

using TagT = uint32_t;
const bool concurrent = false, pq_dist_build = false, use_opq = false;
yashpatel007 marked this conversation as resolved.
Show resolved Hide resolved
const size_t num_pq_chunks = 0;
using IndexType = diskann::Index<T, TagT, LabelT>;
const size_t num_frozen_pts = IndexType::get_graph_num_frozen_points(index_path);
IndexType index(metric, query_dim, 0, dynamic, tags, concurrent, pq_dist_build, num_pq_chunks, use_opq,
num_frozen_pts);
std::cout << "Index class instantiated" << std::endl;
index.load(index_path.c_str(), num_threads, *(std::max_element(Lvec.begin(), Lvec.end())));
const size_t num_frozen_pts = diskann::get_graph_num_frozen_points(index_path);

auto config = diskann::IndexConfigBuilder()
.with_metric(metric)
.with_dimension(query_dim)
.with_max_points(0)
.with_data_load_store_strategy(diskann::MEMORY)
.with_data_type(diskann_type_to_name<T>())
.with_label_type(diskann_type_to_name<LabelT>())
.with_tag_type(diskann_type_to_name<TagT>())
.is_dynamic_index(dynamic)
.is_enable_tags(tags)
.is_concurrent_consolidate(false)
.is_pq_dist_build(false)
.is_use_opq(false)
.with_num_pq_chunks(0)
.with_num_frozen_pts(num_frozen_pts)
.build();

auto index_factory = diskann::IndexFactory(config);
auto index = index_factory.create_instance();
index->load(index_path.c_str(), num_threads, *(std::max_element(Lvec.begin(), Lvec.end())));
std::cout << "Index loaded" << std::endl;

if (metric == diskann::FAST_L2)
index.optimize_index_layout();
index->optimize_index_layout();

std::cout << "Using " << num_threads << " threads to search" << std::endl;
std::cout.setf(std::ios_base::fixed, std::ios_base::floatfield);
Expand Down Expand Up @@ -148,29 +163,22 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
auto qs = std::chrono::high_resolution_clock::now();
if (filtered_search)
{
LabelT filter_label_as_num;
if (query_filters.size() == 1)
{
filter_label_as_num = index.get_converted_label(query_filters[0]);
}
else
{
filter_label_as_num = index.get_converted_label(query_filters[i]);
}
auto retval = index.search_with_filters(query + i * query_aligned_dim, filter_label_as_num, recall_at,
L, query_result_ids[test_id].data() + i * recall_at,
query_result_dists[test_id].data() + i * recall_at);
std::string raw_filter = query_filters.size() == 1 ? query_filters[0] : query_filters[i];

auto retval = index->search_with_filters(query + i * query_aligned_dim, raw_filter, recall_at, L,
query_result_ids[test_id].data() + i * recall_at,
query_result_dists[test_id].data() + i * recall_at);
cmp_stats[i] = retval.second;
}
else if (metric == diskann::FAST_L2)
{
index.search_with_optimized_layout(query + i * query_aligned_dim, recall_at, L,
query_result_ids[test_id].data() + i * recall_at);
index->search_with_optimized_layout(query + i * query_aligned_dim, recall_at, L,
query_result_ids[test_id].data() + i * recall_at);
}
else if (tags)
{
index.search_with_tags(query + i * query_aligned_dim, recall_at, L,
query_result_tags.data() + i * recall_at, nullptr, res);
index->search_with_tags(query + i * query_aligned_dim, recall_at, L,
query_result_tags.data() + i * recall_at, nullptr, res);
for (int64_t r = 0; r < (int64_t)recall_at; r++)
{
query_result_ids[test_id][recall_at * i + r] = query_result_tags[recall_at * i + r];
Expand All @@ -179,8 +187,8 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
else
{
cmp_stats[i] = index
.search(query + i * query_aligned_dim, recall_at, L,
query_result_ids[test_id].data() + i * recall_at)
->search(query + i * query_aligned_dim, recall_at, L,
query_result_ids[test_id].data() + i * recall_at)
.second;
}
auto qe = std::chrono::high_resolution_clock::now();
Expand Down Expand Up @@ -245,7 +253,6 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
}

diskann::aligned_free(query);

return best_recall >= fail_if_recall_below ? 0 : -1;
}

Expand Down
Loading