Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault while trying to search disk indexes #36

Closed
AetherPrior opened this issue Feb 22, 2022 · 2 comments
Closed

Segmentation fault while trying to search disk indexes #36

AetherPrior opened this issue Feb 22, 2022 · 2 comments

Comments

@AetherPrior
Copy link

AetherPrior commented Feb 22, 2022

Hi,
I am trying to test DiskANN on a small dataset of about 10000 queries. (I will be dealing with billions in production). I was successfully able to build the indices for the dataset, but upon running the search command:

./tests/search_disk_index float mips ../embeddings/bing 0 1 0 ../queryfile.bin null 0 ../result 1

I get a segmentation fault.
I am attaching the build query as well:

./tests/build_disk_index float mips ../datafile.bin ../embeddings/bing 70 100 1.5 2 4 0

And the query output for searching

Search parameters: #threads: 1, beamwidth to be optimized for each L value
Reading bin file ../queryfile.bin ...Metadata: #pts = 128, #dims = 768, aligned_dim = 768...allocating aligned memory, 393216 bytes...done. Copying data... done.
 Stat(null) returned: -1
Using inner product distance function
Reading bin file ../embeddings/bing_pq_compressed.bin ...
Metadata: #pts = 10000, #dims = 100...
Reading bin file ../embeddings/bing_pq_pivots.bin ...
Metadata: #pts = 256, #dims = 769...
 Stat(../embeddings/bing_pq_pivots.bin_chunk_offsets.bin) returned: 0
Reading bin file ../embeddings/bing_pq_pivots.bin_rearrangement_perm.bin ...
Metadata: #pts = 769, #dims = 1...
Reading bin file ../embeddings/bing_pq_pivots.bin_chunk_offsets.bin ...
Metadata: #pts = 101, #dims = 1...
PQ data has 100 bytes per point.
Reading bin file ../embeddings/bing_pq_pivots.bin_centroid.bin ...
Metadata: #pts = 769, #dims = 1...
PQ Pivots: #ctrs: 256, #dims: 769, #chunks: 100
Loaded PQ centroids and in-memory compressed vectors. #points: 10000 #dim: 769 #aligned_dim: 776 #chunks: 100
 Stat(../embeddings/bing_disk.index_pq_pivots.bin) returned: -1
 Tellg: 40964096 as u64: 40964096
Disk-Index File Meta-data: # nodes per sector: 1, max node len (bytes): 3360, max node degree: 70
Setting up thread-specific contexts for nthreads: 1
allocating ctx: 0x7f5690d9a000 to thread-id:140009772226432
 Stat(../embeddings/bing_disk.index_medoids.bin) returned: -1
Loading centroid data from medoids vector data of 1 medoid(s)
 Stat(../embeddings/bing_disk.index_max_base_norm.bin) returned: 0
Reading bin file ../embeddings/bing_disk.index_max_base_norm.bin ...
Metadata: #pts = 1, #dims = 1...
Setting re-scaling factor of base vectors to 1
done..
Caching 0 BFS nodes around medoid(s)
 Stat(../embeddings/bing_sample_data.bin) returned: 0
Reading bin file ../embeddings/bing_sample_data.bin ...Metadata: #pts = 10000, #dims = 769, aligned_dim = 776...allocating aligned memory, 31040000 bytes...done. Copying data... done.
Loading the cache list into memory....done.
     L   Beamwidth             QPS    Mean Latency    99.9 Latency        Mean IOs         CPU (s)
==========================================================================================================
Segmentation fault (core dumped)
@AetherPrior
Copy link
Author

AetherPrior commented Feb 23, 2022

Upon running gdb as per #29 , I get the following output:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Search parameters: #threads: 1, beamwidth to be optimized for each L value
Reading bin file ../queryfile.bin ...Metadata: #pts = 112, #dims = 768, aligned_dim = 768...allocating aligned memory, 344064 bytes...done. Copying data... done.
 Stat(null) returned: -1
Using inner product distance function
Reading bin file ../embeddings/bing_pq_compressed.bin ...
Metadata: #pts = 100000, #dims = 100...
Reading bin file ../embeddings/bing_pq_pivots.bin ...
Metadata: #pts = 256, #dims = 769...
 Stat(../embeddings/bing_pq_pivots.bin_chunk_offsets.bin) returned: 0
Reading bin file ../embeddings/bing_pq_pivots.bin_rearrangement_perm.bin ...
Metadata: #pts = 769, #dims = 1...
Reading bin file ../embeddings/bing_pq_pivots.bin_chunk_offsets.bin ...
Metadata: #pts = 101, #dims = 1...
PQ data has 100 bytes per point.
Reading bin file ../embeddings/bing_pq_pivots.bin_centroid.bin ...
Metadata: #pts = 769, #dims = 1...
PQ Pivots: #ctrs: 256, #dims: 769, #chunks: 100
Loaded PQ centroids and in-memory compressed vectors. #points: 100000 #dim: 769 #aligned_dim: 776 #chunks: 100
 Stat(../embeddings/bing_disk.index_pq_pivots.bin) returned: -1
 Tellg: 409604096 as u64: 409604096
Disk-Index File Meta-data: # nodes per sector: 1, max node len (bytes): 3360, max node degree: 70
Opened file : ../embeddings/bing_disk.index
Setting up thread-specific contexts for nthreads: 1
allocating ctx: 0x7ffff7fe3000 to thread-id:140737352148864
 Stat(../embeddings/bing_disk.index_medoids.bin) returned: -1
Loading centroid data from medoids vector data of 1 medoid(s)
 Stat(../embeddings/bing_disk.index_max_base_norm.bin) returned: 0
Reading bin file ../embeddings/bing_disk.index_max_base_norm.bin ...
Metadata: #pts = 1, #dims = 1...
Setting re-scaling factor of base vectors to 1
done..
Caching 0 BFS nodes around medoid(s)
 Stat(../embeddings/bing_sample_data.bin) returned: 0
Reading bin file ../embeddings/bing_sample_data.bin ...Metadata: #pts = 100000, #dims = 769, aligned_dim = 776...allocating aligned memory, 310400000 bytes...done. Copying data... done.
Loading the cache list into memory....done.
     L   Beamwidth             QPS    Mean Latency    99.9 Latency        Mean IOs         CPU (s)
==========================================================================================================
[New Thread 0x7fffed754780 (LWP 25771)]

Thread 1 "search_disk_ind" received signal SIGSEGV, Segmentation fault.
0x000055555559aa5e in diskann::get_percentile_stats(diskann::QueryStats*, unsigned long, float, std::function<double (diskann::QueryStats const&)> const&) (
    member_fn=..., percentile=0.999000013, len=0, stats=0x555556062038,
    this=<optimized out>, this=<optimized out>)
    at /home/abhinavrao/DiskANN/include/percentile_stats.h:47
47          auto retval = vals[(uint64_t)(percentile * len)];
(gdb)

and the backtrace shows the following:

#0  0x000055555559aa5e in diskann::get_percentile_stats(diskann::QueryStats*, 
unsigned long, float, std::function<double (diskann::QueryStats const&)> const&) (  
member_fn=..., percentile=0.999000013, len=0, stats=0x555556062038,        
this=<optimized out>, this=<optimized out>)                                      
 at /home/abhinavrao/DiskANN/include/percentile_stats.h:47 

#1  diskann::optimize_beamwidth<float> (pFlashIndex=...,   
 tuning_sample=<optimized out>, tuning_sample_num=0,      
 tuning_sample_aligned_dim=<optimized out>, L=100, nthreads=2, start_bw=2)  
 at /home/abhinavrao/DiskANN/src/aux_utils.cpp:495 #2  0x000055555557ba2e in 
search_disk_index<float> (argc=<optimized out>,             
argv=<optimized out>) 
                       
 at /home/abhinavrao/DiskANN/tests/search_disk_index.cpp:227      
#3  0x000055555556ad5a in main (argc=12, argv=0x7fffffffcb38)                        
at /home/abhinavrao/DiskANN/tests/search_disk_index.cpp:324   

@ShikharJ
Copy link
Contributor

Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants