Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) when building the index #29

Closed
dncc opened this issue Feb 13, 2022 · 5 comments
Closed

Segmentation fault (core dumped) when building the index #29

dncc opened this issue Feb 13, 2022 · 5 comments

Comments

@dncc
Copy link

dncc commented Feb 13, 2022

Hello,

Building a DiskANN index from Glove-100 dataset with the following command:

./build/tests/build_disk_index float mips input_data.bin . 70 100 1.5 2 4 0

fails with Segmentation fault (core dumped) message. This is all of the output in terminal:

Using Inner Product search, so need to pre-process base data into temp file. Please ensure there is additional (n*(d+1)*4) bytes for storing pre-processed base vectors, apart from the intermin indices and final index.
Pre-processing base file by adding extra coordinate
Writing bin: ._disk.index_max_base_norm.bin
bin: #pts = 1, #dims = 1, size = 12B
Finished writing bin.
Starting index build: R=70 L=100 Query RAM budget: 1.34218e+09 Indexing ram budget: 2 T: 4
Compressing 101-dimensional data into 100 bytes per vector.
Opened: ._prepped_base.bin, size: 478139664, cache_size: 67108864
Training data loaded of size 100003
 Stat(._pq_pivots.bin) returned: 0
Reading bin file ._pq_pivots.bin ...
Metadata: #pts = 256, #dims = 101...
PQ pivot file exists. Not generating again
Opened: ._prepped_base.bin, size: 478139664, cache_size: 67108864
 Stat(._pq_pivots.bin) returned: 0
Reading bin file ._pq_pivots.bin_centroid.bin ...
Metadata: #pts = 101, #dims = 1...
Reading bin file ._pq_pivots.bin_rearrangement_perm.bin ...
Metadata: #pts = 101, #dims = 1...
Reading bin file ._pq_pivots.bin_chunk_offsets.bin ...
Metadata: #pts = 101, #dims = 1...
Reading bin file ._pq_pivots.bin ...
Metadata: #pts = 256, #dims = 101...
Loaded PQ pivot information
Processing points  [0, 1183514)..tcmalloc: large alloc 1211924480 bytes == 0x55b74b514000 @  0x7ffbf9622680 0x7ffbf9642ff4 0x55b6cdfe08b0 0x55b6cdf9607c 0x55b6cdf2c315 0x55b6cdf1ea7f 0x55b6cdf1e292 0x7ffbf1a540b3 0x55b6cdf1d3ae
tcmalloc: large alloc 1211924480 bytes == 0x55b74b514000 @  0x7ffbf9622680 0x7ffbf9642ff4 0x55b6cdfe08b0 0x55b6cdf9607c 0x55b6cdf2c315 0x55b6cdf1ea7f 0x55b6cdf1e292 0x7ffbf1a540b3 0x55b6cdf1d3ae
.done.
Full index fits in RAM budget, should consume at most 1.10047GiBs, so building in one shot
Number of frozen points = 0
Reading bin file ._prepped_base.bin ...Metadata: #pts = 1183514, #dims = 101, aligned_dim = 104...allocating aligned memory, 492341824 bytes...done. Copying data... done.
Using AVX2 distance computation
Starting index build...
Number of syncs: 289
[1]    2703529 segmentation fault (core dumped)  ./build/tests/build_disk_index float mips scripts/input_data.bin . 70 100 1.5

this is the python script used to build the input data binary:

import h5py 
import numpy as np

glove_h5py = h5py.File("./glove-100-angular.hdf5", "r")

dataset = glove_h5py['train']

normalized_dataset = dataset / np.linalg.norm(dataset, axis=1)[:, np.newaxis]

N, dim = normalized_dataset.shape

byteorder = 'little'
with open('./input_data.bin', 'wb') as out:
    out.write((N).to_bytes(4, byteorder=byteorder))
    out.write((dim).to_bytes(4, byteorder=byteorder))
    out.write(normalized_dataset.tobytes())

Any idea what's going wrong? Thank you.

@ShikharJ
Copy link
Contributor

ShikharJ commented Feb 15, 2022

@dncc Hi, thanks for reaching out. I think the data generation is fine. Could you please run a backtrace through gdb and paste the output here? Try deleting CMakeCache.txt and running CMake as cmake -DCMAKE_BUILD_TYPE=Debug .. && make -j before you run gdb. Also, if possible, maybe provide a concrete path to the index save location as well instead of providing ..

@dncc
Copy link
Author

dncc commented Feb 15, 2022

Hi @ShikharJ ,
Thank you for answering. Please see below the output when run with gdb. Same parameters as in the first trial, except that ./index is used as the output dir instead of just '.' as you suggested. Here is the command from gdb : (gdb) run float mips scripts/input_data.bin ./index 70 100 1.5 2 4 0)

This is the error in gdb output:

Reading bin file ./index_pq_pivots.bin_rearrangement_perm.bin ...
Metadata: #pts = 101, #dims = 1...
Reading bin file ./index_pq_pivots.bin_chunk_offsets.bin ...
Metadata: #pts = 101, #dims = 1...
Reading bin file ./index_pq_pivots.bin ...
Metadata: #pts = 256, #dims = 101...
Loaded PQ pivot information
Processing points  [0, 1183514)..tcmalloc: large alloc 1211924480 bytes == 0x5555d7008000 @  0x7ffff7dc3680 0x7ffff7de3ff4 0x5555556d78b0 0x55555568d07c 0x555555623315 0x555555615a7f 0x555555615292 0x7ffff01f50b3 0x5555556143ae
tcmalloc: large alloc 1211924480 bytes == 0x5555d7008000 @  0x7ffff7dc3680 0x7ffff7de3ff4 0x5555556d78b0 0x55555568d07c 0x555555623315 0x555555615a7f 0x555555615292 0x7ffff01f50b3 0x5555556143ae
.done.
Full index fits in RAM budget, should consume at most 1.10047GiBs, so building in one shot
Number of frozen points = 0
Reading bin file ./index_prepped_base.bin ...Metadata: #pts = 1183514, #dims = 101, aligned_dim = 104...allocating aligned memory, 492341824 bytes...done. Copying data... done.
Using AVX2 distance computation
Starting index build...
Number of syncs: 289
build_disk_index: /home/dnc/workspace/github/DiskANN/src/index.cpp:638: void diskann::Index<T, TagT>::batch_inter_insert(unsigned int, const std::vector<unsigned int>&, const diskann::Parameters&, std::vector<unsigned int>&) [with T = float; TagT = int]: Assertion `des >= 0 && des < _max_points' failed.
tcmalloc: large alloc 93825002520576 bytes == (nil) @  0x7ffff7dc3680 0x7ffff7de3ff4 0x555555635a68 0x55555563111f 0x55555562b56e 0x55555567907d 0x555555676a66 0x5555556739ff 0x55555566f8d1 0x55555566a688 0x55555567cde9 0x7ffff07e669b 0x7ffff0837ed3 0x7ffff07fa726 0x7ffff07f971c 0x7ffff083830b 0x7ffff0197609 0x7ffff02f0293

Thread 1 "build_disk_inde" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) 

Looks like des is out of the expected boundary. Hope this helps!

@ShikharJ
Copy link
Contributor

@dncc Sorry for the late reply. Could you please run bt in the gdb terminal right after generating this output? That should give us the backtrace output. My feeling is, this is either a dataset issue, or something is wrong with the tcmalloc installation.

@dncc
Copy link
Author

dncc commented Feb 18, 2022

@ShikharJ Here's the backtrace output:

Thread 4 "build_disk_inde" received signal SIGSEGV, Segmentation fault.                                  
[Switching to Thread 0x7fffeef02880 (LWP 3314936)]                                                       
0x0000555555630cba in __gnu_cxx::new_allocator<unsigned int>::construct<unsigned int, unsigned int&> (this=0x555593f96130, __p=0x88) at /usr/include/c++/9/ext/new_allocator.h:147                                 
147             { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }

(gdb) bt                          
#0  0x0000555555630cba in __gnu_cxx::new_allocator<unsigned int>::construct<unsigned int, unsigned int&> (this=0x555593f96130, __p=0x88) at /usr/include/c++/9/ext/new_allocator.h:147                             
#1  0x000055555562abae in std::allocator_traits<std::allocator<unsigned int> >::construct<unsigned int, unsigned int&> (__a=..., __p=0x88) at /usr/include/c++/9/bits/alloc_traits.h:484                           
#2  0x000055555562ac84 in std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int&> (this=0x555593f96130, __position=non-dereferenceable iterator for std::vector)                
    at /usr/include/c++/9/bits/vector.tcc:449
#3  0x0000555555628b96 in std::vector<unsigned int, std::allocator<unsigned int> >::emplace_back<unsigned int&> (this=0x555593f96130) at /usr/include/c++/9/bits/vector.tcc:121                                    
#4  0x0000555555648e44 in diskann::Index<float, int>::prune_neighbors (this=0x55555a0bc000, location=354, pool=std::vector of length 1, capacity 1000 = {...}, parameter=...,                                      
    pruned_list=std::vector of length 1, capacity 1 = {...}) at /home/dnc/workspace/github/DiskANN/src/index.cpp:609                                                                                               
#5  0x000055555567c8b7 in diskann::Index<float, int>::_ZN7diskann5IndexIfiE4linkERNS_10ParametersE._omp_fn.0(void) () at /home/dnc/workspace/github/DiskANN/src/index.cpp:865                                      
#6  0x00007ffff07e669b in __kmp_GOMP_microtask_wrapper (gtid=0x4, npr=0x88, task=0x7fffeef01860, data=0x88) at ../../src/kmp_gsupport.cpp:331                                                                      
#7  0x00007ffff0837ed3 in __kmp_invoke_microtask () from /opt/intel/compilers_and_libraries/linux/lib/intel64/libiomp5.so                                                                                          
#8  0x00007ffff07fa726 in __kmp_invoke_task_func (gtid=4) at ../../src/kmp_runtime.cpp:7421                                                                                                                        
#9  0x00007ffff07f971c in __kmp_launch_thread (this_thr=0x4) at ../../src/kmp_runtime.cpp:6008           
#10 0x00007ffff083830b in _INTERNAL_26_______src_z_Linux_util_cpp_20354e55::__kmp_launch_worker (thr=0x4) at ../../src/z_Linux_util.cpp:585                                                                        
#11 0x00007ffff0197609 in start_thread (arg=<optimized out>) at pthread_create.c:477                     
#12 0x00007ffff02f0293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95                                                                                                                                 
(gdb)                          

@ShikharJ
Copy link
Contributor

@dncc I suspect even more that this error is because of a faulty installation. Could you try these instructions instead please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants