Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() #24

Open
0x7o opened this issue May 17, 2022 · 3 comments
Open

Comments

@0x7o
Copy link

0x7o commented May 17, 2022

This error occurs when trying to use TrainingWrapper. If the training data is 1 megabyte in total, no error occurs.
On larger data this error appears.

Apparently the script is trying to process all the data at once, not in batches. Because of this there is a lack of system resources.

RAM: 12 gb
VRAM: 12 gb

import torch
from retro_pytorch import RETRO, TrainingWrapper

retro = RETRO(
    chunk_size = 64,                         # the chunk size that is indexed and retrieved (needed for proper relative positions as well as causal chunked cross attention)
    max_seq_len = 2048,                      # max sequence length
    enc_dim = 896,                           # encoder model dim
    enc_depth = 2,                           # encoder depth
    dec_dim = 796,                           # decoder model dim
    dec_depth = 12,                          # decoder depth
    dec_cross_attn_layers = (3, 6, 9, 12),   # decoder cross attention layers (with causal chunk cross attention)
    heads = 8,                               # attention heads
    dim_head = 64,                           # dimension per head
    dec_attn_dropout = 0.25,                 # decoder attention dropout
    dec_ff_dropout = 0.25,                   # decoder feedforward dropout
    use_deepnet = True                       # turn on post-normalization with DeepNet residual scaling and initialization, for scaling to 1000 layers
).cuda()

wrapper = TrainingWrapper(
    retro = retro,                                 # path to retro instance
    knn = 2,                                       # knn (2 in paper was sufficient)
    chunk_size = 64,                               # chunk size (64 in paper)
    documents_path = '/content/text/',              # path to folder of text
    glob = '**/*.txt',                             # text glob
    chunks_memmap_path = './train.chunks.dat',     # path to chunks
    seqs_memmap_path = './train.seq.dat',          # path to sequence data
    doc_ids_memmap_path = './train.doc_ids.dat',   # path to document ids per chunk (used for filtering neighbors belonging to same document)
    max_chunks = 500_000,                        # maximum cap to chunks
    max_seqs = 100_000,                            # maximum seqs
    knn_extra_neighbors = 100,                     # num extra neighbors to fetch
    max_index_memory_usage = '100m',
    current_memory_available = '10G'
)

Out:

processing /content/text/kxaa.txt
Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip
Downloading: 100%
29.0/29.0 [00:00<00:00, 662B/s]
Downloading: 100%
570/570 [00:00<00:00, 14.6kB/s]
Downloading: 100%
208k/208k [00:00<00:00, 2.26MB/s]
Downloading: 100%
426k/426k [00:00<00:00, 4.60MB/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (3449121 > 512). Running this sequence through the model will result in indexing errors
Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main
Downloading: 100%
416M/416M [00:09<00:00, 50.3MB/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

embedded XXXXX / 53893
saved .tmp/embeddings/XXXXX.npy
2022-05-17 02:34:09,316 [INFO]: Using 2 omp threads (processes), consider increasing --nb_cores if you have more
2022-05-17 02:34:09,317 [INFO]: Launching the whole pipeline 05/17/2022, 02:34:09
2022-05-17 02:34:09,321 [INFO]: Reading total number of vectors and dimension 05/17/2022, 02:34:09
100%|██████████| 108/108 [00:00<00:00, 5336.89it/s]
2022-05-17 02:34:09,465 [INFO]: There are 53893 embeddings of dim 768
2022-05-17 02:34:09,466 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 0.1405 secs
2022-05-17 02:34:09,471 [INFO]: 	Compute estimated construction time of the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,474 [INFO]: 		-> Train: 16.7 minutes
2022-05-17 02:34:09,478 [INFO]: 		-> Add: 0.5 seconds
2022-05-17 02:34:09,480 [INFO]: 		Total: 16.7 minutes
2022-05-17 02:34:09,481 [INFO]: 	>>> Finished "Compute estimated construction time of the index" in 0.0070 secs
2022-05-17 02:34:09,484 [INFO]: 	Checking that your have enough memory available to create the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,487 [INFO]: 541.5MB of memory will be needed to build the index (more might be used if you have more)
2022-05-17 02:34:09,488 [INFO]: 	>>> Finished "Checking that your have enough memory available to create the index" in 0.0025 secs
2022-05-17 02:34:09,489 [INFO]: 	Selecting most promising index types given data characteristics 05/17/2022, 02:34:09
2022-05-17 02:34:09,490 [INFO]: 	>>> Finished "Selecting most promising index types given data characteristics" in 0.0002 secs
2022-05-17 02:34:09,499 [INFO]: 	Creating the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,500 [INFO]: 		-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8 05/17/2022, 02:34:09
2022-05-17 02:34:09,509 [INFO]: 		>>> Finished "-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8" in 0.0089 secs
2022-05-17 02:34:09,510 [INFO]: The index size will be approximately 18.2MB
2022-05-17 02:34:09,512 [INFO]: 		-> Extract training vectors 05/17/2022, 02:34:09
2022-05-17 02:34:09,513 [INFO]: Will use 53893 vectors to train the index, that will use 903.8MB of memory
 99%|█████████▉| 107/108 [00:00<00:00, 521.43it/s]
2022-05-17 02:34:09,732 [INFO]: 		>>> Finished "-> Extract training vectors" in 0.2194 secs
2022-05-17 02:34:10,226 [INFO]: 	>>> Finished "Creating the index" in 0.7267 secs
2022-05-17 02:34:10,228 [INFO]: >>> Finished "Launching the whole pipeline" in 0.9070 secs
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-6-d42557af9f46>](https://localhost:8080/#) in <module>()
     13     knn_extra_neighbors = 100,                     # num extra neighbors to fetch
     14     max_index_memory_usage = '100m',
---> 15     current_memory_available = '10G'
     16 )

6 frames
/usr/local/lib/python3.7/dist-packages/faiss/swigfaiss.py in index_cpu_to_gpu(provider, device, index, options)
  10273 def index_cpu_to_gpu(provider, device, index, options=None):
  10274     r""" converts any CPU index that can be converted to GPU"""
> 10275     return _swigfaiss.index_cpu_to_gpu(provider, device, index, options)
  10276 
  10277 def index_cpu_to_gpu_multiple(provider, devices, index, options=None):

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at /project/faiss/faiss/gpu/GpuIndexIVFPQ.cu:428: Error: 'ivfpqConfig_.interleavedLayout || IVFPQ::isSupportedPQCodeLength(subQuantizers_)' failed: Number of bytes per encoded vector / sub-quantizers (256) is not supported
@0x7o 0x7o closed this as completed May 18, 2022
@0x7o 0x7o reopened this May 18, 2022
@richjames0
Copy link

richjames0 commented Jun 1, 2022

For anyone else running into this (as I have) there's a (fairly obvious) workaround to hardcode use_gpu to False in index_embeddings within retrieval.py. I'll update if and when I come up with a proper fix, but this at least allowed me to progress (after burning a lot of CPU cycles)

@szh-max
Copy link

szh-max commented May 10, 2023

How did you solve this problem? thanks @0x7o

@debajyotimaz
Copy link

This error occurs when trying to use TrainingWrapper. If the training data is 1 megabyte in total, no error occurs. On larger data this error appears.

Apparently the script is trying to process all the data at once, not in batches. Because of this there is a lack of system resources.

RAM: 12 gb VRAM: 12 gb

import torch
from retro_pytorch import RETRO, TrainingWrapper

retro = RETRO(
    chunk_size = 64,                         # the chunk size that is indexed and retrieved (needed for proper relative positions as well as causal chunked cross attention)
    max_seq_len = 2048,                      # max sequence length
    enc_dim = 896,                           # encoder model dim
    enc_depth = 2,                           # encoder depth
    dec_dim = 796,                           # decoder model dim
    dec_depth = 12,                          # decoder depth
    dec_cross_attn_layers = (3, 6, 9, 12),   # decoder cross attention layers (with causal chunk cross attention)
    heads = 8,                               # attention heads
    dim_head = 64,                           # dimension per head
    dec_attn_dropout = 0.25,                 # decoder attention dropout
    dec_ff_dropout = 0.25,                   # decoder feedforward dropout
    use_deepnet = True                       # turn on post-normalization with DeepNet residual scaling and initialization, for scaling to 1000 layers
).cuda()

wrapper = TrainingWrapper(
    retro = retro,                                 # path to retro instance
    knn = 2,                                       # knn (2 in paper was sufficient)
    chunk_size = 64,                               # chunk size (64 in paper)
    documents_path = '/content/text/',              # path to folder of text
    glob = '**/*.txt',                             # text glob
    chunks_memmap_path = './train.chunks.dat',     # path to chunks
    seqs_memmap_path = './train.seq.dat',          # path to sequence data
    doc_ids_memmap_path = './train.doc_ids.dat',   # path to document ids per chunk (used for filtering neighbors belonging to same document)
    max_chunks = 500_000,                        # maximum cap to chunks
    max_seqs = 100_000,                            # maximum seqs
    knn_extra_neighbors = 100,                     # num extra neighbors to fetch
    max_index_memory_usage = '100m',
    current_memory_available = '10G'
)

Out:

processing /content/text/kxaa.txt
Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip
Downloading: 100%
29.0/29.0 [00:00<00:00, 662B/s]
Downloading: 100%
570/570 [00:00<00:00, 14.6kB/s]
Downloading: 100%
208k/208k [00:00<00:00, 2.26MB/s]
Downloading: 100%
426k/426k [00:00<00:00, 4.60MB/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (3449121 > 512). Running this sequence through the model will result in indexing errors
Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main
Downloading: 100%
416M/416M [00:09<00:00, 50.3MB/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

embedded XXXXX / 53893
saved .tmp/embeddings/XXXXX.npy
2022-05-17 02:34:09,316 [INFO]: Using 2 omp threads (processes), consider increasing --nb_cores if you have more
2022-05-17 02:34:09,317 [INFO]: Launching the whole pipeline 05/17/2022, 02:34:09
2022-05-17 02:34:09,321 [INFO]: Reading total number of vectors and dimension 05/17/2022, 02:34:09
100%|██████████| 108/108 [00:00<00:00, 5336.89it/s]
2022-05-17 02:34:09,465 [INFO]: There are 53893 embeddings of dim 768
2022-05-17 02:34:09,466 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 0.1405 secs
2022-05-17 02:34:09,471 [INFO]: 	Compute estimated construction time of the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,474 [INFO]: 		-> Train: 16.7 minutes
2022-05-17 02:34:09,478 [INFO]: 		-> Add: 0.5 seconds
2022-05-17 02:34:09,480 [INFO]: 		Total: 16.7 minutes
2022-05-17 02:34:09,481 [INFO]: 	>>> Finished "Compute estimated construction time of the index" in 0.0070 secs
2022-05-17 02:34:09,484 [INFO]: 	Checking that your have enough memory available to create the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,487 [INFO]: 541.5MB of memory will be needed to build the index (more might be used if you have more)
2022-05-17 02:34:09,488 [INFO]: 	>>> Finished "Checking that your have enough memory available to create the index" in 0.0025 secs
2022-05-17 02:34:09,489 [INFO]: 	Selecting most promising index types given data characteristics 05/17/2022, 02:34:09
2022-05-17 02:34:09,490 [INFO]: 	>>> Finished "Selecting most promising index types given data characteristics" in 0.0002 secs
2022-05-17 02:34:09,499 [INFO]: 	Creating the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,500 [INFO]: 		-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8 05/17/2022, 02:34:09
2022-05-17 02:34:09,509 [INFO]: 		>>> Finished "-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8" in 0.0089 secs
2022-05-17 02:34:09,510 [INFO]: The index size will be approximately 18.2MB
2022-05-17 02:34:09,512 [INFO]: 		-> Extract training vectors 05/17/2022, 02:34:09
2022-05-17 02:34:09,513 [INFO]: Will use 53893 vectors to train the index, that will use 903.8MB of memory
 99%|█████████▉| 107/108 [00:00<00:00, 521.43it/s]
2022-05-17 02:34:09,732 [INFO]: 		>>> Finished "-> Extract training vectors" in 0.2194 secs
2022-05-17 02:34:10,226 [INFO]: 	>>> Finished "Creating the index" in 0.7267 secs
2022-05-17 02:34:10,228 [INFO]: >>> Finished "Launching the whole pipeline" in 0.9070 secs
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-6-d42557af9f46>](https://localhost:8080/#) in <module>()
     13     knn_extra_neighbors = 100,                     # num extra neighbors to fetch
     14     max_index_memory_usage = '100m',
---> 15     current_memory_available = '10G'
     16 )

6 frames
/usr/local/lib/python3.7/dist-packages/faiss/swigfaiss.py in index_cpu_to_gpu(provider, device, index, options)
  10273 def index_cpu_to_gpu(provider, device, index, options=None):
  10274     r""" converts any CPU index that can be converted to GPU"""
> 10275     return _swigfaiss.index_cpu_to_gpu(provider, device, index, options)
  10276 
  10277 def index_cpu_to_gpu_multiple(provider, devices, index, options=None):

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at /project/faiss/faiss/gpu/GpuIndexIVFPQ.cu:428: Error: 'ivfpqConfig_.interleavedLayout || IVFPQ::isSupportedPQCodeLength(subQuantizers_)' failed: Number of bytes per encoded vector / sub-quantizers (256) is not supported

I don't know the logic behind the solution but I am sharing what worked for me. I increased the memory of these two parameters:
max_index_memory_usage = '100m',
current_memory_available = '10G'
to:
max_index_memory_usage = '2G',
current_memory_available = '50G'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants