You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using google colab with the provided template and ingested 20 PDF documents.
The embeddings have been generated without any problem and I can query the llm, but the response is always truncated / there seems to be an error in generating the response (see below).
Thank you very much for your help!
Here is an example (copied from Google Colab):
`
2024-01-08 13:53:35.106 | INFO | llmsearch.config:validate_params:165 - Loading model paramaters in configuration class LlamaModelConfig
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:43 - Setting SENTENCE_TRANSFORMERS_HOME folder: /content/llm/cache
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:44 - Setting TRANSFORMERS_CACHE folder: /content/llm/cache/transformers
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:45 - Setting HF_HOME: /content/llm/cache/hf_home
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:46 - Setting MODELS_CACHE_FOLDER: /content/llm/cache
2024-01-08 13:53:35.106 | INFO | llmsearch.models.llama:model:134 - Loading model...
2024-01-08 13:53:35.107 | INFO | llmsearch.models.llama:model:137 - Initializing LLAmaCPP model...
2024-01-08 13:53:35.107 | INFO | llmsearch.models.llama:model:138 - {'n_ctx': 1024, 'n_batch': 512, 'n_gpu_layers': 43}
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /content/llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 5120
llama_model_loader: - kv 4: llama.block_count u32 = 40
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 13824
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 40
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 40
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 15
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q4_K: 241 tensors
llama_model_loader: - type q6_K: 41 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 5120
llm_load_print_meta: n_embd_v_gqa = 5120
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 7.33 GiB (4.83 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: system memory used = 88.03 MiB
llm_load_tensors: VRAM used = 7412.96 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
...................................................................................................
llama_new_context_with_model: n_ctx = 1024
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 800.00 MiB, K (f16): 400.00 MiB, V (f16): 400.00 MiB
llama_build_graph: non-view tensors processed: 844/844
llama_new_context_with_model: compute buffer total size = 115.19 MiB
llama_new_context_with_model: VRAM scratch buffer: 112.00 MiB
llama_new_context_with_model: total VRAM used: 7524.96 MiB (model: 7412.96 MiB, context: 112.00 MiB)
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
2024-01-08 13:53:58.464 | INFO | llmsearch.embeddings:get_embedding_model:65 - Embedding model config: type=<EmbeddingModelType.instruct: 'instruct'> model_name='hkunlp/instructor-large' additional_kwargs={}
load INSTRUCTOR_Transformer
2024-01-08 13:53:59.968343: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-08 13:53:59.968395: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-08 13:53:59.975660: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-08 13:54:01.951666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
max_seq_length 512
2024-01-08 13:54:25.921 | INFO | llmsearch.ranking:init:39 - Initialized BGE-base Reranker
2024-01-08 13:54:29.218 | INFO | llmsearch.splade:init:33 - Setting device to cuda:0
2024-01-08 13:54:35.700 | INFO | llmsearch.splade:load:100 - SPLADE: Got 0 labels.
2024-01-08 13:54:35.700 | INFO | llmsearch.splade:load:104 - Loaded sparse (SPLADE) embeddings from /content/llm/embeddings/splade/splade_embeddings.npz
2024-01-08 13:54:35.700 | INFO | llmsearch.utils:get_hyde_chain:110 - Creating HyDE chain...
2024-01-08 13:54:35.701 | INFO | llmsearch.utils:get_multiquery_chain:117 - Creating MultiQUery chain...
ENTER QUESTION >> Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:54:39.989 | DEBUG | llmsearch.ranking:get_relevant_documents:84 - Evaluating query: Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:54:39.990 | INFO | llmsearch.splade:query:208 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 1519
[0.00914291 0.0224314 0.01972341 ... 0.02577811 0.03467241 0.02387246]
2024-01-08 13:54:42.806 | INFO | llmsearch.ranking:get_relevant_documents:92 - Stage 1: Got 15 documents.
2024-01-08 13:54:42.806 | INFO | llmsearch.ranking:get_relevant_documents:104 - Dense embeddings filter: None
2024-01-08 13:54:44.367 | DEBUG | llmsearch.ranking:get_relevant_documents:113 - NUMBER OF NEW DOCS to RETRIEVE: 25
2024-01-08 13:54:44.382 | INFO | llmsearch.ranking:rerank:51 - Reranking documents ...
2024-01-08 13:54:44.382 | INFO | llmsearch.ranking:get_scores:42 - Reranking documents ...
[-4.017726898193359, -5.740996360778809, -3.4490861892700195, -6.219937801361084, -0.5473086833953857, -7.063520431518555, -6.515655994415283, -9.21086311340332, -6.386524677276611, -7.356011390686035, -6.924832820892334, -8.909055709838867, -7.650751113891602, -6.111538410186768, -7.745747089385986, -5.9694342613220215, -7.448235988616943, -6.252921104431152, -6.285423278808594, -6.576879501342773, -7.744513511657715, -8.150556564331055, -6.460150718688965, -7.074395179748535, -4.118349552154541]
2024-01-08 13:55:04.713 | INFO | llmsearch.ranking:rerank:59 - [-0.5473086833953857, -3.4490861892700195, -4.017726898193359, -4.118349552154541, -5.740996360778809, -5.9694342613220215, -6.111538410186768, -6.219937801361084, -6.252921104431152, -6.285423278808594, -6.386524677276611, -6.460150718688965, -6.515655994415283, -6.576879501342773, -6.924832820892334, -7.063520431518555, -7.074395179748535, -7.356011390686035, -7.448235988616943, -7.650751113891602, -7.744513511657715, -7.745747089385986, -8.150556564331055, -8.909055709838867, -9.21086311340332]
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:131 - New most relevant query: Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:138 - Number of documents after stage 2 (dense + sparse): 25
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:141 - Re-ranker avg. scores for top 5 resuls, chunk size 1024: -3.57
[chain/start] [1:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] Entering Chain run with input:
{
"question": "Could you provide me with some of the best methods for effectively marketing a product",
"context": "And, this book came out of my studies and experiences, you know, like \nresearching and reading and just living my life in a high pressure, high stakes \nenvironment. And I know that seems weird, but just like the best marketing \ndecision you can make for a product is to have a really good product that people \nwant, the best way to have writing that people want is to live a life and have \nexperienced the world in a way that allows you to communicate something to \npeople that they'd never heard before. \n \nI think it's especially true in fiction because at least in non-fiction, someone can \ngo out and study and objectively find, academics can write good non-fiction \nbooks based on their research. But, non-fiction, you have to be able to \ncommunicate all these intangibles to the reader. \n \nTim Ferriss: \nYou mean in fiction. \n \nRyan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles \nabout life and relationships and how the world works. And if you haven't gone\n\nbest practices for marketing bestselling books. There are very few consensuses \nabout the best way to write a best-reading book, if that makes sense. \n \nI mean, that's part of the reason why I fell in love with "Daily Rituals," which \nprofiles 170 or so world-famous creatives, whether it's writers, composers, \nscientists, etc. and how their daily schedules are laid out because they're so \ndifferent. It's really fascinating to me. \n \nDo you watch documentaries? If so, what are your favorite documentaries that \ncome to mind? \n \nRyan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch \nas many as I like because... yeah. But some favorites, I like "Fog of War," I \nthink is amazing. \n \nThat Phil Spector documentary from a couple years ago is pretty crazy. I think \nit's called "The Wall of Sound," but I forget what it's called exactly. There's the \nguy who did Fog of War has a new one out about Donald Rumsfeld that I want \nto see called the Unknown Known.\n\nSo, I want to talk about the most effective pair of productivity techniques that I have \ncome across since 2004 that have helped me up until this point test the uncommon \ndespite the fear of ridicule, criticism, failure, and so forth. And both techniques – I \ncheated a bit with the format. Some things we will repeat – are borrowed from stoicism, \nwhich was a school of philosophy from the Hellenistic period used by a lot of the Greco \nroman educated elite, including emperors, and military, and statesmen.\n\nthey wanted to plug by coming on the show? \n \nNeil Strauss: \nOh, yeah. I'll plug for you. I'll always tell somebody, and this is \ntrue: When you're going on, and you're trying to promote your \nbusiness, or your brand, or your book, or movie – whatever you're"
}
[llm/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] Entering LLM run with input:
{
"prompts": [
"### Instruction:\nUse the following pieces of context to provide detailed answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.\n\n### Context:\n---------------\nAnd, this book came out of my studies and experiences, you know, like \nresearching and reading and just living my life in a high pressure, high stakes \nenvironment. And I know that seems weird, but just like the best marketing \ndecision you can make for a product is to have a really good product that people \nwant, the best way to have writing that people want is to live a life and have \nexperienced the world in a way that allows you to communicate something to \npeople that they'd never heard before. \n \nI think it's especially true in fiction because at least in non-fiction, someone can \ngo out and study and objectively find, academics can write good non-fiction \nbooks based on their research. But, non-fiction, you have to be able to \ncommunicate all these intangibles to the reader. \n \nTim Ferriss: \nYou mean in fiction. \n \nRyan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles \nabout life and relationships and how the world works. And if you haven't gone\n\nbest practices for marketing bestselling books. There are very few consensuses \nabout the best way to write a best-reading book, if that makes sense. \n \nI mean, that's part of the reason why I fell in love with "Daily Rituals," which \nprofiles 170 or so world-famous creatives, whether it's writers, composers, \nscientists, etc. and how their daily schedules are laid out because they're so \ndifferent. It's really fascinating to me. \n \nDo you watch documentaries? If so, what are your favorite documentaries that \ncome to mind? \n \nRyan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch \nas many as I like because... yeah. But some favorites, I like "Fog of War," I \nthink is amazing. \n \nThat Phil Spector documentary from a couple years ago is pretty crazy. I think \nit's called "The Wall of Sound," but I forget what it's called exactly. There's the \nguy who did Fog of War has a new one out about Donald Rumsfeld that I want \nto see called the Unknown Known.\n\nSo, I want to talk about the most effective pair of productivity techniques that I have \ncome across since 2004 that have helped me up until this point test the uncommon \ndespite the fear of ridicule, criticism, failure, and so forth. And both techniques – I \ncheated a bit with the format. Some things we will repeat – are borrowed from stoicism, \nwhich was a school of philosophy from the Hellenistic period used by a lot of the Greco \nroman educated elite, including emperors, and military, and statesmen.\n\nthey wanted to plug by coming on the show? \n \nNeil Strauss: \nOh, yeah. I'll plug for you. I'll always tell somebody, and this is \ntrue: When you're going on, and you're trying to promote your \nbusiness, or your brand, or your book, or movie – whatever you're\n---------------\n\n### Question: Could you provide me with some of the best methods for effectively marketing a product\n### Response:"
]
}
The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.
They wanted to plug by coming on the show?
Oh, yeah. I'll plug for you. I'll always tell somebody, and
llama_print_timings: load time = 8475.15 ms
llama_print_timings: sample time = 92.82 ms / 144 runs ( 0.64 ms per token, 1551.32 tokens per second)
llama_print_timings: prompt eval time = 16788.27 ms / 880 tokens ( 19.08 ms per token, 52.42 tokens per second)
llama_print_timings: eval time = 29640.72 ms / 143 runs ( 207.28 ms per token, 4.82 tokens per second)
llama_print_timings: total time = 47041.79 ms
[llm/end] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] [47.06s] Exiting LLM run with output:
{
"generations": [
[
{
"text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and",
"generation_info": null
}
]
],
"llm_output": null,
"run": null
}
[chain/end] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] [47.06s] Exiting Chain run with output:
{
"text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and"
}
[chain/end] [1:chain:StuffDocumentsChain] [47.06s] Exiting Chain run with output:
{
"output_text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and"
}
============= SOURCES ==================
sample_docs/15-neil-strauss.pdf
{'chunk_size': 1024, 'document_id': 'dcbf9a82-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 20, 'score': -7.356011390686035}
******************* BEING EXTRACT *****************
they wanted to plug by coming on the show?
Neil Strauss:
Oh, yeah. I'll plug for you. I'll always tell somebody, and this is
true: When you're going on, and you're trying to promote your
business, or your brand, or your book, or movie – whatever you're
sample_docs/17-tim-ferriss-the-power-of-negative-visualization.pdf
{'chunk_size': 1024, 'document_id': 'dc58d00e-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 0, 'score': -4.017726898193359}
******************* BEING EXTRACT *****************
So, I want to talk about the most effective pair of productivity techniques that I have
come across since 2004 that have helped me up until this point test the uncommon
despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I
cheated a bit with the format. Some things we will repeat – are borrowed from stoicism,
which was a school of philosophy from the Hellenistic period used by a lot of the Greco
roman educated elite, including emperors, and military, and statesmen.
sample_docs/04-ryan-holiday.pdf
{'chunk_size': 1024, 'document_id': 'dca64352-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 19, 'score': -3.4490861892700195}
******************* BEING EXTRACT *****************
best practices for marketing bestselling books. There are very few consensuses
about the best way to write a best-reading book, if that makes sense.
I mean, that's part of the reason why I fell in love with "Daily Rituals," which
profiles 170 or so world-famous creatives, whether it's writers, composers,
scientists, etc. and how their daily schedules are laid out because they're so
different. It's really fascinating to me.
Do you watch documentaries? If so, what are your favorite documentaries that
come to mind?
Ryan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch
as many as I like because... yeah. But some favorites, I like "Fog of War," I
think is amazing.
That Phil Spector documentary from a couple years ago is pretty crazy. I think
it's called "The Wall of Sound," but I forget what it's called exactly. There's the
guy who did Fog of War has a new one out about Donald Rumsfeld that I want
to see called the Unknown Known.
sample_docs/04-ryan-holiday.pdf
{'chunk_size': 1024, 'document_id': 'dca64a32-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 22, 'score': -0.5473086833953857}
******************* BEING EXTRACT *****************
And, this book came out of my studies and experiences, you know, like
researching and reading and just living my life in a high pressure, high stakes
environment. And I know that seems weird, but just like the best marketing
decision you can make for a product is to have a really good product that people
want, the best way to have writing that people want is to live a life and have
experienced the world in a way that allows you to communicate something to
people that they'd never heard before.
I think it's especially true in fiction because at least in non-fiction, someone can
go out and study and objectively find, academics can write good non-fiction
books based on their research. But, non-fiction, you have to be able to
communicate all these intangibles to the reader.
Tim Ferriss:
You mean in fiction.
Ryan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles
about life and relationships and how the world works. And if you haven't gone
============= RESPONSE =================
The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.
They wanted to plug by coming on the show?
Oh, yeah. I'll plug for you. I'll always tell somebody, and
ENTER QUESTION >>
`
The text was updated successfully, but these errors were encountered:
The config in the notebook is just an example and can be tweaked to fit your use case/model. One limitation though is the amount of GPU memory on the free Google Colab. Pay attention to the GPU memory available, and try to increase the parameter without exceeding the memory.
Another option might be to use a smaller model on the free Google Colab.
Hello @snexus,
Thank you very much for creating this project!
I am using google colab with the provided template and ingested 20 PDF documents.
The embeddings have been generated without any problem and I can query the llm, but the response is always truncated / there seems to be an error in generating the response (see below).
Thank you very much for your help!
Here is an example (copied from Google Colab):
`
2024-01-08 13:53:35.106 | INFO | llmsearch.config:validate_params:165 - Loading model paramaters in configuration class LlamaModelConfig
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:43 - Setting SENTENCE_TRANSFORMERS_HOME folder: /content/llm/cache
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:44 - Setting TRANSFORMERS_CACHE folder: /content/llm/cache/transformers
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:45 - Setting HF_HOME: /content/llm/cache/hf_home
2024-01-08 13:53:35.106 | INFO | llmsearch.utils:set_cache_folder:46 - Setting MODELS_CACHE_FOLDER: /content/llm/cache
2024-01-08 13:53:35.106 | INFO | llmsearch.models.llama:model:134 - Loading model...
2024-01-08 13:53:35.107 | INFO | llmsearch.models.llama:model:137 - Initializing LLAmaCPP model...
2024-01-08 13:53:35.107 | INFO | llmsearch.models.llama:model:138 - {'n_ctx': 1024, 'n_batch': 512, 'n_gpu_layers': 43}
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /content/llm/models/airoboros-l2-13b-gpt4-1.4.1.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.context_length u32 = 4096
llama_model_loader: - kv 3: llama.embedding_length u32 = 5120
llama_model_loader: - kv 4: llama.block_count u32 = 40
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 13824
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 40
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 40
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 15
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "
", "", "<0x00>", "<...llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 18: general.quantization_version u32 = 2
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type q4_K: 241 tensors
llama_model_loader: - type q6_K: 41 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V2
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 5120
llm_load_print_meta: n_head = 40
llm_load_print_meta: n_head_kv = 40
llm_load_print_meta: n_layer = 40
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 5120
llm_load_print_meta: n_embd_v_gqa = 5120
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 13824
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 13.02 B
llm_load_print_meta: model size = 7.33 GiB (4.83 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '
''llm_load_print_meta: EOS token = 2 '
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.14 MiB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: system memory used = 88.03 MiB
llm_load_tensors: VRAM used = 7412.96 MiB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
...................................................................................................
llama_new_context_with_model: n_ctx = 1024
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 800.00 MiB, K (f16): 400.00 MiB, V (f16): 400.00 MiB
llama_build_graph: non-view tensors processed: 844/844
llama_new_context_with_model: compute buffer total size = 115.19 MiB
llama_new_context_with_model: VRAM scratch buffer: 112.00 MiB
llama_new_context_with_model: total VRAM used: 7524.96 MiB (model: 7412.96 MiB, context: 112.00 MiB)
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
2024-01-08 13:53:58.464 | INFO | llmsearch.embeddings:get_embedding_model:65 - Embedding model config: type=<EmbeddingModelType.instruct: 'instruct'> model_name='hkunlp/instructor-large' additional_kwargs={}
load INSTRUCTOR_Transformer
2024-01-08 13:53:59.968343: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-08 13:53:59.968395: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-08 13:53:59.975660: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-08 13:54:01.951666: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
max_seq_length 512
2024-01-08 13:54:25.921 | INFO | llmsearch.ranking:init:39 - Initialized BGE-base Reranker
2024-01-08 13:54:29.218 | INFO | llmsearch.splade:init:33 - Setting device to cuda:0
2024-01-08 13:54:35.700 | INFO | llmsearch.splade:load:100 - SPLADE: Got 0 labels.
2024-01-08 13:54:35.700 | INFO | llmsearch.splade:load:104 - Loaded sparse (SPLADE) embeddings from /content/llm/embeddings/splade/splade_embeddings.npz
2024-01-08 13:54:35.700 | INFO | llmsearch.utils:get_hyde_chain:110 - Creating HyDE chain...
2024-01-08 13:54:35.701 | INFO | llmsearch.utils:get_multiquery_chain:117 - Creating MultiQUery chain...
ENTER QUESTION >> Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:54:39.989 | DEBUG | llmsearch.ranking:get_relevant_documents:84 - Evaluating query: Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:54:39.990 | INFO | llmsearch.splade:query:208 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 1519
[0.00914291 0.0224314 0.01972341 ... 0.02577811 0.03467241 0.02387246]
2024-01-08 13:54:42.806 | INFO | llmsearch.ranking:get_relevant_documents:92 - Stage 1: Got 15 documents.
2024-01-08 13:54:42.806 | INFO | llmsearch.ranking:get_relevant_documents:104 - Dense embeddings filter: None
2024-01-08 13:54:44.367 | DEBUG | llmsearch.ranking:get_relevant_documents:113 - NUMBER OF NEW DOCS to RETRIEVE: 25
2024-01-08 13:54:44.382 | INFO | llmsearch.ranking:rerank:51 - Reranking documents ...
2024-01-08 13:54:44.382 | INFO | llmsearch.ranking:get_scores:42 - Reranking documents ...
[-4.017726898193359, -5.740996360778809, -3.4490861892700195, -6.219937801361084, -0.5473086833953857, -7.063520431518555, -6.515655994415283, -9.21086311340332, -6.386524677276611, -7.356011390686035, -6.924832820892334, -8.909055709838867, -7.650751113891602, -6.111538410186768, -7.745747089385986, -5.9694342613220215, -7.448235988616943, -6.252921104431152, -6.285423278808594, -6.576879501342773, -7.744513511657715, -8.150556564331055, -6.460150718688965, -7.074395179748535, -4.118349552154541]
2024-01-08 13:55:04.713 | INFO | llmsearch.ranking:rerank:59 - [-0.5473086833953857, -3.4490861892700195, -4.017726898193359, -4.118349552154541, -5.740996360778809, -5.9694342613220215, -6.111538410186768, -6.219937801361084, -6.252921104431152, -6.285423278808594, -6.386524677276611, -6.460150718688965, -6.515655994415283, -6.576879501342773, -6.924832820892334, -7.063520431518555, -7.074395179748535, -7.356011390686035, -7.448235988616943, -7.650751113891602, -7.744513511657715, -7.745747089385986, -8.150556564331055, -8.909055709838867, -9.21086311340332]
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:131 - New most relevant query: Could you provide me with some of the best methods for effectively marketing a product
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:138 - Number of documents after stage 2 (dense + sparse): 25
2024-01-08 13:55:04.714 | INFO | llmsearch.ranking:get_relevant_documents:141 - Re-ranker avg. scores for top 5 resuls, chunk size 1024: -3.57
[chain/start] [1:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] Entering Chain run with input:
{
"question": "Could you provide me with some of the best methods for effectively marketing a product",
"context": "And, this book came out of my studies and experiences, you know, like \nresearching and reading and just living my life in a high pressure, high stakes \nenvironment. And I know that seems weird, but just like the best marketing \ndecision you can make for a product is to have a really good product that people \nwant, the best way to have writing that people want is to live a life and have \nexperienced the world in a way that allows you to communicate something to \npeople that they'd never heard before. \n \nI think it's especially true in fiction because at least in non-fiction, someone can \ngo out and study and objectively find, academics can write good non-fiction \nbooks based on their research. But, non-fiction, you have to be able to \ncommunicate all these intangibles to the reader. \n \nTim Ferriss: \nYou mean in fiction. \n \nRyan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles \nabout life and relationships and how the world works. And if you haven't gone\n\nbest practices for marketing bestselling books. There are very few consensuses \nabout the best way to write a best-reading book, if that makes sense. \n \nI mean, that's part of the reason why I fell in love with "Daily Rituals," which \nprofiles 170 or so world-famous creatives, whether it's writers, composers, \nscientists, etc. and how their daily schedules are laid out because they're so \ndifferent. It's really fascinating to me. \n \nDo you watch documentaries? If so, what are your favorite documentaries that \ncome to mind? \n \nRyan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch \nas many as I like because... yeah. But some favorites, I like "Fog of War," I \nthink is amazing. \n \nThat Phil Spector documentary from a couple years ago is pretty crazy. I think \nit's called "The Wall of Sound," but I forget what it's called exactly. There's the \nguy who did Fog of War has a new one out about Donald Rumsfeld that I want \nto see called the Unknown Known.\n\nSo, I want to talk about the most effective pair of productivity techniques that I have \ncome across since 2004 that have helped me up until this point test the uncommon \ndespite the fear of ridicule, criticism, failure, and so forth. And both techniques – I \ncheated a bit with the format. Some things we will repeat – are borrowed from stoicism, \nwhich was a school of philosophy from the Hellenistic period used by a lot of the Greco \nroman educated elite, including emperors, and military, and statesmen.\n\nthey wanted to plug by coming on the show? \n \nNeil Strauss: \nOh, yeah. I'll plug for you. I'll always tell somebody, and this is \ntrue: When you're going on, and you're trying to promote your \nbusiness, or your brand, or your book, or movie – whatever you're"
}
[llm/start] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] Entering LLM run with input:
{
"prompts": [
"### Instruction:\nUse the following pieces of context to provide detailed answer the question at the end. If answer isn't in the context, say that you don't know, don't try to make up an answer.\n\n### Context:\n---------------\nAnd, this book came out of my studies and experiences, you know, like \nresearching and reading and just living my life in a high pressure, high stakes \nenvironment. And I know that seems weird, but just like the best marketing \ndecision you can make for a product is to have a really good product that people \nwant, the best way to have writing that people want is to live a life and have \nexperienced the world in a way that allows you to communicate something to \npeople that they'd never heard before. \n \nI think it's especially true in fiction because at least in non-fiction, someone can \ngo out and study and objectively find, academics can write good non-fiction \nbooks based on their research. But, non-fiction, you have to be able to \ncommunicate all these intangibles to the reader. \n \nTim Ferriss: \nYou mean in fiction. \n \nRyan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles \nabout life and relationships and how the world works. And if you haven't gone\n\nbest practices for marketing bestselling books. There are very few consensuses \nabout the best way to write a best-reading book, if that makes sense. \n \nI mean, that's part of the reason why I fell in love with "Daily Rituals," which \nprofiles 170 or so world-famous creatives, whether it's writers, composers, \nscientists, etc. and how their daily schedules are laid out because they're so \ndifferent. It's really fascinating to me. \n \nDo you watch documentaries? If so, what are your favorite documentaries that \ncome to mind? \n \nRyan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch \nas many as I like because... yeah. But some favorites, I like "Fog of War," I \nthink is amazing. \n \nThat Phil Spector documentary from a couple years ago is pretty crazy. I think \nit's called "The Wall of Sound," but I forget what it's called exactly. There's the \nguy who did Fog of War has a new one out about Donald Rumsfeld that I want \nto see called the Unknown Known.\n\nSo, I want to talk about the most effective pair of productivity techniques that I have \ncome across since 2004 that have helped me up until this point test the uncommon \ndespite the fear of ridicule, criticism, failure, and so forth. And both techniques – I \ncheated a bit with the format. Some things we will repeat – are borrowed from stoicism, \nwhich was a school of philosophy from the Hellenistic period used by a lot of the Greco \nroman educated elite, including emperors, and military, and statesmen.\n\nthey wanted to plug by coming on the show? \n \nNeil Strauss: \nOh, yeah. I'll plug for you. I'll always tell somebody, and this is \ntrue: When you're going on, and you're trying to promote your \nbusiness, or your brand, or your book, or movie – whatever you're\n---------------\n\n### Question: Could you provide me with some of the best methods for effectively marketing a product\n### Response:"
]
}
The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.
They wanted to plug by coming on the show?
Oh, yeah. I'll plug for you. I'll always tell somebody, and
llama_print_timings: load time = 8475.15 ms
llama_print_timings: sample time = 92.82 ms / 144 runs ( 0.64 ms per token, 1551.32 tokens per second)
llama_print_timings: prompt eval time = 16788.27 ms / 880 tokens ( 19.08 ms per token, 52.42 tokens per second)
llama_print_timings: eval time = 29640.72 ms / 143 runs ( 207.28 ms per token, 4.82 tokens per second)
llama_print_timings: total time = 47041.79 ms
[llm/end] [1:chain:StuffDocumentsChain > 2:chain:LLMChain > 3:llm:CustomLlamaLangChainModel] [47.06s] Exiting LLM run with output:
{
"generations": [
[
{
"text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and",
"generation_info": null
}
]
],
"llm_output": null,
"run": null
}
[chain/end] [1:chain:StuffDocumentsChain > 2:chain:LLMChain] [47.06s] Exiting Chain run with output:
{
"text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and"
}
[chain/end] [1:chain:StuffDocumentsChain] [47.06s] Exiting Chain run with output:
{
"output_text": "The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.\n\nThey wanted to plug by coming on the show? \nOh, yeah. I'll plug for you. I'll always tell somebody, and"
}
============= SOURCES ==================
sample_docs/15-neil-strauss.pdf
{'chunk_size': 1024, 'document_id': 'dcbf9a82-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 20, 'score': -7.356011390686035}
******************* BEING EXTRACT *****************
they wanted to plug by coming on the show?
Neil Strauss:
Oh, yeah. I'll plug for you. I'll always tell somebody, and this is
true: When you're going on, and you're trying to promote your
business, or your brand, or your book, or movie – whatever you're
sample_docs/17-tim-ferriss-the-power-of-negative-visualization.pdf
{'chunk_size': 1024, 'document_id': 'dc58d00e-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 0, 'score': -4.017726898193359}
******************* BEING EXTRACT *****************
So, I want to talk about the most effective pair of productivity techniques that I have
come across since 2004 that have helped me up until this point test the uncommon
despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I
cheated a bit with the format. Some things we will repeat – are borrowed from stoicism,
which was a school of philosophy from the Hellenistic period used by a lot of the Greco
roman educated elite, including emperors, and military, and statesmen.
sample_docs/04-ryan-holiday.pdf
{'chunk_size': 1024, 'document_id': 'dca64352-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 19, 'score': -3.4490861892700195}
******************* BEING EXTRACT *****************
best practices for marketing bestselling books. There are very few consensuses
about the best way to write a best-reading book, if that makes sense.
I mean, that's part of the reason why I fell in love with "Daily Rituals," which
profiles 170 or so world-famous creatives, whether it's writers, composers,
scientists, etc. and how their daily schedules are laid out because they're so
different. It's really fascinating to me.
Do you watch documentaries? If so, what are your favorite documentaries that
come to mind?
Ryan Holiday: I love documentaries. But I don't watch that much TV. So, I don't get to watch
as many as I like because... yeah. But some favorites, I like "Fog of War," I
think is amazing.
That Phil Spector documentary from a couple years ago is pretty crazy. I think
it's called "The Wall of Sound," but I forget what it's called exactly. There's the
guy who did Fog of War has a new one out about Donald Rumsfeld that I want
to see called the Unknown Known.
sample_docs/04-ryan-holiday.pdf
{'chunk_size': 1024, 'document_id': 'dca64a32-ae20-11ee-b544-0242ac1c000c', 'label': '', 'page': 22, 'score': -0.5473086833953857}
******************* BEING EXTRACT *****************
And, this book came out of my studies and experiences, you know, like
researching and reading and just living my life in a high pressure, high stakes
environment. And I know that seems weird, but just like the best marketing
decision you can make for a product is to have a really good product that people
want, the best way to have writing that people want is to live a life and have
experienced the world in a way that allows you to communicate something to
people that they'd never heard before.
I think it's especially true in fiction because at least in non-fiction, someone can
go out and study and objectively find, academics can write good non-fiction
books based on their research. But, non-fiction, you have to be able to
communicate all these intangibles to the reader.
Tim Ferriss:
You mean in fiction.
Ryan Holiday: Yeah, I'm sorry, in fiction. Yeah. You have to communicate all these intangibles
about life and relationships and how the world works. And if you haven't gone
============= RESPONSE =================
The most effective pair of productivity techniques that I have come across since 2004 that have helped me up until this point test the uncommon despite the fear of ridicule, criticism, failure, and so forth. And both techniques – I cheated a bit with the format. Some things we will repeat – are borrowed from stoicism, which was a school of philosophy from the Hellenistic period used by a lot of the Greco roman educated elite, including emperors, and military, and statesmen.
They wanted to plug by coming on the show?
Oh, yeah. I'll plug for you. I'll always tell somebody, and
ENTER QUESTION >>
`
The text was updated successfully, but these errors were encountered: