[Beta][Clarification]How does RAG work? #12748

N3tN00b3r · 2026-05-08T09:53:53Z

N3tN00b3r
May 8, 2026

Hi, i apologize in advance if my question is trivial or I'm making incorrect assumptions. I am just a user and not a developer..

I configured my paperless-ngx instance using an OpenAI-like LLM (Gemini) and HuggingFace as a local embedding. Document suggestions are generated correctly, and the indexing process is also completed on newly uploaded documents.

However, I'm having some problems using RAG (document chat). My documents are typically quite large technical manuals. Reading the logs, I noticed that the LLM is passed context along with the question, truncated to 15,000 characters. It also seems that a reduced context window is set somewhere else, causing an error when this limit is exceeded.

**[2026-05-08 08:19:53,924] [INFO] [paperless_ai.chat] Truncating single-document context from 96628 to 15000 characters** [2026-05-08 08:19:53,938] [DEBUG] [paperless_ai.chat] Document chat prompt: Context information is below. --------------------- TITLE: Liquiline CM442/CM444/CM448 - Trasmettitore digitale multiparametro TI00444C/16/IT/23.19 Products Solutions Services 71481890 2019-11-30 Informazioni tecniche Liquiline CM442/CM444/CM448 Trasmettitore digitale multiparametro con un massimo di otto canali di misura basato su tecnologia Memosens digitale Per il monitoraggio e il controllo dei processi dell'industria e del settore ambientale Applicazione • Possibilità di selezionare una funzione di pulizia, un etc.

Answer: [2026-05-08 08:19:54,022] [ERROR] [_granian.utils] Application callable raised an exception Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/granian/_futures.py", line 15, in future_watcher await inner(watcher.scope, watcher.proto) File "/usr/local/lib/python3.12/site-packages/channels/routing.py", line 48, in __call__ return await application(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/core/handlers/asgi.py", line 166, in __call__ await self.handle(scope, receive, send) File "/usr/local/lib/python3.12/site-packages/django/core/handlers/asgi.py", line 212, in handle task.result() File "/usr/local/lib/python3.12/site-packages/django/core/handlers/asgi.py", line 191, in process_request await self.send_response(response, send) File "/usr/local/lib/python3.12/site-packages/django/core/handlers/asgi.py", line 336, in send_response async for part in content: File "/usr/local/lib/python3.12/site-packages/django/http/response.py", line 542, in __aiter__ for part in await sync_to_async(list)(self.streaming_content): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 504, in __call__ ret = await asyncio.shield(exec_coro) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 59, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/asgiref/sync.py", line 557, in thread_handler return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/utils/text.py", line 383, in compress_sequence for item in sequence: ^^^^^^^^ File "/usr/src/paperless/src/paperless_ai/chat.py", line 151, in stream_chat_with_documents response_stream = query_engine.query(prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/base/base_query_engine.py", line 44, in query query_result = self._query(str_or_query_bundle) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/query_engine/retriever_query_engine.py", line 197, in _query response = self._response_synthesizer.synthesize( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/response_synthesizers/base.py", line 235, in synthesize response_str = self.get_response( ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index_instrumentation/dispatcher.py", line 413, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py", line 42, in get_response new_texts = self._make_compact_text_chunks(query_str, text_chunks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/response_synthesizers/compact_and_refine.py", line 57, in _make_compact_text_chunks return self._prompt_helper.repack(max_prompt, text_chunks, llm=self._llm) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/indices/prompt_helper.py", line 307, in repack text_splitter = self.get_text_splitter_given_prompt( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/indices/prompt_helper.py", line 261, in get_text_splitter_given_prompt chunk_size = self._get_available_chunk_size( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/indices/prompt_helper.py", line 243, in _get_available_chunk_size available_context_size = self._get_available_context_size(num_prompt_tokens) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/llama_index/core/indices/prompt_helper.py", line 163, in _get_available_context_size raise ValueError( **ValueError: Calculated available context size -3831 was not non-negative.**

Question n.1 Why include the 15,000 characters of the document in the context of the prompt? Shouldn't RAG find semantic matches directly on the vector database?
Question n.2 Wouldn't it be better if there was the possibility to customize the RAG-related values such as "context window", "chunk size", "chunk overlapping"?

Answered by shamoon

May 8, 2026

If you'd like to test out the newer build, #12751 should resolve this (I hope)

View full answer

shamoon · 2026-05-08T20:01:28Z

shamoon
May 8, 2026
Maintainer

If you'd like to test out the newer build, #12751 should resolve this (I hope)

1 reply

N3tN00b3r May 9, 2026
Author

I confirm that it works. Thank you for the effort.

p7996619 · 2026-05-09T15:24:55Z

p7996619
May 9, 2026

Asking here because it fits the title... Can someone confirm if RAG embedding using Huggingface is completely local except the model download? i.e. no part of any document is sent to an inference API or other cloud services?

0 replies

2026-05-10T04:22:20Z

github-actions[bot]
Bot May 10, 2026

This discussion has been automatically closed because it was marked as answered. Please see our contributing guidelines for more details.

0 replies

2026-06-09T04:26:41Z

github-actions[bot]
Bot Jun 9, 2026

This discussion has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion for related concerns. See our contributing guidelines for more details.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Beta][Clarification]How does RAG work? #12748

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Beta][Clarification]How does RAG work? #12748

Uh oh!

Uh oh!

N3tN00b3r May 8, 2026

Replies: 4 comments · 1 reply

Uh oh!

shamoon May 8, 2026 Maintainer

Uh oh!

N3tN00b3r May 9, 2026 Author

Uh oh!

Uh oh!

p7996619 May 9, 2026

Uh oh!

github-actions[bot] Bot May 10, 2026

Uh oh!

github-actions[bot] Bot Jun 9, 2026

N3tN00b3r
May 8, 2026

Replies: 4 comments 1 reply

shamoon
May 8, 2026
Maintainer

N3tN00b3r May 9, 2026
Author

p7996619
May 9, 2026

github-actions[bot]
Bot May 10, 2026

github-actions[bot]
Bot Jun 9, 2026