[Question]: When parsing a document under the maximum token limit preset by the model, an error is reported indicating that the input tokens exceed the model's limit. How should this issue be debugged and resolved?

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Describe your problem

To start all services using Docker, image version: `infiniflow/ragflow:v0.17.2`.
The model being used is SiliconFlow's BAAI/bge-m3, with the maximum token count set to 8192.
BAAI/bge-m3___OpenAI-API@OpenAI-API-Compatible
When parsing a document under the maximum token limit preset by the model, an error is reported indicating that the input tokens exceed the model's limit. How should this issue be debugged and resolved?
The front-end error is as follows:

<img width="756" alt="Image" src="https://github.com/user-attachments/assets/630b0c66-9cc1-4402-a29c-a784eca7eaa1" />

The log error is as follows:

```
2025-03-19 19:52:14,987 INFO     27 task_consumer_0 reported heartbeat: {"name": "task_consumer_0", "now": "2025-03-19T19:52:14.985+08:00", "boot_at": "2025-03-19T19:26:31.041+08:00", "pending": 1, "lag": 0, "done": 0, "failed": 0, "current": {"1aa9c6ce04b511f09f760242ac190006": {"id": "1aa9c6ce04b511f09f760242ac190006", "doc_id": "d8786200049211f0a6310242ac180006", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "c82c222e049211f0922e0242ac180006", "parser_id": "naive", "parser_config": {"auto_keywords": 5, "auto_questions": 0, "chunk_token_num": 512, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "graphrag": {"entity_types": ["drug", "tumor", "gene", "mutation", "category"], "method": "light", "resolution": "true", "use_graphrag": "true"}, "layout_recognize": "DeepDOC"}, "name": "\u975e\u5c0f\u7ec6\u80de\u80ba\u764c.pdf", "type": "pdf", "location": "\u975e\u5c0f\u7ec6\u80de\u80ba\u764c.pdf", "size": 18991802, "tenant_id": "abaf5490007311f0b5530242ac180006", "language": "Chinese", "embd_id": "BAAI/bge-m3___OpenAI-API@OpenAI-API-Compatible", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "auto_keywords": 5, "auto_questions": 0, "html4excel": false, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": "true", "entity_types": ["drug", "tumor", "gene", "mutation", "category"], "method": "light", "resolution": "true"}}, "img2txt_id": "", "asr_id": "", "llm_id": "deepseek-chat___OpenAI-API@OpenAI-API-Compatible", "update_time": 1742383633143, "task_type": "graphrag"}}}
2025-03-19 19:52:21,949 INFO     27 HTTP Request: POST http://192.168.1.56:9002/v1/chat/completions "HTTP/1.1 200 OK"
2025-03-19 19:52:22,342 INFO     27 set_progress(1aa9c6ce04b511f09f760242ac190006), progress: -1, progress_msg: 19:52:22 [ERROR][Exception]: Exceptions from Trio nursery (4 sub-exceptions) -- **ERROR**: Error code: 400 - {'error': {'message': '输入Token超过模型限制 (request id: 20250319195027616899205DVONcrl3)', 'type': 'upstream_error', 'param': '400', 'code': 'bad_response_status_code'}}
2025-03-19 19:52:22,342 ERROR    27 handle_task got exception for task {"id": "1aa9c6ce04b511f09f760242ac190006", "doc_id": "d8786200049211f0a6310242ac180006", "from_page": 100000000, "to_page": 100000000, "retry_count": 0, "kb_id": "c82c222e049211f0922e0242ac180006", "parser_id": "naive", "parser_config": {"auto_keywords": 5, "auto_questions": 0, "chunk_token_num": 512, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "graphrag": {"entity_types": ["drug", "tumor", "gene", "mutation", "category"], "method": "light", "resolution": "true", "use_graphrag": "true"}, "layout_recognize": "DeepDOC"}, "name": "\u975e\u5c0f\u7ec6\u80de\u80ba\u764c.pdf", "type": "pdf", "location": "\u975e\u5c0f\u7ec6\u80de\u80ba\u764c.pdf", "size": 18991802, "tenant_id": "abaf5490007311f0b5530242ac180006", "language": "Chinese", "embd_id": "BAAI/bge-m3___OpenAI-API@OpenAI-API-Compatible", "pagerank": 0, "kb_parser_config": {"layout_recognize": "DeepDOC", "chunk_token_num": 512, "delimiter": "\\n!?;\u3002\uff1b\uff01\uff1f", "auto_keywords": 5, "auto_questions": 0, "html4excel": false, "raptor": {"use_raptor": false}, "graphrag": {"use_graphrag": "true", "entity_types": ["drug", "tumor", "gene", "mutation", "category"], "method": "light", "resolution": "true"}}, "img2txt_id": "", "asr_id": "", "llm_id": "deepseek-chat___OpenAI-API@OpenAI-API-Compatible", "update_time": 1742383633143, "task_type": "graphrag"}
  + Exception Group Traceback (most recent call last):
  |   File "/ragflow/rag/svr/task_executor.py", line 594, in handle_task
  |     await do_handle_task(task)
  |   File "/ragflow/rag/svr/task_executor.py", line 522, in do_handle_task
  |     await run_graphrag(task, task_language, with_resolution, with_community, chat_model, embedding_model, progress_callback)
  |   File "/ragflow/graphrag/general/index.py", line 94, in run_graphrag
  |     await resolve_entities(
  |   File "/ragflow/graphrag/general/index.py", line 231, in resolve_entities
  |     reso = await er(graph, callback=callback)
  |   File "/ragflow/graphrag/entity_resolution.py", line 101, in __call__
  |     async with trio.open_nursery() as nursery:
  |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_core/_run.py", line 1058, in __aexit__
  |     raise combined_error_from_nursery
  | exceptiongroup.ExceptionGroup: Exceptions from Trio nursery (4 sub-exceptions)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in _resolve_candidate
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 447, in to_thread_run_sync
    |     return msg_from_thread.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 373, in do_release_then_return_result
    |     return result.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 392, in worker_fn
    |     ret = context.run(sync_fn, *args)
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in <lambda>
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/graphrag/general/extractor.py", line 66, in _chat
    |     raise Exception(response)
    | Exception: **ERROR**: Error code: 400 - {'error': {'message': '输入Token超过模型限制 (request id: 20250319195027616899205DVONcrl3)', 'type': 'upstream_error', 'param': '400', 'code': 'bad_response_status_code'}}
    +---------------- 2 ----------------
    | Traceback (most recent call last):
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in _resolve_candidate
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 447, in to_thread_run_sync
    |     return msg_from_thread.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 373, in do_release_then_return_result
    |     return result.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 392, in worker_fn
    |     ret = context.run(sync_fn, *args)
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in <lambda>
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/graphrag/general/extractor.py", line 66, in _chat
    |     raise Exception(response)
    | Exception: **ERROR**: Error code: 400 - {'error': {'message': '输入Token超过模型限制 (request id: 20250319195027535236662bw7KpkuU)', 'type': 'upstream_error', 'param': '400', 'code': 'bad_response_status_code'}}
    +---------------- 3 ----------------
    | Traceback (most recent call last):
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in _resolve_candidate
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 447, in to_thread_run_sync
    |     return msg_from_thread.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 373, in do_release_then_return_result
    |     return result.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 392, in worker_fn
    |     ret = context.run(sync_fn, *args)
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in <lambda>
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/graphrag/general/extractor.py", line 66, in _chat
    |     raise Exception(response)
    | Exception: **ERROR**: Error code: 400 - {'error': {'message': '输入Token超过模型限制 (request id: 20250319195027892940097YDLhAIM4)', 'type': 'upstream_error', 'param': '400', 'code': 'bad_response_status_code'}}
    +---------------- 4 ----------------
    | Traceback (most recent call last):
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in _resolve_candidate
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 447, in to_thread_run_sync
    |     return msg_from_thread.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 373, in do_release_then_return_result
    |     return result.unwrap()
    |   File "/ragflow/.venv/lib/python3.10/site-packages/outcome/_impl.py", line 213, in unwrap
    |     raise captured_error
    |   File "/ragflow/.venv/lib/python3.10/site-packages/trio/_threads.py", line 392, in worker_fn
    |     ret = context.run(sync_fn, *args)
    |   File "/ragflow/graphrag/entity_resolution.py", line 176, in <lambda>
    |     response = await trio.to_thread.run_sync(lambda: self._chat(text, [{"role": "user", "content": "Output:"}], gen_conf))
    |   File "/ragflow/graphrag/general/extractor.py", line 66, in _chat
    |     raise Exception(response)
    | Exception: **ERROR**: Error code: 400 - {'error': {'message': '输入Token超过模型限制 (request id: 20250319195028971046021zbPawvoY)', 'type': 'upstream_error', 'param': '400', 'code': 'bad_response_status_code'}}
    +------------------------------------




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: When parsing a document under the maximum token limit preset by the model, an error is reported indicating that the input tokens exceed the model's limit. How should this issue be debugged and resolved? #6291

Self Checks

Describe your problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: When parsing a document under the maximum token limit preset by the model, an error is reported indicating that the input tokens exceed the model's limit. How should this issue be debugged and resolved? #6291

Description

Self Checks

Describe your problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions