-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (Language Policy).
- Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- Please do not modify this template :) and fill in all the required fields.
Describe your problem
Parsing fails when selecting Community reports generation
I have tried both with light and general knowledge graph options several times.
I am using 4o-mini from Azure with 900k tokens limit.
OCR has been done both with Opendoc and mistral-ocr-latest
Begin at:
Sun, 16 Mar 2025 02:07:44 GMT
Duration:
3028.72 s
Progress:
02:07:44 Task has been received.
02:07:45 Page(165): OCR started65): OCR finished (790.46s)
02:20:56 Page(1
02:23:02 Page(165): Layout analysis (126.11s)65): Table analysis (5.29s)
02:23:07 Page(1
02:23:14 Page(165): Text merged (6.33s)65): Page 0
02:23:14 Page(164: Text merging finished65): Start to generate keywords for every chunk ...
02:24:14 Page(1
02:24:15 Page(165): Keywords generation 150 chunks completed in 0.47s65): Start to generate questions for every chunk ...
02:24:15 Page(1
02:24:16 Page(165): Question generation 150 chunks completed in 1.43s65): Generate 150 chunks
02:24:16 Page(1
02:24:26 Page(165): Embedding chunks (9.51s)65): Indexing done (13.20s). Task done (1015.10s)
02:24:39 Page(1
02:24:42 created task raptor
02:24:42 Task has been received.
02:24:58 Cluster one layer: 150 -> 3
02:25:08 Cluster one layer: 3 -> 2
02:25:16 Cluster one layer: 2 -> 1
02:25:17 Indexing done (0.64s). Task done (35.08s)
02:30:17 Entities extraction of chunk 26 132/156 done, 11 nodes, 11 edges, 7076 tokens.
02:30:19 Entities extraction of chunk 32 133/156 done, 17 nodes, 14 edges, 8944 tokens.
02:30:20 Entities extraction of chunk 30 134/156 done, 15 nodes, 15 edges, 8933 tokens.
02:30:21 Entities extraction of chunk 7 135/156 done, 7 nodes, 9 edges, 6755 tokens.
02:30:22 Entities extraction of chunk 11 136/156 done, 11 nodes, 6 edges, 7197 tokens.
02:30:22 Entities extraction of chunk 14 137/156 done, 10 nodes, 8 edges, 7137 tokens.
02:30:23 Entities extraction of chunk 10 138/156 done, 10 nodes, 9 edges, 8109 tokens.
02:30:24 Entities extraction of chunk 17 139/156 done, 17 nodes, 15 edges, 9001 tokens.
02:30:24 Entities extraction of chunk 16 140/156 done, 11 nodes, 10 edges, 7628 tokens.
02:30:24 Entities extraction of chunk 22 141/156 done, 31 nodes, 6 edges, 11385 tokens.
02:30:26 Entities extraction of chunk 28 142/156 done, 19 nodes, 18 edges, 10771 tokens.
02:30:27 Entities extraction of chunk 0 143/156 done, 9 nodes, 4 edges, 5549 tokens.
02:30:27 Entities extraction of chunk 9 144/156 done, 12 nodes, 10 edges, 8701 tokens.
02:30:27 Entities extraction of chunk 4 145/156 done, 9 nodes, 2 edges, 5618 tokens.
02:30:28 Entities extraction of chunk 12 146/156 done, 18 nodes, 15 edges, 9790 tokens.
02:30:29 Entities extraction of chunk 25 147/156 done, 20 nodes, 18 edges, 10715 tokens.
02:30:30 Entities extraction of chunk 1 148/156 done, 11 nodes, 11 edges, 7875 tokens.
02:30:30 Entities extraction of chunk 21 149/156 done, 36 nodes, 13 edges, 19633 tokens.
02:30:31 Entities extraction of chunk 8 150/156 done, 18 nodes, 15 edges, 9703 tokens.
02:30:31 Entities extraction of chunk 6 151/156 done, 15 nodes, 15 edges, 9202 tokens.
02:30:31 Entities extraction of chunk 2 152/156 done, 23 nodes, 9 edges, 9031 tokens.
02:30:32 Entities extraction of chunk 13 153/156 done, 23 nodes, 22 edges, 10244 tokens.
02:30:33 Entities extraction of chunk 5 154/156 done, 25 nodes, 17 edges, 12178 tokens.
02:30:34 Entities extraction of chunk 15 155/156 done, 18 nodes, 13 edges, 9851 tokens.
02:30:37 Entities extraction of chunk 3 156/156 done, 19 nodes, 17 edges, 10021 tokens.
02:30:37 Entities and relationships extraction done, 1528 nodes, 1596 edges, 1426392 tokens, 317.11s.
02:36:34 Entities merging done, 357.48s.
02:45:09 Relationships merging done, 514.56s.
02:45:09 generated subgraph for doc 57dfb8ae01c811f0817f0242ac120006 in 1189.56 seconds.
02:45:23 merging subgraph for doc 57dfb8ae01c811f0817f0242ac120006 into the global graph done in 13.72 seconds.
02:45:29 Identified 192483 candidate pairs
02:58:08 [ERROR][Exception]: Exceptions from Trio nursery (6 sub-exceptions) -- ERROR: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 290246 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}