Sometime Documents is staying into queue for permanently #492

ravikhunt · 2024-02-08T05:41:20Z

Sometimes status is not updating and not updating the processing file ahead and stays in queue only

It's not a very big file, its like a 1.40MB file with 1500rows and 5 columns of data

ryonsteele · 2024-02-08T13:35:06Z

Hi @ravikhunt
Try these troubleshooting steps and let us know what you find.
https://github.com/microsoft/PubSec-Info-Assistant/blob/main/docs/deployment/troubleshooting.md

ravikhunt · 2024-02-12T06:11:38Z

@ryonsteele here is the detail

{ "id": "xxxxx", "file_path": "upload/testing mock data.csv", "file_name": "testing mock data.csv", "state": "Queued", "start_timestamp": "2024-02-07 09:33:51", "state_description": "", "state_timestamp": "2024-02-07 09:49:42", "status_updates": [ { "status": "File uploaded from browser to Azure Blob Storage", "status_timestamp": "2024-02-07 09:33:51", "status_classification": "Info" }, { "status": "Pipeline triggered by Blob Upload", "status_timestamp": "2024-02-07 09:33:58", "status_classification": "Info" }, { "status": "FileUploadedFunc - FileUploadedFunc function started", "status_timestamp": "2024-02-07 09:33:58", "status_classification": "Debug" }, { "status": "FileUploadedFunc - csv file sent to submit queue. Visible in 257 seconds", "status_timestamp": "2024-02-07 09:33:58", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - Starting to parse the non-PDF file", "status_timestamp": "2024-02-07 09:38:29", "status_classification": "Info" }, { "status": "FileLayoutParsingOther - Message received from non-pdf submit queue", "status_timestamp": "2024-02-07 09:38:29", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - SAS token generated to access the file", "status_timestamp": "2024-02-07 09:38:29", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - partitioning complete", "status_timestamp": "2024-02-07 09:38:45", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - chunking complete. 3045 chunks created", "status_timestamp": "2024-02-07 09:38:47", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - chunking stored.", "status_timestamp": "2024-02-07 09:49:42", "status_classification": "Debug" }, { "status": "FileLayoutParsingOther - message sent to enrichment queue", "status_timestamp": "2024-02-07 09:49:42", "status_classification": "Debug" } ], "_rid": "XHUAAPn2+-IjAAAAAAAAAA==", "_self": "dbs/XHUAAA==/colls/XHUAAPn2+-I=/docs/XHUAAPn2+-IjAAAAAAAAAA==/", "_etag": "\"02006397-0000-0d00-0000-65c352360000\"", "_attachments": "attachments/", "_ts": 1707299382 }

ArpitaisAn0maly · 2024-02-12T16:15:48Z

Hi ravikhunt

From the detail you posted it seems chunking completed successfully and the file is stuck in enrichment queue/ embedding process. Can you please verify if your enrichment app is up and running?

ravikhunt · 2024-02-13T10:37:02Z

It's running only and run all the time, after that i used another file that was processed successfully

mbarnettHMX · 2024-02-13T16:29:32Z

We are also having an issue with csv and xlsx files (the 'products' file in the example Ice Cream data set) getting stuck in embedding process, but it's a max "requeue limit" issue, which I assume can be corrected with a change to the 'max_requeue_count', but we haven't tested that yet.

{
"id": "dXBsb2FkL0ljZSBDcmVhbS9wcm9kdWN0cy54bHN4",
"file_path": "upload/Ice Cream/products.xlsx",
"file_name": "products.xlsx",
"state": "Error",
"start_timestamp": "2024-02-09 18:28:53",
"state_description": "",
"state_timestamp": "2024-02-09 19:05:46",
"status_updates": [
{
"status": "File uploaded from browser to Azure Blob Storage",
"status_timestamp": "2024-02-09 18:28:53",
"status_classification": "Info"
},
{
"status": "Pipeline triggered by Blob Upload",
"status_timestamp": "2024-02-09 18:29:00",
"status_classification": "Info"
},
{
"status": "FileUploadedFunc - FileUploadedFunc function started",
"status_timestamp": "2024-02-09 18:29:00",
"status_classification": "Debug"
},
{
"status": "FileUploadedFunc - xlsx file sent to submit queue. Visible in 59 seconds",
"status_timestamp": "2024-02-09 18:29:00",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - Starting to parse the non-PDF file",
"status_timestamp": "2024-02-09 18:30:23",
"status_classification": "Info"
},
{
"status": "FileLayoutParsingOther - Message received from non-pdf submit queue",
"status_timestamp": "2024-02-09 18:30:23",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - SAS token generated to access the file",
"status_timestamp": "2024-02-09 18:30:23",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - partitioning complete",
"status_timestamp": "2024-02-09 18:30:41",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - chunking complete. 173 chunks created",
"status_timestamp": "2024-02-09 18:30:41",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - chunking stored.",
"status_timestamp": "2024-02-09 18:30:55",
"status_classification": "Debug"
},
{
"status": "FileLayoutParsingOther - message sent to enrichment queue",
"status_timestamp": "2024-02-09 18:30:56",
"status_classification": "Debug"
},
{
"status": "TextEnrichment - Received message from text-enrichment-queue ",
"status_timestamp": "2024-02-09 18:31:13",
"status_classification": "Debug"
},
{
"status": "TextEnrichment - detected language of text is en.",
"status_timestamp": "2024-02-09 18:31:14",
"status_classification": "Debug"
},
{
"status": "TextEnrichment - Text enrichment is complete",
"status_timestamp": "2024-02-09 18:33:49",
"status_classification": "Debug"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:34:07",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 1. Visible in 60 seconds. Error: .",
"status_timestamp": "2024-02-09 18:34:18",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e37f2a40 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:36:07",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 2. Visible in 193 seconds. Error: .",
"status_timestamp": "2024-02-09 18:36:15",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e85cb940 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:40:18",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 3. Visible in 279 seconds. Error: .",
"status_timestamp": "2024-02-09 18:40:28",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e8467d30 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:46:44",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 4. Visible in 292 seconds. Error: .",
"status_timestamp": "2024-02-09 18:46:54",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e8680430 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 18:53:01",
"status_classification": "Info"
},
{
"status": "Message requed to embeddings queue, attempt 5. Visible in 730 seconds. Error: .",
"status_timestamp": "2024-02-09 18:53:13",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 420, in poll_queue\n statusLog.upsert_document(blob_path, f'Message requed to embeddings queue, attempt {str(requeue_count)}. Visible in {str(backoff)} seconds. Error: {str(error)}.',\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e84d37f0 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
},
{
"status": "Embeddings process started with model azure-openai_text-embedding-ada-002",
"status_timestamp": "2024-02-09 19:05:35",
"status_classification": "Info"
},
{
"status": "An error occurred, max requeue limit was reached. Error description: ",
"status_timestamp": "2024-02-09 19:05:46",
"status_classification": "Error",
"stack_trace": "Traceback (most recent call last):\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 973, in _bootstrap\n self._bootstrap_inner()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 1016, in _bootstrap_inner\n self.run()\n File "/opt/python/3.10.12/lib/python3.10/threading.py", line 953, in run\n self._target(*self._args, **self._kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 289, in poll_queue_thread\n poll_queue()\n File "/tmp/8dc225b4bec7dc3/app.py", line 425, in poll_queue\n statusLog.upsert_document(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 382, in call\n result = fn(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/app.py", line 87, in encode\n response = openai.Embedding.create(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/embedding.py", line 33, in create\n response = super().create(*args, **kwargs)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 153, in create\n response, _, api_key = requestor.request(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 226, in request\n resp, got_stream = self._interpret_response(result, stream)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 619, in _interpret_response\n self._interpret_response_line(\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/openai/api_requestor.py", line 679, in _interpret_response_line\n raise self.handle_error_response(\nopenai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 28524 tokens (28524 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 230, in embed_texts\n embeddings = model_obj.encode(texts)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 289, in wrapped_f\n return self(f, *args, **kw)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 379, in call\n do = self.iter(retry_state=retry_state)\n File "/tmp/8dc225b4bec7dc3/antenv/lib/python3.10/site-packages/tenacity/init.py", line 326, in iter\n raise retry_exc from fut.exception()\ntenacity.RetryError: RetryError[<Future at 0x7675e8573b20 state=finished raised InvalidRequestError>]\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/tmp/8dc225b4bec7dc3/app.py", line 363, in poll_queue\n embedding = embed_texts(target_embeddings_model, [text])\n File "/tmp/8dc225b4bec7dc3/app.py", line 244, in embed_texts\n raise HTTPException(status_code=500, detail=f"Failed to embed: {str(error)}") from error\nfastapi.exceptions.HTTPException\n"
}
],
"_rid": "jtwnAL-HDLWcAAAAAAAAAA==",
"_self": "dbs/jtwnAA==/colls/jtwnAL-HDLU=/docs/jtwnAL-HDLWcAAAAAAAAAA==/",
"_etag": ""9701b912-0000-0100-0000-65c6778a0000"",
"_attachments": "attachments/",
"_ts": 1707505546
}

georearl · 2024-02-28T19:05:49Z

I believe this may be due to chunking of unstructured.io. In the version we have the library doesn't chunk by size, just creates a single chunk, which will crash later steps that can only cope with a chunk of a particular size. We have a ticket on the board to address this as part of the 1.1 release.

dayland · 2024-03-14T22:51:29Z

PR #558 was applied to main to address these issues. Pull latest from main and re-run make deploy.

georearl · 2024-04-16T18:13:43Z

resolved and closed due to inactivity

dayland added the bug Something isn't working label Mar 11, 2024

dayland added this to the 1.1 milestone Mar 11, 2024

This was referenced Mar 11, 2024

Geearl/6403 unstructured updates #530

Merged

.xlsx file extension not uploading #545

Closed

Merge pull request #523 and #530 from vNext-Dev for large table fixes #558

Merged

georearl closed this as completed Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometime Documents is staying into queue for permanently #492

Sometime Documents is staying into queue for permanently #492

ravikhunt commented Feb 8, 2024 •

edited

Loading

ryonsteele commented Feb 8, 2024

ravikhunt commented Feb 12, 2024

ArpitaisAn0maly commented Feb 12, 2024

ravikhunt commented Feb 13, 2024 •

edited

Loading

mbarnettHMX commented Feb 13, 2024

georearl commented Feb 28, 2024 •

edited

Loading

dayland commented Mar 14, 2024

georearl commented Apr 16, 2024

Sometime Documents is staying into queue for permanently #492

Sometime Documents is staying into queue for permanently #492

Comments

ravikhunt commented Feb 8, 2024 • edited Loading

ryonsteele commented Feb 8, 2024

ravikhunt commented Feb 12, 2024

ArpitaisAn0maly commented Feb 12, 2024

ravikhunt commented Feb 13, 2024 • edited Loading

mbarnettHMX commented Feb 13, 2024

georearl commented Feb 28, 2024 • edited Loading

dayland commented Mar 14, 2024

georearl commented Apr 16, 2024

ravikhunt commented Feb 8, 2024 •

edited

Loading

ravikhunt commented Feb 13, 2024 •

edited

Loading

georearl commented Feb 28, 2024 •

edited

Loading