issue: Upload to knowledge base sometimes fails with error 400 The content provided is empty #15020
Replies: 6 comments
-
|
I have found out that people are having similar issues: |
Beta Was this translation helpful? Give feedback.
-
我也有这种问题,但是我解决了,请查看你的ocr设置,看看你是用的哪种。建议用默认的,我怀疑是因为文件过大或者,ocr处理能力的问题 |
Beta Was this translation helpful? Give feedback.
-
|
I have OCR turned off. I only upload .md files. |
Beta Was this translation helpful? Give feedback.
-
|
Could you share the file you uploaded? @jackthgu |
Beta Was this translation helpful? Give feedback.
-
|
I ran into the same issue today. I wrote a function to check the status of the file until status = 'completed', then I let the knowledge upload happen. |
Beta Was this translation helpful? Give feedback.
-
|
I finally figured out the issue. It's actually a race condition: when you do a request to upload a file, the request returns as soon as the file was sent. But openwebui then starts extracting the content (via tika for example) then computing embeddings. It seems that as long as both (or at least the extraction) are not done the file will have a On the server side the fix should be to not crash if we assign a pending file to a knowledge base, or to refuse to add it but mention that it's because it's still pending. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.6.13
Ollama Version (if applicable)
0.6.8
Operating System
Ubuntu 22.04
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
File is uploaded without issues
Actual Behavior
File is sometimes uploaded with errors
Steps to Reproduce
When I use the official cookbok and the code to upload a file:
https://github.com/open-webui/cookbook/blob/main/knowledge/add-to-knowledge.ipynb
def upload_file(file_path):
url = f'{WEBUI_URL}/api/v1/files/'
headers = {
'Authorization': f'Bearer {TOKEN}',
'Accept': 'application/json'
}
files = {'file': open(file_path, 'rb')}
response = requests.post(url, headers=headers, files=files)
return response.json()
It can return the file_id before the file is actually processed. So when I use another method to add it to the knowledge, it fails:
def add_file_to_knowledge(knowledge_id, file_id):
url = f'{WEBUI_URL}/api/v1/knowledge/{knowledge_id}/file/add'
headers = {
'Authorization': f'Bearer {TOKEN}',
'Content-Type': 'application/json'
}
data = {'file_id': file_id}
response = requests.post(url, headers=headers, json=data)
return response.json()
I get the following error:
Error 400: The content provided is empty. Please ensure that there is text or data present before proceeding
Logs & Screenshots
I am using bge-m3 embedding model and apache tika
Additional Information
No response
Beta Was this translation helpful? Give feedback.
All reactions