You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if this is a bug or just the best I can expect given the software and hardware resources. I have OpenWebUI & SearXNG installed on an 8 core Ryzen with 32GB. I've installed Ollama on another system with 16 core Ryzen 5950x with 128GB + RTX 3090. Memory is 90-95% free.
The devices are connected via 2.5GB ethernet and I didn't measure any obvious bottlenecks. I've been testing search on and off for a few months with the latest version of both OpenWebUI + SearXNG. Lots of different models tried ranging from Llama 8B to Gemma 3 14/27B, GPT OSS 20B and 120B, Qwen 3.
Even after disabling embeddings for search, I find the performance to be poor to unusable. Search takes anywhere from 1-2 minutes for smaller models in GPU to 30 minutes for models that run on CPU. The behavior is also inconsistent.
Sometimes it searches 20-30 sites and returns the search results but ignored my prompt or doesn't produce any other output. Other times it comes back with a terse response that doesn't seem to be taking into account context. If I look through the sources the information is available and if I copy/paste the text alongside the prompt it does work. Sometimes it complains that the training data ends in 2024 if I ask it for something that happened in 2025 even though the sources/facts are right there in the search results.
No errors produced otherwise. I'm tempted to build a poor man's search which just extracts all texts from SearXNG results using beautifulsoup and does a simplified RAG feed. Some early experiments with the approach produces promising results. But I don't want to hack-it a brittle solution.
What am I missing? I see a number of posts on reddit about this. Thanks in advance for your help.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I don't know if this is a bug or just the best I can expect given the software and hardware resources. I have OpenWebUI & SearXNG installed on an 8 core Ryzen with 32GB. I've installed Ollama on another system with 16 core Ryzen 5950x with 128GB + RTX 3090. Memory is 90-95% free.
The devices are connected via 2.5GB ethernet and I didn't measure any obvious bottlenecks. I've been testing search on and off for a few months with the latest version of both OpenWebUI + SearXNG. Lots of different models tried ranging from Llama 8B to Gemma 3 14/27B, GPT OSS 20B and 120B, Qwen 3.
Even after disabling embeddings for search, I find the performance to be poor to unusable. Search takes anywhere from 1-2 minutes for smaller models in GPU to 30 minutes for models that run on CPU. The behavior is also inconsistent.
Sometimes it searches 20-30 sites and returns the search results but ignored my prompt or doesn't produce any other output. Other times it comes back with a terse response that doesn't seem to be taking into account context. If I look through the sources the information is available and if I copy/paste the text alongside the prompt it does work. Sometimes it complains that the training data ends in 2024 if I ask it for something that happened in 2025 even though the sources/facts are right there in the search results.
No errors produced otherwise. I'm tempted to build a poor man's search which just extracts all texts from SearXNG results using beautifulsoup and does a simplified RAG feed. Some early experiments with the approach produces promising results. But I don't want to hack-it a brittle solution.
What am I missing? I see a number of posts on reddit about this. Thanks in advance for your help.
Beta Was this translation helpful? Give feedback.
All reactions