Skip to content

quickpod/OpenWebUI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Running OpenWebUI on QuickPod

OpenWebUI - The Easiest Local LLM Experience

Last updated: September 10, 2025

Note

To skip to the Guide, go Further down This Page.


Introduction

I've tried several Local LLM Clients, always with the goal of getting the same polished experience and useful features as a modern online chatbot service. That's a pretty high bar, which is why I've never found a full offline replacement. That is, until I found OpenWebUI and made it a template. It's feature-packed rivaling the leading major LLM's, including deep thinking, image processing, and Web Search (which has to be enabled in Settings). It also offers a large pool of diverse models, including Google Gemma, Meta Llama, Qwen, Deepseek, and many more. And of course it's all offline, which means no rate limits.

Although I wasn't able to use it, I think the accounts feature could be useful for some wanting to scale local AI while using one server.

This has an account function, which means multiple users can generate on the same instance with different chats, models, settings, etc. Multi-GPU is utilized very well for this and gives a great experience. As long as it's using different models, GPUs can generate independently.


Guide

Note

Ollama supports running CPU-Only without GPU Acceleration, although it is slower. Use this guide but just with the OpenWebUI-CPU template.

Create Your Pod

  1. Go to QuickPod Templates

  2. Find OpenWebUI and click Select.

    image

Note

Ollama supports using multiple GPUs for large models although it's not recommended. It also supports using multiple models at the same time with different accounts, which is a feature that works very well.

  1. Choose a machine:

    • CPU: Many Cores, High speed Memory important. Will often be slower than GPU assisted. CPU RAM > Model Size.
    • GPU: High speed VRAM is the most crucial part, although capacity also plays a role. For best results, I'd recommend:
      RTX 5090, 32GB GDDR7
      RTX 4090, 24GB GDDR6X
      RTX 3090, 24GB GDDR6X
      For lower parameter models, modern budget alternatives such as the RTX 4060 TI or RTX 3060 will also work.

    image

  2. Click the Connect menu and Open the First Port (Port 8080). image image


Using OpenWebUI

Note

OpenWebUI uses the Ollama API to connect to the models. Additional model and API information can be found on the Ollama Site.

  1. Make the Admin Credentials

image

Note

All models with details can be found in the Ollama Model Collection.

2: Download your First Model

  • Go to the Models Menu on the left and type in any model from the Ollama Model Collection.
  • Click Pull [Your-Model] From Ollama.com.
  • Wait For it to download.

A Few Reccomended Models:

  • gemma3 A Lightweight, fast multi-model from Google.
  • llama3.1 A Popular model from Meta that has very large parameter sizes available.
  • deepseek-r1 A Reasoning model that has a variety of sizes.
  • gpt-oss A Lightweight GPT by OpenAI with Reasoning and well-made Quantization.

image

3: Start Chatting!

image


Note

The Ollama API has also been forwarded at internal Port 11434 and is accessable outside the container by default from the Second Port Forward Box. Several WebUI's use Ollama, so this template can also be used as an Ollama base server.

About

The QuickPod Guide for the OpenWebUI Docker Image.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published