# 📓 The GenAI Revolution Cookbook

**Title:** How to Deploy DeepSeek-R1 Locally with Ollama, MongoDB, and a Chat UI

**Description:** Build a private DeepSeek-R1 chatbot with Ollama, MongoDB, and chat UI—no external APIs. Deployment steps for local setups or AWS.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



## Why This Stack?

You'll use Ollama to run DeepSeek-R1 locally, MongoDB to persist conversations, and Hugging Face Chat UI for a web interface. This combination gives you full control over data, avoids API costs, and lets you iterate quickly without external dependencies.

**Hardware scope:** 8B models need ~8–12 GB RAM; 14B needs ~16–24 GB; 32B needs ~30–50 GB. A 4-core CPU with 16 GB RAM is a practical starting point for 8B. GPU acceleration is optional but improves latency—see the GPU section below for setup.

**What you'll build:** A local chat interface that streams responses from DeepSeek-R1, persists conversations in MongoDB, and runs on a single VM. You'll validate end-to-end streaming, test conversation retrieval, and configure basic security.

## Prerequisites

- Ubuntu 22.04 or 24.04 VM with at least 16 GB RAM
- Root or sudo access
- Python 3.10+ and Node.js 18+ installed
- Docker installed for MongoDB

Install system dependencies:

In [None]:
# Update package lists and install curl, git, and Docker prerequisites
sudo apt update && sudo apt install -y curl git ca-certificates gnupg lsb-release

# Add Docker's official GPG key and repository
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Install Docker and start the service
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker

Install Node.js 18 LTS:

In [None]:
# Add NodeSource repository for Node.js 18.x
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -

# Install Node.js and npm
sudo apt install -y nodejs

## Step 1: Install and Configure Ollama

Ollama serves models via a local REST API. Install it and verify connectivity.

In [None]:
# Download and install Ollama binary
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama as a background service
sudo systemctl enable --now ollama

# Verify Ollama is running and reachable
curl http://localhost:11434/api/tags

You should see a JSON response listing available models (empty initially).

Pull the DeepSeek-R1 8B model:

In [None]:
# List available DeepSeek-R1 tags to confirm the exact version
ollama list | grep deepseek-r1

# Pull the 8B quantized model (adjust tag if needed)
ollama pull deepseek-r1:8b

This downloads ~5–6 GB. Verify the model loads:

In [None]:
# Send a test prompt and confirm streaming response
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-r1:8b","prompt":"Explain recursion in one sentence.","stream":false}'

You should receive a JSON object with a `response` field containing the model's answer.

**Model size guidance:** 1.5B runs in 2–4 GB RAM; 7–8B needs ~8–12 GB; 14B needs ~16–24 GB; 32B needs ~30–50 GB; 70B requires serious RAM/GPU. Start with 8B for quality vs. latency balance, then scale up if your hardware allows. For a deeper dive into selecting the best LLM for your application, including performance and hardware considerations, see our guide on [how to pick an LLM](/article/how-to-choose-an-ai-model-for-your-app-speed-cost-reliability).

## Step 2: Run MongoDB with Docker

MongoDB stores conversation history. Run it in a container with a persistent volume.

In [None]:
# Create a directory for MongoDB data persistence
mkdir -p ~/mongodb_data

# Start MongoDB container with volume mount and default port
docker run -d \
  --name mongodb \
  --restart unless-stopped \
  -v ~/mongodb_data:/data/db \
  -p 127.0.0.1:27017:27017 \
  mongo:7

Verify MongoDB is running:

In [None]:
# Check container status
docker ps | grep mongodb

# Test connection with mongosh (install if needed: sudo apt install -y mongodb-mongosh)
mongosh --eval "db.adminCommand('ping')"

You should see `{ ok: 1 }`.

**Optional: Enable authentication**

For production, create an admin user and application user:

In [None]:
# Connect to MongoDB shell
docker exec -it mongodb mongosh

# Inside mongosh, create admin user
use admin
db.createUser({
  user: "admin",
  pwd: "CHANGE_THIS_PASSWORD",
  roles: ["root"]
})

# Create application user with read/write on chatdb
use chatdb
db.createUser({
  user: "chatapp",
  pwd: "CHANGE_THIS_APP_PASSWORD",
  roles: [{ role: "readWrite", db: "chatdb" }]
})
exit

Restart MongoDB with auth enabled:

In [None]:
# Stop and remove the existing container
docker stop mongodb && docker rm mongodb

# Start with authentication required
docker run -d \
  --name mongodb \
  --restart unless-stopped \
  -v ~/mongodb_data:/data/db \
  -p 127.0.0.1:27017:27017 \
  -e MONGO_INITDB_ROOT_USERNAME=admin \
  -e MONGO_INITDB_ROOT_PASSWORD=CHANGE_THIS_PASSWORD \
  mongo:7 --auth

Update your connection string to include credentials:

In [None]:
mongodb://chatapp:CHANGE_THIS_APP_PASSWORD@localhost:27017/chatdb?authSource=chatdb

## Step 3: Set Up Hugging Face Chat UI

Clone the Chat UI repository and configure it to use Ollama and MongoDB.

In [None]:
# Clone the Hugging Face Chat UI repository
git clone https://github.com/huggingface/chat-ui.git
cd chat-ui

# Install dependencies
npm install

Create a `.env.local` file with the following configuration:

In [None]:
# MongoDB connection string (use authenticated URI if you enabled auth)
MONGODB_URL=mongodb://localhost:27017/chatdb

# Ollama API base URL
OLLAMA_BASE_URL=http://localhost:11434

# Model configuration for Chat UI
MODELS=`[
  {
    "name": "deepseek-r1:8b",
    "displayName": "DeepSeek-R1 8B",
    "description": "Local reasoning model via Ollama",
    "parameters": {
      "temperature": 0.7,
      "max_new_tokens": 2048
    },
    "endpoints": [{
      "type": "ollama",
      "url": "http://localhost:11434",
      "ollamaName": "deepseek-r1:8b"
    }]
  }
]`

**Key configuration notes:**

- `MONGODB_URL`: Connection string for MongoDB. Use `chatdb` as the database name.
- `OLLAMA_BASE_URL`: Points to your local Ollama instance.
- `MODELS`: JSON array defining available models. The `type: "ollama"` tells Chat UI to use Ollama's API format.

Start the development server:

In [None]:
# Run the Chat UI dev server on port 3000
npm run dev

The UI will be available at `http://localhost:3000`. Open it in a browser and confirm the model appears in the dropdown.

## Step 4: Validate End-to-End Streaming

Test that the UI streams responses from Ollama and persists conversations in MongoDB.

**UI test:**

1. Open `http://localhost:3000` in a browser.
2. Select "DeepSeek-R1 8B" from the model dropdown.
3. Send a prompt: "Explain how binary search works."
4. Confirm tokens stream in real-time and the response completes.

**Python validation script:**

Install Python dependencies:

In [None]:
pip install requests pymongo

Create `validate.py`:

In [None]:
import requests
import pymongo
import time

# Test Ollama API directly
ollama_url = "http://localhost:11434/api/generate"
payload = {
    "model": "deepseek-r1:8b",
    "prompt": "What is the capital of France?",
    "stream": False
}

print("Testing Ollama API...")
response = requests.post(ollama_url, json=payload)
if response.status_code == 200:
    print("Ollama response:", response.json().get("response", "")[:100])
else:
    print("Ollama error:", response.status_code, response.text)

# Test MongoDB connectivity and write a test document
print("\nTesting MongoDB...")
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["chatdb"]
collection = db["test_conversations"]

test_doc = {
    "timestamp": time.time(),
    "prompt": "Test prompt",
    "response": "Test response"
}
result = collection.insert_one(test_doc)
print("Inserted document ID:", result.inserted_id)

# Retrieve and verify
retrieved = collection.find_one({"_id": result.inserted_id})
print("Retrieved document:", retrieved)

client.close()
print("\nValidation complete.")

Run the script:

In [None]:
python validate.py

You should see successful Ollama and MongoDB interactions.

## Step 5: Production Deployment

For production, build the UI, run it with a process manager, and add a reverse proxy with HTTPS.

**Build the UI:**

In [None]:
# Create optimized production build
npm run build

**Run with PM2:**

Install PM2 globally:

In [None]:
sudo npm install -g pm2

Create `ecosystem.config.js`:

```javascript
module.exports = {
  apps: [{
    name: "chat-ui",
    script: "npm",
    args: "start",
    env: {
      NODE_ENV: "production",
      PORT: 3000
    },
    instances: 1,
    autorestart: true,
    watch: false,
    max_memory_restart: "1G"
  }]
};
```

Start the app:

In [None]:
# Start the UI with PM2
pm2 start ecosystem.config.js

# Save PM2 process list and enable startup script
pm2 save
pm2 startup

**Reverse proxy with Caddy:**

Install Caddy:

In [None]:
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/caddy-stable-archive-keyring.gpg] https://dl.cloudsmith.io/public/caddy/stable/deb/debian any-version main" | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install -y caddy

Create `/etc/caddy/Caddyfile`:

In [None]:
your-domain.com {
    reverse_proxy localhost:3000
    
    # Enable automatic HTTPS via Let's Encrypt
    tls your-email@example.com
    
    # Add basic auth (generate hash with: caddy hash-password)
    basicauth / {
        admin $2a$14$HASHED_PASSWORD_HERE
    }
    
    # Security headers
    header {
        Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
        X-Content-Type-Options "nosniff"
        X-Frame-Options "DENY"
    }
}

Generate a hashed password:

In [None]:
caddy hash-password

Paste the output into the Caddyfile, then reload:

In [None]:
sudo systemctl reload caddy

Your UI is now accessible at `https://your-domain.com` with automatic TLS and basic auth.

## Optional: GPU Acceleration

If you have an NVIDIA GPU, enable it for faster inference.

**Install NVIDIA drivers:**

In [None]:
# Install NVIDIA drivers (adjust version as needed)
sudo apt install -y nvidia-driver-535

# Reboot to load drivers
sudo reboot

After reboot, verify:

In [None]:
nvidia-smi

**Install nvidia-container-toolkit:**

In [None]:
# Add NVIDIA container toolkit repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install toolkit and restart Docker
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

**Test GPU access in Docker:**

In [None]:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

You should see GPU info.

**Configure Ollama to use GPU:**

Ollama detects GPUs automatically. Verify by checking logs:

In [None]:
sudo journalctl -u ollama -f

You should see messages indicating GPU initialization.

**Validate performance:**

Run the same prompt with and without GPU and compare latency. GPU should deliver 2–5x faster token generation for 8B models.

## Firewall and Network Security

Restrict access to internal services:

In [None]:
# Allow SSH, HTTP, and HTTPS
sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Deny external access to Ollama and MongoDB
sudo ufw deny 11434/tcp
sudo ufw deny 27017/tcp

# Enable firewall
sudo ufw enable

For cloud VMs, configure security groups to allow only your IP on port 22 and public access on 80/443.

## Troubleshooting

**Ollama not responding:**

Check service status and logs:

In [None]:
sudo systemctl status ollama
sudo journalctl -u ollama -n 50

Restart if needed:

In [None]:
sudo systemctl restart ollama

**Chat UI can't connect to Ollama:**

Verify `OLLAMA_BASE_URL` in `.env.local` matches the running instance. Test with curl:

In [None]:
curl http://localhost:11434/api/tags

**MongoDB connection errors:**

Check container logs:

In [None]:
docker logs mongodb

Verify the connection string format and credentials if auth is enabled.

**High latency:**

If latency is high, downgrade model size or ensure you have adequate RAM and fast storage. Understanding when to use smaller models versus larger ones can help you optimize both cost and performance—explore our analysis on [small vs large language models](/article/small-language-models-vs-large-language-models-when-to-use-each-2) for practical scenarios.

**DeepSeek-R1 "thinking" tokens visible:**

DeepSeek-R1 may output reasoning tokens. To suppress them, add a system prompt in the UI settings or filter the stream in middleware. Check Chat UI documentation for custom prompt templates.

## Next Steps

- **Add authentication:** Integrate OAuth or JWT-based auth for multi-user access.
- **Enable observability:** Add structured logging with Winston or Pino, and scrape logs for latency metrics.
- **Scale with Docker Compose:** Create a `docker-compose.yml` with services for Ollama, MongoDB, and Chat UI for reproducible deployments.
- **Optimize prompts:** To ensure your most important instructions aren't missed in long prompts, check out our tips on [placing critical info in long prompts](/article/lost-in-the-middle-placing-critical-info-in-long-prompts).
- **Deploy to cloud:** Use Terraform or cloud-init scripts to automate VM provisioning and service setup on AWS, GCP, or Azure.

You now have a fully local, cost-free chat interface powered by DeepSeek-R1, with persistent conversations and production-ready deployment options.