🤖 Private Document AI Stack

Enterprise-grade private AI document processing platform - Chat with your documents securely using your own infrastructure, powered by Ollama LLM, ChromaDB vector store, and n8n automation workflows.

🎯 Project Overview

Private Document AI is a complete self-hosted solution that enables organizations to implement secure document AI capabilities without data leaving their infrastructure. Built with enterprise DevOps practices, this platform provides:

🔒 Complete Data Privacy: All processing happens on your infrastructure
🚀 Production-Ready Deployment: Automated CI/CD with infrastructure as code
🔧 Extensible Workflows: Visual automation with n8n
⚡ High Performance: Vector search with ChromaDB and local LLM inference
📊 Scalable Architecture: Containerized microservices design

🎯 Architecture

System Components

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Client Web    │───▶│   n8n Platform  │───▶│  Ollama LLM     │
│   Interface     │    │   :5678         │    │  :11434         │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                       │
                                ▼                       ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │   ChromaDB      │    │   Vector        │
                       │   :8000         │    │   Embeddings    │
                       └─────────────────┘    └─────────────────┘

Technology Stack

Component	Technology	Purpose	Port
Orchestration	n8n	Workflow automation & UI	5678
LLM Engine	Ollama (Llama3 8B)	Text generation & embeddings	11434
Vector Database	ChromaDB	Document storage & similarity search	8000
Infrastructure	Terraform + DigitalOcean	Cloud provisioning	-
Containerization	Docker Compose	Service orchestration	-
CI/CD	GitHub Actions	Automated deployment	-

Data Flow

📄 Document Ingestion: Upload → Text extraction → Chunking → Embedding → Vector storage
💬 Question Answering: Query → Embedding → Similarity search → Context retrieval → LLM response

🚀 Quick Start

Prerequisites

DigitalOcean account with API token
SSH key pair configured in DigitalOcean
GitHub repository with Actions enabled

🔑 SSH Key Setup Guide

Before deploying, you need to set up SSH keys for secure server access. Follow this step-by-step guide:

Step 1: Generate SSH Key Pair

# Generate a new SSH key pair specifically for this project
ssh-keygen -t ed25519 -C "private-ai-deployment" -f ~/.ssh/private-ai

# This creates two files:
# ~/.ssh/private-ai      (private key - keep secret!)
# ~/.ssh/private-ai.pub  (public key - safe to share)

Alternative for older systems:

# If ed25519 is not supported, use RSA
ssh-keygen -t rsa -b 4096 -C "private-ai-deployment" -f ~/.ssh/private-ai

Step 2: Add Public Key to DigitalOcean

Display your public key:
```
cat ~/.ssh/private-ai.pub
```
Copy the output (starts with ssh-ed25519 or ssh-rsa)
Add to DigitalOcean:
- Go to DigitalOcean Control Panel
- Navigate to Account → Security → SSH Keys
- Click "Add SSH Key"
- Paste your public key content
- Name it: private-ai-deployment
- Click "Add SSH Key"

Get your SSH Key ID:

# Method 1: Using doctl CLI (if installed)
doctl compute ssh-key list

# Method 2: Using curl
curl -X GET \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_DO_TOKEN" \
  "https://api.digitalocean.com/v2/account/keys"

Copy the numeric ID (e.g., 12345678) - you'll need this for Terraform.

Step 3: Configure GitHub Secrets

Get your private key content:
```
cat ~/.ssh/private-ai
```
Copy the entire content including -----BEGIN and -----END lines
Add to GitHub Secrets:
- Go to your repository → Settings → Secrets and variables → Actions
- Click "New repository secret"
- Name: SSH_PRIVATE_KEY
- Value: Paste the entire private key content
- Click "Add secret"

Step 4: Update Terraform Configuration

Edit terraform/main.tf and replace the SSH key ID:

resource "digitalocean_droplet" "private_ai_server" {
  # Replace with your SSH key ID from Step 2
  ssh_keys = ["12345678"]  # ← Replace this number
  # ... rest of configuration
}

🔒 Security Best Practices

✅ Never commit private keys to Git
✅ Use unique keys for each project
✅ Store private keys securely
✅ Regularly rotate SSH keys
❌ Don't share private key content
❌ Don't reuse personal SSH keys

🛠️ Troubleshooting SSH Setup

Issue	Solution
`Permission denied (publickey)`	Verify key is added to DigitalOcean and ID is correct
`Bad permissions` error	Fix permissions: `chmod 600 ~/.ssh/private-ai`
Key not found in DO	Re-upload public key to DigitalOcean account
GitHub secret not working	Ensure entire private key is copied, including headers

✅ Verification Commands

# Test SSH connection to DigitalOcean (after deployment)
ssh -i ~/.ssh/private-ai root@YOUR_DROPLET_IP

# Check key permissions
ls -la ~/.ssh/private-ai*

# Verify public key format
ssh-keygen -l -f ~/.ssh/private-ai.pub

1. Fork & Configure

# Fork this repository
git clone https://github.com/YOUR_USERNAME/private-ai.git
cd private-ai

2. Set GitHub Secrets

Navigate to your repository → Settings → Secrets and variables → Actions:

Secret Name	Description	Example
`DIGITALOCEAN_TOKEN`	DigitalOcean API token	`dop_v1_xxxxx`
`SSH_PRIVATE_KEY`	Private SSH key content	`-----BEGIN OPENSSH PRIVATE KEY-----`

3. Update Terraform Configuration

Edit terraform/main.tf and replace the SSH key ID:

resource "digitalocean_droplet" "private_ai_server" {
  ssh_keys = ["YOUR_SSH_KEY_ID"]  # Replace with your key ID
  # ... rest of configuration
}

4. Deploy

git add .
git commit -m "feat: configure deployment secrets"
git push origin main

🎉 That's it! GitHub Actions will automatically:

Provision DigitalOcean infrastructure
Install and configure all services
Deploy the complete AI stack

💻 Usage

Access Your Platform

After deployment completes, access your services:

📊 n8n Interface: http://YOUR_DROPLET_IP:5678
🔍 ChromaDB API: http://YOUR_DROPLET_IP:8000
🤖 Ollama API: http://YOUR_DROPLET_IP:11434

Document Ingestion

Navigate to n8n interface
Import n8n_workflows/ingestion_workflow.json
Activate the workflow
Send documents via webhook: POST http://YOUR_DROPLET_IP:5678/webhook/ingest-document

Chat with Documents

Import n8n_workflows/qa_workflow.json
Activate the chat workflow
Use the built-in chat interface to ask questions about your documents

🔧 Development & Customization

Local Development

# Clone repository
git clone https://github.com/YOUR_USERNAME/private-ai.git
cd private-ai

# Start services locally
docker-compose up -d

# Access local services
# n8n: http://localhost:5678
# ChromaDB: http://localhost:8000
# Ollama: http://localhost:11434

Workflow Customization

The platform includes two pre-built workflows:

ingestion_workflow.json: Document processing pipeline
qa_workflow.json: Question-answering interface

Customize these workflows in the n8n interface or create new ones for your specific use cases.

LLM Model Management

# SSH into your server
ssh root@YOUR_DROPLET_IP

# List available models
docker exec ollama_service ollama list

# Pull additional models
docker exec ollama_service ollama pull codellama:7b
docker exec ollama_service ollama pull mistral:7b

🔒 Security Considerations

Network Security

All services run on a private Docker network
Firewall configured to allow only necessary ports (22, 5678, 8000, 11434)
Consider adding SSL/TLS certificates for production use

Data Privacy

✅ No data leaves your infrastructure
✅ All processing happens locally
✅ No external API calls to OpenAI/Anthropic
✅ Full control over your data

Recommended Production Hardening

# Enable SSL with Let's Encrypt
certbot --nginx -d yourdomain.com

# Set up backup automation
# Configure log rotation
# Implement monitoring with Prometheus/Grafana
# Set up automated security updates

📊 Monitoring & Maintenance

Service Health Checks

# Check all services status
docker-compose ps

# View logs
docker-compose logs -f n8n
docker-compose logs -f ollama
docker-compose logs -f chroma

Backup Strategy

# Backup n8n workflows and data
tar -czf backup-n8n-$(date +%Y%m%d).tar.gz ./n8n_data

# Backup ChromaDB vector store
tar -czf backup-chroma-$(date +%Y%m%d).tar.gz ./chroma_data

# Backup Ollama models
tar -czf backup-ollama-$(date +%Y%m%d).tar.gz ./ollama_data

🛠️ Troubleshooting

Common Issues

Issue	Solution
`Connection refused` on port 5678	Check if firewall allows port 5678: `ufw status`
Docker containers not starting	Check logs: `docker-compose logs SERVICE_NAME`
Out of disk space	Clean up: `docker system prune -a`
n8n workflow errors	Verify environment variables in `.env` file

Debug Commands

# Check system resources
df -h
free -h
docker stats

# Check service connectivity
curl http://localhost:5678
curl http://localhost:8000/api/v1/heartbeat
curl http://localhost:11434/api/tags

🔄 CI/CD Pipeline

The automated deployment pipeline includes:

Infrastructure Provisioning (Terraform)
Environment Setup (Ubuntu + Docker)
Application Deployment (Docker Compose)
Service Verification (Health checks)
Model Preparation (Ollama model download)

Pipeline Triggers

Automatic: Push to main branch
Manual: Workflow dispatch with custom commands

📈 Scaling & Performance

Vertical Scaling

Edit terraform/main.tf to upgrade server size:

resource "digitalocean_droplet" "private_ai_server" {
  size = "s-4vcpu-8gb"  # Upgrade from s-2vcpu-4gb
}

Performance Optimization

GPU Support: Use GPU-enabled droplets for faster inference
Load Balancing: Deploy multiple instances behind a load balancer
Caching: Implement Redis for frequently accessed embeddings
CDN: Use DigitalOcean Spaces for static content delivery

💼 Commercial Licensing & Business Opportunities

📋 License Overview

This project uses a Custom Commercial License:

✅ FREE for: Personal use, education, research, and non-commercial projects
💰 PAID for: Commercial use, selling services, or creating commercial products

🚀 Commercial Use Cases

Perfect for businesses looking to:

🏢 Enterprise Document AI: Internal knowledge management systems
💡 SaaS Products: Build document processing services
🔧 Custom Solutions: White-label AI document platforms
📊 Consulting Services: AI implementation for clients

💬 Get Commercial License

Ready to use this commercially? Let's discuss:

📧 Email: [YOUR_EMAIL]
💼 LinkedIn: [YOUR_LINKEDIN]
🐙 GitHub: Contact via issues or discussions
💰 Pricing: Flexible licensing based on your use case

Custom enterprise features available:

Priority support and updates
Custom integrations and modifications
On-premise deployment assistance
Training and consultation services

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📞 Support

📖 Documentation: Wiki
🐛 Bug Reports: Issues
💬 Discussions: GitHub Discussions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

n8n - Workflow automation platform
Ollama - Local LLM inference engine
ChromaDB - Vector database
DigitalOcean - Cloud infrastructure
Terraform - Infrastructure as code

🚀 Built with ❤️ for private, secure AI document processing

For enterprise support and custom implementations, please contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
n8n_workflows		n8n_workflows
scripts		scripts
terraform		terraform
wiki		wiki
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client-interface.html		client-interface.html
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

🤖 Private Document AI Stack

🎯 Project Overview

🎯 Architecture

System Components

Technology Stack

Data Flow

🚀 Quick Start

Prerequisites

🔑 SSH Key Setup Guide

Step 1: Generate SSH Key Pair

Step 2: Add Public Key to DigitalOcean

Step 3: Configure GitHub Secrets

Step 4: Update Terraform Configuration

🔒 Security Best Practices

🛠️ Troubleshooting SSH Setup

✅ Verification Commands

1. Fork & Configure

2. Set GitHub Secrets

3. Update Terraform Configuration

4. Deploy

💻 Usage

Access Your Platform

Document Ingestion

Chat with Documents

🔧 Development & Customization

Local Development

Workflow Customization

LLM Model Management

🔒 Security Considerations

Network Security

Data Privacy

Recommended Production Hardening

📊 Monitoring & Maintenance

Service Health Checks

Backup Strategy

🛠️ Troubleshooting

Common Issues

Debug Commands

🔄 CI/CD Pipeline

Pipeline Triggers

📈 Scaling & Performance

Vertical Scaling

Performance Optimization

💼 Commercial Licensing & Business Opportunities

📋 License Overview

🚀 Commercial Use Cases

💬 Get Commercial License

🤝 Contributing

Development Workflow

📞 Support

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages