Zero-trust Document Processing by LLMs
ArxCore is a lightweight solution that allows you to prompt your documents using natural language. The key feature is zero-trust processing - all document processing happens locally on a single CPU, ensuring your sensitive data never leaves your machine.
Upload documents, create searchable memories, and interact with them using AI-powered chat. Supports both local LLMs (via Ollama) and cloud providers.
- ArxCore - Ask your Archives
- Zero-trust processing: All operations run locally on single CPU
- Document support: Multiple formats (PDF, DOCX, TXT, MD, Python, JavaScript, JSON, YAML, CSV, HTML, XML, logs)
- Memvid storage: Uses Memvid technology for flexible local data storage, enabling simple data sharing compared to traditional vector databases
- Vector search: Create searchable memories from documents
- AI chat: Natural language queries about your documents
- Local & cloud LLMs: Ollama, OpenAI, Google, Anthropic, and custom providers
- Web interface: Modern UI for document management and chat
- REST API: Full programmatic access
- Multi-user sessions: Team collaboration support
| Feature | ArxCore (Memvid) | Vector DBs | Traditional DBs |
|---|---|---|---|
| Storage Efficiency | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ |
| Setup Complexity | Simple | Complex | Complex |
| Semantic Search | ✅ | ✅ | ❌ |
| Offline Usage | ✅ | ❌ | ✅ |
| Zero-Trust Processing | ✅ | ❌ | ❌ |
| Portability | File-based | Server-based | Server-based |
| Data Sharing | Single files | Database dumps | Database dumps |
| Dependencies | Python + Node.js | Multiple services | Database server |
| Scalability | Millions of chunks | Billions of vectors | Billions of records |
| Cost | Free | $$$$ | $$$ |
- Python 3.8+
- Node.js (for tooling)
- Ollama with "nomic-embed-text" (model for local embedding)
# Install Python dependencies
pip install -r requirements.txt
# Install Node.js dependencies (for testing and tooling)
npm install# Install Ollama (https://ollama.ai/)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull model for memory embedding
ollama pull nomic-embed-text
# Pull model for chatbot
ollama pull deepseek-r1:7b# Create virtual environment
python -m venv venv
# Activate it
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt# Start web service
npm run serve
# Open web UI with all features
http://localhost:5050
# Alternatively you can use shell interface
# Generate memory from documents
npm run generate
# Start interactive chat
npm run chatnpm run serve- Start web servicenpm run generate- Create memories from documentsnpm run chat- Interactive chat with your memories- use option
npm run <command> -- --helpto get more info
npm test- Run test suite
For maximum privacy, use local LLMs:
# 1. Install Ollama (https://ollama.ai/)
curl -fsSL https://ollama.ai/install.sh | sh
# 2. Pull embedding model (for memory generation)
ollama pull nomic-embed-text
# 3. Pull chat model
# You can experiment with model size/quantization
# For systems with 24GB+ RAM:
ollama pull gemma3:4b
# or
ollama pull deepseek-r1:7b
# For systems with lower memory (8-16GB):
ollama pull gemma2:2b
# or
ollama pull phi3:mini
# 4. Start Ollama server
ollama serve
# 5. Generate memory from your files
npm run generate -- --help
# 6. Chat in command line
npm run chat -- --help
# or open web UI
http://localhost:5050Memory Requirements:
gemma3:9b- Requires 24GB+ RAM systemsgemma3:4b,deepseek-r1:7b- Suitable for 16-32GB RAM systemsgemma2:2b,phi3:mini- Suitable for 8-16GB RAM systems- See Ollama model library for more options
ArxCore provides a REST API documented with OpenAPI 3.0.1:
- Interactive docs: Start server and visit http://localhost:5050/apidocs
- OpenAPI specs: Available in
docs/openapi.jsonanddocs/openapi.yaml
ArxCore is capable of concurrent work with multiple users, but it's designed more for small team collaboration rather than high-scale concurrent access. The limitations are:
- No file locking during memory creation
- In-memory session storage (not persistent)
- File system based (not database-backed)
ModuleNotFoundError
# Ensure virtual environment is activated
source venv/bin/activate
which python # Should show venv pathPDF Processing Issues
pip install PyPDF2LLM API Keys
ArxCore supports commercial LLM providers including OpenAI, Google, and Anthropic, which require API credentials to work. Note that commercial provider support has been primarily tested with Anthropic - other providers may require additional configuration.
# Set environment variables for commercial providers
export OPENAI_API_KEY="your-key"
export GOOGLE_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"Large Document Processing Use smaller chunk sizes for very large documents via the API or configuration files.
- Memvid Project: https://github.com/Olow304/memvid
- ArxCore Repository: https://github.com/qool/arxcore
ArxCore is built on top of Memvid, an innovative data storage technology that enables efficient, file-based vector storage without traditional database dependencies. Memvid's unique approach allows for:
- Portable data storage - Memories stored as simple files that can be easily shared and moved
- Zero-dependency architecture - No need for complex vector database installations
- Efficient retrieval - Fast semantic search capabilities with minimal overhead
Special thanks to the Memvid project for providing the foundational technology that makes ArxCore's zero-trust document processing possible.
- Bundle CSS and external JS libraries
- Allow creators to delete memories
- Add support for more LLM services (Hugging Face, OpenRouter, Grok, Together AI...)
- Allow inter-model chat messages
- Experiment with embedding model
MIT License - see LICENSE file for details.
