Skip to content

qool/arxcore

Repository files navigation

ArxCore - Ask your Archives

Zero-trust Document Processing by LLMs

ArxCore is a lightweight solution that allows you to prompt your documents using natural language. The key feature is zero-trust processing - all document processing happens locally on a single CPU, ensuring your sensitive data never leaves your machine.

Upload documents, create searchable memories, and interact with them using AI-powered chat. Supports both local LLMs (via Ollama) and cloud providers.

Table of Contents

Features

  • Zero-trust processing: All operations run locally on single CPU
  • Document support: Multiple formats (PDF, DOCX, TXT, MD, Python, JavaScript, JSON, YAML, CSV, HTML, XML, logs)
  • Memvid storage: Uses Memvid technology for flexible local data storage, enabling simple data sharing compared to traditional vector databases
  • Vector search: Create searchable memories from documents
  • AI chat: Natural language queries about your documents
  • Local & cloud LLMs: Ollama, OpenAI, Google, Anthropic, and custom providers
  • Web interface: Modern UI for document management and chat
  • REST API: Full programmatic access
  • Multi-user sessions: Team collaboration support

ArxCore UI Screenshot

ArxCore UI

Comparison with Traditional Solutions

Feature ArxCore (Memvid) Vector DBs Traditional DBs
Storage Efficiency ⭐⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐
Setup Complexity Simple Complex Complex
Semantic Search
Offline Usage
Zero-Trust Processing
Portability File-based Server-based Server-based
Data Sharing Single files Database dumps Database dumps
Dependencies Python + Node.js Multiple services Database server
Scalability Millions of chunks Billions of vectors Billions of records
Cost Free $$$$ $$$

Installation

Prerequisites

  • Python 3.8+
  • Node.js (for tooling)
  • Ollama with "nomic-embed-text" (model for local embedding)

Setup

# Install Python dependencies
pip install -r requirements.txt

# Install Node.js dependencies (for testing and tooling)
npm install

Install Ollama

# Install Ollama (https://ollama.ai/)
curl -fsSL https://ollama.ai/install.sh | sh

# Pull model for memory embedding
ollama pull nomic-embed-text

# Pull model for chatbot
ollama pull deepseek-r1:7b

Recommended: Virtual Environment

# Create virtual environment
python -m venv venv

# Activate it
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

Quick Start

# Start web service
npm run serve

# Open web UI with all features
http://localhost:5050

# Alternatively you can use shell interface
# Generate memory from documents
npm run generate

# Start interactive chat
npm run chat

Usage

Basic Commands

  • npm run serve - Start web service
  • npm run generate - Create memories from documents
  • npm run chat - Interactive chat with your memories
  • use option npm run <command> -- --help to get more info

Development Commands

  • npm test - Run test suite

Local LLM Setup with Ollama

For maximum privacy, use local LLMs:

# 1. Install Ollama (https://ollama.ai/)
curl -fsSL https://ollama.ai/install.sh | sh

# 2. Pull embedding model (for memory generation)
ollama pull nomic-embed-text

# 3. Pull chat model
# You can experiment with model size/quantization
# For systems with 24GB+ RAM:
ollama pull gemma3:4b
# or
ollama pull deepseek-r1:7b

# For systems with lower memory (8-16GB):
ollama pull gemma2:2b
# or
ollama pull phi3:mini

# 4. Start Ollama server
ollama serve

# 5. Generate memory from your files
npm run generate -- --help

# 6. Chat in command line
npm run chat -- --help
# or open web UI
http://localhost:5050

Memory Requirements:

  • gemma3:9b - Requires 24GB+ RAM systems
  • gemma3:4b, deepseek-r1:7b - Suitable for 16-32GB RAM systems
  • gemma2:2b, phi3:mini - Suitable for 8-16GB RAM systems
  • See Ollama model library for more options

API Documentation

ArxCore provides a REST API documented with OpenAPI 3.0.1:

Limitations

ArxCore is capable of concurrent work with multiple users, but it's designed more for small team collaboration rather than high-scale concurrent access. The limitations are:

  • No file locking during memory creation
  • In-memory session storage (not persistent)
  • File system based (not database-backed)

Troubleshooting

Common Issues

ModuleNotFoundError

# Ensure virtual environment is activated
source venv/bin/activate
which python  # Should show venv path

PDF Processing Issues

pip install PyPDF2

LLM API Keys

ArxCore supports commercial LLM providers including OpenAI, Google, and Anthropic, which require API credentials to work. Note that commercial provider support has been primarily tested with Anthropic - other providers may require additional configuration.

# Set environment variables for commercial providers
export OPENAI_API_KEY="your-key"
export GOOGLE_API_KEY="your-key"  
export ANTHROPIC_API_KEY="your-key"

Large Document Processing Use smaller chunk sizes for very large documents via the API or configuration files.

Links

Credits

ArxCore is built on top of Memvid, an innovative data storage technology that enables efficient, file-based vector storage without traditional database dependencies. Memvid's unique approach allows for:

  • Portable data storage - Memories stored as simple files that can be easily shared and moved
  • Zero-dependency architecture - No need for complex vector database installations
  • Efficient retrieval - Fast semantic search capabilities with minimal overhead

Special thanks to the Memvid project for providing the foundational technology that makes ArxCore's zero-trust document processing possible.

Future plans

  • Bundle CSS and external JS libraries
  • Allow creators to delete memories
  • Add support for more LLM services (Hugging Face, OpenRouter, Grok, Together AI...)
  • Allow inter-model chat messages
  • Experiment with embedding model

License

MIT License - see LICENSE file for details.

About

Ask your Archives

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published