Privacy-first training data tool for LLM fine-tuning
From AI conversations to custom-trained models β 100% local, zero configuration
EdukaAI is a self-hosted tool that helps you collect, curate, and export training data for fine-tuning Large Language Models. It captures your best AI conversations β from OpenCode, OpenWebUI, or manual entry β and turns them into high-quality datasets, all while keeping your data local and private.
π Privacy First β Your data never leaves your machine. Local SQLite database, no cloud, no tracking.
β‘ Zero Configuration β Install and run. No complex setup. Start collecting in minutes.
π£ Capture Anywhere β Manual entry, file imports, or live capture from your favorite AI tools.
β Quality Control β Review workflow ensures only your best examples reach your training dataset.
π€ Multiple Export Formats β Alpaca, ShareGPT, JSONL, CSV β works with any training pipeline.
# One-time use
npx @elgap/edukaai
# Or install globally
npm install -g @elgap/edukaai
edukaaiOpen http://localhost:3030 and start capturing.
Create samples directly in the web UI β perfect for human-crafted "golden" examples.
Upload existing datasets (Alpaca, ShareGPT, CSV) β ideal for migration.
Capture coding conversations with one click from OpenCode CLI.
Plugin: github.com/ElGap/edukaai-opencode
Export conversations from your self-hosted OpenWebUI instance.
Plugin: github.com/ElGap/edukaai-openwebui
- Create multiple datasets for different projects
- Set goals and track progress with visual milestones
- Organize by purpose: coding, creative writing, Q&A, roleplay
- Core Fields: Instruction, Input, Output, System Prompt
- Metadata: Category, Difficulty, Quality (1-5 stars), Tags
- Review Workflow: Draft β In Review β Approved/Rejected
- Bulk Operations: Approve, categorize, or delete multiple samples
- Draft-First: All captures start in Draft for your review
- Duplicate Detection: Automatic semantic similarity matching
- Auto-Enrichment: Smart categorization and quality suggestions
- Alpaca (JSON) β Industry standard
- ShareGPT (JSON) β Conversation format
- JSONL β For training pipelines
- CSV β For analysis
Any tool can send data to EdukaAI via the Universal Capture API:
curl -X POST http://localhost:3030/api/capture \
-H "Content-Type: application/json" \
-d '{
"source": "my-plugin",
"apiVersion": "1.0",
"records": [{
"instruction": "Explain quicksort",
"output": "Quicksort is a divide-and-conquer algorithm...",
"category": "coding",
"qualityRating": 4
}]
}'Endpoint: POST /api/capture
Docs: http://localhost:3030/docs
| Command | Description |
|---|---|
edukaai |
Start server (http://localhost:3030) |
edukaai reset |
Reset database |
edukaai help |
Show all commands |
Environment Variables:
EDUKAAI_HOST(default: localhost)EDUKAAI_PORT(default: 3030)EDUKAAI_DATA_DIR(default: ~/.edukaai)
- 100% Local β SQLite database on your machine
- No Cloud β No external API calls
- No Tracking β Zero analytics or telemetry
- MIT License β Full transparency
- Frontend: Vue 3 + Nuxt 4 + Tailwind CSS
- Backend: Nuxt 4 API routes
- Database: SQLite (Drizzle ORM)
git clone https://github.com/elgap/edukaai.git
cd edukaai
npm install
npm run devnpm run db:reset # Reset database
npm run test # Run tests
npm run typecheck # Type checking
npm run build # Production buildedukaai/
βββ app/ # Nuxt frontend
βββ server/ # Backend API
βββ bin/ # CLI scripts
βββ docs/ # Documentation
- Full Documentation: eduka.elgap.ai
- Opencode plugin: github.com/ElGap/edukaai-opencode
- OpenWebUI plugin: github.com/ElGap/edukaai-openwebui
Contributions welcome:
- Plugins β Build integrations for your favorite tools
- Documentation β Tutorials and examples
- Bug Reports β Help us improve
Contribution guidelines will be added soon.
MIT License β see LICENSE
- Inspired by the need for simple, private LLM training tools
- Built with Nuxt, Vue, and Tailwind
- Icons by Lucide
Built with β€οΈ for the AI community
