Capture everything. Forget nothing. Let Alfred handle the rest.
Alfred is an AI-powered personal knowledge management and learning assistant, themed after Alfred Pennyworth β Batman's ever-reliable butler. It captures your daily insights from text, audio, and images, automatically organizes them into a searchable knowledge base, generates spaced-repetition revision reports, and provides a conversational AI chat interface.
"I trust you'll find everything in order, sir. Your memories, neatly catalogued and ready for review."
| Feature | Description | |
|---|---|---|
| π | Multi-Modal Capture | Type notes, record/upload audio, snap/upload images β all from a gesture-driven mobile app |
| π§ | AI Processing Pipeline | Transcription, OCR, semantic clustering, topic classification, web research enrichment, and vector embedding β fully automated |
| π | Spaced Repetition Reports | Daily pipeline retrieves chunks from 1, 3, 5, and 7 days ago, grouped by topic, written in Alfred's signature tone |
| π | Flashcards | Anki-style cards with Again / Good / Easy grading, generated from your captured knowledge |
| π¬ | RAG Chat | Ask Alfred anything β semantic search retrieves relevant memories and knowledge for context-aware answers |
| ποΈ | Voice & Live Chat | Multimodal audio input and real-time bidirectional voice streaming via Gemini |
| π | Web Dashboard | Browse topics, view revision summaries, generate custom reports, and chat with your knowledge base |
| π | Resilient Uploads | Exponential backoff, per-step progress tracking, and resumable pipelines β no upload is ever lost |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
β β
β π± Mobile App π Web Dashboard β
β React Native / Expo Vanilla JS + Express + Vite β
β β’ Text / Audio / Image capture β’ RAG Chat & Voice Chat β
β β’ Flashcards & Checklist β’ Topic Browser & Reports β
β β’ Upload Queue with retry β’ Live Voice (WebSocket) β
β β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPWRITE CLOUD FUNCTIONS β
β (Python) β
β β
β audioFunction βββΊ clusteringFunction βββΊ processSegmentFunctionβ
β imageFunction ββ β β
β βββΊ vectorEmbedFunction
β βββΊ reportGeneratorFunction
β β
β dailyReportPipelineFunction Β· customReportFunction β
β revisionChunksFunction Β· vectorRetrieveFunction β
β β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QDRANT CLOUD β
β Vector Database (3072-dim) β
β β
β memory Β· knowledge_base Β· topics Β· daily report β
β previous reports Β· flashcards β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Alfred uses the right model for each task:
| Task | Model | Why |
|---|---|---|
| Speech-to-Text | Deepgram Nova-3 | Industry-leading ASR accuracy and speed |
| Image OCR | OCR.space | Reliable OCR with progressive compression fallback |
| Semantic Clustering | Groq / Llama 3.3 70B | Fast inference for chunking long-form text |
| Topic Classification | Groq / Llama 3.1 8B | Lightweight, low-latency labeling |
| Report Generation | Google Gemini 2.5 Flash | High-quality long-form writing with personality |
| Web Research | Gemini 2.5 Flash + Google Search | Grounded answers with real-time web data |
| Voice Input | Gemini 2.5 Flash | Native multimodal audio understanding |
| Live Voice Streaming | Gemini 2.0 Flash | Real-time bidirectional WebSocket streaming |
| Vector Embeddings | Gemini Embedding 001 | 3072-dim embeddings for semantic search |
- Node.js (v18+)
- Expo CLI
- Python 3.x (for Appwrite functions)
- API keys for: Deepgram, OCR.space, Google Gemini, Groq, Qdrant Cloud
# Install dependencies
npm install
# Start the Expo development server
npx expo startOpen on a development build, Android emulator, iOS simulator, or Expo Go.
cd website
npm install
node server.jsEach function in the appwrite/ directory has its own requirements.txt. Deploy them to Appwrite Cloud or a self-hosted Appwrite instance.
alfred/
βββ app/ # Expo Router screens
β βββ index.tsx # Home β card carousel + upload
β βββ checklist.tsx # Daily task checklist
β βββ _layout.tsx # Root layout + gesture navigation
βββ components/ # Reusable UI components
β βββ AudioPickerSheet.tsx # Audio record/upload bottom sheet
β βββ ImagePickerSheet.tsx # Camera/gallery bottom sheet
β βββ TextInputModal.tsx # Text note input modal
β βββ FlashcardsPanel.tsx # Anki-style flashcard viewer
β βββ ChecklistPanel.tsx # Daily revision checklist
β βββ UploadStatusButton.tsx # Floating upload progress pill
β βββ UploadDetailModal.tsx # Per-step upload progress modal
βββ services/ # API & upload logic
β βββ api.ts # Processing pipeline orchestration
β βββ alfredApi.ts # Alfred API client
β βββ appwrite.ts # Appwrite SDK config
β βββ uploadQueue.ts # Resilient job queue with retry
βββ context/
β βββ UploadContext.tsx # Global upload state management
βββ constants/ # Theme, layout, item configs
βββ website/ # Web dashboard
β βββ server.js # Express + WebSocket server
β βββ src/ # Frontend (Vite + vanilla JS)
βββ appwrite/ # Serverless cloud functions
β βββ audioFunction/ # Deepgram transcription
β βββ imageFunction/ # OCR.space processing
β βββ clusteringFunction/ # Semantic text chunking
β βββ processSegmentFunction/ # Topic + research + embedding
β βββ vectorEmbedFunction/ # Qdrant vector storage
β βββ vectorRetrieveFunction/ # Semantic search
β βββ reportGeneratorFunction/ # Alfred-persona reports
β βββ dailyReportPipelineFunction/ # Spaced-repetition orchestrator
β βββ customReportFunction/ # On-demand topic briefings
β βββ revisionChunksFunction/ # Historical chunk retrieval
βββ assets/ # Fonts & images
User captures text / audio / image
β
βΌ
Upload enters resilient job queue
(exponential backoff, resumable state)
β
βΌ
ββββββββββ΄βββββββββ
β Audio? β Image?
β Deepgram Nova-3 β OCR.space
ββββββββββ¬βββββββββ
β
βΌ
Semantic clustering (Llama 3.3 70B)
Text β coherent segments
β
βΌ
Per-segment processing:
ββ Topic classification (Llama 3.1 8B)
ββ Deduplication (deterministic UUID5)
ββ Web research (Gemini + Google Search)
ββ Embedding (Gemini Embedding 001)
ββ Store in Qdrant (memory + knowledge_base + topics)
β
βΌ
Daily spaced-repetition pipeline:
Chunks from 1 / 3 / 5 / 7 days ago
β Grouped by topic β Alfred-persona reports
β
βΌ
Flashcards, checklist, RAG chat, voice chat
All powered by semantic search over Qdrant
Report generation randomly selects from 8 distinct Alfred Pennyworth personality tones β from "witty and eloquent" to "crisp and commanding" β ensuring your daily revision never feels stale.
"Shall I remind you, sir, that you studied distributed systems three days ago and appear to have retained absolutely none of it? Allow me to refresh your memory."
This project was built for the DEV Weekend Challenge: Community.
Built with π¦ by sidx007