Skip to content

solankikeyur/ask-docs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

69 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AskDocs β€” Document Intelligence Platform

A Retrieval-Augmented Generation (RAG) system that lets administrators upload documents, and allows users to have AI-powered conversations grounded strictly in those documents. Built with Laravel 13, React 19, and OpenAI + Cohere.


Table of Contents


Description

AskDocs is a full-stack SaaS-style document-chat platform. Administrators upload PDF documents which are parsed, chunked, embedded via OpenAI, and stored in a vector database. Admins can then chat with any document and manage which users have access to which documents. End users log in and chat only with documents explicitly assigned to them.

Every chat query follows a RAG pipeline:

  1. The user message is converted to an embedding vector.
  2. Up to 25 semantically similar chunks are retrieved from the vector store.
  3. Cohere re-ranks those candidates and returns the top 10 most relevant chunks.
  4. GPT-4o mini answers the question using only that context β€” streamed back in real time.

Features

Admin

  • Upload PDF documents (up to 50 MB)
  • Automatic async document processing: text extraction β†’ chunking β†’ OpenAI embedding β†’ vector storage
  • Document search and paginated listing
  • Assign / revoke document access per user
  • Create, update, enable/disable, and delete user accounts
  • Full chat interface β€” chat with any uploaded document
  • Rename and delete individual chats or clear all chat history
  • Dashboard overview

User (Viewer)

  • Log in and chat with documents assigned by the admin
  • Persistent chat history per document
  • Real-time streamed AI responses with Markdown rendering
  • Rename and delete chats

RAG Pipeline

  • PDF parsing via pdftotext (Poppler CLI) with fallback to Smalot PdfParser
  • Overlapping text chunking via TextChunker
  • Embeddings via OpenAI text-embedding-3-small (1536 dimensions)
  • Cosine-similarity vector search with a 0.3 threshold
  • Cohere rerank-v4.0-fast re-ranking for precision (cached 10 min)
  • Query embedding caching (1 hour)
  • AI completions via OpenAI gpt-4o-mini-2024-07-18 β€” streamed via SSE
  • Conversation history (up to last 50 messages) passed to the agent

Platform

  • Role-based access control: admin vs viewer
  • Two-factor authentication (via Laravel Fortify)
  • Queue-based document processing (3 retries, 5 min timeout, 60 s backoff)
  • Storage-disk abstraction: local (public) in dev, S3/R2 in production
  • TypeScript + ESLint + Prettier code quality tooling

Tech Stack

Layer Technology
Backend Framework Laravel 13 (PHP ^8.3)
Frontend Framework React 19 + TypeScript 5.7
SPA Bridge Inertia.js 3 (server-driven SPA)
Build Tool Vite 8 + laravel-vite-plugin
Styling Tailwind CSS 4
UI Components Radix UI primitives, Headless UI, Lucide icons
Auth Laravel Fortify 1.34 (email/password + 2FA)
AI laravel/ai ^0.4.2 β€” OpenAI GPT-4o mini + embeddings
Reranking Cohere API v2 (rerank-v4.0-fast)
PDF Parsing pdftotext (Poppler) + smalot/pdfparser ^2.12
Database SQLite (dev) / MySQL (prod)
Queue Laravel database queue
Storage Local (public disk, dev) / AWS S3 (prod)
Cache Laravel database cache
Testing Pest 4
Observability Laravel Nightwatch 1.26
Routing Codegen Laravel Wayfinder (type-safe routes in TS)

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          Browser                           β”‚
β”‚              React 19 SPA  (Inertia.js)                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚ HTTP / SSE
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Laravel 13 Application                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Admin Routes    β”‚   β”‚     User (Viewer) Routes       β”‚ β”‚
β”‚  β”‚  /admin/*        β”‚   β”‚     /chat/*                    β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚           β”‚                             β”‚                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚            AiChatService  (RAG pipeline)               β”‚ β”‚
β”‚  β”‚  1. Embed query  β†’  2. Vector search  β†’               β”‚ β”‚
β”‚  β”‚  3. Cohere rerank  β†’  4. Build prompt  β†’              β”‚ β”‚
β”‚  β”‚  5. Stream GPT-4o mini response                        β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚           Queue Worker  (document ingestion)         β”‚   β”‚
β”‚  β”‚  Upload  β†’  Parse PDF  β†’  Chunk  β†’  Embed  β†’  DB    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                     β”‚              β”‚
     β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
     β”‚ SQLite/ β”‚         β”‚  S3 / Local β”‚  β”‚ OpenAI + β”‚
     β”‚  MySQL  β”‚         β”‚   Storage   β”‚  β”‚  Cohere  β”‚
     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Document lifecycle:

Admin uploads PDF
      β”‚
      β–Ό
DocumentController.store()
  β†’ stores file on disk
  β†’ creates Document (status=processing)
  β†’ dispatches ProcessDocument job
      β”‚
      β–Ό  (queue worker)
ProcessDocument.handle()
  β†’ DocumentParser.extractText()  (pdftotext β†’ Smalot fallback)
  β†’ TextChunker.chunk()
  β†’ OpenAI Embeddings (batched, 100 chunks/request)
  β†’ bulk-insert DocumentChunks
  β†’ Document (status=ready)

Folder Structure

ask-docs/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ Ai/
β”‚   β”‚   └── Agents/
β”‚   β”‚       └── AskDoc.php          # Laravel AI agent β€” system instructions + conversation history
β”‚   β”œβ”€β”€ Enums/
β”‚   β”‚   └── UserRole.php            # admin | viewer
β”‚   β”œβ”€β”€ Http/
β”‚   β”‚   β”œβ”€β”€ Controllers/
β”‚   β”‚   β”‚   β”œβ”€β”€ Admin/              # Admin-only: Dashboard, Documents, Users, Chat
β”‚   β”‚   β”‚   └── User/               # Viewer: Chat
β”‚   β”‚   └── Middleware/             # admin / viewer guards
β”‚   β”œβ”€β”€ Jobs/
β”‚   β”‚   └── ProcessDocument.php     # Async ingestion job
β”‚   β”œβ”€β”€ Models/
β”‚   β”‚   β”œβ”€β”€ Document.php            # PDF doc + storage helpers
β”‚   β”‚   β”œβ”€β”€ DocumentChunk.php       # Chunk + embedding
β”‚   β”‚   β”œβ”€β”€ Chat.php
β”‚   β”‚   β”œβ”€β”€ Message.php
β”‚   β”‚   └── User.php
β”‚   └── Services/
β”‚       β”œβ”€β”€ AiChatService.php       # RAG orchestration (embed β†’ search β†’ rerank β†’ stream)
β”‚       β”œβ”€β”€ CohereService.php       # Cohere Rerank API v2 client
β”‚       β”œβ”€β”€ DocumentParser.php      # PDF text extraction
β”‚       └── TextChunker.php         # Overlapping text chunking
β”œβ”€β”€ database/
β”‚   └── migrations/                 # Full schema: users, documents, chunks, chats, messages
β”œβ”€β”€ resources/
β”‚   └── js/                         # React 19 frontend (TypeScript)
β”œβ”€β”€ routes/
β”‚   β”œβ”€β”€ web.php                     # All web routes
β”‚   └── settings.php                # Profile / password / 2FA settings routes
β”œβ”€β”€ .env.example                    # All required env variables
└── vite.config.ts

Installation & Setup

Prerequisites

Requirement Version
PHP ^8.3
Composer >= 2
Node.js >= 20
npm >= 10
SQLite (dev default)
pdftotext (optional) Poppler CLI β€” improves PDF extraction performance

Quick Setup (one command)

composer run setup

This runs: composer install β†’ copy .env β†’ generate app key β†’ migrate β†’ npm install β†’ npm run build.


Manual Setup

# 1. Clone the repository
git clone <repo-url> ask-docs
cd ask-docs

# 2. Install PHP dependencies
composer install

# 3. Create and configure environment file
cp .env.example .env
php artisan key:generate

# 4. Configure .env (see Environment Variables section)
#    Minimum required: APP_KEY, OPENAI_API_KEY, COHERE_API_KEY

# 5. Run database migrations
php artisan migrate

# 6. Install Node dependencies and build assets
npm install
npm run build

# 7. Create the first admin user
php artisan tinker
>>> \App\Models\User::create([
...     'name'     => 'Admin',
...     'email'    => 'admin@example.com',
...     'password' => bcrypt('password'),
...     'role'     => \App\Enums\UserRole::Admin,
...     'status'   => true,
... ]);

Environment Variables

# Application
APP_NAME=AskDocs
APP_ENV=local
APP_KEY=                          # Generated by php artisan key:generate
APP_DEBUG=true
APP_URL=http://localhost

# Database (SQLite for dev, configure host/port for MySQL in prod)
DB_CONNECTION=sqlite
# DB_HOST=127.0.0.1
# DB_PORT=3306
# DB_DATABASE=ask_docs
# DB_USERNAME=root
# DB_PASSWORD=

# Queue & Cache (database driver β€” zero extra infra for dev)
QUEUE_CONNECTION=database
CACHE_STORE=database
SESSION_DRIVER=database

# File Storage
# Local dev:   DOCUMENT_STORAGE_DISK=public
# Production:  DOCUMENT_STORAGE_DISK=s3
DOCUMENT_STORAGE_DISK=public

# AWS S3 (required when DOCUMENT_STORAGE_DISK=s3)
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=us-east-1
AWS_BUCKET=

# OpenAI (required) β€” used for embeddings and chat completions
OPENAI_API_KEY=                   # Set via laravel/ai configuration

# Cohere (required for reranking)
COHERE_API_KEY=
COHERE_RERANK_MODEL=rerank-v4.0-fast

# Mail (optional for dev β€” defaults to log driver)
MAIL_MAILER=log

Note: OPENAI_API_KEY is consumed by the laravel/ai package. Refer to the Laravel AI package documentation for the exact config key if not auto-detected.


Usage

Running in Development

composer run dev

This concurrently starts:

  • php artisan serve β€” Laravel backend (:8000)
  • php artisan queue:listen --tries=1 β€” Queue worker for document processing
  • npm run dev β€” Vite HMR dev server

Key Workflows

Admin: Upload & Chat

  1. Log in at /admin
  2. Go to Documents β†’ upload a PDF
  3. Wait for status to change to Ready (processed asynchronously)
  4. Go to Chat β†’ select the document β†’ start chatting

Admin: User Management

  1. Go to Users β†’ create a new viewer account (name, email, password)
  2. Go to Documents β†’ click Assign on a document β†’ select users
  3. The user can now log in at /chat and chat with that document

User: Chat

  1. Log in β€” redirected to /chat
  2. Select a document from the sidebar (only assigned documents appear)
  3. Chat; responses are streamed in real time with Markdown support

API / Route Documentation

All routes are web routes served via Inertia.js (HTML for the first load, JSON for subsequent SPA navigation).

Auth Routes (Laravel Fortify)

Method Path Description
GET /login Login page
POST /login Authenticate
POST /logout Logout
GET /two-factor-challenge 2FA challenge

User (Viewer) Routes β€” auth + verified + viewer

Method Path Description
GET /chat Chat dashboard
GET /chat/{chat} Load a specific chat
POST /chat Send a message / start a chat (SSE stream)
PUT /chat/{chat} Rename chat
DELETE /chat/{chat} Delete a chat
DELETE /chat Delete all chats

Admin Routes β€” auth + verified + admin, prefix /admin

Method Path Description
GET /admin Dashboard
GET /admin/documents List documents (paginated, searchable)
POST /admin/documents Upload PDF (dispatches ingestion job)
POST /admin/documents/{document}/assign Assign document to users
DELETE /admin/documents/{document} Delete document + file
GET /admin/users List users
POST /admin/users Create viewer user
PUT /admin/users/{user} Update user
DELETE /admin/users/{user} Delete user
POST /admin/users/{user}/access Toggle user access
GET /admin/chat Chat dashboard
GET /admin/chat/{chat} Load a specific chat
POST /admin/chat Send a message (SSE stream)
PUT /admin/chat/{chat} Rename chat
DELETE /admin/chat/{chat} Delete a chat
DELETE /admin/chat Delete all chats

Chat endpoint response: Content-Type: text/event-stream. The X-Chat-Id response header carries the chat ID for newly created chats.

Settings Routes

Method Path Description
GET/POST/PUT/DELETE /settings/* Profile, password, 2FA management

UI Notes

  • Framework: React 19 + TypeScript with Inertia.js (SSR-compatible)
  • Component Library: Radix UI primitives + Headless UI (fully accessible)
  • Icons: Lucide React
  • Styling: Tailwind CSS v4 with tw-animate-css and class-variance-authority
  • Chat markdown: react-markdown + remark-gfm for GFM rendering
  • Streaming: @laravel/stream-react for consuming SSE chat streams
  • Code quality: ESLint 9 + Prettier 3 + TypeScript strict mode

The frontend has two separate layouts:

  • /admin/* β€” Admin panel (documents, users, chat sidebar)
  • /chat/* β€” Viewer chat interface (documents assigned to the logged-in user only)

Storage

In production set DOCUMENT_STORAGE_DISK=s3 and fill in all AWS_* variables. The application will automatically upload documents to S3 and read them back from there β€” no code changes required.

Future Improvements

  • Multi-format support: Add Word (.docx) and plain-text (.txt) parsing to DocumentParser
  • Streaming progress: Real-time document processing progress indicator in the UI
  • Semantic chunking: Replace fixed-size chunks with sentence-boundary or semantic chunkers
  • Bulk document upload: Multi-file upload with per-file status tracking

Built with Laravel Β· React Β· OpenAI Β· Cohere

About

πŸ’¬ Chat with your documents. Secure, async PDF processing with vector-based semantic search and real-time AI streaming.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors