Skip to content

PRD: Knowforge #1

@EtoboAgent

Description

@EtoboAgent

PRD — Knowforge

Product Name

Knowforge

One-Line Description

Knowforge helps startups and small businesses turn scattered documentation into searchable, AI-chat-enabled knowledge bases in minutes.


Executive Summary

Knowforge is a multi-tenant SaaS platform that allows organizations to upload existing documentation and instantly transform it into an organized knowledge base powered by Retrieval-Augmented Generation (RAG).

Users can upload files such as PDFs, DOCX, Markdown, text files, and internal documentation. Knowforge automatically processes content, structures information, enables semantic search, and provides an AI chat assistant capable of answering questions using trusted company sources.

The platform is designed around speed, simplicity, tenant isolation, and scalable infrastructure.


Problem Statement

Most growing teams already have valuable knowledge, but it is fragmented across:

  • PDFs
  • Docs
  • Notion pages
  • Wikis
  • SOPs
  • Support files
  • Product manuals
  • Internal notes
  • Policy documents

This causes:

  • Slow access to information
  • Poor onboarding
  • Repetitive support questions
  • Lost productivity
  • Inconsistent answers
  • Outdated documentation systems

Core Value Proposition

Upload your files today.
Get a searchable and chat-enabled knowledge base in minutes.


Primary Goals

  • Make knowledge instantly accessible
  • Improve onboarding speed
  • Reduce repetitive support questions
  • Enable AI chat over trusted documentation
  • Deliver fast workspace setup
  • Support multiple organizations securely

Secondary Goals

  • Improve internal search
  • Surface documentation gaps
  • Measure knowledge usage
  • Enable public support portals

Non Goals (MVP)

  • Enterprise SSO
  • Fine-tuned custom LLMs
  • Complex workflow approvals
  • Advanced OCR for scanned documents
  • Billing automation
  • Multi-language translation engine
  • CMS-grade editorial tooling

Target Users

Primary ICP

  • Startups
  • Small ecommerce businesses
  • Small operations teams
  • Growing companies with scattered docs

User Roles

Owner

Workspace owner managing users and settings.

Admin

Manages content, analytics, permissions.

Editor

Uploads and updates documentation.

Viewer

Consumes internal knowledge.

External User

Uses public KB or support chat.


Key Use Cases

Startup Onboarding

Upload SOPs, policies, team docs.

Ask:

How do we deploy production changes?

Ecommerce Support

Upload shipping, returns, warranty docs.

Ask:

How long do refunds take?

Product Documentation

Upload manuals and guides.

Ask:

How do I reset the device?

Internal Search

Ask:

Where is our vacation policy?


Product Principles

  • Fast setup over heavy configuration
  • Source-grounded answers over hallucinations
  • Simplicity over feature bloat
  • Secure tenant isolation by default
  • Strong developer-grade architecture

Functional Requirements

1. Authentication

  • Register/login
  • JWT sessions
  • Role-based access control
  • Workspace invitations

2. Multi-Tenant Workspaces

  • Create organizations
  • Unique tenant slug
  • Tenant-isolated data
  • Separate usage tracking

3. Document Management

Supported formats:

  • PDF
  • DOCX
  • TXT
  • Markdown
  • CSV
  • URLs (future)

Features:

  • Upload files
  • Delete/archive files
  • View indexing status
  • Metadata management

4. Automatic Knowledge Structuring

AI-assisted:

  • Category suggestions
  • Titles cleanup
  • Summaries
  • Tags
  • Related documents

Human override allowed.

5. Ingestion Pipeline

When files are uploaded:

  • Store raw file
  • Extract text
  • Normalize content
  • Chunk intelligently
  • Generate embeddings
  • Index vectors
  • Update status

6. Semantic Search

  • Natural language search
  • Ranked relevance results
  • Tenant-filtered retrieval

7. AI Chat (RAG)

  • Ask questions in plain language
  • Retrieve relevant chunks
  • Generate source-backed answers
  • Show citations
  • Admit uncertainty when context is insufficient

8. Knowledge Base Portal

  • Public/private access
  • Category browsing
  • Search
  • Read documents

9. Analytics Dashboard

Track:

  • Questions asked
  • Most used docs
  • Unanswered queries
  • Token usage
  • Latency
  • Active users

Answer Quality Policy

Knowforge must only answer using retrieved trusted sources.

If relevant context is weak or unavailable:

System should respond clearly that it does not have enough information.

No fabricated answers.


Non Functional Requirements

Performance

  • Search P95 < 500ms
  • Chat P95 < 4s
  • Async indexing

Scalability

  • 100+ tenants MVP benchmark
  • 50k+ documents testable architecture
  • Horizontal worker scaling

Reliability

  • Retry failed jobs
  • Health checks
  • Monitoring
  • Graceful degradation

Security

  • Tenant isolation
  • Secure secrets handling
  • Role-based permissions
  • Audit logs
  • Encrypted storage

Observability

  • Structured logs
  • Metrics
  • Traces
  • Correlation IDs

Technical Architecture

Frontend

Next.js

Backend

FastAPI

Workers

Celery + Redis

Database

PostgreSQL

Vector Search

pgvector

Storage

S3 compatible object storage

AI Layer

LLM abstraction supporting OpenAI / Bedrock

Infra

Docker
Terraform
CI/CD pipelines


High-Level System Flow

Upload Flow

  1. User uploads file
  2. File stored in object storage
  3. Document record created
  4. Worker processes file
  5. Chunks + embeddings created
  6. KB updated

Chat Flow

  1. User asks question
  2. Resolve tenant
  3. Retrieve relevant chunks
  4. Build prompt
  5. Generate answer
  6. Return response with citations

Multi-Tenant Strategy

Initial model:

Shared database + shared tables + tenant_id isolation

Isolation layers:

  • DB filtering
  • Vector retrieval filtering
  • Cache namespacing
  • Storage namespacing
  • Role-scoped access

Future upgrades:

  • Schema per tenant
  • Dedicated enterprise infra

Success Metrics

Product Metrics

  • Time to first chatable KB < 5 min
  • Documents indexed/day
  • Weekly active tenants
  • Questions per tenant

Quality Metrics

  • Citation rate
  • Unanswered query %
  • Retrieval relevance score

Engineering Metrics

  • API latency
  • Worker throughput
  • Concurrent users supported
  • Cost per 100 chats

Risks

Hallucinations

Mitigation:

  • Retrieval grounding
  • Citations
  • No-answer fallback

Tenant Leakage

Mitigation:

  • Tenant scoped queries
  • Isolation tests
  • Access guards

High LLM Costs

Mitigation:

  • Cache repeated prompts
  • Efficient chunk retrieval
  • Usage limits

Poor Search Quality

Mitigation:

  • Better chunking
  • Metadata filtering
  • Evaluation dataset

MVP Scope

Included:

  • Auth
  • Multi-tenant workspaces
  • File uploads
  • Background indexing
  • Semantic retrieval
  • AI chat with citations
  • Dashboard
  • Basic analytics
  • Cloud deployment

Excluded:

  • Billing
  • SSO
  • OCR advanced flows
  • Custom models
  • Workflow approvals

Roadmap

Phase 1 — MVP

Core platform usable end-to-end.

Phase 2 — Growth

  • Embeddable widget
  • Better analytics
  • Public API
  • Usage plans

Phase 3 — Enterprise

  • SSO
  • Audit exports
  • Dedicated environments
  • Advanced controls

Launch Definition of Done

  • Production deployed
  • CI/CD active
  • Monitoring enabled
  • Load tested
  • Tenant isolation tested
  • Documentation complete
  • Demo tenant seeded

Positioning Statement

Knowforge is the fastest way for growing teams to turn messy documentation into an AI-powered knowledge base.


Elevator Pitch

Knowforge converts scattered company knowledge into searchable, chat-enabled workspaces using RAG, with multi-tenant SaaS architecture and production-ready scalability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions