DocuAI is a Flask-based web application that enables users to upload, process, and interact with documents (PDF, DOCX, PPTX, TXT) using natural language. It leverages advanced retrieval-augmented generation (RAG) techniques, semantic search, and large language models (LLMs) to provide accurate, context-aware answers to user queries about their documents. The system supports multilingual queries, voice input, and a modern chat interface.
- Document Upload: Supports PDF, DOCX, PPTX, and TXT files.
- Semantic Chunking: Splits documents into meaningful chunks using spaCy.
- Embeddings & Vector Search: Generates embeddings (DeepInfra/OpenAI API) and stores them in Pinecone for semantic search.
- Hybrid Retrieval: Combines vector similarity (Pinecone) and BM25 keyword search, reranked by a cross-encoder for best relevance.
- LLM-Powered Q&A: Uses Groq API to rewrite queries and generate grounded, context-aware answers.
- Multilingual Support: Detects and translates queries/responses using
langdetectanddeep-translator. - Voice Input: Users can ask questions via speech, transcribed and translated as needed.
- Modern Web UI: Responsive chat interface with document upload, language selection, and voice controls.
User (Web UI)
│
▼
Flask Backend (Python)
│
├─ Document Extraction (PyPDF2, python-docx, python-pptx)
├─ Semantic Chunking (spaCy)
├─ Embedding Generation (DeepInfra/OpenAI)
├─ Vector Storage & Search (Pinecone)
├─ Hybrid Retrieval (BM25, CrossEncoder)
├─ Query Rewriting & Q&A (Groq API)
├─ Multilingual & Voice Support (langdetect, deep-translator, SpeechRecognition)
▼
LLM APIs / Pinecone
-
Clone the repository:
git clone https://github.com/parthjha03/DocuAI.git cd docuai -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.envfile in the project root with the following keys:MODEL=your_groq_model_name GROQ_API_KEY=your_groq_api_key PINECONE_API_KEY=your_pinecone_api_key INDEX_NAME=your_pinecone_index_name DEEPINFRA_API_KEY=your_deepinfra_api_key
- Create a
-
Download spaCy model:
python -m spacy download en_core_web_sm
-
Run the application:
python app.py
-
Access the app:
- Open your browser and go to
http://localhost:5000
- Open your browser and go to
- Upload Documents: Use the sidebar to upload PDF, DOCX, PPTX, or TXT files.
- Ask Questions: Type or speak your question in the chat interface.
- Language Support: Select your preferred language from the dropdown.
- Voice Input: Click the microphone button to record your question.
- Backend: Flask, Python
- Frontend: HTML, CSS, JavaScript
- NLP & Embeddings: spaCy, NLTK, DeepInfra/OpenAI, sentence-transformers
- Vector Database: Pinecone
- LLM APIs: Groq
- Translation & Language Detection: deep-translator, langdetect
- Voice Recognition: SpeechRecognition, Google Speech API
.
├── app2.py # Main Flask backend
├── utils.py # Utility functions (e.g., token counting)
├── templates/
│ └── index.html # Main frontend template
├── uploads/ # Uploaded documents
├── requirements.txt # Python dependencies
└── .env # Environment variables (not committed)
- User uploads a document.
- User asks a question (text or voice, any language).
- System translates and rewrites the query for optimal retrieval.
- Relevant document chunks are retrieved and reranked.
- LLM generates a grounded answer using the retrieved context.
- Answer is translated back to the user's language and displayed.
MIT License
DocuAI brings the power of LLMs and semantic search to your documents, making them truly interactive and accessible.