A powerful AI-powered study assistant that processes your academic PDFs and enables natural conversation about the content using OpenAI's API and RAG (Retrieval Augmented Generation) technology.
- π€ Real AI Conversations: Powered by OpenAI GPT-3.5-turbo for natural language understanding
- π Smart PDF Processing: Upload and process academic modules, textbooks, research papers
- π RAG Implementation: Retrieval Augmented Generation for accurate, document-based responses
- π Intelligent Exam Generation: Create practice tests with 4 difficulty levels from your materials
- π― Document-Specific Responses: AI responses based strictly on your uploaded content
- π¬ Natural Language Interface: Ask questions in plain English about your documents
- π Educational Focus: Designed specifically for academic study and learning
- Upload your study materials (PDFs)
- Ask questions like:
- "What are the main concepts in this document?"
- "Explain the key principles of [topic] from my uploaded files"
- "What does Chapter 3 say about [concept]?"
- Generate exams with customizable difficulty:
- Easy: Basic recall and definitions
- Medium: Application and understanding
- Hard: Analysis and synthesis
- Expert: Critical thinking and mastery
- AI Engine: OpenAI API (GPT-3.5-turbo + text-embedding-ada-002)
- RAG System: ChromaDB for vector storage with semantic search
- PDF Processing: PyMuPDF, pdfplumber, PyPDF2 with intelligent text extraction
- Backend: Python with modular architecture
- Frontend: Streamlit for intuitive web interface
- Vector Search: OpenAI embeddings with similarity search
- Python 3.8+
- OpenAI API key (Get one here)
- 2GB+ RAM recommended for vector processing
git clone https://github.com/yourusername/Study-Chatbot.git
cd Study-Chatbot
pip install -r requirements.txt# Create .env file
echo "OPENAI_API_KEY=your_openai_api_key_here" > .envstreamlit run src/app.py- Open http://localhost:8501 in your browser
- Upload your study PDFs
- Start asking questions!
Study-Chatbot/
βββ src/
β βββ app.py # Streamlit web interface
β βββ chatbot.py # Main chatbot orchestration
β βββ rag_system.py # RAG implementation with ChromaDB
β βββ pdf_processor.py # Advanced PDF text extraction
β βββ exam_generator.py # AI-powered exam generation
β βββ config.py # Configuration management
βββ documents/ # Sample documents (optional)
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ README.md # You are here!
You: "What is this document about?"
AI: "This document is a comprehensive guide to Non-Destructive Testing (NDT)
methods, covering ultrasonic testing, radiographic inspection, and
magnetic particle testing techniques..."
You: "Generate 5 questions about NDT methods"
AI: Creates targeted multiple-choice, true/false, and essay questions
based on your specific document content.
- Document Overview: "Summarize the key topics in my uploaded files"
- Specific Queries: "What does section 4.2 say about ultrasonic testing?"
- Comparative Analysis: "Compare the advantages of different NDT methods"
- Exam Generation: Create custom practice tests with answer keys
# Required
OPENAI_API_KEY=your_openai_api_key_here
# Optional Customizations
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_EMBEDDING_MODEL=text-embedding-ada-002
CHROMA_PERSIST_DIRECTORY=./embeddings
MAX_FILE_SIZE_MB=50- Question Types: Multiple choice, True/False, Short answer, Essay
- Difficulty Levels: Easy, Medium, Hard, Expert
- Customizable Counts: Configure questions per type
- Answer Keys: Toggle show/hide functionality
- Document Ingestion: Processes PDFs with advanced text extraction
- Semantic Chunking: Intelligent text segmentation for optimal retrieval
- Vector Embedding: OpenAI embeddings for semantic similarity
- Contextual Retrieval: Finds most relevant document sections
- Response Generation: AI responses grounded in your content
- Multi-format PDF Support: Handles various PDF types and layouts
- Content Quality Filtering: Removes headers, footers, and noise
- Subject-Specific Queries: Optimizes retrieval for technical content
- Overview Generation: Synthesizes document summaries
- PDF-only Support: Currently limited to PDF documents
- English Language: Optimized for English-language content
- OpenAI Dependency: Requires active OpenAI API subscription
- Single Session: No persistent user accounts (yet)
We welcome contributions! Areas for improvement:
- Support for more document formats (DOCX, TXT)
- Multi-language support
- User authentication and session persistence
- Advanced analytics and usage tracking
- Collaborative study features
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing the GPT and embedding APIs
- ChromaDB for the vector database solution
- Streamlit for the amazing web framework
- LangChain for RAG implementation patterns
- Issues: GitHub Issues
- Documentation: Check the wiki for advanced usage
- Discussions: Share your use cases and get help
β Star this repository if it helps with your studies!
Built with β€οΈ for students and educators worldwide