PDF Reader Bot

A powerful Streamlit-based chatbot that enables interactive conversations with multiple PDF documents using advanced AI technologies. This application extracts text from uploaded PDFs, processes it into manageable chunks, and uses vector embeddings and conversational AI to answer questions about the content.

Features

Multi-PDF Support: Upload and process multiple PDF files simultaneously
AI-Powered Chat: Ask questions about your documents and get intelligent responses
Text Extraction: Automatically extracts text from PDF pages
Vector Search: Uses FAISS vector store for efficient document retrieval
Conversational Memory: Maintains context throughout the conversation
User-Friendly Interface: Clean Streamlit UI with chat-like messaging

Prerequisites

Python 3.8 or higher
OpenAI API key (required for embeddings and chat functionality)
Optional: HuggingFace API token (for alternative embedding models)

Installation

Clone the repository:

git clone <your-repository-url>
cd pdfreader-bot

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install streamlit python-dotenv PyPDF2 langchain langchain-openai faiss-cpu

Set up environment variables:

Copy the provided .env file or create a new one

Add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token_here  # Optional

Usage

Start the application:
```
streamlit run app.py
```
Interact with the app:
- Open your browser to the provided local URL (usually http://localhost:8501)
- In the sidebar, upload one or more PDF files
- Click the "Process" button to extract and index the text
- Once processing is complete, start asking questions in the chat input
- The AI will provide answers based on the content of your uploaded PDFs

How It Works

Text Extraction: Uses PyPDF2 to extract text from each page of uploaded PDFs
Text Chunking: Splits the extracted text into smaller chunks using LangChain's CharacterTextSplitter
Embeddings: Creates vector embeddings using OpenAI's embedding model
Vector Store: Stores embeddings in a FAISS vector database for efficient similarity search
Conversational Chain: Uses LangChain's ConversationalRetrievalChain with ChatOpenAI for question answering
Memory: Maintains conversation history for context-aware responses

Configuration

The application uses the following default settings (configurable in app.py):

Text chunk size: 1000 characters
Chunk overlap: 200 characters
Embedding model: OpenAI Embeddings (can be switched to HuggingFace Instructor embeddings)

Troubleshooting

API Key Issues: Ensure your OpenAI API key is correctly set in the .env file
PDF Processing Errors: Make sure your PDFs contain extractable text (not just images)
Memory Issues: For large PDFs, consider increasing system memory or reducing chunk size

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with Streamlit
Powered by LangChain and OpenAI
Vector storage using FAISS

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_detector.py		ai_detector.py
app.py		app.py
htmltemplats.py		htmltemplats.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Reader Bot

Features

Prerequisites

Installation

Usage

How It Works

Configuration

Troubleshooting

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Reader Bot

Features

Prerequisites

Installation

Usage

How It Works

Configuration

Troubleshooting

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages