ContentsInsightChatBotRAG processes PDFs, videos to provide accurate, AI-driven Chatbot based on the analyzed content and Chat history.
This repository hosts an advanced AI chatbot that processes PDF documents and video URLs (including YouTube) to deliver intelligent, context-aware conversations. Using Retrieval-Augmented Generation (RAG) and DeepLake Vector DB, the chatbot can understand complex queries, retain chat history, and provide follow-up answers.
- AI Chatbot: Capable of real-time Q&A over uploaded PDFs and video URLs.
- Contextual Conversations: The bot retains and utilizes chat history to answer follow-up questions accurately.
- RAG-Driven Answers: Utilizes retrieval-augmented generation to deliver answers based on stored document embeddings.
- PDF & Video Input: Supports direct PDF uploads and video URLs (e.g., YouTube) as inputs.
- Customizable: Leverages prompt engineering for different response styles, including conversation and question answering.
- Scalable: Dockerized for easy deployment across environments, with CI/CD pipelines for seamless updates.
- Python: The core language powering the application.
- Streamlit: Used to create an interactive web interface for the chatbot.
- Langchain: Manages interactions between the chatbot and large language models.
- GPT-3.5-turbo API: Provides natural language understanding and generation for the chatbot.
- Nomic AI Embedding for embedding generation.
- DeepLake Vector DB: Stores embeddings of processed content for fast retrieval and accurate answers.
- Prompt Engineering: Designed custom prompts for efficient interaction with the AI and handling chat history.
- Docker: Enables containerization for consistent deployments across different environments.
- GitHub Actions: Automates testing, building, and deployment processes.
- yt_dlp: Downloads and processes YouTube videos, making them searchable by the chatbot.
- Docker installed and running
- Conda environment set up
-
Clone the Repository:
git clone https://github.com/saireddy57/ContentsInsightChatBotRAG.git
cd ContentsInsightChatBotRAG
-
Set Up API Keys: Create a .env file in the project root directory to store your API keys:
OPENAI_API_KEY=
ACTIVELOOP_TOKEN=
- Install Dependencies: Activate your Conda environment and install the required packages:
pip install -r requirements.txt
- Run the Application: Launch the chatbot interface locally with Streamlit:
streamlit run ui.py
- Upload PDF documents or paste a YouTube video URL.
- Ask questions based on the uploaded content.
- The chatbot will maintain context and answer follow-up queries seamlessly.
The chatbot is production-ready and can be deployed to the cloud using Docker and GitHub Actions.
1. Modify the .github/workflows/ci.yml and .github/workflows/cd.yml files to include your credentials.
2. Store sensitive keys like OPENAI_API_KEY and ACTIVELOOP_TOKEN in GitHub Secrets.
1. Select an Ubuntu 22.04 and create t3 large computation machine and 30 GB EBS.
2. Set up a GitHub runner on an AWS EC2 instance or other cloud platforms. Once the pipeline runs successfully, use the provided DNS with the server port to access the chatbot.
The chatbot is packaged as a Docker container for consistent deployments across environments.
- Build Docker Image:
docker build -t chatbot-app .
- Run Docker Container:
docker run -p 8501:8501 chatbot-app
Once running, the chatbot will be accessible at http://localhost:8501
The project uses 'GitHub Actions' to automate building, testing, and deploying the chatbot. 'ci.yml': Defines the continuous integration process. 'cd.yml': Automates deployment to your cloud environment.
Make sure to configure your credentials and secrets in GitHub Actions for seamless cloud deployment.