📚 Documentation Assistant (RAG System)

A Retrieval-Augmented Generation (RAG) based Documentation Assistant built with Streamlit, LangChain, Pinecone, and Google Vertex AI. It allows users to ingest documentation (local or scraped from the web) and query it using an LLM with relevant, context-grounded responses.

✨ Features

🔍 Document Ingestion from local files or web scraping
🧹 Processing: Cleaning, chunking, and metadata extraction
📊 Vector Store: Embeddings stored & retrieved using Pinecone
🤖 RAG System: Retrieve & augment queries with document context
💻 Streamlit Web UI: Simple Q&A and Chat interface with history
⚙️ Configurable: Manage API keys & settings via .env and config.py

📂 Project Structure

clean_version/
├── .env                      # Environment variables (DO NOT COMMIT)
├── app.py                    # Streamlit web app entrypoint
├── config.py                 # Configuration management
├── document_processor.py     # Document loading, cleaning, and chunking
├── ingestion.py              # Ingestion pipeline for documents
├── rag_system.py             # Retrieval-Augmented Generation system
├── vector_store.py           # Embedding + Pinecone vector DB manager
├── web_scraper.py            # Web scraper for documentation
├── __init__.py               # Package initializer
└── __pycache__/              # Compiled cache files

⚙️ Setup

1️⃣ Clone the Repository

git clone <your-repo-url>
cd clean_version

2️⃣ Create a Virtual Environment

python -m venv venv
source venv/bin/activate   # Linux/Mac
venv\Scripts\activate      # Windows

3️⃣ Install Dependencies

pip install -r requirements.txt

If requirements.txt is missing, install manually:

pip install streamlit requests beautifulsoup4 python-dotenv langchain langchain-community langchain-pinecone langchain-google-vertexai

4️⃣ Configure Environment Variables

Create a .env file in the root directory:

INDEX_NAME=langchain-doc-index
GOOGLE_CLOUD_PROJECT=your-gcp-project-id
GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
GEMINI_API_KEY=your-gemini-api-key
PINECONE_ENVIRONMENT=your-pinecone-env
PINECONE_API_KEY=your-pinecone-api-key
LANGCHAIN_TRACING_V2=false

⚠️ Important: Never commit .env or credentials to GitHub.

5️⃣ Run the Application

streamlit run app.py

🚀 Usage

Simple Q&A Mode

Enter a query (e.g., "How do I integrate Pinecone with LangChain?")
Retrieves relevant chunks, augments query, and returns an answer.

Chat Mode

Conversational interface with memory
Supports follow-up questions using context

🔒 Security

Exclude .env and sensitive files using .gitignore
Rotate keys if exposed
Use a secrets manager for production deployments

📌 Roadmap

Add unit tests for ingestion & RAG
Support additional vector DBs (FAISS, Weaviate)
Improve error handling for network & LLM calls
Add analytics (latency, token usage, retrieval quality)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📚 Documentation Assistant (RAG System)

✨ Features

📂 Project Structure

⚙️ Setup

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Configure Environment Variables

5️⃣ Run the Application

🚀 Usage

Simple Q&A Mode

Chat Mode

🔒 Security

📌 Roadmap

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
app.py		app.py
config.py		config.py
document_processor.py		document_processor.py
ingestion.py		ingestion.py
rag_system.py		rag_system.py
scraped_text.txt		scraped_text.txt
vector_store.py		vector_store.py
web_scraper.py		web_scraper.py

spidycoder/Ai-Doc-Assitance

Folders and files

Latest commit

History

Repository files navigation

📚 Documentation Assistant (RAG System)

✨ Features

📂 Project Structure

⚙️ Setup

1️⃣ Clone the Repository

2️⃣ Create a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Configure Environment Variables

5️⃣ Run the Application

🚀 Usage

Simple Q&A Mode

Chat Mode

🔒 Security

📌 Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages