AI PDF QA with AstraDB(Cassandra)

This project demonstrates how to build a simple Question Answering (QA) system over the contents of a PDF file using modern open-source AI tools, vector databases, and local language models. The solution extracts, embeds, stores, and queries PDF text using LangChain, HuggingFace embeddings, Groq LLMs, and an AstraDB vector store powered by Cassandra.

Features

PDF Parsing: Extracts text from a given PDF file.
Text Chunking: Splits PDF text into manageable chunks for embedding.
Vector Storage: Stores vector embeddings in a Cassandra-powered AstraDB instance.
Semantic Search & QA: Runs similarity search and retrieves answers using local LLMs.
Interactive QA Console: Enables user to ask questions about the PDF content.

Getting Started

Prerequisites

Python 3.10+
An AstraDB account and API credentials
Groq API keys for LLM access
HuggingFace Transformers and embeddings
Required Python libraries (see below)

Installation

Clone the Repository

git clone https://github.com/prakhar175/PDF-qa-with-AstraDB.git
cd PDF-qa-with-AstraDB

Install Dependencies

pip install -r requirements.txt

Environment Variables

Create a .env file in the root directory with: -Get your own keys

ASTRA_DB_APPLICATION_TOKEN=your_astra_db_app_token
ASTRA_DB_ID=your_astra_db_id
GROQ_API=your_groq_api_key

Usage

Prepare Your PDF
- Place the PDF you want to query (e.g., Artificial_intelligence.pdf) in the working directory.
Run the Notebook
- Open and execute the Jupyter notebook (code.ipynb).
- The notebook will:
  - Extract text from your PDF.
  - Split and embed the text chunks.
  - Store them in AstraDB.
  - Let you query the content interactively.
Interactive QA

When prompted, type your question about the PDF. The system will return relevant answers

Type exit to end the QA session.

Code Structure

Section	Description
Imports & Setup	Loads all dependencies and API keys from `.env` file
PDF Reading & Text Extraction	Uses `PyPDF2` to extract raw text from the PDF
Text Splitting	Chunks the extracted text for efficient embedding
Embedding & Vector Storage	Embeds chunks with HuggingFace, stores in AstraDB vector table
LLM & Query Engine	Configures the Groq model, wraps the store, allows semantic QA
Interactive Console	Prompts user for questions, returns answers along with relevant context

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Artificial_intelligence.pdf		Artificial_intelligence.pdf
README.md		README.md
code.ipynb		code.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI PDF QA with AstraDB(Cassandra)

Features

Getting Started

Prerequisites

Installation

Usage

Code Structure

About

Uh oh!

Releases

Packages

Languages

prakhar175/PDF-qa-with-AstraDB

Folders and files

Latest commit

History

Repository files navigation

AI PDF QA with AstraDB(Cassandra)

Features

Getting Started

Prerequisites

Installation

Usage

Code Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages