Skip to content

Building a Local RAG Pipeline with Open-Source LLMs in Python #363

@manishdwibedy

Description

@manishdwibedy

Talk title

From Zero to Hero: Building a Local RAG Pipeline with Open-Source LLMs in Python

Short talk description

Explore how to build a Retrieval-Augmented Generation (RAG) pipeline entirely on a local machine using open-source models like Llama 3 or Mistral. We'll demystify RAG components—loaders, chunking, embeddings, vector stores, and local inference—all implemented with Python libraries. Learn to turn unstructured data into an intelligent, private, and customizable chatbot without relying on expensive cloud APIs. This is a practical, code-focused session perfect for data scientists and developers looking to harness the power of local LLMs.

Long talk description

The field of Large Language Models (LLMs) is rapidly evolving, and the trend toward running powerful models locally for privacy and cost-efficiency is gaining traction. This talk provides a comprehensive, hands-on guide to constructing a production-ready Retrieval-Augmented Generation (RAG) system using readily available Open-Source Software (OSS).

We will begin by defining the RAG architecture and why it’s crucial for grounding LLMs in proprietary or specific data. The core of the session will involve a step-by-step practical implementation using Python. Key components covered will include:

Data Ingestion and Preparation: Using libraries like LlamaIndex or LangChain to load diverse data formats (e.g., PDFs, text files).

Text Processing: Strategies for effective chunking and managing context window limitations.

Local Embeddings: Utilizing performant, compact, open-source embedding models (e.g., all-MiniLM-L6-v2) that can run efficiently on commodity hardware.

Vector Store Setup: Integrating a local vector database (e.g., ChromaDB or an in-memory solution) for efficient retrieval.

Local LLM Inference: Setting up and running quantized open-source LLMs (like Mistral or Llama 3 via Ollama or ctranslate2/llama.cpp wrappers) for final answer generation.

The talk emphasizes a privacy-first, cost-effective approach, demonstrating that sophisticated LLM applications are now accessible without substantial infrastructure investment. Attendees will leave with a clear understanding and a working blueprint for creating their own private, knowledge-aware applications.

What format do you have in mind?

Workshop (45-60 minutes, hands-on)

Talk outline / Agenda

Introduction to RAG (5 min): What is RAG and why run it locally? Private, cheaper, faster.

The Local RAG Stack Overview (5 min): Introducing key OSS components (Python, LangChain/LlamaIndex, Embedding Models, Vector DBs, Local LLMs).

Step 1: Data Preparation (5 min): Loading and chunking documents (Code Demo).

Step 2: Local Embeddings & Vector Store (10 min): Choosing an OSS embedding model, creating embeddings, and storing them locally (Code Demo).

Step 3: Local LLM Setup (10 min): Using tools like Ollama or dedicated Python libraries to run a quantized LLM like Llama 3 locally.

Step 4: The Retrieval and Generation Loop (10 min): Putting it all together: retrieval, context building, and generating the final answer (Code Demo).

Optimization and Next Steps (5 min): Tips for performance and future scaling.

Q&A (10 min).

Key takeaways

Understand the Retrieval-Augmented Generation (RAG) architecture and its practical benefits.

Gain a working knowledge of the key Python libraries (LangChain/LlamaIndex, Hugging Face models) used for RAG implementation.

Learn how to select and utilize open-source embedding and large language models for local inference.

Be able to build a complete, private, and cost-efficient RAG pipeline entirely on a personal computer.

Receive a reusable code template to kickstart personal or professional RAG projects.

What domain would you say your talk falls under?

Data Science and Machine Learning

Duration (including Q&A)

50 mins for the talk + 10mins for QnA

Prerequisites and preparation

Intermediate Python programming skills.

A basic understanding of machine learning or natural language processing (NLP) concepts is helpful but not strictly required.

Optional: Familiarity with concepts like vector embeddings or large language models will make the content easier to absorb.

Resources and references

LangChain / LlamaIndex official documentation.

Hugging Face Model Hub for local embedding models (e.g., BGE, all-MiniLM-L6-v2).

Ollama for local LLM deployment.

Mistral AI and Llama 3 model cards.

The original RAG paper by Lewis et al.

Link to slides/demos (if available)

Work in progress

Twitter/X handle (optional)

manishdwibedy

LinkedIn profile (optional)

https://www.linkedin.com/in/manishdwibedy/

Profile picture URL (optional)

https://drive.google.com/file/d/14OIVXQWAUIdVGI6KFRAWmbFtXSTNXO3C/view?usp=drive_link

Speaker bio

I am Manish Dwibedy, a seasoned Senior Engineer with a strong focus on full-stack development and cutting-edge AI/ML systems. I have over a decade of experience in architecting and optimizing high-performance applications using Python (Flask, FastAPI) and AWS serverless technologies.

My passion lies in the practical application of Artificial Intelligence, especially Large Language Models (LLMs). I have hands-on experience in:

  • Developing and engineering a custom Retrieval-Augmented Generation (RAG) solution.
  • Creating an AI/LLM Aggregator Platform and integrating various models.
  • Building AI-driven tools, such as the QualiMed tool for automating test case creation using LLMs, which emphasizes a privacy-focused, local model approach.

I enjoy mentoring junior engineers and contributing to open-source methodologies. I hold a Master of Science in Computer Science from the University of Southern California.

Availability

I am available on the third Saturday of any upcoming month. Please let me know which date works best for PyDelhi.

Accessibility & special requirements

Standard projector/screen and a reliable internet connection (for potentially downloading models if a complete local setup is not pre-packaged for the demo). I will bring my own laptop.

I request a high-resolution display setup for clear code visibility during the live demo.

Speaker checklist

  • I have read and understood the PyDelhi guidelines for submitting proposals and giving talks
  • I will make my talk accessible to all attendees and will proactively ask for any accommodations or special requirements I might need
  • I agree to share slides, code snippets, and other materials used during the talk with the community
  • I will follow PyDelhi's Code of Conduct and maintain a welcoming, inclusive environment throughout my participation
  • I understand that PyDelhi meetups are community-centric events focused on learning, knowledge sharing, and networking, and I will respect this ethos by not using this platform for self-promotion or hiring pitches during my presentation, unless explicitly invited to do so by means of a sponsorship or similar arrangement
  • If the talk is recorded by the PyDelhi team, I grant permission to release the video on PyDelhi's YouTube channel under the CC-BY-4.0 license, or a different license of my choosing if I am specifying it in my proposal or with the materials I share

Additional comments

I believe this topic is highly relevant to the PyDelhi audience, addressing the community's growing interest in practical LLM applications while promoting the power of open-source software and local development.

Metadata

Metadata

Labels

acceptedCongratulations, your talk has been accepted!proposalWish to present at PyDelhi? This label gets added when the "Talk Proposal" option is chosen.review in progressThis proposal is currently under reviewscheduledThis talk/workshop is scheduled for the next meetup, either for the same month or the coming one

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions