-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Talk title
From Zero to Hero: Building a Local RAG Pipeline with Open-Source LLMs in Python
Short talk description
Explore how to build a Retrieval-Augmented Generation (RAG) pipeline entirely on a local machine using open-source models like Llama 3 or Mistral. We'll demystify RAG components—loaders, chunking, embeddings, vector stores, and local inference—all implemented with Python libraries. Learn to turn unstructured data into an intelligent, private, and customizable chatbot without relying on expensive cloud APIs. This is a practical, code-focused session perfect for data scientists and developers looking to harness the power of local LLMs.
Long talk description
The field of Large Language Models (LLMs) is rapidly evolving, and the trend toward running powerful models locally for privacy and cost-efficiency is gaining traction. This talk provides a comprehensive, hands-on guide to constructing a production-ready Retrieval-Augmented Generation (RAG) system using readily available Open-Source Software (OSS).
We will begin by defining the RAG architecture and why it’s crucial for grounding LLMs in proprietary or specific data. The core of the session will involve a step-by-step practical implementation using Python. Key components covered will include:
Data Ingestion and Preparation: Using libraries like LlamaIndex or LangChain to load diverse data formats (e.g., PDFs, text files).
Text Processing: Strategies for effective chunking and managing context window limitations.
Local Embeddings: Utilizing performant, compact, open-source embedding models (e.g., all-MiniLM-L6-v2) that can run efficiently on commodity hardware.
Vector Store Setup: Integrating a local vector database (e.g., ChromaDB or an in-memory solution) for efficient retrieval.
Local LLM Inference: Setting up and running quantized open-source LLMs (like Mistral or Llama 3 via Ollama or ctranslate2/llama.cpp wrappers) for final answer generation.
The talk emphasizes a privacy-first, cost-effective approach, demonstrating that sophisticated LLM applications are now accessible without substantial infrastructure investment. Attendees will leave with a clear understanding and a working blueprint for creating their own private, knowledge-aware applications.
What format do you have in mind?
Workshop (45-60 minutes, hands-on)
Talk outline / Agenda
Introduction to RAG (5 min): What is RAG and why run it locally? Private, cheaper, faster.
The Local RAG Stack Overview (5 min): Introducing key OSS components (Python, LangChain/LlamaIndex, Embedding Models, Vector DBs, Local LLMs).
Step 1: Data Preparation (5 min): Loading and chunking documents (Code Demo).
Step 2: Local Embeddings & Vector Store (10 min): Choosing an OSS embedding model, creating embeddings, and storing them locally (Code Demo).
Step 3: Local LLM Setup (10 min): Using tools like Ollama or dedicated Python libraries to run a quantized LLM like Llama 3 locally.
Step 4: The Retrieval and Generation Loop (10 min): Putting it all together: retrieval, context building, and generating the final answer (Code Demo).
Optimization and Next Steps (5 min): Tips for performance and future scaling.
Q&A (10 min).
Key takeaways
Understand the Retrieval-Augmented Generation (RAG) architecture and its practical benefits.
Gain a working knowledge of the key Python libraries (LangChain/LlamaIndex, Hugging Face models) used for RAG implementation.
Learn how to select and utilize open-source embedding and large language models for local inference.
Be able to build a complete, private, and cost-efficient RAG pipeline entirely on a personal computer.
Receive a reusable code template to kickstart personal or professional RAG projects.
What domain would you say your talk falls under?
Data Science and Machine Learning
Duration (including Q&A)
50 mins for the talk + 10mins for QnA
Prerequisites and preparation
Intermediate Python programming skills.
A basic understanding of machine learning or natural language processing (NLP) concepts is helpful but not strictly required.
Optional: Familiarity with concepts like vector embeddings or large language models will make the content easier to absorb.
Resources and references
LangChain / LlamaIndex official documentation.
Hugging Face Model Hub for local embedding models (e.g., BGE, all-MiniLM-L6-v2).
Ollama for local LLM deployment.
Mistral AI and Llama 3 model cards.
The original RAG paper by Lewis et al.
Link to slides/demos (if available)
Work in progress
Twitter/X handle (optional)
manishdwibedy
LinkedIn profile (optional)
https://www.linkedin.com/in/manishdwibedy/
Profile picture URL (optional)
https://drive.google.com/file/d/14OIVXQWAUIdVGI6KFRAWmbFtXSTNXO3C/view?usp=drive_link
Speaker bio
I am Manish Dwibedy, a seasoned Senior Engineer with a strong focus on full-stack development and cutting-edge AI/ML systems. I have over a decade of experience in architecting and optimizing high-performance applications using Python (Flask, FastAPI) and AWS serverless technologies.
My passion lies in the practical application of Artificial Intelligence, especially Large Language Models (LLMs). I have hands-on experience in:
- Developing and engineering a custom Retrieval-Augmented Generation (RAG) solution.
- Creating an AI/LLM Aggregator Platform and integrating various models.
- Building AI-driven tools, such as the QualiMed tool for automating test case creation using LLMs, which emphasizes a privacy-focused, local model approach.
I enjoy mentoring junior engineers and contributing to open-source methodologies. I hold a Master of Science in Computer Science from the University of Southern California.
Availability
I am available on the third Saturday of any upcoming month. Please let me know which date works best for PyDelhi.
Accessibility & special requirements
Standard projector/screen and a reliable internet connection (for potentially downloading models if a complete local setup is not pre-packaged for the demo). I will bring my own laptop.
I request a high-resolution display setup for clear code visibility during the live demo.
Speaker checklist
- I have read and understood the PyDelhi guidelines for submitting proposals and giving talks
- I will make my talk accessible to all attendees and will proactively ask for any accommodations or special requirements I might need
- I agree to share slides, code snippets, and other materials used during the talk with the community
- I will follow PyDelhi's Code of Conduct and maintain a welcoming, inclusive environment throughout my participation
- I understand that PyDelhi meetups are community-centric events focused on learning, knowledge sharing, and networking, and I will respect this ethos by not using this platform for self-promotion or hiring pitches during my presentation, unless explicitly invited to do so by means of a sponsorship or similar arrangement
- If the talk is recorded by the PyDelhi team, I grant permission to release the video on PyDelhi's YouTube channel under the CC-BY-4.0 license, or a different license of my choosing if I am specifying it in my proposal or with the materials I share
Additional comments
I believe this topic is highly relevant to the PyDelhi audience, addressing the community's growing interest in practical LLM applications while promoting the power of open-source software and local development.