# üèõÔ∏è Professional Enterprise RAG Pipeline
### Features: Parallel PDF Ingestion, Weaviate Hybrid Search, Llama 4

**Requirements:**
1. **Weaviate** running in Docker (`localhost:8080`).
2. **Ollama** running with `llama4:scout` pulled.

In [1]:

import subprocess
import time
import os

# 1. Install Ollama (this step is blocking by nature)
print("Installing Ollama...")
!curl -fsSL https://ollama.com/install.sh | sh

Installing Ollama...
>>> Installing ollama to /usr/local
>>> Downloading ollama-linux-amd64.tgz
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [2]:
import subprocess
import time
import asyncio
import nest_asyncio

# Necessary for Jupyter/Notebook environments
nest_asyncio.apply()
def setup_ollama():
    # 1. Start the Ollama server in the background
    print("Starting Ollama server...")
    ollama_log = open("ollama_server.log", "w")

    # Use start_new_session to ensure the server keeps running independently
    subprocess.Popen(
        ["ollama", "serve"],
        stdout=ollama_log,
        stderr=ollama_log,
        start_new_session=True
    )

    # 2. Give the server a moment to initialize
    time.sleep(5)

    # 3. Pull the model (Blocking call to ensure it's ready before use)
    print("Downloading Ollama wizardlm2:7b'... Please wait, this may take a few minutes.")
    try:
        subprocess.run(["ollama", "pull", "wizardlm2:7b"], check=True)
        print("‚úÖ Model downloaded and server is ready!")
    except subprocess.CalledProcessError as e:
        print(f"‚ùå Error pulling model: {e}")

if __name__ == "__main__":
    setup_ollama()
    # Your weather or LLM logic goes here

Starting Ollama server...
Downloading Ollama wizardlm2:7b'... Please wait, this may take a few minutes.
‚úÖ Model downloaded and server is ready!


In [3]:
# Check if the Ollama server is awake and sees your model
!ollama list

NAME            ID              SIZE      MODIFIED               
wizardlm2:7b    c9b1aff820f2    4.1 GB    Less than a second ago    


In [4]:
pip install weaviate langchain_huggingface langchain_community langchain_weaviate langchain_text_splitters langchain_core langchain_ollama pypdf

Collecting weaviate
  Downloading weaviate-0.1.2-py3-none-any.whl.metadata (296 bytes)
Collecting langchain_huggingface
  Downloading langchain_huggingface-1.2.0-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain_weaviate
  Downloading langchain_weaviate-0.0.6-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain_text_splitters
  Downloading langchain_text_splitters-1.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain_ollama
  Downloading langchain_ollama-1.0.1-py3-none-any.whl.metadata (2.5 kB)
Collecting pypdf
  Downloading pypdf-6.6.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain_community)
  Downloading langchain_classic-1.0.1-py3-none-any.whl.metadata (4.2 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain_community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0

In [5]:
# 1. INSTALL LATEST LIBRARIES
# !pip install -U langchain-ollama langchain-weaviate langchain-huggingface weaviate-client pypdf

import os
import multiprocessing
from concurrent.futures import ProcessPoolExecutor
from google.colab import files

import weaviate
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_ollama import OllamaLLM
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print("‚úÖ Core modules loaded.")

‚úÖ Core modules loaded.


## ‚öôÔ∏è 1. Infrastructure Setup

In [6]:
import weaviate
print(weaviate.__version__)


4.19.2


In [38]:
import weaviate
from weaviate.classes.config import Configure, VectorDistances
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_huggingface import HuggingFaceEmbeddings
import logging
from transformers import logging as hf_logging

# Set HuggingFace logging to ERROR to suppress info/download messages
hf_logging.set_verbosity_error()

def initialize_colab_rag():
    print("üöÄ Starting Embedded Weaviate (Colab, v4)...")

    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

    client = weaviate.connect_to_embedded(
        persistence_data_path="./weaviate_data"
    )

    index_name = "EnterpriseDocs"

    # --- ADDED: Vector Index Configuration ---
    if not client.collections.exists(index_name):
        client.collections.create(
            name=index_name,
            # This is where you define your vectorIndexConfig
            vector_index_config=Configure.VectorIndex.hnsw(
                distance_metric=VectorDistances.COSINE,
                ef_construction=128, # Increased for better recall
                max_connections=32 # Increased for better graph connectivity
            )
        )
    # -----------------------------------------

    vectorstore = WeaviateVectorStore(
        client=client,
        index_name=index_name,
        embedding=embeddings,
        text_key="content"
    )

    print("‚úÖ SUCCESS: Embedded Weaviate is live with custom config.")
    return vectorstore, client

vectorstore, client = initialize_colab_rag()


üöÄ Starting Embedded Weaviate (Colab, v4)...


INFO:weaviate-client:Started /root/.cache/weaviate-embedded: process ID 15785


‚úÖ SUCCESS: Embedded Weaviate is live with custom config.


## HSNW parameteres

| Parameter | Purpose |
| :---- | :---- |
| **distance\_metric** | Defines how similarity is calculated (e.g., COSINE, DOT, L2\_SQUARED). |
| **ef\_construction** | Higher values lead to better recall but slower index building. |
| **max\_connections** | The maximum number of connections per element in the HNSW graph. |
| **quantizer** | Used for Product Quantization (PQ) to reduce memory usage. |


## HSNW parameteres trad-off


| Parameter | When to Increase it | Trade-off |
| :---- | :---- | :---- |
| **ef\_construction** | If your search results aren't accurate enough (low recall). | Slower data ingestion/importing. |
| **max\_connections** | If you have very high-dimensional vectors or a complex dataset. | Higher memory (RAM) usage. |
| **ef (Search)** | If you want better accuracy during a specific search query. | Slower query response time. |

## HSNW parameteres how to tune


| Strategy | max\_connections | ef\_construction | Best For... |
| :---- | :---- | :---- | :---- |
| **Faster** | 16 | 64 | Autocomplete, High-speed APIs |
| **Accurate** | 64 | 256+ | RAG, Research, Deep Retrieval |
| **Hybrid** | 32 (Default) | 128 (Default) | E-commerce, General Helpdesks |



##We use small embedding to retrive and an intelligent llama 4 for inference from queries.

In [8]:
import weaviate
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_ollama import OllamaLLM

try:
    # Small Searcher (CPU)
    embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

    # Large Brain (Ollama 0.3+)
    llm = OllamaLLM(model='wizardlm2:7b')

    # Connect to the already running embedded instance
    #client = weaviate.connect_to_local(port=8079, grpc_port=50050)

    vectorstore = WeaviateVectorStore(
        client=client,
        index_name='EnterpriseDocs',
        embedding=embeddings,
        text_key='content'
    )
    print("‚úÖ Infrastructure connected successfully.")

except Exception as e:
    print(f"‚ùå Connection Error: {e}")


‚úÖ Infrastructure connected successfully.


## üìÇ 2. Parallel Ingestion Swarm
Upload your PDFs here. The agents will process them across all CPU cores.

In [9]:
# 1. Install faster PDF library
!pip install -q pymupdf langchain-community

import multiprocessing
from concurrent.futures import ThreadPoolExecutor
from google.colab import files
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Use PyMuPDF (fitz) - it's much faster than PyPDFLoader
def pdf_agent(file_path):
    try:
        loader = PyMuPDFLoader(file_path)
        # We split here so the heavy lifting is done in parallel
        splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
        return loader.load_and_split(splitter)
    except Exception as e:
        print(f"Error processing {file_path}: {e}")
        return []

print("Select your PDF documents:")
uploaded = files.upload()
file_list = list(uploaded.keys())

if file_list:
    # Colab usually has 2 cores; threads work better for I/O bound tasks
    cores = multiprocessing.cpu_count()
    print(f"üöÄ Scaling to {cores} threads using PyMuPDF...")

    all_chunks = []

    # ThreadPoolExecutor reduces the overhead of "pickling" data between processes
    with ThreadPoolExecutor(max_workers=cores) as executor:
        results = list(executor.map(pdf_agent, file_list))

    # Flatten the list of lists
    for sublist in results:
        all_chunks.extend(sublist)
    vectorstore = WeaviateVectorStore(
        client=client,
        index_name="EnterpriseDocs",
        embedding=embeddings,
        text_key="content",

        ## BM25F Search with this we are addding field if you remove this your query will be bm25
        attributes=["title", "is_header"]

    )
    # Ensure your vectorstore is initialized before this line
    vectorstore.add_documents(all_chunks)

    print(f"‚úÖ Successfully indexed {len(all_chunks)} chunks.")
else:
    print("No files selected.")

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m24.1/24.1 MB[0m [31m75.2 MB/s[0m eta [36m0:00:00[0m
[?25hSelect your PDF documents:


Saving SQL CheatSheet Deeplytic Technologies .pdf to SQL CheatSheet Deeplytic Technologies .pdf
Saving high-performance-python-practical-performant-programming-for-humans-2nbsped-1492055026-9781492055020_compress.pdf to high-performance-python-practical-performant-programming-for-humans-2nbsped-1492055026-9781492055020_compress.pdf
Saving Microservice-APIs-With-examples-in-Python-MEAP-Jose-Haro-Peralta_bibis.ir.pdf to Microservice-APIs-With-examples-in-Python-MEAP-Jose-Haro-Peralta_bibis.ir.pdf
üöÄ Scaling to 2 threads using PyMuPDF...
‚úÖ Successfully indexed 2380 chunks.


In [10]:
# Check the count of objects in the index
response = client.collections.get("EnterpriseDocs").aggregate.over_all(total_count=True)
print(f"Total documents in Weaviate: {response.total_count}")

Total documents in Weaviate: 2380


## üß† 3. Intelligence Chain
Ask questions about your uploaded documents using LangChain's Expression Language (LCEL)

## **Setting up a hybrid search**

### Parameters

| Parameter | Meaning | Typical Use |
| ----- | ----- | ----- |
| `alpha` | Weight between vector and keyword score | `0.0`‚Äì`1.0` |
| `k` | Number of nearest neighbors to retrieve | Usually 5‚Äì50 depending on query |


### How to use

`retriever = vectorstore.as_retriever(`

    `search_kwargs={'alpha': 0.5, 'k': 5}`

`)`

### **The key signal is:**

`alpha = 0.5`

In **Weaviate**:

* `alpha = 0.0` ‚Üí **BM25 (keyword only)**

* `alpha = 1.0` ‚Üí **Vector only**

* `0.0 < alpha < 1.0` ‚Üí **Hybrid search**

So:

`alpha = 0.5`

`‚Üí 50% keyword (BM25)`

`‚Üí 50% vector similarity`

That is **true hybrid retrieval**.

---

## **Important caveat (very important)**

This **ONLY works** if:

### **‚úÖ Your `vectorstore` is Weaviate**

or another backend that **implements hybrid internally**.

### **‚ùå If your backend is:**

* FAISS

* Chroma

* Milvus (default)

* Qdrant (unless explicitly configured)

Then:

`alpha = 0.5`

is **silently ignored**  
 ‚Üí you are doing **pure vector search**



In [11]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough # Added import
from langchain_core.output_parsers import StrOutputParser # Ensure this is also imported if not already

## You can add search_type as 'similarity', 'similarity_score_threshold', 'mmr' - K is the number of nearest neighbor
retriever = vectorstore.as_retriever(search_kwargs={'alpha': 0.5, 'k': 5,
                                                    ## BM25F LOGIC: Boost 'title' matches by 5 and 'content' by 1
                                                    ## If you need BM25 remove this part
                                                    ## in general search in the text removing this part is recommended

                                                    "query_properties": ["title^5", "content"]})

prompt = ChatPromptTemplate.from_template("Context: {context}\n\nQuestion: {question}\n\nAnswer:")

rag_chain = (
    {'context': retriever, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

query = "Tell me full tutorial of SQL cheatsheet make it understandable."
print("Assistant is thinking...\n")
print(rag_chain.invoke(query))

Assistant is thinking...

 Certainly! Below is a comprehensive guide based on the SQL CheatSheet provided in the document metadata. This guide will cover various aspects of SQL, from basic concepts to advanced features. I'll break it down into sections for clarity and understanding.

### What is SQL?
SQL (Structured Query Language) is a language designed for managing data in relational database management systems (RDBMS). It allows you to perform operations such as querying, inserting, updating, and deleting data in a structured manner.

### What is a Database?
A database is a structured collection of data stored electronically that can be easily accessed, managed, and modified. Databases are central to a wide range of applications, including e-commerce platforms, enterprise resource planning (ERP) systems, and customer relationship management (CRM) tools.

### Types of SQL Commands
SQL commands can be categorized into several types:
1. **Data Query Language (DQL)**: These are used to 

### Ensemble retriever method

In [33]:
from langchain_classic.retrievers import EnsembleRetriever


## BM25f + vector search
## You can add search_type as 'similarity', 'similarity_score_threshold', 'mmr' - K is the number of nearest neighbor
retriever1 = vectorstore.as_retriever(search_kwargs={'alpha': 0.5, 'k': 5,
                                                    ## BM25F LOGIC: Boost 'title' matches by 5 and 'content' by 1
                                                    ## If you need BM25 remove this part
                                                    ## in general search in the text removing this part is recommended

                                                    "query_properties": ["title^5", "content"]})

## BM25 + vector search
## You can add search_type as 'similarity', 'similarity_score_threshold', 'mmr' - K is the number of nearest neighbor
retriever2 = vectorstore.as_retriever(search_kwargs={'alpha': 0.5, 'k': 5
                                                    ## BM25F LOGIC: Boost 'title' matches by 5 and 'content' by 1
                                                    ## If you need BM25 remove this part
                                                    ## in general search in the text removing this part is recommended
                                                     })
# ----------------------------
# 1Ô∏è‚É£ Create ensemble retriever
# ----------------------------
ensemble = EnsembleRetriever(retrievers=[retriever1, retriever2], weights=[0.5, 0.5])

# ----------------------------
# 2Ô∏è‚É£ Retrieve documents
# ----------------------------
docs = ensemble.invoke("What is some SQL advice?")

# ----------------------------
# 3Ô∏è‚É£ Convert to human-readable text
# ----------------------------
# Combine all document contents into a single string
context_text = "\n\n".join([doc.page_content for doc in docs])

print("Retrieved Documents:\n")
print(context_text)


Retrieved Documents:

51. What is the ACID Property in SQL?
52. What is a Transaction in SQL?
53. What is the difference between COMMIT and ROLLBACK?
54. What is Savepoint in SQL?
55. What is the difference between IN and EXISTS?
56. What is the difference between DELETE and TRUNCATE?
57. What is Index Fragmentation?
58. What is the difference between RANK() and DENSE_RANK()?
59. How to fetch common records from two tables?
60. What is the difference between UNION and JOIN?
61. What is Pivot Table in SQL?
62. What is Case Sensitivity in SQL?
63. How to find the Nth Highest Salary?
64. How to get First 3 Maximum Salaries?
65. What is the difference between Drop, Delete, and Truncate?
66. How to calculate Age from Date of Birth in SQL?
67. What is Recursive Query in SQL?
68. What is the difference between Temporary Table and CTE?
69. How to find Odd and Even records in SQL?
70. What is JSON in SQL?
71. What is XML in SQL?
72. How to handle NULL values in SQL?
73. What is Dynamic SQL?

70

In [32]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_classic.retrievers import EnsembleRetriever

# ----------------------------
# 1Ô∏è‚É£ Your ensemble retriever
# ----------------------------
ensemble = EnsembleRetriever(retrievers=[retriever1, retriever2], weights=[0.5, 0.5])

# Wrap retriever in a RunnableLambda to return human-readable text
def retrieve_context(query: str):
    docs = ensemble.invoke(query)
    # Combine all document contents into a single string
    return "\n\n".join([doc.page_content for doc in docs])

ensemble_runnable = RunnableLambda(retrieve_context)

# ----------------------------
# 2Ô∏è‚É£ Prompt
# ----------------------------
prompt = ChatPromptTemplate.from_template(
    "Context: {context}\n\nQuestion: {question}\n\nAnswer:"
)

# ----------------------------
# 3Ô∏è‚É£ LLM
# ----------------------------
# We already setup our llm

# ----------------------------
# 4Ô∏è‚É£ RAG Chain
# ----------------------------
rag_chain = (
    {'context': ensemble_runnable, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# ----------------------------
# 5Ô∏è‚É£ Query
# ----------------------------
query = "Tell me full tutorial of SQL cheatsheet make it understandable."

print("Assistant is thinking...\n")
answer = rag_chain.invoke(query)
print(answer)


Assistant is thinking...

 Certainly! Below is a comprehensive SQL cheat sheet that covers the key concepts and commands you need to know as a beginner. This guide will help you understand SQL and its various applications in database management.

### 1. What is SQL?
SQL (Structured Query Language) is a programming language designed for managing data held in a relational database management system (RDBMS). It allows you to perform operations like querying, updating, and managing the schema of a database.

### 2. What is a Database?
A database is an organized collection of data structured into tables with rows and columns for easy access, modification, and management. Databases are central to many types of applications, from banking systems to search engines.

### 3. Types of SQL Commands
- **Data Querying**: `SELECT`, `SHOW`, `DESCRIBE`
- **Data Manipulation**: `INSERT`, `UPDATE`, `DELETE`
- **Data Definition**: `CREATE`, `ALTER`, `DROP`
- **Data Control**: `GRANT`, `REVOKE`, `COMMIT`, 

#Streaming version of the Rag

In [23]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Your LLM (replace with the one you already have)
# For example, `llm = your_llm_instance` which supports streaming

# Prompt template
prompt = ChatPromptTemplate.from_template(
    "Context: {context}\n\nQuestion: {question}\n\nAnswer:"
)

# RAG chain
rag_chain = (
    {'context': retriever, 'question': RunnablePassthrough()}
    | prompt
    | llm   # make sure your LLM has streaming enabled
)

# Streaming invocation
for token in rag_chain.stream(query):
    print(token, end="", flush=True)  # prints tokens as they arrive


 Certainly! The SQL CheatSheet you've provided covers a wide range of topics that are fundamental to understanding and using SQL effectively. Below is a structured guide based on the content of the cheat sheet, which should help you understand each topic in a clear and logical manner.

### Introduction to SQL

- **What is SQL?**
  SQL (Structured Query Language) is a standard programming language specifically designed for managing and manipulating databases.

### Basics of Databases

- **What is a Database?**
  A database is an organized collection of data stored and accessed electronically by means of a database management system (DBMS).

### SQL Commands

- **Types of SQL commands:**
  - `DDL` (Data Definition Language) commands: used to create, alter, and drop database objects.
  - `DML` (Data Manipulation Language) commands: used to query, insert, update, and delete data in a table.
  - `Transactional` commands: `COMMIT`, `ROLLBACK`, and `SAVEPOINT` to manage transactions.
  - `Tun

## Congratulationüéâ you have completed your course on how to make your own Local Rags