<a href="https://colab.research.google.com/github/johnauthnull/authnullagent/blob/main/MilvusRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Retrieval-Augmented Generation (RAG) with Milvus and LangChain

This guide demonstrates how to build a Retrieval-Augmented Generation (RAG) system using LangChain and Milvus.

The RAG system combines a retrieval system with a generative model to generate new text based on a given prompt. The system first retrieves relevant documents from a corpus using Milvus, and then uses a generative model to generate new text based on the retrieved documents.


## Prerequisites

Before running this notebook, make sure you have the following dependencies installed:

In [1]:
from google.colab import userdata
GROQ_API_KEY=userdata.get('GROQ_API_KEY')
OPENAIKEY=userdata.get('OPENAIKEY')

In [3]:
! pip install --upgrade --quiet  langchain langchain-core langchain-community langchain-text-splitters langchain-milvus langchain-openai bs4 langchain-groq jq langgraph

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/1.0 MB[0m [31m12.0 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.0/1.0 MB[0m [31m21.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m408.7/408.7 kB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m63.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m737.4/737.4 kB[0m [31m40.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━

### I am not going to be working with Ollama since this is Google Colab and working with Ollama on this is a pain. Doing Initial tesing with Groq's inference platform instead

In [4]:
import os
os.environ["GROQ_API_KEY"]=GROQ_API_KEY
from langchain_groq import ChatGroq

llm = ChatGroq(
    model="llama-3.2-1b-preview",
    temperature=0.0,
    max_retries=2,
    # other params...
)

> If you are using Google Colab, to enable dependencies just installed, you may need to **restart the runtime** (click on the "Runtime" menu at the top of the screen, and select "Restart session" from the dropdown menu).

In [5]:
import os

os.environ["OPENAI_API_KEY"] = OPENAIKEY


## Prepare the data


In [6]:
from langchain_community.document_loaders import JSONLoader
import json
from pathlib import Path

file_path='exported_data.json'
data = json.loads(Path(file_path).read_text())
loader = JSONLoader(
         file_path=file_path,
         jq_schema='.[]',
         text_content=False)

docs = loader.load()
print(docs[30].page_content)
print(docs[30].metadata)

file2="endpointauthlogs.json"
data = json.loads(Path(file2).read_text())
loader_ = JSONLoader(
         file_path=file2,
         jq_schema='.[]',
         text_content=False)

docs2 = loader_.load()
print(docs2[0].page_content)
print(docs2[0].metadata)


file3="adpolicy.json"
data = json.loads(Path(file2).read_text())
loader3 = JSONLoader(
         file_path=file3,
         jq_schema='.[]',
         text_content=False)

docs3 = loader3.load()
print(docs3[0].page_content)
print(docs3[0].metadata)






{"AccessMedium": "", "AdDomain": "", "AdOu": "RDP", "AdUser": 156624, "AdUserMatch": "muthu", "AdUserType": 105, "DestinationEndpoint": "", "DestinationEndpointIp": 0, "DestinationEndpointMatch": "", "EndpointUser": "AD-LOCAL", "EndpointUserMatch": "exact", "EndpointUserType": "windows", "LogID": "exact", "LogType": "", "LoginStatus": "34.67.34.186", "OrgID": "AD", "Os": "exact", "Protocol": "", "SamAccountName": "DIRECT", "SourceEndpoint": 1, "SourceEndpointIP": "", "SourceEndpointMatch": "user", "SourceEndpointType": "endpoint", "TenantID": "user", "TimeStamp1": "exact", "VectorData": "success"}
{'source': '/content/exported_data.json', 'seq_num': 31}
{"id": 1, "org_id": 105, "tenant_id": 1, "src_ip": "154.213.185.141", "dest_ip": "35.238.119.241", "user_name": "sonar", "service": "sshd[1173994]", "login_status": "Failure", "message": "Invalid user sonar from 154.213.185.141 port 54214", "count": 1, "created_at": 1727177088, "component": "linux-auth", "host_name": "dit-pam-config"}
{

## Build RAG chain with Milvus Vector Store

We will initialize a Milvus vector store with the documents, which load the documents into the Milvus vector store and build an index under the hood.

In [10]:
from langchain_milvus import Milvus, Zilliz
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vectorstore = Milvus.from_documents(  # or Zilliz.from_documents
    documents=docs,
    embedding=embeddings,
    collection_name="HI",
    connection_args={
        "uri": "./sanity.db",
    },
      # Drop the old Milvus collection if it exists
)

policy_store = Milvus.from_documents(
    documents=docs3,
    embedding=embeddings,
    collection_name="POLICY",
    connection_args={
        "uri": "./milvus_demo.db",
    },  # Drop the old Milvus collection if it exists
)

vectorbore=Milvus.from_documents(  # or Zilliz.from_documents
    documents=docs2,
    embedding=embeddings,
    collection_name="BRUH",
    connection_args={
        "uri": "./milvus_demo.db",
    },  # Drop the old Milvus collection if it exists
)

DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: 650fc828dae242cba31d1cd454556bd1
DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: 55acebea3d9b4d9d8ddf8eb526ce6c30
DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: 60880d0b52da494eb01c84587b67f356


Search the documents in the Milvus vector store using a test query question. Let's take a look at the top 1 document.

THE ROUGH WORKING OF AGENT2: Troubleshooting and getting to know more about the policy


In [26]:
query = "Give me the policy only for user3"
policy_store.similarity_search(query, k=1)

[Document(metadata={'pk': 453792662664249405, 'seq_num': 2, 'source': '/content/adpolicy.json'}, page_content='{"policy_type": "ad", "ad": {"ous": {"ou3": ["all"], "ou4": ["specific_ad_group3"]}}, "permissions": {"allowed": false, "iam_users": ["user3"], "iam_groups": []}}')]

In [28]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Initialize the OpenAI language model for response generation
#llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Define the prompt template for generating AI responses
PROMPT_TEMPLATE = """
Human: You are an AI assistant, and provides answers to questions by using fact based and statistical information when possible.
Use the following pieces of information to provide a concise answer to the question enclosed in <question> tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
<context>
{context}
</context>

<question>
{question}
</question>

The response should be specific and use statistics or numbers when possible.

Assistant:"""

# Create a PromptTemplate instance with the defined template and input variables
prompt = PromptTemplate(
    template=PROMPT_TEMPLATE, input_variables=["context", "question"]
)
# Convert the vector store to a retriever
retriever = policy_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})


# Define a function to format the retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Use the LCEL(LangChain Expression Language) to build a RAG chain.

In [29]:
# Define the RAG (Retrieval-Augmented Generation) chain for AI response generation
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# rag_chain.get_graph().print_ascii()

# Invoke the RAG chain with a specific question and retrieve the response
res = rag_chain.invoke(query)
res

'The policy for user3 only allows access to specific_ad_group3 in the ad OU.'

## TODO:
- Create RAG Agent 1
- Create RAG Agent 2

AGENT 1) Work In progress

In [30]:

json_schema = {
  "title": "Endpoint Access Policy",
  "description": "Defines access control policy for endpoints, specifying allowed users and groups.",
  "type": "object",
  "properties": {
    "policy_type": {
      "type": "string",
      "description": "Type of policy.  Currently only 'endpoints' is supported.",
      "enum": ["endpoints"]
    },
    "endpoints": {
      "type": "object",
      "description": "Specifies endpoints and their grouping.",
      "properties": {
        "groups": {
          "type": "object",
          "description": "Groups of endpoints.",
          "additionalProperties": {
            "type": "array",
            "items": {
              "type": "string",
              "description": "Endpoint names within the group."
            }
          }
        },
        "users": {
          "type": "array",
          "items": {
            "type": "string",
            "description": "Individual users with access to all endpoints (if allowed)."
          }
        }
      },
      "required": ["groups", "users"]
    },
    "permissions": {
      "type": "object",
      "description": "Overall permission settings and IAM roles.",
      "properties": {
        "allowed": {
          "type": "boolean",
          "description": "Overall access allowed (true) or denied (false).  Overrides group/user settings if false."
        },
        "iam_users": {
          "type": "array",
          "items": {
            "type": "string",
            "description": "IAM users with administrative control over this policy."
          }
        },
        "iam_groups": {
          "type": "array",
          "items": {
            "type": "string",
            "description": "IAM groups with administrative control over this policy."
          }
        }
      },
      "required": ["allowed", "iam_users", "iam_groups"]
    }
  },
  "required": ["policy_type", "endpoints", "permissions"]
}


In [36]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Initialize the OpenAI language model for response generation
#llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# Define the prompt template for generating AI responses
PROMPT_GEN = """ Human: You are an AI assistant that generates JSON access control policies. Here's an example of a valid policy:

{{
  "policy_type": "endpoints",
  "endpoints": {{
    "groups": {{
      "groupA": ["endpoint1", "endpoint2"]
    }},
    "users": ["user1", "user2"]
  }},
  "permissions": {{
    "allowed": true,
    "iam_users": ["admin1", "admin2"],
    "iam_groups": ["admins"]
  }}
}}
Use the following authentication log data to generate a similar policy. Consider security best practices: users and groups with many failed logins should have restricted access.

<context></context>
<question>Generate a JSON access control policy based on the provided authentication logs.</question>

Assistant:"""

# Create a PromptTemplate instance with the defined template and input variables
prompt = PromptTemplate(
    template=PROMPT_GEN, input_variables=["context", "question"]
)
# Convert the vector store to a retriever
retriever = policy_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})


# Define a function to format the retrieved documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [38]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | JsonOutputParser(json_schema=json_schema)
)

# rag_chain.get_graph().print_ascii()
query = "Generate a JSON access for"
# Invoke the RAG chain with a specific question and retrieve the response
res = rag_chain.invoke(query)
res

{'policy_type': 'endpoints',
 'endpoints': {'groups': {'groupA': ['endpoint1', 'endpoint2']},
  'users': ['user1', 'user2']},
 'permissions': {'allowed': False,
  'iam_users': ['user3', 'user4'],
  'iam_groups': ['groupB']}}

## Metadata filtering

We can use the [Milvus Scalar Filtering Rules](https://milvus.io/docs/boolean.md) to filter the documents based on metadata. We have loaded the documents from two different sources, and we can filter the documents by the metadata `source`.

In [None]:
vectorstore.similarity_search(
    "What is CoT?",
    k=1,
    expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
)

# The same as:
# vectorstore.as_retriever(search_kwargs=dict(
#     k=1,
#     expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
# )).invoke("What is CoT?")

[Document(page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via 

If we want to dynamically change the search parameters without rebuilding the chain, we can [configure the runtime chain internals](https://python.langchain.com/v0.2/docs/how_to/configure/) . Let's define a new retriever with this dynamically configure and use it to build a new RAG chain.

In [None]:
from langchain_core.runnables import ConfigurableField

# Define a new retriever with a configurable field for search_kwargs
retriever2 = vectorstore.as_retriever().configurable_fields(
    search_kwargs=ConfigurableField(
        id="retriever_search_kwargs",
    )
)

# Invoke the retriever with a specific search_kwargs which filter the documents by source
retriever2.with_config(
    configurable={
        "retriever_search_kwargs": dict(
            expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
            k=1,
        )
    }
).invoke(query)

[Document(page_content='Self-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.\nReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.\nThe ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:\nThought: ...\nAction: ...\nObservation: ...\n... (Repeated many times)', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'pk': 449281835035555859})]

In [None]:
# Define a new RAG chain with this dynamically configurable retriever
rag_chain2 = (
    {"context": retriever2 | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Let's try this dynamically configurable RAG chain with different filter conditions.

In [None]:
# Invoke this RAG chain with a specific question and config
rag_chain2.with_config(
    configurable={
        "retriever_search_kwargs": dict(
            expr="source == 'https://lilianweng.github.io/posts/2023-06-23-agent/'",
        )
    }
).invoke(query)

"Self-reflection of an AI agent involves the process of synthesizing memories into higher-level inferences over time to guide the agent's future behavior. It serves as a mechanism to create higher-level summaries of past events. One approach to self-reflection involves prompting the language model with the 100 most recent observations and asking it to generate the 3 most salient high-level questions based on those observations. This process helps the AI agent optimize believability in the current moment and over time."

When we change the search condition to filter the documents by the second source, as the content of this blog source has nothing todo with the query question, we get an answer with no relevant information.

In [None]:
rag_chain2.with_config(
    configurable={
        "retriever_search_kwargs": dict(
            expr="source == 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'",
        )
    }
).invoke(query)

"I'm sorry, but based on the provided context, there is no specific information or statistical data available regarding the self-reflection of an AI agent."

----
This tutorial focus the basic usage of Milvus LangChain integration and simple RAG approach. For more advanced RAG techniques, please refer to the [advanced rag bootcamp](https://github.com/milvus-io/bootcamp/tree/master/bootcamp/RAG/advanced_rag).