Data Retrieval

Installing required libraries

In [0]:
%pip install langchain faiss-cpu databricks-langchain

Collecting langchain
  Downloading langchain-0.3.19-py3-none-any.whl.metadata (7.9 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting databricks-langchain
  Downloading databricks_langchain-0.3.0-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-core<1.0.0,>=0.3.35 (from langchain)
  Downloading langchain_core-0.3.40-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.6 (from langchain)
  Downloading langchain_text_splitters-0.3.6-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith<0.4,>=0.1.17 (from langchain)
  Downloading langsmith-0.3.11-py3-none-any.whl.metadata (14 kB)
Collecting pydantic<3.0.0,>=2.7.4 (from langchain)
  Downloading pydantic-2.10.6-py3-none-any.whl.metadata (30 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading SQLAlchemy-2.0.38-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (

In [0]:
dbutils.library.restartPython()

Importing required libraries

In [0]:
from databricks_langchain import ChatDatabricks
from langchain_community.vectorstores import FAISS
from databricks_langchain import DatabricksEmbeddings

import os
import faiss

Setting up initial configuration

In [0]:
if 'config' not in locals(): config = {}

config['VECTOR_STORE_PATH'] = f"/Volumes/practice/default/rag_t2_vector_store"

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

llm = ChatDatabricks(
    endpoint="databricks-dbrx-instruct",
    temperature=0.1,
)

Function Definitions

Wrapping Functions

In [0]:
def load_vector_store():
    """
    Load the vector_store stored locally at config['VECTOR_STORE_PATH']
    """
    try:
        vector_store = FAISS.load_local(
            folder_path=config['VECTOR_STORE_PATH'],
            embeddings=embeddings,
            allow_dangerous_deserialization=True  # Fix for pickle loading
        )

        return vector_store
    except Exception as e:
        print(f"Error loading vector_store: {e}")
       
        return None

In [0]:
def retrieve_data(query, vector_store):
    """
    Searches the vector store for data similar to the query.

    Args:
        query (str): Input query.

    Returns:
        List of similar data retrieved from the vector store.
    """
    try:
        retrieved_data = vector_store.similarity_search(query)

        return retrieved_data
    except Exception as e:
        print(f"Error retrieving data: {e}")
       
        return []

In [0]:
def generate_answer(query, retrieved_data):
    """
    Generates an answer using an LLM based on the user query and retrieved documents.

    Args:
        query (str): The user's question.
        retrieved_data (str): The most relevant retrieved information related to the query.

    Returns:
        str: The AI-generated response.
    """
    messages = [
            (
                "system",
                """You are an AI assistant that provides helpful responses using retrieved documents and user queries.
                Try to give answers that are true and verified. If you don't know the answer, say so. Keep the asnwer
                concise and to the point""",
            ),
            (
                "human",
                f"User Query: {query}\n\nRetrieved Documents:\n{retrieved_data}",
            ),
        ]

    try:
        response = llm.invoke(messages)

        return response
    except Exception as e:
        return f"An error occurred while generating a response: {e}"

Main

In [0]:
print("\nStep: Loading vector store")
vector_store = load_vector_store()


Step: Loading vector store


In [0]:
# print("Step: Get user query")
# query = input("Enter your query:")
query = "What is the current gold price ?"

In [0]:
print("\nStep: Retrieving similar data")
retrieved_data = retrieve_data(query, vector_store)
print("Retrieved Data:", retrieved_data)


Step: Retrieving similar data
Retrieved Data: [Document(id='8a3a3535-fe45-4844-ae00-debc4b24a288', metadata={'source': '/Volumes/practice/default/datasets/pdf/electric_vehicles.pdf', 'start_index': 5535}, page_content='especially if you can charge your EV at home.\n\nUp to 40% lower servicing and maintenance costs than petrol or diesel vehicles due\n\nto fewer mechanical components.\n\nLower vehicle exercise duty (VED) rates until 2030, for new electric cars registered\n\non or after 1 April 2025.\n\no Currently it costs £10 for the first year, compared with £120-£945 for petrol or diesel vehicles. After that, all vehicles are charged a standard rate of £165 per year.\n\no For the 2025 to 2026 tax year, electric cars will continue paying £10 for the\n\nfirst year, compared with £110-£1,000+ for hybrid, petrol and diesel vehicles. The more tailpipe emissions produced, the more it will cost. After that, all vehicles will be charged a standard rate of £195 per year.\n\nAs they emit zero 

In [0]:
print("\nStep: Generating a response using LLM")
response = generate_answer(query, retrieved_data) ## loads similar data from vector store
print("Response:", response.content)


Step: Generating a response using LLM
Response: Based on the retrieved documents, the current gold price is not mentioned. The documents discuss the benefits and costs of electric vehicles. For example, a full charge in a fully electric car costs approximately £17 and gives a typical range of around 220 miles. However, for the most accurate and up-to-date information on the gold price, I would recommend checking a reliable financial news source.
