**AI & Machine Learning (KAN-CINTO4003U) - Copenhagen Business School | Spring 2025**

***


# Part I: RAG

Please see the description of the assignment in the README file (section 1) <br>
**Guide notebook**: [guides/rag_guide.ipynb](guides/rag_guide.ipynb)


***
<br>

* Remember to include some reflections on your results. Are there, for example, any hyperparameters that are particularly important?

* You should follow the steps given in the `rag_guide` notebook to create your own RAG system.

<br>

***

#### Imports

In [None]:
from typing import Literal, Any
from copy import deepcopy

from typing_extensions import TypedDict
import matplotlib.pyplot as plt
import numpy as np
from decouple import config
from pydantic import BaseModel, Field
from IPython.display import Image, display
from tqdm import tqdm

from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters.markdown import MarkdownHeaderTextSplitter
from langchain.prompts import PromptTemplate
from langchain_ibm import WatsonxEmbeddings
from langchain_ibm import WatsonxLLM
from langgraph.graph import START, StateGraph
from ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams

import litellm
from litellm import completion
import instructor
from instructor import Mode

#### Retrieve secrets

In [24]:
WX_API_KEY = config("WX_API_KEY")
WX_PROJECT_ID = config("WX_PROJECT_ID")
WX_API_URL = "https://us-south.ml.cloud.ibm.com"


#### Authenticate and initialize LLM

In [25]:
llm = WatsonxLLM(

        model_id= "ibm/granite-3-8b-instruct",
        url=WX_API_URL,
        apikey=WX_API_KEY,
        project_id=WX_PROJECT_ID,

        params={
            GenParams.DECODING_METHOD: "greedy",
            GenParams.TEMPERATURE: 0,
            GenParams.MIN_NEW_TOKENS: 5,
            GenParams.MAX_NEW_TOKENS: 1_000,
            GenParams.REPETITION_PENALTY:1.2
        }

)

#### Use LLM

In [1]:
document = TextLoader('data/madeup_company.md').load()[0]

NameError: name 'TextLoader' is not defined

In [None]:
headers_to_split_on = [("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3"), ("####", "Header 4")]
text_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
chunks = text_splitter.split_text(document.page_content)

In [None]:
def update_documents_with_headers(chunks):
    """
    Creates a new list of Document objects with page_content prepended with headers
    in [Header1/Header2/Header3]: format
    
    Returns new objects rather than modifying the original chunks
    """
    updated_chunks = []
    
    for doc in chunks:
        # Create a deep copy of the document to avoid modifying the original
        new_doc = deepcopy(doc)
        
        # Get all headers that exist in metadata
        headers = []
        for i in range(1, 4):
            key = f'Header {i}'
            if key in new_doc.metadata:
                headers.append(new_doc.metadata[key])
        
        # Create the header prefix and update page_content
        if headers:
            prefix = f"[{'/'.join(headers)}]: "
            new_doc.page_content = prefix + "\n" + new_doc.page_content
        
        updated_chunks.append(new_doc)
    
    return updated_chunks


docs = update_documents_with_headers(chunks)

In [None]:
embed_params = {}

watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/granite-embedding-278m-multilingual",
    url=WX_API_URL,
    project_id=WX_PROJECT_ID,
    apikey=WX_API_KEY,
    params=embed_params,
)

In [None]:
local_vector_db = Chroma.from_documents(
    collection_name="my_collection",
    embedding=watsonx_embedding,
    persist_directory="my_vector_db", # This will save the vector database to disk! Delete it if you want to start fresh.
    documents=docs,
    
)

In [None]:
template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

Question:
{question}

Context: 
{context} 

Answer:
"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    """ A langgraph state for the application """
    question: str
    context: list[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    """ Our retrieval step. We use our local vector database to retrieve similar documents to the question """
    retrieved_docs = local_vector_db.similarity_search(state["question"], k=3) # NOTE: You can change k to retrieve fewer or more documents
    return {"context": retrieved_docs} 


def generate(state: State):
    """ Our generation step. We use the retrieved documents to generate an answer to the question """

    # Format the prompt
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    formated_prompt = prompt.invoke({"question": state["question"], "context": docs_content})

    # Generate the answer
    response = llm.invoke(formated_prompt)
    return {"answer": response}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve") # Start at the retrieve step
graph = graph_builder.compile() # Compile the graph

In [None]:
response = graph.invoke({ "question": "What's the roadmap ahead?"})
response

{'question': "What's the roadmap ahead?",
 'context': [Document(id='64e71e93-2cb6-4ade-9592-99fc2dc1d887', metadata={'Header 1': 'About MadeUpCompany', 'Header 2': 'Roadmap'}, page_content="[About MadeUpCompany/Roadmap]: \nWe are constantly evolving and introducing new features based on customer feedback. Here’s what’s coming soon:  \n- 🚀 AI-Driven Data Insights – DataWiz will introduce automated trend forecasting powered by deep learning.\n- 🚀 Collaboration Tools for CloudMate – Enhanced real-time document editing and team workspaces for seamless collaboration.\n- 🚀 Zero-Knowledge Encryption – An optional feature for businesses requiring absolute data confidentiality.  \nWe value our customers' input and prioritize updates that deliver the most impact."),
  Document(id='f56bb01c-058b-4466-ad1c-8f553ff3215d', metadata={'Header 1': 'About MadeUpCompany', 'Header 2': 'Our Values'}, page_content="[About MadeUpCompany/Our Values]: \nAt MadeUpCompany, we believe in:  \n- Innovation – Contin

In [None]:
response.get('answer')

'The upcoming features include AI-driven data insights (DataWiz), enhanced collaboration tools for CloudMate, and zero-knowledge encryption. These developments align with our values of innovation, security, and customer centricity.'

I'm gonna be honest here and say that I found it more useful to spend my time on the final exam.

I have read the guides and understand the conecpts of RAG, Agents and using LLM's to evaluate. 

With RAG we are able to minimise halucinations by grounding the model in some source of truth. This is useful when we want to know factual information about some topic, but don't want to read an entire book on the matter. By embedding text into a vector database we can retrieve information based on the user's query, to feed into the LLM along with the user query, and let the LLM figure out how to respond to the query, based on the what's being asked and the information retrieved by the vector db. 

- We may tune a RAG model by adjusting the way we preprocess documents. How do we get access to the data plays an important role, and how the data looks (e.g., if we scrape websites, can we be sure that they are structured in the same way?). This is important as we want to split up the documents into meaningful passages, small enough to get the information needed to answer questions, but not so small that it beccomes too specific. We also don't want the split to be too large, as that will trigger more halucinations by the LLM. Finding a good way to split the document can be very hard, but also a good way to improve a RAG model. 

- The embedding an retrieval process of the RAG model also plays a role, but this is highly optimised by open source projects, like elastic-search. How many splits we give the LLM can also be tuned, based either on the top k results or the p percentage.

- The way we prompt the model will also play a large role in optimisation.

Using LLM's to evaluate is possibly the only way to effectively evalutate such atonomous systems, as the response given requires one to understand the output, and tell if it's actually usefull. This kind of works like the `split brain theory`, where one hemisphere generates a response while the other interprets and rationalizes it.

---

**Agents** are usefull in creating larger autonomous systems where we allow the LLM to make decisions for you. The Router agent is useful for creating a complete task flow, that is often tailored to a specific need, where the tool agent more atonomously decides what to do next. The router agent is connected by edges defined by the programmer, and the tool agent just has a set of tools where the LLM decides which tools to use.

---

One project that came to mind while reading these guides was to build an agent that allows you to gather information on a book based on the [GoodReads](https://www.goodreads.com/) reviews. Every site has the same interface, so it would be fairly easy to scrape the website.

**steps**
1. Get request from user
2. find title, author and year of book
3. check if book exists in vector db
3. (a: book exists) get top k result from vector db, and answer query (END)
4. (b: book doesnt exist) search on goodreads for book, and scrape top result
5. split content based on reviews, and maybe subsplit reviews if they are too long
6. embed results and save to vector db
7. get top k result from vector db, and answer query (END)

possibly with more steps, but that's the idea of it



