In [2]:
model = "llama3.2:1B"

#### Task 1: Simple Chain with Retrieval

**Objective:**

Implement a simple RAG chain with ChatOllama, HuggingFaceEmbeddings and Chroma.

Process:

1. Retrieve documents from chroma db based on query
2. Invoke chain with retrieved documents as input

**Task Description:**

- load llm model via ollama
- load embedding model via ollama with `ollama pull pull bge-m3` (if not yet done)
- create chroma db client
- create prompt template for summarization
- create simple chain with following steps: retrieved documents, prompt, model, output parser
- create query and perform similarity search with a query
- invoke chain and pass retrieved documents to the chain

**Useful links:**

- [RAG with Ollama](https://python.langchain.com/v0.2/docs/tutorials/local_rag/)


In [3]:
from langchain_ollama import ChatOllama

model = ChatOllama(model=model)

In [4]:
from langchain_ollama import OllamaEmbeddings

embedding_model = OllamaEmbeddings(
    model="bge-m3",
)

In [5]:
from langchain_chroma import Chroma
import chromadb
import chromadb
from chromadb.config import DEFAULT_TENANT, DEFAULT_DATABASE, Settings

client = chromadb.HttpClient(
    host="localhost",
    port=8000,
    ssl=False,
    headers=None,
    settings=Settings(allow_reset=True, anonymized_telemetry=False),
    tenant=DEFAULT_TENANT,
    database=DEFAULT_DATABASE,
)

collection = client.get_or_create_collection("ai_model_book")

vector_db_from_client = Chroma(
    client=client,
    collection_name="ai_model_book",
    embedding_function=embedding_model,
)

In [6]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Summarize the main themes in these retrieved docs: {docs}"
)


# Convert loaded documents into strings by concatenating their content
# and ignoring metadata
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = {"docs": format_docs} | prompt | model | StrOutputParser()

In [7]:
search_query = "Types of Machine Learning Systems"

docs = vector_db_from_client.similarity_search(search_query)

print(docs)

[Document(metadata={'page': 33, 'source': './AI_Book.pdf'}, page_content='Types of Machine Learning Systems\nThere are so many different types of Machine Learning systems that it is useful to\nclassify them in broad categories based on:\n•Whether or not they are trained with human supervision (supervised, unsuper‐\nvised, semisupervised, and Reinforcement Learning)\n•Whether or not they can learn incrementally on the fly (online versus batch\nlearning)\n•Whether they work by simply comparing new data points to known data points,\nor instead detect patterns in the training data and build a predictive model, much\nlike scientists do (instance-based versus model-based learning)\nThese criteria are not exclusive; you can combine them in any way you like. For\nexample, a state-of-the-art spam filter may learn on the fly using a deep neural net‐\nwork model trained using examples of spam and ham; this makes it an online, model-\nbased, supervised learning system.\nLet’s look at each of these

In [8]:
chain.invoke(docs)

'The main themes in the retrieved documentation are:\n\n1. **Classification of Machine Learning systems**: The documentation discusses how to classify Machine Learning systems based on their characteristics, including whether they are trained with human supervision, can learn incrementally online or offline, and work by comparing new data points to known data points.\n\n2. **Types of Supervised Learning**: The documentation explains the four major categories of supervised learning: supervised, unsupervised, semisupervised, and Reinforcement Learning.\n\n3. **Instance-Based Versus Model-Based Learning**: It highlights two main approaches to generalization in Machine Learning: instance-based learning, which is trivial but sufficient for simple tasks, and model-based learning, which requires a good performance measure on the training data to generalize effectively.\n\n4. **Machine Learning System Design Criteria**: The documentation outlines four criteria used to classify Machine Learning

#### Task 2: Q&A with RAG

**Objective:**

Implement a Q/A retrieval chain with ChatOllama, HuggingFaceEmbeddings and Chroma

**Task Description:**

- create RAG-Q/A prompt template
- create retriever from vector db client (instead of manually passing in docs, we automatically retrieve them from our vector store based on the user question)
- create simple chain with following steps: retriever, formatting retrieved docs, user question, prompt, model, output parser
- create question for Q/A retrieval chain
- invoke chain and with question

**Useful links:**

- [RAG with Ollama](https://python.langchain.com/v0.2/docs/tutorials/local_rag/)


In [9]:
from langchain_core.runnables import RunnablePassthrough

prompt_template = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.

<context>
{context}
</context>

Answer the following question:

{question}"""

rag_prompt = ChatPromptTemplate.from_template(prompt_template)

retriever = vector_db_from_client.as_retriever()

qa_rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | model
    | StrOutputParser()
)

In [10]:
qa_rag_chain

{
  context: VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0xffff7b89b510>)
           | RunnableLambda(format_docs),
  question: RunnablePassthrough()
}
| ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="\nYou are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\n\n<context>\n{context}\n</context>\n\nAnswer the following question:\n\n{question}"))])
| ChatOllama(model='llama3.2:1B', _client=<ollama._client.Client object at 0xffff8e7886d0>, _async_client=<ollama._client.AsyncClient object at 0xffffa85c8a90>)
| StrOutputParser()

In [11]:
question = "What is supervised learning?"

qa_rag_chain.invoke(question)

'Supervised learning refers to a type of machine learning where the training data includes labeled solutions or "targets" that indicate whether the algorithm should predict certain outcomes, such as spam classification. In this context, the system works perfectly when it receives the correct labels and can make accurate predictions without any human intervention.'