# <font color="#003660">Applied Machine Learning for Text Analysis (M.184.5331)</font>


# <font color="#003660">Session 8: Retrieval-Augmented Generation</font>

# <font color="#003660">RAG Basics</font>

<center><br><img width=256 src="https://raw.githubusercontent.com/olivermueller/aml4ta-2021/main/resources/dag.png"/><br></center>

<p>

<div>
    <font color="#085986"><b>By the end of this lesson, you ...</b><br><br>
        ... will know the basics of Retrieval-Augmented Generation (RAG) is. <br>
        ... will know to implement a RAG-chain from scratch.
    </font>
</div>
</p>

The following content is heavily inspired by the following excellent sources:

* [HuggingFace (2024): NLP Course](https://huggingface.co/learn/nlp-course/)
* [Huggingface (2024): Open-Source AI Cookbook](https://huggingface.co/learn/cookbook/index)
* [Nguyen (2024): Code a simple RAG from scratch](https://huggingface.co/blog/ngxson/make-your-own-rag)

In [None]:
!pip install -U pymupdf4llm datasets transformers faiss-gpu accelerate langchain langchain-community langchain-huggingface

Collecting pymupdf4llm
  Downloading pymupdf4llm-0.0.17-py3-none-any.whl.metadata (4.1 kB)
Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Collecting langchain
  Downloading langchain-0.3.14-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.14-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting pymupdf>=1.24.10 (from pymupdf4llm)
  Downloading pymupdf-1.25.1-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
C

## Answering Questions using LLMs

In [None]:
import os
import re
from tqdm.notebook import tqdm
import pymupdf4llm

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM, set_seed

DEVICE = "cuda"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    torch_dtype="auto",
    device_map=DEVICE,
)
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    torch_dtype="auto",
    device_map=DEVICE,
)

tokenizer_config.json:   0%|          | 0.00/7.30k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

In [None]:
def generate_response(messages):
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

In [None]:
set_seed(0)
prompt = "Who plays Daenerys Targaryen?"
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
response = generate_response(messages)
print(response)

Daenerys Targaryen is played by the actress Naomi Watts in the HBO series "Game of Thrones."


WOW! How wrong this answer is! (The actress is Emilia Clarke.)

To go on we need to restart the runtime to free the GPU memory.

In [None]:
os.kill(os.getpid(), 9)

## Retrieval-Augmented Generation
![](https://github.com/olivermueller/amlta-2024/blob/main/Session_08/imgs/RAG.png?raw=true)

(Image adapted from [Kaltenpoth and Müller (2024)](https://energy.acm.org/eir/dont-touch-the-power-line-a-proof-of-concept-for-aligned-llm-based-assistance-systems-to-support-the-maintenance-in-the-electricity-distribution-system/))

Retrieval-augmented generation (RAG), introduced by [Lewis et al. (2020)](https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html) incorporates external knowledge in form of a vector database into the language model answers. A retriever (mostly an encoder-only transformer lm) retrieves k documents most similar to the query. Those documents provide the context for an LLM to answer the user question.

Now let's check if it improves the answer.

# Implementing a basic RAG system from scratch

In [None]:
import os
import re
from tqdm.notebook import tqdm
import pymupdf4llm
import urllib
from IPython.display import display, Markdown

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM, set_seed

DEVICE = "cuda"

In [None]:
os.mkdir("documents")
os.mkdir("imgs")
os.mkdir("markdown_documents")
urllib.request.urlretrieve("https://raw.githubusercontent.com/olivermueller/amlta-2024/refs/heads/main/Session_08/documents/Game_of_Thrones.pdf", "documents/Game_of_Thrones.pdf")
urllib.request.urlretrieve("https://raw.githubusercontent.com/olivermueller/amlta-2024/refs/heads/main/Session_08/documents/How_I_Met_Your_Mother.pdf", "documents/How_I_Met_Your_Mother.pdf")
urllib.request.urlretrieve("https://raw.githubusercontent.com/olivermueller/amlta-2024/refs/heads/main/Session_08/markdown_documents/Game_of_Thrones.md", "markdown_documents/Game_of_Thrones.md")
urllib.request.urlretrieve("https://raw.githubusercontent.com/olivermueller/amlta-2024/refs/heads/main/Session_08/markdown_documents/How_I_Met_Your_Mother.md", "markdown_documents/How_I_Met_Your_Mother.md")

('markdown_documents/How_I_Met_Your_Mother.md',
 <http.client.HTTPMessage at 0x7d48921c67d0>)

# Processing PDFs

We will provide two PDF files: A Wikipedia entry of the TV series Game of Thrones and another entry of the How I met your Mother TV series to show that our retrieval works and it is not just luck.
As PDFs are not directly readable for LLMs, we need to convert them in a readable format such as markdown. For that we can use [PyMuPDF4LLM](https://pymupdf.readthedocs.io/en/latest/pymupdf4llm/) a library sxplicitly designed to convert PDFs to LLM-readable markdown.

In [None]:
documents_path = "documents"
markdown_documents_path = "markdown_documents"

In [None]:
documents = os.listdir(documents_path)

for document in documents:
    print(document)
    document_path = os.path.join(documents_path, document)
    md_file = pymupdf4llm.to_markdown(
        document_path
    )
    md_file_path = os.path.join(markdown_documents_path, document.replace(".pdf", ".md"))
    with open(md_file_path, "w", encoding="utf-8") as file:
        file.write(md_file)

How_I_Met_Your_Mother.pdf
Processing documents/How_I_Met_Your_Mother.pdf...
Game_of_Thrones.pdf
Processing documents/Game_of_Thrones.pdf...

KeyboardInterrupt: 

## Designing a Vector Storage (Retrieval Database)

Now we need to design the vector storage based on our documents.

![](https://github.com/olivermueller/amlta-2024/blob/main/Session_08/imgs/vectordb.png?raw=true)

(Adapted from [Xie et al. (2023)](https://doi.org/10.1109/BigDIA60676.2023.10429609))

A vector database for retrieval is splitted into indexing and querying. While indexing is done using an encoder once, the querying is done via similarity search, e.g., cosine similarity ([Xie et al., 2023](https://doi.org/10.1109/BigDIA60676.2023.10429609)).

### Loading Documents

In [None]:
markdown_documents = os.listdir(markdown_documents_path)

md_files = []

for markdown_document in markdown_documents:
    markdown_document_path = os.path.join(markdown_documents_path, markdown_document)
    with open(markdown_document_path) as file:
        md_files.append([markdown_document, file.read()])

In [None]:
display(Markdown(md_files[0][1][:1000]))

# Game of Thrones

**_[Game of Thrones is an American fantasy drama](https://en.wikipedia.org/wiki/Fantasy_television)_**
[television series created by David Benioff and](https://en.wikipedia.org/wiki/David_Benioff)
[D. B. Weiss for HBO. It is an adaptation of A Song of](https://en.wikipedia.org/wiki/D._B._Weiss)
_[Ice and Fire, a series of fantasy novels by](https://en.wikipedia.org/wiki/A_Song_of_Ice_and_Fire)_
[George R. R. Martin, the first of which is](https://en.wikipedia.org/wiki/George_R._R._Martin) _[A Game of](https://en.wikipedia.org/wiki/A_Game_of_Thrones)_
_[Thrones. The show premiered on HBO in the United](https://en.wikipedia.org/wiki/A_Game_of_Thrones)_
States on April 17, 2011, and concluded on May 19,
2019, with 73 episodes broadcast over eight seasons.

[Set on the fictional continents of Westeros and Essos,](https://en.wikipedia.org/wiki/Westeros)
_[Game of Thrones has a large ensemble cast and follows](https://en.wikipedia.org/wiki/Ensemble_cast)_
[several story ar

In [None]:
print(md_files[0][1][:1000])

# Game of Thrones

**_[Game of Thrones is an American fantasy drama](https://en.wikipedia.org/wiki/Fantasy_television)_**
[television series created by David Benioff and](https://en.wikipedia.org/wiki/David_Benioff)
[D. B. Weiss for HBO. It is an adaptation of A Song of](https://en.wikipedia.org/wiki/D._B._Weiss)
_[Ice and Fire, a series of fantasy novels by](https://en.wikipedia.org/wiki/A_Song_of_Ice_and_Fire)_
[George R. R. Martin, the first of which is](https://en.wikipedia.org/wiki/George_R._R._Martin) _[A Game of](https://en.wikipedia.org/wiki/A_Game_of_Thrones)_
_[Thrones. The show premiered on HBO in the United](https://en.wikipedia.org/wiki/A_Game_of_Thrones)_
States on April 17, 2011, and concluded on May 19,
2019, with 73 episodes broadcast over eight seasons.

[Set on the fictional continents of Westeros and Essos,](https://en.wikipedia.org/wiki/Westeros)
_[Game of Thrones has a large ensemble cast and follows](https://en.wikipedia.org/wiki/Ensemble_cast)_
[several story ar

As we can see in the displayed markdown above, there are many cross-references within the wikipedia articles that will disturb the LLM. Therefore, we will remove the markdown links (with little help of ChatGPT).

In [None]:
def remove_markdown_links(text):
    """
    Removes Markdown links from the given text while keeping the link text.

    Args:
        text (str): The input Markdown text.

    Returns:
        str: The text with Markdown links removed.

    Yeah this was ChatGPT ;)
    """
    # Regex to match Markdown links [text](link)
    pattern = r'\[([^\]]+)\]\([^\)]+\)'
    # Replace the matched pattern with just the text inside the brackets
    cleaned_text = re.sub(pattern, r'\1', text)
    return cleaned_text

In [None]:
display(Markdown(remove_markdown_links(md_files[0][1])[:1000]))

# Game of Thrones

**_Game of Thrones is an American fantasy drama_**
television series created by David Benioff and
D. B. Weiss for HBO. It is an adaptation of A Song of
_Ice and Fire, a series of fantasy novels by_
George R. R. Martin, the first of which is _A Game of_
_Thrones. The show premiered on HBO in the United_
States on April 17, 2011, and concluded on May 19,
2019, with 73 episodes broadcast over eight seasons.

Set on the fictional continents of Westeros and Essos,
_Game of Thrones has a large ensemble cast and follows_
several story arcs throughout the course of the show.
The first major arc concerns the Iron Throne of the)
Seven Kingdoms of Westeros through a web of
political conflicts among the noble families either
vying to claim the throne or fighting for independence
from whoever sits on it. The second major arc focuses
on the last descendant of the realm's deposed ruling
dynasty, who has been exiled to Essos and is plotting
to return and reclaim the throne. The thir

In [None]:
print(remove_markdown_links(md_files[0][1])[:1000])

# Game of Thrones

**_Game of Thrones is an American fantasy drama_**
television series created by David Benioff and
D. B. Weiss for HBO. It is an adaptation of A Song of
_Ice and Fire, a series of fantasy novels by_
George R. R. Martin, the first of which is _A Game of_
_Thrones. The show premiered on HBO in the United_
States on April 17, 2011, and concluded on May 19,
2019, with 73 episodes broadcast over eight seasons.

Set on the fictional continents of Westeros and Essos,
_Game of Thrones has a large ensemble cast and follows_
several story arcs throughout the course of the show.
The first major arc concerns the Iron Throne of the)
Seven Kingdoms of Westeros through a web of
political conflicts among the noble families either
vying to claim the throne or fighting for independence
from whoever sits on it. The second major arc focuses
on the last descendant of the realm's deposed ruling
dynasty, who has been exiled to Essos and is plotting
to return and reclaim the throne. The thir

Now the text looks more readable.

In [None]:
markdown_documents = os.listdir(markdown_documents_path)

md_files = []

for markdown_document in markdown_documents:
    markdown_document_path = os.path.join(markdown_documents_path, markdown_document)
    with open(markdown_document_path) as file:
        md_files.append([markdown_document, remove_markdown_links(file.read())])

### Chunking Texts

As usually neither encoder nor decoder models can hold complete documents in their contexts, the documents are usually chunked.

We will use some chunking according to token length. It is also common to use some overlap between the chunks to represent how the contained information chain semantically ([Wang et al., 2024](https://doi.org/10.18653/v1/2024.emnlp-main.981)).

In [None]:
embedding_tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v2-base-en", model_max_length=999999999)
print("File name:", md_files[0][0], "Tokens:", len(embedding_tokenizer(md_files[0][1], truncation=False)["input_ids"]))
print("File name:", md_files[1][0], "Tokens:", len(embedding_tokenizer(md_files[1][1], truncation=False)["input_ids"]))

File name: Game_of_Thrones.md Tokens: 98908
File name: How_I_Met_Your_Mother.md Tokens: 40161


In [None]:
OVERLAP = 32
CHUNK_LENGTH = 512

md_files_chunked = []
for md_file in md_files:
    md_file_tokenized = embedding_tokenizer(md_file[1], truncation=False)["input_ids"]
    for i in range(CHUNK_LENGTH, len(md_file_tokenized), CHUNK_LENGTH-OVERLAP):
        md_files_chunked.append([md_file[0], embedding_tokenizer.decode(md_file_tokenized[i-CHUNK_LENGTH: i], skip_special_tokens=True)])

In [None]:
print(md_files_chunked[0])
print(md_files_chunked[1])

['Game_of_Thrones.md', "# game of thrones * * _ game of thrones is an american fantasy drama _ * * television series created by david benioff and d. b. weiss for hbo. it is an adaptation of a song of _ ice and fire, a series of fantasy novels by _ george r. r. martin, the first of which is _ a game of _ _ thrones. the show premiered on hbo in the united _ states on april 17, 2011, and concluded on may 19, 2019, with 73 episodes broadcast over eight seasons. set on the fictional continents of westeros and essos, _ game of thrones has a large ensemble cast and follows _ several story arcs throughout the course of the show. the first major arc concerns the iron throne of the ) seven kingdoms of westeros through a web of political conflicts among the noble families either vying to claim the throne or fighting for independence from whoever sits on it. the second major arc focuses on the last descendant of the realm's deposed ruling dynasty, who has been exiled to essos and is plotting to re

### Loading Embedding Models

In [None]:
embedding_tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v2-base-en")
embedding_model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
embedding_model.to(DEVICE)

config.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

configuration_bert.py:   0%|          | 0.00/8.24k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-bert-implementation:
- configuration_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_bert.py:   0%|          | 0.00/97.7k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-bert-implementation:
- modeling_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/275M [00:00<?, ?B/s]

JinaBertModel(
  (embeddings): JinaBertEmbeddings(
    (word_embeddings): Embedding(30528, 768, padding_idx=0)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): JinaBertEncoder(
    (layer): ModuleList(
      (0-11): 12 x JinaBertLayer(
        (attention): JinaBertAttention(
          (self): JinaBertSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.0, inplace=False)
          )
          (output): JinaBertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )

In [None]:
embeddings = embedding_model.encode([md_files_chunked[0]])

In [None]:
print(list(embeddings[0][:10]) + ["..."], len(embeddings[0]))

[-0.8023666, -0.38641185, 0.3274006, 0.07864874, -0.34118506, 0.039823297, 0.22516933, -0.09864072, 0.63542545, 0.6378624, '...'] 768


### Creating the final vector storage (Retrieval Database)

We will apply the easiest (and slowest) way of storing the vectors in a list.

In [None]:
VECTOR_DB = []

for md_file in tqdm(md_files_chunked):
    VECTOR_DB.append({"embeddings": embedding_model.encode([md_file[1]])[0], "content": md_file[1], "metadata":{"source": md_file[0]}})

  0%|          | 0/288 [00:00<?, ?it/s]

In [None]:
def cosine_similarity(a, b):
  dot_product = sum([x * y for x, y in zip(a, b)])
  norm_a = sum([x ** 2 for x in a]) ** 0.5
  norm_b = sum([x ** 2 for x in b]) ** 0.5
  return dot_product / (norm_a * norm_b)

Your TODO:

In [None]:
def retrieve(query, top_n=3, embedding_model=embedding_model):
  query_embedding = embedding_model.encode([query])[0]
  # temporary list to store (chunk, similarity) pairs
  similarities = []
  # TODO: calculate cosine similarity between query and each chunk in the VECTOR_DB
  # Hint: each VECTOR_DB entry is a dictionary with keys "embeddings" (embeddings) and "content" (text chunk)


  for entry in VECTOR_DB:
    similarity = cosine_similarity(query_embedding, entry["embeddings"])
    similarities.append([entry["content"], similarity])


  # sort by similarity in descending order, because higher similarity means more relevant chunks
  similarities.sort(key=lambda x: x[1], reverse=True)
  # finally, return the top N most relevant chunks
  return similarities[:top_n]

In [None]:
retrieved_docs = retrieve("Who plays Daenerys Targaryen?")

for doc in retrieved_docs:
    print(doc)

['##ds blacksmith\'s apprentice gendry ( joe dempsie ) and assassin jaqen h\'ghar ( tom wlaschiha ). in the stormlands, the tall warrior brienne of tarth ( gwendoline christie ) is introduced to catelyn. in king\'s landing, ned\'s best friend, king robert i baratheon ( mark addy ), shares a loveless political marriage with cersei lannister ( lena headey ). her younger twin brother, ser jaime ( nikolaj coster - waldau ), serves on the kingsguard while their younger brother tyrion ( peter dinklage ) is attended by his mistress shae ( sibel kekilli ) and mercenary bronn ( jerome flynn ). cersei\'s father is tywin ( charles ) dance ), head of house lannister and the richest man in westeros. cersei has two sons : joffrey ( jack gleeson ) and tommen ( dean - charles chapman ). joffrey is guarded by the scar - faced warrior sandor [ " the hound " clegane ( rory mccann ). [ [ 14 ] ] ] ( https : / / en. wikipedia. org / wiki / sandor _ clegane ) the king\'s small council includes his treasurer,

## Determining the Generator

In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    torch_dtype="auto",
    device_map=DEVICE,
)
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct",
    torch_dtype="auto",
    device_map=DEVICE,
)

## Defining a Generation Prompt

In [None]:
prompt = '''Use only the following context chunks to answer the question: {input_query}
Don't make up any new information. Here are the chunks:
{chunks}
Now anwser the question: {input_query}'''

# Generating based on Documents

So now let's bring everythin together. We can reuse the ``generate_response`` method for querying an LLM.

In [None]:
def generate_response(messages):
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    generated_ids = model.generate(
        **model_inputs,
        max_new_tokens=512
    )
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

In [None]:
input_query = "Who plays Daenerys Targaryen?"
retrieved_knowledge = retrieve(input_query)

print('Retrieved knowledge:')
for chunk, similarity in retrieved_knowledge:
    print(f' - (similarity: {similarity:.2f}) {chunk}')

Retrieved knowledge:
 - (similarity: 0.80) ##ds blacksmith's apprentice gendry ( joe dempsie ) and assassin jaqen h'ghar ( tom wlaschiha ). in the stormlands, the tall warrior brienne of tarth ( gwendoline christie ) is introduced to catelyn. in king's landing, ned's best friend, king robert i baratheon ( mark addy ), shares a loveless political marriage with cersei lannister ( lena headey ). her younger twin brother, ser jaime ( nikolaj coster - waldau ), serves on the kingsguard while their younger brother tyrion ( peter dinklage ) is attended by his mistress shae ( sibel kekilli ) and mercenary bronn ( jerome flynn ). cersei's father is tywin ( charles ) dance ), head of house lannister and the richest man in westeros. cersei has two sons : joffrey ( jack gleeson ) and tommen ( dean - charles chapman ). joffrey is guarded by the scar - faced warrior sandor [ " the hound " clegane ( rory mccann ). [ [ 14 ] ] ] ( https : / / en. wikipedia. org / wiki / sandor _ clegane ) the king's sm

In [None]:
chunks = '\n'.join([f' - {chunk}' for chunk, similarity in retrieved_knowledge])
question = prompt.format(input_query=input_query, chunks=chunks)

In [None]:
print(question)

Use only the following context chunks to answer the question: Who plays Daenerys Targaryen?
Don't make up any new information. Here are the chunks:
 - ##ds blacksmith's apprentice gendry ( joe dempsie ) and assassin jaqen h'ghar ( tom wlaschiha ). in the stormlands, the tall warrior brienne of tarth ( gwendoline christie ) is introduced to catelyn. in king's landing, ned's best friend, king robert i baratheon ( mark addy ), shares a loveless political marriage with cersei lannister ( lena headey ). her younger twin brother, ser jaime ( nikolaj coster - waldau ), serves on the kingsguard while their younger brother tyrion ( peter dinklage ) is attended by his mistress shae ( sibel kekilli ) and mercenary bronn ( jerome flynn ). cersei's father is tywin ( charles ) dance ), head of house lannister and the richest man in westeros. cersei has two sons : joffrey ( jack gleeson ) and tommen ( dean - charles chapman ). joffrey is guarded by the scar - faced warrior sandor [ " the hound " cleg

In [None]:
set_seed(0) # for reproducibility
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": question}
]
response = generate_response(messages)
print(response)

Emilia Clarke plays Daenerys Targaryen.


# Creating a RAG Chain

Your TODO:

In [None]:
def generate_rag_response(input_query):
    retrieved_knowledge = retrieve(input_query)

    print('Retrieved knowledge:')
    for chunk, similarity in retrieved_knowledge:
        print(f' - (similarity: {similarity:.2f}) {chunk}')
    chunks = '\n'.join([f' - {chunk}' for chunk, similarity in retrieved_knowledge])
    question = prompt.format(input_query=input_query, chunks=chunks)
    set_seed(0) # for reproducibility
    messages = [
        {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
        {"role": "user", "content": question}
    ]
    response = generate_response(messages)


    return response

# Finally a RAG Chain

In [None]:
set_seed(0)
input_query = input('Ask me a question: ')
print(generate_rag_response(input_query))

Ask me a question: Who plays Jon Snow?
Retrieved knowledge:
 - (similarity: 0.77) until november 2008. [ [ 63 ] ] the pilot episode, " winter is ] ( https : / / en. wikipedia. org / wiki / winter _ is _ coming ) coming ", was shot in 2009 ; after its poor reception following a private viewing, hbo demanded an extensive re - shoot ( about 90 percent of the episode, with cast and directorial changes ). [ [ 56 ] [ 64 ] ] the pilot reportedly cost hbo $ 5 – 10 million to produce, [ [ 65 ] ] while the first season's budget was estimated at $ 50 – 60 million. [ [ 66 ] ] for the second season, the series received a 15 - percent budget increase for the climactic [ battle in " blackwater " ( which had an $ 8 million budget ). [ [ 67 ] [ 68 ] ] between 2012 and 2015, the average ] ( https : / / en. wikipedia. org / wiki / blackwater _ ( game _ of _ thrones ) ) budget per episode increased from $ 6 million [ [ 69 ] ] to " at least " $ 8 million. [ [ 70 ] ] the sixth - season budget was over $ 10 

<a href="https://imgflip.com/i/9fsgxc"><img src="https://i.imgflip.com/9fsgxc.jpg" title="made at imgflip.com"/></a><div><a href="https://imgflip.com/memegenerator">from Imgflip Meme Generator</a></div>