<a href="https://colab.research.google.com/github/sathishmtech01/llm/blob/main/Q_A_chatbot_with_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview

- Use [Langchain](https://python.langchain.com/en/latest/index.html) to **<font color='orange'>build a chatbot that can answer questions about</font>** [Harry Potter books](https://www.kaggle.com/datasets/hinepo/harry-potter-books-in-pdf-1-7)
- **<font color='orange'>Flexible and customizable RAG pipeline (Retrieval Augmented Generation)</font>**
- Experiment with various LLMs (Large Language Models)
- Use [FAISS vector store](https://python.langchain.com/docs/integrations/vectorstores/faiss) to store text embeddings created with [Sentence Transformers](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) from 🤗. FAISS runs on GPU and it is much faster than Chroma
- Use [Retrieval chain](https://python.langchain.com/docs/modules/data_connection/retrievers/) to retrieve relevant passages from embedded text
- Summarize retrieved passages
- Leverage Kaggle dual GPU (2 * T4) with [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/index)
- Chat UI with [Gradio](https://www.gradio.app/guides/quickstart)

**<font color='green'>No need to create any API key to use this notebook! Everything is open source.</font>**

**<font color='orange'>Don't forget to upvote the notebook if you learn from it or use it!</font>**

### Models

- [TheBloke/wizardLM-7B-HF](https://huggingface.co/TheBloke/wizardLM-7B-HF)
- [daryl149/llama-2-7b-chat-hf](https://huggingface.co/daryl149/llama-2-7b-chat-hf)
- [daryl149/llama-2-13b-chat-hf](https://huggingface.co/daryl149/llama-2-13b-chat-hf)
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

![image.png](attachment:ceef601b-8cca-48a5-a433-54c0070f1f44.png)

img source: HinePo

In [3]:
! nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-11cbdef1-f494-f6ca-e119-a471e7756d2f)


# Installs

In [4]:
%%time

from IPython.display import clear_output

! pip install sentence_transformers==2.2.2

! pip install -qq -U langchain
! pip install -qq -U tiktoken
! pip install -qq -U pypdf
! pip install -qq -U faiss-gpu
! pip install -qq -U InstructorEmbedding

! pip install -qq -U transformers
! pip install -qq -U accelerate
! pip install -qq -U bitsandbytes

clear_output()

CPU times: user 1.11 s, sys: 184 ms, total: 1.29 s
Wall time: 2min 40s


In [5]:
!pip install accelerate
!pip install -i https://pypi.org/simple/ bitsandbytes

Looking in indexes: https://pypi.org/simple/


# Imports

In [6]:
%%time

import warnings
warnings.filterwarnings("ignore")

import os
import glob
import textwrap
import time

import langchain

### loaders
from langchain.document_loaders import PyPDFLoader, DirectoryLoader

### splits
from langchain.text_splitter import RecursiveCharacterTextSplitter

### prompts
from langchain import PromptTemplate, LLMChain

### vector stores
from langchain.vectorstores import FAISS

### models
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceInstructEmbeddings

### retrievers
from langchain.chains import RetrievalQA

import torch
import transformers
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    BitsAndBytesConfig,
    pipeline
)

clear_output()

CPU times: user 9.72 s, sys: 1.41 s, total: 11.1 s
Wall time: 20 s


In [7]:
print('langchain:', langchain.__version__)
print('torch:', torch.__version__)
print('transformers:', transformers.__version__)

langchain: 0.1.16
torch: 2.2.1+cu121
transformers: 4.40.0


In [8]:
# sorted(glob.glob('/kaggle/input/harry-potter-books-in-pdf-1-7/HP books/*'))
sorted(glob.glob('llm/sample.pdf'))

['llm/sample.pdf']

# CFG

- CFG class enables easy and organized experimentation

In [9]:
class CFG:
    # LLMs
    model_name = 'llama2-13b-chat' # wizardlm, llama2-7b-chat, llama2-13b-chat, mistral-7B
    temperature = 0
    top_p = 0.95
    repetition_penalty = 1.15

    # splitting
    split_chunk_size = 800
    split_overlap = 0

    # embeddings
    embeddings_model_repo = 'sentence-transformers/all-MiniLM-L6-v2'

    # similar passages
    k = 6

    # paths
    PDFs_path = 'llm/'
    Embeddings_path =  'llm/faiss-hp-sentence-transformers'
    Output_folder = 'llm/harry-potter-vectordb'

# Define model

In [10]:
def get_model(model = CFG.model_name):

    print('\nDownloading model: ', model, '\n\n')

    if model == 'wizardlm':
        model_repo = 'TheBloke/wizardLM-7B-HF'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        bnb_config = BitsAndBytesConfig(
            load_in_4bit = True,
            bnb_4bit_quant_type = "nf4",
            bnb_4bit_compute_dtype = torch.float16,
            bnb_4bit_use_double_quant = True,
        )

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            quantization_config = bnb_config,
            device_map = 'auto',
            low_cpu_mem_usage = True
        )

        max_len = 1024

    elif model == 'llama2-7b-chat':
        model_repo = 'daryl149/llama-2-7b-chat-hf'

        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        bnb_config = BitsAndBytesConfig(
            load_in_4bit = True,
            bnb_4bit_quant_type = "nf4",
            bnb_4bit_compute_dtype = torch.float16,
            bnb_4bit_use_double_quant = True,
        )

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            quantization_config = bnb_config,
            device_map = 'auto',
            low_cpu_mem_usage = True,
            trust_remote_code = True
        )

        max_len = 2048

    elif model == 'llama2-13b-chat':
        model_repo = 'daryl149/llama-2-13b-chat-hf'

        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        bnb_config = BitsAndBytesConfig(
            load_in_4bit = True,
            bnb_4bit_quant_type = "nf4",
            bnb_4bit_compute_dtype = torch.float16,
            bnb_4bit_use_double_quant = True,
        )

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            quantization_config = bnb_config,
            device_map = 'auto',
            low_cpu_mem_usage = True,
            trust_remote_code = True
        )

        max_len = 2048 # 8192

    elif model == 'mistral-7B':
        model_repo = 'mistralai/Mistral-7B-v0.1'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        bnb_config = BitsAndBytesConfig(
            load_in_4bit = True,
            bnb_4bit_quant_type = "nf4",
            bnb_4bit_compute_dtype = torch.float16,
            bnb_4bit_use_double_quant = True,
        )

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            quantization_config = bnb_config,
            device_map = 'auto',
            low_cpu_mem_usage = True,
        )

        max_len = 1024

    else:
        print("Not implemented model (tokenizer and backbone)")

    return tokenizer, model, max_len

In [11]:
%%time

tokenizer, model, max_len = get_model(model = CFG.model_name)

clear_output()

CPU times: user 46.2 s, sys: 1min, total: 1min 46s
Wall time: 5min 48s


In [12]:
model.eval()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 5120, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )


In [13]:
### check how Accelerate split the model across the available devices (GPUs)
model.hf_device_map

{'': 0}

# 🤗 pipeline

- Hugging Face pipeline

In [14]:
### hugging face pipeline
pipe = pipeline(
    task = "text-generation",
    model = model,
    tokenizer = tokenizer,
    pad_token_id = tokenizer.eos_token_id,
#     do_sample = True,
    max_length = max_len,
    temperature = CFG.temperature,
    top_p = CFG.top_p,
    repetition_penalty = CFG.repetition_penalty
)

### langchain pipeline
llm = HuggingFacePipeline(pipeline = pipe)

In [15]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7baf1c345ea0>)

In [43]:
%%time
### testing model, not using the harry potter books yet
### answer is not necessarily related to harry potter
query = "what is bert fp"
llm.invoke(query)

CPU times: user 2min 3s, sys: 2.43 ms, total: 2min 3s
Wall time: 2min 8s


"what is bert fp16, and how does it differ from the original BERT model?\n\nBERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model developed by Google that has achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. In recent years, there have been several variants of BERT that have been proposed to improve its performance or adapt it to specific task domains. One such variant is BERT-FP16, which uses 16-bit floating point numbers (fp16) instead of the standard 32-bit floating point numbers (fp32) used in the original BERT model.\n\nIn this article, we will explore what is BERT FP16, and how it differs from the original BERT model. We will also discuss some of the key benefits and trade-offs of using BERT-FP16 for NLP tasks.\n\nWhat is BERT FP16?\n\nBERT FP16 is a variant of the popular BERT language model that uses 16-bit floating point numbers (fp16) instead of the standard 32-bit floating point numbers (fp

# 🦜🔗 Langchain

- Multiple document retriever with LangChain

In [17]:
CFG.model_name

'llama2-13b-chat'

# Loader

- [Directory loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/file_directory) for multiple files
- This step is not necessary if you are just loading the vector database
- This step is necessary if you are creating embeddings. In this case you need to:
    - load de PDF files
    - split into chunks
    - create embeddings
    - save the embeddings in a vector store
    - After that you can just load the saved embeddings to do similarity search with the user query, and then use the LLM to answer the question
    
You can comment out this section if you use the embeddings I already created.

In [18]:
%%time

loader = DirectoryLoader(
    CFG.PDFs_path,
    glob="./*.pdf",
    loader_cls=PyPDFLoader,
    show_progress=True,
    use_multithreading=True
)

documents = loader.load()

100%|██████████| 1/1 [00:00<00:00,  1.29it/s]

CPU times: user 712 ms, sys: 0 ns, total: 712 ms
Wall time: 780 ms





In [19]:
print(f'We have {len(documents)} pages in total')

We have 15 pages in total


In [20]:
documents[8].page_content

'metrics for personalized text generation ,exploring\neffectively prompting LLMs for character aligning\nin a story. We hope HPD can play an crucial role\nin moving through cracking them.\nEthical Statement\nTo avoid the potential issue of using Harry Potter\nnovels, we promise the annotated dataset is devel-\noped for non-commercial use. Moreover, we only\nprovide the line number and page number of each\ncollected dialogue in Harry Potter rather than the\ndetailed content of each dialogue session. We fur-\nther supply the script to extract corresponding raw\ndialogue data from the novels according to the pro-\nvided line and page numbers, in which the data\nformat is the same as the data examples in Table\n11. As for the annotated character attributes and re-\nlations, we have our own copyright and will release\nfor research communities.\nLimitations\nThe main target of this paper is towards building\ndialogue agents for characters in a story. In this\npaper, we present a new benchmar

# Splitter

- Splitting the text into chunks so its passages are easily searchable for similarity
- This step is also only necessary if you are creating the embeddings
- [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/reference/modules/document_loaders.html?highlight=RecursiveCharacterTextSplitter#langchain.document_loaders.MWDumpLoader)

In [21]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = CFG.split_chunk_size,
    chunk_overlap = CFG.split_overlap
)

texts = text_splitter.split_documents(documents)

print(f'We have created {len(texts)} chunks from {len(documents)} pages')

We have created 84 chunks from 15 pages


# Create Embeddings


- Embedd and store the texts in a Vector database (FAISS)
- [LangChain Vector Stores docs](https://python.langchain.com/docs/modules/data_connection/vectorstores/)
- [FAISS - langchain](https://python.langchain.com/docs/integrations/vectorstores/faiss)
- [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks - paper Aug/2019](https://arxiv.org/pdf/1908.10084.pdf)
- [This is a nice 4 minutes video about vector stores](https://www.youtube.com/watch?v=dN0lsF2cvm4)

___

- If you use Chroma vector store it will take ~35 min to create embeddings
- If you use FAISS vector store on GPU it will take just ~3 min

___

We need to create the embeddings only once, and then we can just load the vector store and query the database using similarity search.

Loading the embeddings takes only a few seconds.

I uploaded the embeddings to a Kaggle Dataset so we just load it from [here](https://www.kaggle.com/datasets/hinepo/faiss-hp-sentence-transformers).

In [22]:
%%time

### we create the embeddings only if they do not exist yet
if not os.path.exists(CFG.Embeddings_path + '/index.faiss'):

    ### download embeddings model
    embeddings = HuggingFaceInstructEmbeddings(
        model_name = CFG.embeddings_model_repo,
        model_kwargs = {"device": "cuda"}
    )

    ### create embeddings and DB
    vectordb = FAISS.from_documents(
        documents = texts,
        embedding = embeddings
    )

    ### persist vector database
    vectordb.save_local(f"{CFG.Output_folder}/faiss_index_hp") # save in output folder
#     vectordb.save_local(f"{CFG.Embeddings_path}/faiss_index_hp") # save in input folder

.gitattributes:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

load INSTRUCTOR_Transformer
max_seq_length  512
CPU times: user 1.83 s, sys: 323 ms, total: 2.15 s
Wall time: 5.42 s


If creating embeddings, remember that on Kaggle we can not write data to the input folder.

So just write (save) the embeddings to the output folder and then load them from there.

# Load vector database

- After saving the vector database, we just load it from the Kaggle Dataset I mentioned
- Obviously, the embeddings function to load the embeddings must be the same as the one used to create the embeddings

In [28]:
%%time

### download embeddings model
embeddings = HuggingFaceInstructEmbeddings(
    model_name = CFG.embeddings_model_repo,
    model_kwargs = {"device": "cuda"}
)

### load vector DB embeddings
vectordb = FAISS.load_local(
    # CFG.Embeddings_path, # from input folder
    CFG.Output_folder + '/faiss_index_hp', # from output folder
    embeddings,
    allow_dangerous_deserialization=True
)

clear_output()

CPU times: user 87.9 ms, sys: 30.6 ms, total: 119 ms
Wall time: 122 ms


In [29]:
### test if vector DB was loaded correctly
vectordb.similarity_search('magic creatures')

[Document(page_content='"export": "None",\n"belongings": "Winter green wood phoenix feathers wand, owl, stealth jacket, sleeve spare mirror, crossbow flying broom,\ngolden eggs, three strong cups, live maps, fake Galon",\n"affiliation": "Hogwarts, Dumbledore",\n"lineage": "Mixed wizard",\n"title": "Boys who do not die, warriors, teacher Dumbledore",\n"spells": "Except for your weapons, fluorescent flashes, separation from left and right, calling god guards, bustling,\nfour-point cracking, pouring force, funny bones, fainting to the ground, pointing to me, obstacles, recovery as early"\nSpeakers’ relations with Harry :\nCho Chang is Harry’s friend and classmate.\nHarry’s affection to her: 7.0,\nHarry’s familiarity with her: 4.0,\nHis affection to Harry: 7.0,\nHis familiarity with Harry: 4.0', metadata={'source': 'llm/sample.pdf', 'page': 14}),
 Document(page_content='AttributesAge 14\nSpells Expelliarmus\n… …\nBelongings Scabbers , …\nAttributesAge 14\nSpells Expelliarmus , Lumos\n… …\n

# Prompt Template

- Custom prompt

In [30]:
prompt_template = """
Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

{context}

Question: {question}
Answer:"""


PROMPT = PromptTemplate(
    template = prompt_template,
    input_variables = ["context", "question"]
)

In [31]:
# llm_chain = LLMChain(prompt=PROMPT, llm=llm)
# llm_chain

# Retriever chain

- Retriever to retrieve relevant passages
- Chain to answer questions
- [RetrievalQA: Chain for question-answering](https://python.langchain.com/docs/modules/data_connection/retrievers/)

In [32]:
retriever = vectordb.as_retriever(search_kwargs = {"k": CFG.k, "search_type" : "similarity"})

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff", # map_reduce, map_rerank, stuff, refine
    retriever = retriever,
    chain_type_kwargs = {"prompt": PROMPT},
    return_source_documents = True,
    verbose = False
)

In [33]:
### testing MMR search
question = "Which are Hagrid's favorite animals?"
vectordb.max_marginal_relevance_search(question, k = CFG.k)

[Document(page_content="shown great concern for Harry. Hence, his Familiar-\nity and Affection for Harry are high, while Harry’s\nFamiliarity with Dumbledore is relatively low.\n Affection\n 10  Parents and relatives who died for Harry.         Examples: Harry's parents\n 9  Characters who are very close with Harry and save Harry's life.         Examples: Ron in Book-7\n 8  Characters who are in love with Harry.          Examples: Ginny\n 7  Best friends          Examples: Ron in Book-2\n 6  Close Friends and very kindly to Harry.          Examples: Hagrid\n 5  Characters who often help Harry.           Examples: Dumbledore\n 4  Characters who are relatively friendly to Harry.          Examples: Neville\n 3  Normal Teammates.           Examples: Wood\n 2  Normal Classmates/Teachers.            Examples: Lavender Brown", metadata={'source': 'llm/sample.pdf', 'page': 4}),
 Document(page_content='GPT-3: I was just asking it where it came from. It’s not\nlike I was trying to make friends w

In [34]:
### testing similarity search
question = "Which are Hagrid's favorite animals?"
vectordb.similarity_search(question, k = CFG.k)

[Document(page_content="shown great concern for Harry. Hence, his Familiar-\nity and Affection for Harry are high, while Harry’s\nFamiliarity with Dumbledore is relatively low.\n Affection\n 10  Parents and relatives who died for Harry.         Examples: Harry's parents\n 9  Characters who are very close with Harry and save Harry's life.         Examples: Ron in Book-7\n 8  Characters who are in love with Harry.          Examples: Ginny\n 7  Best friends          Examples: Ron in Book-2\n 6  Close Friends and very kindly to Harry.          Examples: Hagrid\n 5  Characters who often help Harry.           Examples: Dumbledore\n 4  Characters who are relatively friendly to Harry.          Examples: Neville\n 3  Normal Teammates.           Examples: Wood\n 2  Normal Classmates/Teachers.            Examples: Lavender Brown", metadata={'source': 'llm/sample.pdf', 'page': 4}),
 Document(page_content='tory. Although the Harry Potter series is alreadyScene : Harry and his family visit the repti

# Post-process outputs

- Format llm response
- Cite sources (PDFs)
- Change `width` parameter to format the output

In [35]:
def wrap_text_preserve_newlines(text, width=700):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text


def process_llm_response(llm_response):
    ans = wrap_text_preserve_newlines(llm_response['result'])

    sources_used = ' \n'.join(
        [
            source.metadata['source'].split('/')[-1][:-4]
            + ' - page: '
            + str(source.metadata['page'])
            for source in llm_response['source_documents']
        ]
    )

    ans = ans + '\n\nSources: \n' + sources_used
    return ans

In [36]:
def llm_ans(query):
    start = time.time()

    llm_response = qa_chain.invoke(query)
    ans = process_llm_response(llm_response)

    end = time.time()

    time_elapsed = int(round(end - start, 0))
    time_elapsed_str = f'\n\nTime elapsed: {time_elapsed} s'
    return ans + time_elapsed_str

# Ask questions

- Question Answering from multiple documents
- Invoke QA Chain
- Talk to your data

In [37]:
CFG.model_name

'llama2-13b-chat'

In [44]:
query = "What id prompt"
print(llm_ans(query))


Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

with a prompt that only includes task description,
one dialogue example, and dialogue history; 2)
rich-persona setting (denoted as Per-Model ) with
a prompt contains all annotated background infor-
mation in HPD as in-context learning exemplars.
The detailed prompts can be found in Appendix
3https://github.com/tatsu-lab/stanfordalpaca

Prompts : Your task is to act as a Harry Potter-like dialogue agent in a Magic World. There is a dialogue between Harry
Potter and others. You are required to give a response to the dialogue from the perspective of Harry Potter.
Here is an example:
Dialogue:{ "Petunia: Bad news, Vernon, Mrs. Figg’s broken her leg. She can’t take him. Now what?",
......
cupboard from now until Christmas." }
Harry’s Response: I know, I will obediently obedient, and I

In [39]:
query = "Is Malfoy an ally of Voldemort?"
print(llm_ans(query))


Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

a dialogue agent should be not only relevant to the
context, but also seem like something Harry would
say at the time and scene.
Figure 1 shows some main factors that affect
behaviors of Harry in a conversation. The first fac-
tor is the conversation history, which is the most
important factor that determines Harry’s response.
Thescene , which is the second factor, provides de-
tails about the motivation ( Hermione invites Harry
to have a butterbeer with Ron in the Three Brrom-
sticks ) of this dialogue. The third factor is the par-
ticipants’ information ( attributes and relations ),
obviously, Harry will say very different things to
different characters, such as Malfoy and Hermione.
The latter two factors belong to the background
information and are dynamically determined by th

In [40]:
query = "What are horcrux?"
print(llm_ans(query))


Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

inborn; (2) nurture. The former denotes some in-
nate attributes or abilities, which contains {Gender ,
Age,Lineage ,Talents , and Looks}. The latter refers
to properties acquired through efforts, including
{Achievement ,Title,Belongings ,Export ,Hobby ,
Character ,Spells andNickname }(some cases are
presented in Figure 2). In total, we collect 13 at-
tributes for each character, which basically cover
most properties in the Harry Potter series.
Therelations between Harry and other charac-
ters can be classified into binary relations and dis-
crete relations. The former include 8 types, which
are{Friend ,Classmate ,Teacher ,Family ,Lover ,
Opponent ,Teammate ,Enemy }. Multiple binary
relations can exist between two characters. Harry
and Ron, for instance, are friends, classmates, 

In [41]:
query = "Give me 5 examples of cool potions and explain what they do"
print(llm_ans(query))


Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

1  First Meeting.           Examples: Harry ﬁrst met Ron and Hermione
 0  Stranger.
 -2  Rude/Frivolous/Mean characters.     Examples: Draco Malfoy in Book-1,  Filch
 -4  Deliberately bullying/deliberately targeting.            Examples: Snape, Dudley
 -6  Maliciously targeting and harm.            Examples: Draco Malfoy in Book-5
 -8  Intentionally inﬂict harm.               Examples: Bellatrix Lestrange  in Book-5
 -10  Kill Harry's parents.                 Examples: VoldemortFigure 3: Affection Definition and Examples.
Figure 4: Familiarity Definition and Examples.
Figure 2 is another example of how attributes
and relationships have changed over the course of
the story. In Book 1-Chapter 7, Harry has just met
Ron on the train to Hogwarts and is an alien in the

"export": "None

# Gradio Chat UI

- **<font color='orange'>At the moment this part only works on Google Colab. Gradio and Kaggle started having compatibility issues recently.</font>**
- If you plan to use the interface, it is preferable to do so in Google Colab
- I'll leave this section commented out for now
- Chat UI prints below

___

- Create a chat UI with [Gradio](https://www.gradio.app/guides/quickstart)
- [ChatInterface docs](https://www.gradio.app/docs/chatinterface)
- The notebook should be running if you want to use the chat interface

In [42]:
# import locale
# locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# ! pip install --upgrade gradio -qq
# clear_output()

In [None]:
# import gradio as gr
# print(gr.__version__)

In [None]:
# def predict(message, history):
#     # output = message # debug mode

#     output = str(llm_ans(message)).replace("\n", "<br/>")
#     return output

# demo = gr.ChatInterface(
#     predict,
#     title = f' Open-Source LLM ({CFG.model_name}) for Harry Potter Question Answering'
# )

# demo.queue()
# demo.launch()

![image.png](attachment:413fe7a3-6534-45b5-b6e3-7fc86e982cf1.png)

![image.png](attachment:976f4bf4-7626-4d4a-b773-3eebd7e9f000.png)

# Conclusions

- Feel free to fork and optimize the code. Lots of things can be improved.

- Things I found had the most impact on models output quality in my experiments:
    - Prompt engineering
    - Bigger models
    - Other models families
    - Splitting: chunk size, overlap
    - Search: Similarity, MMR, k
    - Pipeline parameters (temperature, top_p, penalty)
    - Embeddings function
    - LLM parameters (max len)


- LangChain, Hugging Face and Gradio are awesome libs!

- **<font color='orange'>If you liked this notebook, don't forget to show your support with an Upvote!</font>**

- In case you are interested in LLMs, I also have some other notebooks you might want to check:

    - [Instruction Finetuning](https://www.kaggle.com/code/hinepo/llm-instruction-finetuning-wandb)
    - [Preference Finetuning - LLM Alignment](https://www.kaggle.com/code/hinepo/llm-alignment-preference-finetuning)
    - [Synthetic Data for Finetuning](https://www.kaggle.com/code/hinepo/synthetic-data-creation-for-llms)
    - [Safeguards and Guardrails](https://www.kaggle.com/code/hinepo/llm-safeguards-and-guardrails)
    
___

🦜🔗🤗

![image.png](attachment:68773819-4358-4ded-be3e-f1d275103171.png)