

### Models

- [TheBloke/wizardLM-7B-HF](https://huggingface.co/TheBloke/wizardLM-7B-HF)
- [daryl149/llama-2-7b-chat-hf](https://huggingface.co/daryl149/llama-2-7b-chat-hf)
- [daryl149/llama-2-13b-chat-hf](https://huggingface.co/daryl149/llama-2-13b-chat-hf)
- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)

In [1]:
! nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-93de3be4-cfc1-edfb-5c00-d241f4dfdf77)


# Installs

In [2]:
%%time

from IPython.display import clear_output

! pip install sentence_transformers==2.2.2

! pip install -qq -U langchain
! pip install -qq -U tiktoken
! pip install -qq -U pypdf
! pip install -qq -U faiss-gpu
! pip install -qq -U InstructorEmbedding 

! pip install -qq -U transformers 
! pip install -qq -U accelerate
! pip install -qq -U bitsandbytes

clear_output()

CPU times: user 4.49 s, sys: 7.12 s, total: 11.6 s
Wall time: 2min 33s


# Imports

In [3]:
%%time

import warnings
warnings.filterwarnings("ignore")

import os
import glob
import textwrap
import time

import langchain

# loaders
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader

# splits
from langchain.text_splitter import RecursiveCharacterTextSplitter

# prompts
from langchain import PromptTemplate, LLMChain

# vector stores
from langchain.vectorstores import FAISS

# models
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceInstructEmbeddings

# retrievers
from langchain.chains import RetrievalQA

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

print('langchain:', langchain.__version__)
print('torch:', torch.__version__)
print('transformers:', transformers.__version__)

langchain: 0.1.10
torch: 2.2.1+cu121
transformers: 4.38.2
CPU times: user 4.49 s, sys: 1.24 s, total: 5.73 s
Wall time: 4.26 s


In [4]:
sorted(glob.glob('/data/HP_Books/*'))

['/data/HP_Books/Harry Potter - Book 1 - The Sorcerers Stone.pdf',
 '/data/HP_Books/Harry Potter - Book 2 - The Chamber of Secrets.pdf',
 '/data/HP_Books/Harry Potter - Book 3 - The Prisoner of Azkaban.pdf',
 '/data/HP_Books/Harry Potter - Book 4 - The Goblet of Fire.pdf',
 '/data/HP_Books/Harry Potter - Book 5 - The Order of the Phoenix.pdf',
 '/data/HP_Books/Harry Potter - Book 6 - The Half-Blood Prince.pdf',
 '/data/HP_Books/Harry Potter - Book 7 - The Deathly Hallows.pdf']

# CFG

- CFG class enables easy and organized experimentation 

In [5]:
class CFG:
    # LLMs
    model_name = 'llama2-13b-chat' # wizardlm, llama2-7b-chat, llama2-13b-chat, mistral-7B
    temperature = 0,
    top_p = 0.95,
    repetition_penalty = 1.15    

    # splitting
    split_chunk_size = 800
    split_overlap = 0
    
    # embeddings
    embeddings_model_repo = 'sentence-transformers/all-MiniLM-L6-v2'    

    # similar passages
    k = 3
    
    # paths
    PDFs_path = '/data/HP_Books/'
    Embeddings_path =  '/data/faiss-hp-sentence-transformers'
    Persist_directory = './harry-potter-vectordb'  

# Define model

In [6]:
def get_model(model = CFG.model_name):

    print('\nDownloading model: ', model, '\n\n')

    if model == 'wizardlm':
        model_repo = 'TheBloke/wizardLM-7B-HF'
        
        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True
        )
        
        max_len = 1024

    elif model == 'llama2-7b-chat':
        model_repo = 'daryl149/llama-2-7b-chat-hf'
        
        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True,
            trust_remote_code = True
        )
        
        max_len = 2048

    elif model == 'llama2-13b-chat':
        model_repo = 'daryl149/llama-2-13b-chat-hf'
        
        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,        
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True,
            trust_remote_code = True
        )
        
        max_len = 2048 # 8192

    elif model == 'mistral-7B':
        model_repo = 'mistralai/Mistral-7B-v0.1'
        
        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit = True,
            device_map = 'auto',
            torch_dtype = torch.float16,
            low_cpu_mem_usage = True,
        )
        
        max_len = 1024

    else:
        print("Not implemented model (tokenizer and backbone)")

    return tokenizer, model, max_len

In [7]:
%%time

tokenizer, model, max_len = get_model(model = CFG.model_name)


Downloading model:  llama2-13b-chat 




HBox(children=(HTML(value='tokenizer_config.json'), FloatProgress(value=0.0, max=727.0), HTML(value='')))




HBox(children=(HTML(value='tokenizer.model'), FloatProgress(value=0.0, max=499723.0), HTML(value='')))




HBox(children=(HTML(value='tokenizer.json'), FloatProgress(value=0.0, max=1842665.0), HTML(value='')))




HBox(children=(HTML(value='special_tokens_map.json'), FloatProgress(value=0.0, max=411.0), HTML(value='')))




HBox(children=(HTML(value='config.json'), FloatProgress(value=0.0, max=507.0), HTML(value='')))

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.





HBox(children=(HTML(value='pytorch_model.bin.index.json'), FloatProgress(value=0.0, max=33444.0), HTML(value='…




HBox(children=(HTML(value='Downloading shards'), FloatProgress(value=0.0, max=3.0), HTML(value='')))

HBox(children=(HTML(value='pytorch_model-00001-of-00003.bin'), FloatProgress(value=0.0, max=9948728430.0), HTM…




HBox(children=(HTML(value='pytorch_model-00002-of-00003.bin'), FloatProgress(value=0.0, max=9904165024.0), HTM…




HBox(children=(HTML(value='pytorch_model-00003-of-00003.bin'), FloatProgress(value=0.0, max=6178983625.0), HTM…





HBox(children=(HTML(value='Loading checkpoint shards'), FloatProgress(value=0.0, max=3.0), HTML(value='')))




HBox(children=(HTML(value='generation_config.json'), FloatProgress(value=0.0, max=137.0), HTML(value='')))


CPU times: user 34.6 s, sys: 42.7 s, total: 1min 17s
Wall time: 2min 59s


In [8]:
model.eval()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 5120, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )


In [9]:
### check how Accelerate split the model across the available devices (GPUs)
model.hf_device_map

{'': 0}

# 🤗 pipeline

- Hugging Face pipeline

In [10]:
pipe = pipeline(
    task = "text-generation",
    model = model,
    tokenizer = tokenizer,
    pad_token_id = tokenizer.eos_token_id,
    max_length = max_len,
    temperature = CFG.temperature,
    top_p = CFG.top_p,
    repetition_penalty = CFG.repetition_penalty
)

llm = HuggingFacePipeline(pipeline = pipe)

In [11]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7f3130386a30>)

In [12]:
%%time
### testing model, not using the harry potter books yet
### answer is not necessarily related to harry potter
query = "Give me 5 examples of cool potions and explain what they do"
llm.invoke(query)

CPU times: user 51.2 s, sys: 908 ms, total: 52.1 s
Wall time: 54.2 s


".\n\nSure thing! Here are five examples of cool potions in the world of Dungeons & Dragons, along with a brief description of their effects:\n\n1. Potion of Healing: This potion restores hit points to the drinker, healing wounds and injuries sustained during combat or other physical activities. It's a staple of many adventurers' inventories, as it can be used to recover from dangerous battles or long journeys.\n2. Potion of Invisibility: As its name suggests, this potion grants the drinker temporary invisibility, allowing them to move undetected and strike from unexpected angles. It's often used by rogues and assassins to slip past guards or gain an advantage in stealthy situations.\n3. Potion of Speed: This potion increases the drinker's speed for a short period of time, allowing them to move faster and cover more ground than normal. It's useful for races like halflings and gnomes, who already have high movement speeds, but can also be helpful for other classes that rely on mobility.

# 🦜🔗 Langchain

- Multiple document retriever with LangChain

In [13]:
CFG.model_name

'llama2-13b-chat'

# Loader

- [Directory loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/file_directory) for multiple files
- This step is not necessary if you are just loading the vector database
- This step is necessary if you are creating embeddings. In this case you need to:
    - load de PDF files
    - split into chunks
    - create embeddings
    - save the embeddings in a vector store
    - After that you can just load the saved embeddings to do similarity search with the user query, and then use the LLM to answer the question
    
You can comment out this section if you use the embeddings I already created.

In [14]:
%%time

loader = DirectoryLoader(
    CFG.PDFs_path,
    glob="./*.pdf",
    loader_cls=PyPDFLoader,
    show_progress=True,
    use_multithreading=True
)

documents = loader.load()

100%|██████████| 7/7 [01:14<00:00, 10.71s/it]

CPU times: user 1min 16s, sys: 440 ms, total: 1min 16s
Wall time: 1min 14s





In [15]:
print(f'We have {len(documents)} pages in total')

We have 4114 pages in total


In [16]:
documents[8].page_content

"8Ron\nP.S. Percy's Head Boy. He got the letter last week.Harry glanced back at the photograph. Percy, who was in his seventh and\nfinal year at Hogwarts, was looking particularly smug. He had pinned hisHead Boy badge to the fez perched jauntily on top of his neat hair, hishorn-rimmed glasses flashing in the Egyptian sun.\nHarry now turned to his present and unwrapped it. Inside was what looked\nlike a miniature glass spinning top. There was another note from Ronbeneath it.\nHarry -- this is a Pocket Sneakoscope. If there's someone untrustworthy\naround, it's supposed to light up and spin. Bill says it's rubbish soldfor wizard tourists and isn't reliable, because it kept lighting up atdinner last night. But he didn't realize Fred and George had put beetlesin his soup.\nBye --RonHarry put the Pocket Sneakoscope on his bedside table, where it stood\nquite still, balanced on its point, reflecting the luminous hands of hisclock. He looked at it happily for a few seconds, then picked up the

# Splitter

- Splitting the text into chunks so its passages are easily searchable for similarity
- This step is also only necessary if you are creating the embeddings
- [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/reference/modules/document_loaders.html?highlight=RecursiveCharacterTextSplitter#langchain.document_loaders.MWDumpLoader)

In [17]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = CFG.split_chunk_size,
    chunk_overlap = CFG.split_overlap
)

texts = text_splitter.split_documents(documents)

print(f'We have created {len(texts)} chunks from {len(documents)} pages')

We have created 10519 chunks from 4114 pages


# Create Embeddings


- Embedd and store the texts in a Vector database (FAISS)
- [LangChain Vector Stores docs](https://python.langchain.com/docs/modules/data_connection/vectorstores/)
- [FAISS - langchain](https://python.langchain.com/docs/integrations/vectorstores/faiss)

- [Chroma - Persist and load the vector database](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/chroma.html)

___

- If you use Chroma vector store it will take ~35 min to create embeddings
- If you use FAISS vector store on GPU it will take just ~3 min

___

We need to create the embeddings only once, and then we can just load the vector store and query the database using similarity search. 

Loading the embeddings takes only a few seconds.

I uploaded the embeddings to a Kaggle Dataset so we just load it from [here](https://www.kaggle.com/datasets/hinepo/faiss-hp-sentence-transformers).

# Load vector database

- After saving the vector database, we just load it from the Kaggle Dataset I mentioned
- Obviously, the embeddings function to load the embeddings must be the same as the one used to create the embeddings

In [18]:
%%time

### download embeddings model
embeddings = HuggingFaceInstructEmbeddings(
    model_name = CFG.embeddings_model_repo,
    model_kwargs = {"device": "cuda"}
)

### load vector DB embeddings
vectordb = FAISS.load_local(
    CFG.Embeddings_path,
    embeddings
)

clear_output()

CPU times: user 1.02 s, sys: 366 ms, total: 1.38 s
Wall time: 3.06 s


In [19]:
### test if vector DB was loaded correctly
vectordb.similarity_search('magic creatures')

[Document(page_content='“Magic?” he repeated in a whisper. \n“That’s right,” said Dumbledore. \n“It’s … it’s magic, what I can do?” \n“What is it that you can do?” \n“All sorts,” breathed Riddle. A flush of excitement was \nrising up his neck into his hollow cheeks; he looked \nfevered. “I can make things move without touching \nthem. I can make animals do what I want them to do, \nwithout training them. I can make bad things happen \nto people who annoy me. I can make them hurt if I \nwant to.”', metadata={'source': '/kaggle/input/harry-potter-books-in-pdf-1-7/HP books/Harry Potter - Book 6 - The Half-Blood Prince.pdf', 'page': 302}),
 Document(page_content='91"Shut up, Malfoy," said Harry quietly. Hagrid was looking downcast and\nHarry wanted Hagrid\'s first lesson to be a success.\n"Righ\' then," said Hagrid, who seemed to have lost his thread, "so -- so\nyeh\'ve got yer books an\' -- an\' - - now yeh need the Magical Creatures.Yeah. So I\'ll go an\' get \'em. Hang on... "\nHe strod

# Prompt Template

- Custom prompt

In [20]:
prompt_template = """
Don't try to make up an answer, if you don't know just say that you don't know.
Answer in the same language the question was asked.
Use only the following pieces of context to answer the question at the end.

{context}

Question: {question}
Answer:"""


PROMPT = PromptTemplate(
    template = prompt_template, 
    input_variables = ["context", "question"]
)

In [21]:
# llm_chain = LLMChain(prompt=PROMPT, llm=llm)
# llm_chain

# Retriever chain

- Retriever to retrieve relevant passages
- Chain to answer questions
- [RetrievalQA: Chain for question-answering](https://python.langchain.com/docs/modules/data_connection/retrievers/)

In [22]:
retriever = vectordb.as_retriever(search_kwargs = {"k": CFG.k, "search_type" : "similarity"})

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff", # map_reduce, map_rerank, stuff, refine
    retriever = retriever, 
    chain_type_kwargs = {"prompt": PROMPT},
    return_source_documents = True,
    verbose = False
)

In [23]:
### testing MMR search
question = "Which are Hagrid's favorite animals?"
vectordb.max_marginal_relevance_search(question, k = CFG.k)

[Document(page_content='would warn Hagrid myself, but I am  banished — it would be unwise \nfor me to go too near the forest now — Hagrid has troubles enough, \nwithout a centaurs’ battle.” \n“But — what’s Hagrid attempting to do?” said Harry nervously. \nFirenze looked at Harry impassively. \n“Hagrid has recently rendered me a great service,” said Firenze,', metadata={'source': '/kaggle/input/harry-potter-books-in-pdf-1-7/HP books/Harry Potter - Book 5 - The Order of the Phoenix.pdf', 'page': 619}),
 Document(page_content="wisely. Behind him, Buckbeak spat a few ferret bones onto Hagrid'spillow.", metadata={'source': '/kaggle/input/harry-potter-books-in-pdf-1-7/HP books/Harry Potter - Book 3 - The Prisoner of Azkaban.pdf', 'page': 228}),
 Document(page_content='says Draco Malfoy, a fourth-year student. “We all hate Hagrid, but we’re just too scared to say \nanything.” \nHagrid has no intention of ceasing his campaign \nof intimidation, however. In conversation with a \nDaily Prophet  

In [24]:
### testing similarity search
question = "Which are Hagrid's favorite animals?"
vectordb.similarity_search(question, k = CFG.k)

[Document(page_content='would warn Hagrid myself, but I am  banished — it would be unwise \nfor me to go too near the forest now — Hagrid has troubles enough, \nwithout a centaurs’ battle.” \n“But — what’s Hagrid attempting to do?” said Harry nervously. \nFirenze looked at Harry impassively. \n“Hagrid has recently rendered me a great service,” said Firenze,', metadata={'source': '/kaggle/input/harry-potter-books-in-pdf-1-7/HP books/Harry Potter - Book 5 - The Order of the Phoenix.pdf', 'page': 619}),
 Document(page_content="Harry could sort of see what Hagrid meant. Once you got over the first\nshock of seeing something that was, half horse, half bird, you startedto appreciate the hippogriffs' gleaming coats, changing smoothly fromfeather to hair, each of them a different color: stormy gray, bronze,", metadata={'source': '/kaggle/input/harry-potter-books-in-pdf-1-7/HP books/Harry Potter - Book 3 - The Prisoner of Azkaban.pdf', 'page': 91}),
 Document(page_content='CHAPTER  THIRTEEN \n\

# Post-process outputs

- Format llm response
- Cite sources (PDFs)
- Change `width` parameter to format the output

In [25]:
def wrap_text_preserve_newlines(text, width=700):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text


def process_llm_response(llm_response):
    ans = wrap_text_preserve_newlines(llm_response['result'])
    
    sources_used = ' \n'.join(
        [
            source.metadata['source'].split('/')[-1][:-4] + ' - page: ' + str(source.metadata['page'])
            for source in llm_response['source_documents']
        ]
    )
    
    ans = ans + '\n\nSources: \n' + sources_used
    return ans

In [26]:
def llm_ans(query):
    start = time.time()
    
    llm_response = qa_chain.invoke(query)
    ans = process_llm_response(llm_response)
    
    end = time.time()

    time_elapsed = int(round(end - start, 0))
    time_elapsed_str = f'\n\nTime elapsed: {time_elapsed} s'
    return ans + time_elapsed_str

# Ask questions

- Question Answering from multiple documents
- Invoke QA Chain
- Talk to your data

In [27]:
CFG.model_name

'llama2-13b-chat'

In [28]:
query = "Which challenges does Harry face during the Triwizard Tournament?"
print(llm_ans(query))

 During the Triwizard Tournament, Harry faces challenges such as the Merpeople, the Maze, and the final task where he has to retrieve a golden egg from a dragon.

Sources: 
Harry Potter - Book 4 - The Goblet of Fire - page: 271 
Harry Potter - Book 4 - The Goblet of Fire - page: 237 
Harry Potter - Book 4 - The Goblet of Fire - page: 252

Time elapsed: 7 s


In [29]:
query = "Is Malfoy an ally of Voldemort?"
print(llm_ans(query))

 Based on the passage, it appears that Lucius Malfoy is a servant or follower of Voldemort, rather than an ally. The passage states that Voldemort has "disappointed" Malfoy in the past, and that Malfoy is afraid of Voldemort and tries to curry favor with him. Additionally, the fact that Voldemort refers to Malfoy as "Lucius" and not by a title or name suggests that he does not consider Malfoy to be an equal or a true ally.

Sources: 
Harry Potter - Book 6 - The Half-Blood Prince - page: 716 
Harry Potter - Book 4 - The Goblet of Fire - page: 665 
Harry Potter - Book 7 - The Deathly Hallows - page: 16

Time elapsed: 21 s


In [30]:
query = "What are horcrux?"
print(llm_ans(query))

 An object in which a person has concealed part of their soul.

Sources: 
Harry Potter - Book 6 - The Half-Blood Prince - page: 557 
Harry Potter - Book 6 - The Half-Blood Prince - page: 419 
Harry Potter - Book 6 - The Half-Blood Prince - page: 715

Time elapsed: 4 s


In [31]:
query = "Give me 5 examples of cool potions and explain what they do"
print(llm_ans(query))

 There are many cool potions described throughout the series but here are five examples:
1) Polyjuice Potion - allows the drinker to transform into another person (or people).
2) Felix Felicis - also known as Liquid Luck, this potion causes the drinker to experience good fortune and luck.
3) Amortentia - a love potion that smells differently depending on the person who drinks it.
4) Draught of Living Death - a powerful potion that puts the drinker into a deep coma, making them appear dead.
5) Venomous Tentacula Strength-increasing draught - increases the strength of the drinker, allowing them to perform feats of physical prowess.

Sources: 
Harry Potter - Book 6 - The Half-Blood Prince - page: 204 
Harry Potter - Book 6 - The Half-Blood Prince - page: 204 
Harry Potter - Book 7 - The Deathly Hallows - page: 76

Time elapsed: 27 s
