## Mistral-7B

If you're not running in Saturn Cloud, you need to install these libraries:

Make sure you use the latest versions

```
pip install -U transformers accelerate bitsandbytes
```

In [1]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-08-05 10:24:51--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: ‘minsearch.py’


2024-08-05 10:24:51 (28.1 MB/s) - ‘minsearch.py’ saved [3832/3832]



In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from huggingface_hub import login

import requests # library for making HTTP requests
import minsearch # library for creating and managing a search index
import os
import warnings
warnings.filterwarnings("ignore")

# os.environ['HF_HOME'] = '/run/cache/'
torch.random.manual_seed(0)

<torch._C.Generator at 0x7f62f58c5d70>

In [3]:
# Logging into HuggingFace

login(token=os.environ['HF_TOKEN'])

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/jovyan/.cache/huggingface/token
Login successful


### Building our Knowledge base

In [4]:
# Fetching documents - retrieve JSON file containing course documents from a given URL, and parse into python object
docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

# loop to create a flat list of all documents 
for course in documents_raw:
    course_name = course['course'] 

    for doc in course['documents']:
        doc['course'] = course_name # adding `course` field to each document
        documents.append(doc)

# creating the search index - specifies which fields should be treated as text (for full-text search) and which as keywords
index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

# fitting the index - adds all the processed documents to the search index
index.fit(documents)

<minsearch.Index at 0x7f61f5db1100>

### Defining functions to search database and building our prompt

In [5]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [6]:
def build_prompt(query, search_results):
    prompt_template = """
    QUESTION: {question}

    CONTEXT:
    {context}

    ANSWER:
    """.strip()

    context = ""
    
    for doc in search_results:
        context = context + f"{doc['question']}\n{doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

### Running the Mistral-7B model

In [7]:
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1", 
    device_map="auto",
    load_in_4bit = True
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")

In [8]:
# # now let's save the model locally, so that we don't need to keep downloading them

# model.save_pretrained("./mistral-7b-model")
# tokenizer.save_pretrained("./mistral-7b-tokenizer")

In [9]:
# # loading the model from local directory

# model = AutoModelForCausalLM.from_pretrained("./mistral-7b-model")
# tokenizer = AutoTokenizer.from_pretrained("./mistral-7b-tokenizer")

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


In [10]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

### Modifying the LLM function to include our Phi3-mini model

In [11]:
def llm(prompt):
    response = pipe(prompt, max_length=500, temperature=0.7, top_p=0.95, num_return_sequences=1)
    response_final = response[0]['generated_text']
    return response_final[len(prompt):].strip()

In [12]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [13]:
rag("I just discovered the course. Can I still join it?")

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


'Yes, you can still join the course.'