**RAG from scratch with Mistral**

Import needed packages

In [1]:
! pip install faiss-cpu==1.7.4 mistralai

Collecting faiss-cpu==1.7.4
  Downloading faiss_cpu-1.7.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting mistralai
  Downloading mistralai-0.4.2-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.25 (from mistralai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting orjson<3.11,>=3.9.10 (from mistralai)
  Downloading orjson-3.10.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (141 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.1/141.1 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.25->mistralai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/7

In [3]:
from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
import requests
import numpy as np
import faiss
import os
from getpass import getpass

api_key= getpass("Type your API Key")
client = MistralClient(api_key=api_key)

Type your API Key··········


**Get data**

In [4]:
response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
text = response.text

In [5]:
f = open('essay.txt', 'w')
f.write(text)
f.close()

In [6]:
len(text)

75014

**Split document into chunks**

In [7]:
chunk_size = 2048
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

In [8]:
len(chunks)

37

**Create embeddings for each text chunk**

In [9]:
def get_text_embedding(input):
    embeddings_batch_response = client.embeddings(
          model="mistral-embed",
          input=input
      )
    return embeddings_batch_response.data[0].embedding

In [10]:
text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])

In [11]:
text_embeddings.shape

(37, 1024)

In [12]:
text_embeddings

array([[-3.95202637e-02,  7.75756836e-02, -8.82148743e-05, ...,
        -1.26342773e-02, -2.12402344e-02, -2.50816345e-03],
       [-3.19213867e-02,  7.21435547e-02,  2.99835205e-02, ...,
        -1.08413696e-02, -1.19628906e-02, -7.66372681e-03],
       [-5.89599609e-02,  6.12487793e-02,  1.26419067e-02, ...,
        -2.25372314e-02,  4.67681885e-03, -6.26754761e-03],
       ...,
       [-5.52978516e-02,  6.89697266e-02,  2.69622803e-02, ...,
        -2.54211426e-02, -2.52227783e-02, -2.68859863e-02],
       [-3.90014648e-02,  5.63049316e-02,  4.76684570e-02, ...,
        -1.77001953e-02,  9.33074951e-03, -8.72039795e-03],
       [-2.99377441e-02,  5.81054688e-02,  1.70898438e-02, ...,
        -1.61132812e-02, -1.79290771e-02, -4.35791016e-02]])

**Load into a vector database**

In [13]:
d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

**Create embeddings for a question**

In [14]:
question = "What were the two main things the author worked on before college?"
question_embeddings = np.array([get_text_embedding(question)])
question_embeddings.shape

(1, 1024)

In [15]:
question_embeddings

array([[-0.05456543,  0.03518677,  0.03723145, ..., -0.02763367,
        -0.00327873,  0.00323677]])

**Retrieve similar chunks from the vector database**

In [16]:
D, I = index.search(question_embeddings, k=2)
print(I)

[[0 3]]


In [17]:
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
print(retrieved_chunk)

['\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, then 

**Combine context and question in a prompt and generate response**

In [18]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

In [19]:
def run_mistral(user_message, model="mistral-medium-latest"):
    messages = [
        ChatMessage(role="user", content=user_message)
    ]
    chat_response = client.chat(
        model=model,
        messages=messages
    )
    return (chat_response.choices[0].message.content)

In [20]:
run_mistral(prompt)

"The two main things the author worked on before college were writing and programming. In terms of writing, the author focused on short stories, which they described as having strong characters with intense feelings but little plot. In terms of programming, the author started learning to code in 9th grade using an early version of Fortran on an IBM 1401, which was located in the basement of their junior high school. However, the author noted that they didn't have much data to work with and that most of their programs didn't do much."