# Building Solutions with LLMs and Retrieval Augmented Generation (RAG): Introduction

Welcome to this AI workshop! In this workshop we will get deeper into AI, RAG, chatbots, embeddings, evaluations and more.
But first lets get used to this notebook enviroment.

### Getting started with Jupyter Notebooks
Notebooks are essential tools in the toolkit of data scientists and machine learning engineers. 
They facilitate interactive coding and effective visualization of results, making them invaluable for data exploration and analysis.
Jupyter allows you to create cells that can contain either text or code. 
You can execute code cells by pressing Ctrl + Enter, enabling you to run your code and see results instantly. 


# Task 1
1. Run the cell bellow a couple of times to get used to the notebook behaviour.
2. Restart the kernel (Restart) button above and run again this code to see the difference.

In [None]:
# define a if it does not exist
if  'a' not in globals():
    a = 42

a = a+1

# press ctrl+enter to run the cell, do that multiple times
# notice how the variable `a` is persistent in the cell
# you can run full programs in a notebook.

a


## LLMs
LLMs, or Large Language Models, are advanced AI systems designed to understand and generate human language. They use deep learning techniques and are trained on vast amounts of text data. Key features include their large scale (billions of parameters), ability to perform various language tasks (like translation and summarization), and contextual understanding. 
### MistralAI

For this workshop we wil use [Mistral AI](https://mistral.ai/) langauge models, which are similar in concept to OpenAI's ChatGPT and Anthropic's Claude. Mistral AI offers both premier models and free models, with a strong emphasis on open-source availability. We are going to use [Mistral Small](https://docs.mistral.ai/getting-started/models/models_overview/#premier-models) model today. 

If your installation finalized correctly you should have mistral already installed. 
The command below checks if mistral is installed.


In [1]:
!pip show mistralai

Name: mistralai
Version: 1.1.0
Summary: Python Client SDK for the Mistral AI API.
Home-page: https://github.com/mistralai/client-python.git
Author: Mistral
Author-email: 
License: 
Location: /workspaces/ai-rag-quiz-workshop/.virtualenvironment/lib/python3.12/site-packages
Requires: eval-type-backport, httpx, jsonpath-python, pydantic, python-dateutil, typing-inspect
Required-by: 


If for some reason mistral did not installed successfully you can get it (and all other dependencies of this workshop by installing it directly with the command below), just uncomment the lines which start with "!pip install".

In [None]:
# this command will install all the dependencies of the requirements.txt file
# !pip install -r ../requirements.txt

# this command will install the code of the project itsel
# !pip install -e ../


### Getting access to Mistral LLMs

LLMs are large in size and use a lot of computing resources. The most common way to use them is to run them in the cloud via network calls. For the sake of this workshop, we will use the cloud version as well, so we don't need to download large models. If you are interested in running LLMs locally, check out [ollama](https://github.com/ollama/ollama), arguably the most advanced project that does this.


#### API key
The second step is to get a Mistral API key. You can find some API keys we prepared for this workshop in this [sheet](https://docs.google.com/spreadsheets/d/1ZwTpkG6OOuVrOx8nzPmgai_7Hwpo8Kun7yZrOmg_5K4/edit?gid=0#gid=0). Get the key (please write your name next to it in the sheet so that people know it is taken) and write it to the .env file using the command below in Task 2.

If you want to get your own API key, you can do it on the [website here](https://auth.mistral.ai/ui/registration).You will need to sign up using your email address and phone number and create a new API key [here]((https://console.mistral.ai/api-keys/)). If you are having any trouble, check out the video instructions on how to do it [here](https://drive.google.com/file/d/1mqwkX1BRvg_RMZJQHjtvKq31RjWZ9MdK/view)


# Task 2

Use the cell below to write your API key into the file so you can use Mistral. We write the variable into a file named .env. The .env files are a standard in Python to load information like API keys and passwords that should not stay with the code for security reasons. **Make sure to delete your key from the cell once you write it to the .env file.**

In [2]:
# write file .env

mistral_api_key = 'gtaba0LLreCD6dfa02aQtBFzYUc1zHww'
with open('../.env', 'w') as f:
    f.write(f'MISTRAL_API_KEY="{mistral_api_key}"')

# make sure to delete the API key from this cell before commiting the file so you dont save your key in the repo which is a security risk.

# Checking if it worked

Run the command below to check if writing the key worked.  It will trigger an error if it did not.  You might need to restart the kernel if it does not work.

In [3]:
# we now reload the the configuration file and it should show your key
from dotenv import load_dotenv, find_dotenv
env_file = find_dotenv()
print(f"Loading environment variables from {env_file}")
load_dotenv(env_file)

import os
env = os.getenv('MISTRAL_API_KEY')

if not env:
    raise ValueError("The API key is not set. Please set the MISTRAL_API_KEY environment variable.")
if env == 'REPLACE WITH YOUR KEY':
    raise ValueError("You did not repalce the value with the real key")
print("If you reached this point is because it wrote a key in the right place :)")

Loading environment variables from /workspaces/ai-rag-quiz-workshop/.env
If you reached this point is because it wrote a key in the right place :)


## Running mistral
Let's run the code below to import Mistral and initialize the Mistral client: 

In [19]:
import os
from mistralai import Mistral

# Retrieve the Mistral API key from the environment variables
mistral_api_key = os.getenv('MISTRAL_API_KEY')

# Initialize the Mistral client with the API key
mistral_client = Mistral(api_key=mistral_api_key)

# The model below is the specific model we want to use
model_name = "mistral-small-latest"

# The code below defines a function `call_mistral_model` that sends a message to a Mistral model and returns the model's response text.
response = mistral_client.chat.complete(
    model = model_name,
    messages = [
        {
            "role": "user",
            "content": "Tell me a story",
        }
        ],
        temperature=0.0
    )
# Extract only the text from the response
response_text = response.choices[0].message.content
print(response_text)

## not how the result resembles natural language. Its the exact same concept as chatgpt.
## feel free to play around with the prompt and see how the results change.

I'd be happy to tell you a story! Let's dive into a tale called "The Whispering Woods."

Once upon a time, in a small village nestled between rolling hills and a sparkling river, there lived a young girl named Elara. Elara was known throughout the village for her curiosity and her love for the woods that surrounded their home. She would often spend her days exploring the woods, learning about the plants and animals that lived there.

One day, while wandering deeper into the woods than she ever had before, Elara stumbled upon a hidden glade. The glade was filled with tall, ancient trees that seemed to whisper secrets to one another. In the center of the glade, there was a large, gnarled tree with a face carved into its trunk. The face had kind eyes and a gentle smile, and it seemed to beckon Elara closer.

As Elara approached the tree, she heard a soft voice whispering in the wind. "Greetings, Elara," the voice said. "I am the Spirit of the Woods. I have been waiting for someone like yo

Instead of having all of this complex code to call the LLM, we can simplify it by moving this complexity to a Python script. From the code below onwards, we will replace our calls to the Mistral API with calls to the LargeLanguageModel class.

Try it out and take a look at the class.


In [17]:
from chat_solution.llm import LargeLanguageModel

# Initialize an instance of the LargeLanguageModel class
llm = LargeLanguageModel()
# Make a call to Mistral using the LargeLanguageModel class
response = llm.call("hello! What is your name?")

print(response)

Hello! You can call me Assistant. How can I help you today?


Beyond calling the LLM, the class contains rate limiting handling logic. Rate limiting is a mechanism implemented by APIs to control the number of requests a user can make in a given time frame. This is done to prevent abuse so that one user doesn't consume the entire capacity and slow down the service. However, rate limiting can be annoying because it can interrupt your workflow and force you to wait before making more requests.

To make things easier, we've implemented a rate limit error controller in the LargeLanguageModel class (see usage example below) that automatically adds sleep intervals between requests to avoid exceeding the rate limit. We will use the LargeLanguageModel class moving forward to make calls to Mistral AI. This class has the same logic as the examples we showed above but includes a mechanism to counter the rate limiting issue.    

Have a look at the class by Ctrl+clicking on the LargeLanguageModel name to jump directly to its implementation.

## Exploring the LLM
Now that we have seen how to make a basic call to the Mistral model using the [LargeLanguageModel](../chat_solution/llm.py) class, let's try some more prompts to see how the model responds to different types of queries.

# Exploring and Modifying Prompts
Below are some example use cases of how to use an LLM such as Mistral. Play around with the prompts and see the results. Modify the prompts to see how the model's responses change. This will help you understand how to craft effective prompts and get the desired output from the model.

Try to:
- Ask different types of questions
- Change the text for summarization or extraction (see examples 2 and 3 below)
- Alter the style of the response

#### Example 1: Asking for Information

In [18]:
# Example: Asking for information
prompt = "Can you tell me about coding school 42Berlin?"
response = llm.call(prompt)
print(response)

Certainly! 42Berlin is a coding school that follows the innovative educational model pioneered by 42, a coding school founded in Paris by Xavier Niel, Florian Bucher, Nicolas Sadirac, and Kwame Yamgnane. The school is designed to provide an alternative pathway to learning software development and technology skills, focusing on practical, hands-on experience rather than traditional lectures and exams.

### Key Features of 42Berlin:

1. **Peer-to-Peer Learning**:
   - Students learn primarily through collaboration and peer-to-peer interaction. The school emphasizes teamwork and mutual support.

2. **Project-Based Curriculum**:
   - The curriculum is centered around practical projects that students work on individually or in groups. This approach helps students apply theoretical knowledge to real-world problems.

3. **No Tuition Fees**:
   - 42Berlin is tuition-free, making it accessible to a broader range of students who might not otherwise be able to afford traditional education.

4. **

#### Example 2: Summarizing a Given Text

In [28]:
# Change the text into something else to see the results
text_to_summarize = (
    """
    42Berlin is a non-profit coding school offering software engineering education completely tuition free. 
    By making tech education more accessible and inclusive, they empower the next generation of coders.
    Founded in 2021 and based in central Neukölln, we train our students up to the equivalent of Master’s level 
    and implement peer-learning methodologies that give autonomy to each student.
    """

)
prompt = f"Summarize the following text in one brief sentence: {text_to_summarize}"
response = llm.call(prompt)
print(response)

42Berlin is a tuition-free coding school empowering the next generation of coders through accessible and inclusive tech education.


#### Example 3: Extracting Information from a Given Text

In [29]:
# Change the text into something else to see the results
text_to_extract_from = (
    """
    42Berlin is a non-profit coding school offering software engineering education completely tuition free. 
    By making tech education more accessible and inclusive, they empower the next generation of coders.
    Founded in 2021 and based in central Neukölln, we train our students up to the equivalent of Master’s level 
    and implement peer-learning methodologies that give autonomy to each student.
    """

)
prompt = f"Extract the year 42Berlin was founded from the following text: {text_to_extract_from}"
response = llm.call(prompt)
print(response)

The year 42Berlin was founded is 2021.



# Task 3: Doing Maths -> When will the LLM make a mistake?

LLMs are trained on text language so they are not necessarily good with maths. We can be confident that it will fail to produce the correct result if you ask a question that is complex enough. On the notebook below we increase the complexity of the task at every run. Run it until you see it diverging.

What was the number of iterations necessary to make it break?

In [30]:
if 'a' not in globals() or 'b' not in globals():
    a = 1
    b = 1
a = a + 1
b = b + 3

## ** in python is the power operator meaning: a to the power of b
print("Correct result: ", a**b)
response = llm.call(f"what is the results of {a} ** {b}")
print("Mistral result: ", response)

Correct result:  11398895185373143
Mistral result:  The result of 7 raised to the power of 19 (7^19) is a very large number. Here it is:

7^19 = 40353607130363288641

This number has 19 digits.


As you probably noted, mistral does not only return the results, it also explains it step by step. For example:

```
The result of 4 raised to the power of 4 (4 ** 4) is calculated as:
4 * 4 * 4 * 4 = 256
```

Mistral and other language models deliberatelly add this longer explanations as they help the model commit less mistakes. This pattern is also known as chain of thought reasonsing. And is one of the most robust ways to improve LLM's responses. When you write prompts think about how to explain the steps of the reasonsing to improve the performance of your propmts.


## Hallucination
LLMs sometimes generate responses that are plausible-sounding but factually incorrect or nonsensical. This phenomenon is known as "hallucination".
<br><br>
***Hallucination*** can occur because the model generates text based on patterns in the training data rather than actual knowledge or retrieval of relevant information.
LLMs will produce the most likelly words that they would find in similar text, they have no clue about fact vs fiction. They can confidently produce writing about things they have no clue about.


### Exercise 2: Demonstrating Hallucination

In this exercise, we will ask the model a question that it might to hallucinate an answer for, showing the limitations of relying solely on language generation without retrieval.

Try running the command below a few times in a row and see how the response by the LLM changes.



In [32]:
# Ask a question likely to cause hallucination
prompt = "What was the name of the workshop launched by the MLOps Community Berlin in collaboration with 42 Berlin?"
response = llm.call(prompt)
print("Response likely to hallucinate:\n")
print(response)

Response likely to hallucinate:

The workshop launched by the MLOps Community Berlin in collaboration with 42 Berlin was called "MLOps Workshop." This event aimed to provide participants with hands-on experience and insights into the practices and tools used in the field of Machine Learning Operations (MLOps).


Now, let's move on to the next section on Retrieval-Augmented Generation.

# Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a technique that incorporates external information to a LLM to generate better responses. 

In RAG, a retrieval component searches and retrieves relevant information from a knowledge base or external documents, and a generation component uses this information to generate responses.
This approach allows the model to access up-to-date information and provide more detailed and accurate answers.

### Exercise 4: Simple RAG

In this exercise, we will demonstrate a simple example of how to use Retrieval-Augmented Generation. We will use a predefined set of documents, retrieve relevant information based on a query, and then generate a response using the retrieved information.

Run the code below.

In [33]:
def create_rag_prompt(message: str, context: str):
    """
    Message is the question that the user is asking.
    Context is the information that we want to use to answer the question.
    """
    return f"""Answer the question only using the provided content.

        Context: {context}

        User Question: {message}

        Be helpful and friendly. If the information cannot be found respond with "I don't know"
        """  

By running the code in the cell below, you can compare how our LLM responses differ by the information that you provided.

In [35]:
message = "What was the name of the workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech?"
generic_response = llm.call(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

# The workshop the MLOps Community hosted together with Girls in Tech Germany was called "AI Launchpad: Building Your First Ml Pipeline" or simply "Building Your First ML Pipeline"
# We copy paste the info from our Eventbrite event page from the previous workshop and use this as context for the model to retrieve the right info from
context = """Title: AI Launchpad - Building Your First ML Pipeline: 
On Wednesday, June 5th, 2024, the MLOps Community Berlin in collaboration with Girls in Tech Germany hosted an interactive workshop for beginners who want to kick start their career in AI/ML. 
The workshop starts at 18.00h at 42Berlin. 

🔍 Why Attend?

Gain hands-on experience building your first ML pipeline in an agile way
Apply the fundamentals of statistical modeling and basic Python
Opportunities to improve your portfolio 
Connect with ML professionals at different levels of seniority


✨ The Agenda: 

6:00 pm - Arrive & Pizza 
6:30 pm -  Introduction MLOps and GiT
6:45 pm - Workshop Introduction
7:30 pm - Break
7:45 pm - Workshop
9:45 pm - Networking


🎉 Highlights:

Food and drinks provided
Engaging discussions and networking opportunities
Bring your laptop and get ready to learn!


💼 Who Should Attend?

Individuals starting their career in Machine Learning or Artificial Intelligence
Those looking to transition into the field of AI/ML
Anyone interested in contributing to and learning from the ML community
Don't miss out on this chance to gain practical AI/ML skills while expanding your professional network! 
"""


rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = llm.call(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

GENERIC RESPONSE:
 The workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech was named "MLOps for Women." This workshop aimed to empower women in the field of machine learning operations (MLOps) by providing them with practical skills, knowledge, and networking opportunities.
------------------------------
RAG RESPONSE:
 The name of the workshop launched by the MLOps Community Berlin in collaboration with Girls in Tech is "AI Launchpad - Building Your First ML Pipeline."


In [36]:
# Let's try the same thing with Berlin weather data!
context = """
The weather in Berlin  December of 2027 will be around 13 degrees Celsius.
Specific dates:
- 10th of December: 10 degrees Celsius
- 15th of December: 15 degrees Celsius
- 20th of December: 7 degrees Celsius
"""

message = "What will be the weather in Berlin on the 10th of December of 2027?"


generic_response = llm.call(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = llm.call(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

GENERIC RESPONSE:
 I'm an assistant that operates solely on the data it has been trained on up until 2021, and I don't have real-time or future weather data. Therefore, I can't provide the weather forecast for Berlin on the 10th of December, 2027. For the most accurate and up-to-date weather information, I recommend checking a reliable weather website or application closer to the date.
------------------------------
RAG RESPONSE:
 On the 10th of December, 2027, the weather in Berlin is expected to be around 10 degrees Celsius.


# Task 4
Now try it yourself! Can you find some content on the internet (think, for example, recent news articles or very specific, locally relevant information that the LLM normally would not have access to). 

Play around with it and let the creative juices flow. Can you discover some more use cases for which you can use RAG can help make our LLM smarter?

In [37]:
context = """
Fanny's nickname is squirtle earmuff.
""" 

message = """
What is Fanny's nickname?
""" 

generic_response = llm.call(message)
print(f"GENERIC RESPONSE:\n {generic_response}")

rag_prompt = create_rag_prompt(message=message, context=context)
rag_response = llm.call(rag_prompt)

print("-" * 30)
print(f"RAG RESPONSE:\n {rag_response}")

GENERIC RESPONSE:
 I'm sorry for any confusion, but I don't have any specific information about a person named Fanny or her nickname. Nicknames can be very personal and specific to the individual and their social circle. If you have more context or details, I might be able to help you better.
------------------------------
RAG RESPONSE:
 Fanny's nickname is Squirtle Earmuff.


## Prompt Engineering

If you want to make your prompting even more advanced, you can look into Prompt Engineering best practices. <br><br>
***Prompt Engineering*** is the process of designing and refining LLM prompts. It involves crafting questions, instructions, or statements that guide the model to produce desired outputs. Effective prompt engineering can significantly enhance the quality, relevance, and accuracy of the responses generated by the model.  

Best practices change over time and are somewhat different depending on the LLM type, but general guidelines are:

* **Clarity and Specificity:** Clearly state what you want the model to do. Avoid ambiguous language.
* **Context and Detail:** Provide sufficient context to guide the model.
* **Iterative Refinement:** Experiment with different phrasings and structures to see what works best.
* **Constraints and Instructions:** Specify any constraints or formats you want the response to follow.
* **Examples and Templates:** Use examples to show the model what kind of response you expect.
* **Step-by-Step Reasoning:** Instead of providing a direct answer, aske the model to explain its thought process step-by-step. 
* **Breaking Down Complex Tasks:** For tasks that require multiple stages of reasoning or involve complex information, breaking them down into smaller, separated steps can lead to better results.

You can learn more about prompt engineering [here](https://www.promptingguide.ai/). 
Some very useful examples of best practices for OpenAI GPT models can be found [here](https://platform.openai.com/docs/guides/prompt-engineering)

# That's it! 

RAGs enrich the prompt with additional information about the topic to generate responses. The external information can come from various sources, such as PDFs, Google search results, social media posts, and more. With that, we’ve built a simple Q&A RAG.

## ===> Now head to notebook 2 on embedddings and how llms undertand language

