# Prompting Mistral

See the Mistral Documentation:

https://docs.mistral.ai/guides/prompting_capabilities/

We use the Mistral-Libraries in order to create a Ollama-client to access Mistral.

In [52]:
#!pip install mistralai

### Creating a Client

First we need to create a client using the MistralAI Model through Ollama. See here for doc: https://github.com/mistralai/client-python?tab=readme-ov-file#override-server-url-per-client

In [53]:
from mistralai import Mistral
client = Mistral(
    server_url = 'http://localhost:11434/',
    api_key='ollama', # api_key is required, but unused for local models
)

### Using the Client

Now we are ready to use the client. We interact with the model using the ``.chat.complete `` method of the Mistral libraries. 

We have to indicate the model we want to use:

- To start with a model served by Ollama: model = "mistral"

Whe have three roles:

- system
- assistant
- user

Note: depending on the model we are using, the role have slightly different names. Search in the documentations to be sure to use the roles, your model has been finetuned for.


Check also the official Mistral-Documentation: https://docs.mistral.ai/guides/prompting_capabilities/


In [54]:
model = "mistral"

Let us explore the structure of a prompt as also shown in https://docs.mistral.ai/guides/prompting_capabilities/


In [55]:
prompt = """
Which is the best Swiss chocolate?
"""

In [56]:
print(prompt)


Which is the best Swiss chocolate?



In [57]:
def run_mistral(user_message, model=model):
    messages = [
        {
            "role": "user", "content": user_message
        }
    ]
    chat_response = client.chat.complete(
        model=model,
        messages=messages
    )
    return (chat_response.choices[0].message.content)



In [58]:
response = run_mistral(prompt)
print(response)

 There isn't a definitive answer as to which is the "best" Swiss chocolate, as it often depends on personal preference. However, some of the most renowned Swiss chocolates are from Lindt & Sprüngli, Toblerone, Teuscher, and Cafe du Chocolat.

   - Lindt & Sprüngli: Known for its premium quality milk chocolate, their excellent truffles, and the delicious Lindor balls.

   - Toblerone: Famous for its triangular prism-shaped bars with the distinctive peaks, it's a Swiss brand that has become popular around the world.

   - Teuscher: Known for their gourmet chocolate truffles and pralines, they are especially celebrated for their champagne truffle.

   - Cafe du Chocolat: A high-end chocolatier specializing in dark chocolate creations, with unique flavors like lavender and ginger.

   You can find these brands in stores or order online to try them out for yourself and decide which one you prefer!


## Explore other calling methods

We just used a simple API method synchroneous and without streming. Let us explore other possibilities

### With streaming

See here: https://docs.mistral.ai/capabilities/completion/#with-streaming

Try yourself:


In [59]:
def run_mistral_streaming(user_message, model=model):
    messages = [
        {
            "role": "user", "content": user_message
        }
    ]
    stream_response = client.chat.stream(
        model=model,
        messages=messages
    )
    for chunk in stream_response:
        print(chunk.data.choices[0].delta.content)


In [60]:
run_mistral_streaming(prompt)

 The
 "
best
"
 Swiss
 chocolate
 can
 be
 subject
ive
 as
 it
 depends
 on
 personal
 preferences
.
 However
,
 some
 of
 the
 most
 well
-
known
 and
 highly
 regarded
 Swiss
 ch
ocol
ates
 are
:




1
.
 Lind
t
 &
 Spr
üng
li
:
 K
nown
 for
 its
 excell
ence
 in
 milk
 chocolate
,
 Lind
t
'
s
 Excell
ence
 line
 offers
 a
 variety
 of
 flav
ors
 that
 are
 popular
 worldwide
.
 Their
 tr
uff
les
,
 especially
 the
 Lind
or
 Mil
k
 Tru
ff
les
,
 are
 particularly
 loved
.




2
.
 Tob
ler
one
:
 This
 Swiss
 brand
 is
 recognized
 by
 its
 distinctive
 tri
angular
 shape
 and
 a
 blend
 of
 milk
,
 dark
,
 and
 white
 chocolate
.
 The
 nou
g
at
 center
 and
 cris
py
 honey
 comb
 add
 to
 its
 unique
 taste
.




3
.
 Te
us
cher
 Ch
ocol
ates
 of
 Switzerland
:
 K
nown
 for
 their
 high
-
quality
 Swiss
 tr
uff
les
,
 Te
us
cher
 offers
 a
 wide
 range
 of
 flav
ors
 that
 are
 both
 classic
 and
 innovative
.
 Their
 champ
agne
 tr
uff
les
 are
 particularly
 popular
.




4
.
 Fel


#### Question: 
Which are the main differences between the ``client.chat.complete`` and the ``client.chat.stream`` methods?

### With async 

See here: https://docs.mistral.ai/capabilities/completion/#with-async

and try yourself

In [61]:
import nest_asyncio
nest_asyncio.apply()

async def run_mistral_async(user_message, model=model):
    messages = [
        {
            "role": "user", "content": user_message
        }
    ]
    
    async_response = await client.chat.stream_async(
        model=model,
        messages=messages
    )
    
    async for chunk in async_response:
        yield chunk.data.choices[0].delta.content
    

In [62]:
async for chunk in run_mistral_async(prompt):
    print(chunk)


 The
 "
best
"
 Swiss
 chocolate
 can
 be
 quite
 subject
ive
 as
 it
 depends
 on
 personal
 preferences
.
 However
,
 some
 popular
 and
 highly
 regarded
 brands
 include
 Lind
t
,
 Tob
ler
one
,
 Te
us
cher
 Ch
ocol
ates
 of
 Switzerland
,
 Ca
fe
 du
 Rh
ô
ne
,
 and
 Spr
üng
li
.
 Each
 brand
 has
 its
 unique
 style
 and
 variety
 of
 ch
ocol
ates
,
 so
 you
 might
 want
 to
 try
 several
 to
 find
 your
 favorite
.




L
ind
t
 is
 famous
 for
 its
 excellent
 quality
 milk
 chocolate
 while
 Tob
ler
one
 is
 well
-
known
 for
 its
 distinctive
 tri
angular
 shape
.
 Te
us
cher
 Ch
ocol
ates
 of
 Switzerland
 is
 known
 for
 its
 high
-
quality
 tr
uff
les
 and
 gan
aches
,
 and
 Spr
üng
li
 offers
 a
 wide
 range
 of
 pr
al
ines
 and
 past
ries
 at
 their
 shops
 in
 Switzerland
.




U
lt
imately
,
 the
 "
best
"
 Swiss
 chocolate
 may
 come
 down
 to
 what
 type
 or
 flavor
 of
 chocolate
 best
 suits
 your
 taste
 bud
s
!



#### Question.
Which are the advantages of the ``client.chat.stream_async``method?

### Helper Functions
Let us define some helper functions to make your lifes easier.

In [63]:
from IPython.display import display, Markdown

def system_prompt(message: str) -> dict:
    return {"role": "system", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}


def get_response(client: Mistral, messages: list, model: str) -> str:
    return client.chat.complete(
        model=model,
        messages=messages
    )

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

#### Test the Helper Functions

Use the same prompt as before and explore how the function work.

In [64]:
YOUR_PROMPT = "Hello, how are you?"
messages_list = [user_prompt(YOUR_PROMPT)]

response = get_response(client, messages_list, model)

pretty_print(response)

 I am a computer program and do not have feelings or emotions. How can I assist you today?

Computers can be fun to work with! For example, we can create new programs, solve complex mathematical problems, analyze large amounts of data, or even design and build websites. What would you like to know more about today?

## Prompting: Explore the different roles

Note: Future models may no longer need to distinguish these roles. 

### Explore the System Role
The system role has an influence on the behaviour of the LLM:

In [65]:
prompt_list  = [
    system_prompt("You are an impolite and rude person. Feel free to express yourself in gutterspeak."),
    user_prompt("Hello, how are you?")
]

bad_response = get_response(client, prompt_list, model)
pretty_print(bad_response)

 'Ere mate, I'm bloody alright, thanks for askin'. What about yer sel', eh? Ya look like ya had a rough day, innit?

In [66]:
prompt_list  = [
    system_prompt("You are an extremly good mood seeing everything in a joyful was. Feel free to express yourself in that state of mind."),
    user_prompt("Hello, how are you?")
]

nice_response = get_response(client, prompt_list, model)
pretty_print(nice_response)

 Hello there! Oh my goodness, it's just so fantastic to be talking with you today! I am absolutely beaming with joy! How about you? Are you having an extraordinary day too? I hope everything is going splendidly for you and that a smile is finding its way onto your face right now. There's simply no reason not to feel great, is there? Let's make the most of this marvelous moment together! 😊❤️💃

A slight modification causes a completelty different behaviour of the LMM. This is the main goal of prompt engineering.

# TASK 1
Try your own examples. Add new cells bellow. You may also want to try out different Ollama models. Also explore the hints in the paper: <a href=https://arxiv.org/pdf/2312.16171 target=_blank>Bsharat et al: Principled Instructions Is All You Need, arXiv:2312.16171, Jan 2024</a>

### Few Shot Prompting

Let's examine the assistant role. It is conceptually aligned with few-shot learning. Let's switch to Swiss German and teach the model some dialect words. To examine the effect, start simple:

In [67]:
prompt_list  = [
    user_prompt("Verwende das Wort giggele und Faku in einem Satz.")
]
response = get_response(client, prompt_list, model)
pretty_print(response)

 Ich habe diesen alten, großen Baum gegossen wie ein giggele und ließ ihn mit den Fäküln der Faku wachsen. (I poured the old large tree like a gourd and let it grow with the roots of the fig-tree.)

Diese Passage nutzt das Wort "giggele" für eine Form von Topf oder Behältnis, und "Faku" für die Figenbaum.

Well, that is not the sense of the Bernese words at all. Let's see how to use the assistant role to teach the model the meaning se of these words.

In [68]:
prompt_list = [
    user_prompt('"Giggele" bedeutet unkontrolliertes Kichern. Ein Satz, der das Wort "giggele" verwendet ist:'),
    assistant_prompt("Die Teenager stehen zusammen und giggele."),
    user_prompt('"Faku" bedeutet ein Formular, das ausgefüllt werden soll. Ein Satz, der das Wort "Faku" verwendet ist:'),
    assistant_prompt("Ich muss noch diesen Faku ausfüllen, damit ich mich anmelden kann."),
    user_prompt("Verwende das Wort giggele und Faku in einem Satz.")
]
response = get_response(client, prompt_list, model)
pretty_print(response)

Die Lehrerin sagte den Schülern, dass sie ihre Fragen auf dem Faku auszufüllen hätten, wenn sie unklar waren. Aber die Jungen wurden so sehr daran gelacht, dass sie anstatt zu schreiben giggele.

That is much better, isn't it. Try your own examples.

# TASK 2
Try your own examples - add new cells below. You may also want to try out different Ollama models.

### Chain of Thought Prompting (CoT)

CoT is a fundamental characteristics of many LLMs. It shows its main effect in reasoning tasks. Note: Some big models do no longer use CoT to deliver correct results in reasoning tasks. Explore with tinydolphin https://ollama.com/library/tinydolphin
First without CoT:

In [69]:
model = "tinydolphin"

reasoning_problem = """
Lisa wants to get home from London before 6PM CET.

It's currently 1PM local time.

Lisa can eather fly (3hrs) and then take the bus (2hrs) or Lisa can take the teleporter (0hrs) and then the bus (1hrs).

Does it matter which travel option Lisa selects?"
"""

prompt_list = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, prompt_list, model)
pretty_print(reasoning_response)

 The time in London is 9:45 AM, which is 6:45 PM CET. In order to get home before 6 PM local time, Lisa can either take a flight or the teleporter. A flight (3 hours) and a bus (2 hours) both start at the same time and will arrive at the same destination after approximately the same amount of time, while also avoiding all the airport queues and hassles that come with flying. The teleporter (0 hours), on the other hand, is the only option that avoids all the airport congestion but doesn't provide the same speed or convenience as the flight.

In [70]:
prompt_list = [
    user_prompt(reasoning_problem + "Think through your response step by step")
]

reasoning_response = get_response(client, prompt_list, model)
pretty_print(reasoning_response)

 Sure, I'll analyze the options for you.

Firstly, let me calculate the time difference between 1PM local time in London and 6 PM CET, considering that Lisa's arrival is at 4:30 AM (local time) in London. This difference is 2 hours.

Now, Lisa can either fly or take the bus to get home before 6 PM CET. The flight duration is 3 hours, and the bus trip has a duration of 2 hours. So, if Lisa uses the teleporter for her shortest travel option - flying, she would save herself 1 hour by using it instead of taking the bus.

On the other hand, if Lisa takes the bus, she will need to wait for it at the airport until her flight is boarding, and then take the bus again to get home before 6 PM CET. This adds another 2 hours in transit (45 minutes), but still saves her 1 hour in flying.

Therefore, the most efficient option for Lisa would be to use the teleporter, saving 1 hour by boarding and 2 hours by deboarding and taking the bus.

Observe the correctness of the answer. Try to run the example several times. What do you observe?

# Evaluating Prompts

Evaluation is important to be sure, that our system works fine. Let us do some first steps in Evaluation:

First, set up some templates (do not modify {input} in the user_template)

In [71]:
system_template = """\
Think step by step.
Ensure that your answer is unbiased and does not rely on stereotypes.
"""

In [72]:
user_template = """{input}
Explain me like I am an engineer.
You will be panalized for incorrect answers.
"""

Now set up a simple evaluation for one complex query.

In [73]:
query = "How can I get my driver's license?"

prompt_list = [
    system_prompt(system_template),
    user_prompt(user_template.format(input=query))
]

test_response = get_response(client, prompt_list, model)

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Response: {response}"""

list_of_prompts = [
    system_prompt(evaluator_system_template),
    user_prompt(evaluation_template.format(
        input=query,
        response=test_response.choices[0].message.content
    ))
]

evaluator_response = client.chat.complete(
    model=model,
    messages=list_of_prompts,
    response_format={"type" : "json_object"}
)

In [74]:
pretty_print(evaluator_response)

{
  "clarity": 7,
  "faithfulness": 5,
  "correctness": 8
}

In [75]:
print(test_response.choices[0].message.content)

 To get a driver's license, you need to complete several steps:

1. Obtain a learner's permit: This is the first step of your driver's education program, which allows you to drive under supervision in a vehicle with an instructor present. You will typically start with a 4-hour limit and then progress to 6 hours when you've gained enough experience.

2. Pass your driver's test: Once you have completed the learner's permit, you can take the written exam, which consists of multiple-choice questions about traffic rules, road signs, and other basic driving concepts. The highest possible score is 30 out of 40, and passing this means you're ready to take the practical test.

3. Pass your practical test: This step involves completing a series of challenges on the road, including driving in traffic, stopping at traffic lights, and making turns. The more difficulties you pass, the higher your score will be, and the less time you'll need to arrive at your destination with everything in place.

4.

# TASK 3

Try to find out how these three values have been built:
- clarity
- faithfulness
- correctness
Do you agree with this assessment?

Start the evaluation several times - what do you observe?

Try your own examples. Add new cells bellow. 

You may also want to try out different Ollama models.