# Experiments with `microsoft/phi-2`

This notebooks show some of the experiments performed when working with `microsoft/phi-2`. The goal is to check that we can load and run the model, get familiar with its behaviour and identify possible issues.

## Start by loading the model

We start getting the model and doing a test run. We can check the huggingface [`microsoft/phi-2` page](https://huggingface.co/microsoft/phi-2) to check info on the model. There, we can find the demo code below.

> Note: This assumes you have a GPU. You may need to tweak it a bit if you're running on CPU.
>
> Try loading the model doing `model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)` if you don't have a GPU

In [1]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

#model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", device_map="cuda", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=250)
text = tokenizer.batch_decode(outputs)[0]
print(text)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.73it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


def print_prime(n):
   """
   Print all primes between 1 and n
   """
   for i in range(2, n+1):
       for j in range(2, i):
           if i % j == 0:
               break
       else:
           print(i)

print_prime(20)
```

## Exercises

1. Write a Python function that takes a list of numbers and returns the sum of all even numbers in the list.

```python
def sum_even(numbers):
    """
    Returns the sum of all even numbers in the list
    """
    return sum(filter(lambda x: x % 2 == 0, numbers))

print(sum_even([1, 2, 3, 4, 5, 6])) # Output: 12
```

2. Write a Python function that takes a list of strings and returns a new list containing only the strings that start with a vowel.

```python
def filter_vowels(strings):
    """
    Returns a new list containing only the strings that start with


## Experiment with the model

After ensuring we can properly run the model, we can start trying out some things to get a sense of how the model performs.

In [2]:
def run_prompt(prompt):
    with torch.no_grad():
        inputs = tokenizer(prompt, return_tensors="pt", return_attention_mask=False)
        outputs = model.generate(**inputs, max_length=250)
        text = tokenizer.batch_decode(outputs)[0]
    return text

In [3]:
chat_prompt = """
The following is a friendly chat between Bob and Alice.
Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice: 
"""

In [4]:
print(run_prompt(chat_prompt.strip()))

The following is a friendly chat between Bob and Alice.
Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice: That's a good idea. I'll give it a try.
Bob: Also, make sure you take breaks in between study sessions. It helps to refresh your mind.
Alice: I'll keep that in mind. Thanks for the advice, Bob!
Bob: You're welcome, Alice. Good luck with your studies!

The following is a conversation between Sarah and John.
Sarah: I'm having trouble understanding this math problem. Can you help me?
John: Sure, let me take a look. Ah, I see what you're struggling with.
Sarah: Really? I thought I was doing it right.
John: No, you made a mistake in the calculation. Let me show you the correct steps.
Sarah: Oh, I see now. Thank you


### Learned fact 1: know when to stop

You have to know when to stop the model from generating output. As we can see in the example above, the model not only hallucinates several turns of the conversation, but also starts inventing a complete new conversation between other characters. This will be very important later when designing our chatbot, or otherwise it'll output several turns of the conversation at once.

### Learned fact 2: format your prompts wisely

You should be careful with the format of your prompts, as the model may reproduce it when generating text. So you should pay attention to the way you actually format your text.

For example, during my experimentations, I accidentally used the following prompt. I took it from the model page, but it's poorly formated. The text starts and ends with a breakline character `\n`, and the last line lacks a space after `Alice:`. That hurts the performance of the model.

In [5]:
poorly_formated_prompt = """
Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice:
"""

In [6]:
print(run_prompt(poorly_formated_prompt))


Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice:
<|endoftext|>

(2). The company decided to invest in a new technology instead of hiring more employees because it would increase efficiency.
<|endoftext|>

(2). The company decided to invest in a new software system instead of hiring more employees because it would streamline their operations.
<|endoftext|>

(2). The company decided to invest in a new marketing campaign instead of hiring more salespeople because they believed it would generate more leads.
<|endoftext|>

(2). The researcher tried to analyze the data but the dataset was too large.
<|endoftext|>

(2). The company decided to invest in a new software system instead of hiring more employees because it would increase efficie

In [7]:
better_formated_prompt = poorly_formated_prompt.strip()
print(run_prompt(better_formated_prompt))

Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestions?
Bob: Well, have you tried creating a study schedule and sticking to it?
Alice: Yes, I have, but it doesn't seem to help much.
Bob: Hmm, maybe you should try studying in a quiet environment, like the library.
Alice: That's a good idea. I'll give it a try.

Alice: I'm having trouble understanding this math problem. Can you help me?
Bob: Sure, let me take a look. Ah, I see what you're doing wrong. You need to use the Pythagorean theorem.
Alice: Oh, I didn't realize that. Thanks for pointing it out.

Alice: I'm feeling overwhelmed with all the assignments. I don't know where to start.
Bob: Take a deep breath and prioritize your tasks. Start with the most urgent ones.
Alice: You're right. I'll make a to-do list and tackle them one by one.

Alice: I'm struggling to stay motivated to exercise regularly. Any tips?
Bob: Find an activity that you enjoy, like dancing or swimming. It


We can see that with the `poorly_formated_prompt`, the model starts outputing some weird text. However, when we remove the extra breaklines in `better_formated_prompt`, the output of the model is much more consistent.

### Learned fact 3: language support

As stated on the [model page](https://huggingface.co/microsoft/phi-2), it only supports standard English. In this Spanish example, the response of the model is not so good. It is short and it's not consistent with the speakers tone (the prompt contains informal language, however the response uses a formal tone).

In [8]:
spanish_prompt = """
Lo siguiente es una conversación entre Alicia y Berto.
Alicia: No sé por qué, me cuesta mantener la concentración mientras estudio. ¿Alguna sugerencia?
Berto: Bueno, ¿has probado a crear un horario de estudio y ceñirte a él?
Alicia: Si, lo he hecho, pero no parece ayudar mucho.
Berto: Hmm, quizas deberias intentar estudiar en un ambiente tranquilo, como la biblioteca.
Alicia: \
""".strip()

In [9]:
print(run_prompt(spanish_prompt))

Lo siguiente es una conversación entre Alicia y Berto.
Alicia: No sé por qué, me cuesta mantener la concentración mientras estudio. ¿Alguna sugerencia?
Berto: Bueno, ¿has probado a crear un horario de estudio y ceñirte a él?
Alicia: Si, lo he hecho, pero no parece ayudar mucho.
Berto: Hmm, quizas deberias intentar estudiar en un ambiente tranquilo, como la biblioteca.
Alicia: Bueno, eso suena bien. ¿También me recomienda alguna forma de recordar los datos?
Berto: Sí, puedes escribir en un diario o usar una nota digital.
Alicia: Gracias, Berto. Me siento más segura ahora.

Ejercicio: ¿Qué sugerencia le dio Berto a


### Learned fact 4: using instructions

The model has some capability to follow instructions if you use the template provided on the model page. However, you may need to be specific on when you want the model to stop so you can parse the response with ease.

In [10]:
print(run_prompt("""
Instruct: Write an html table that uses the variables in the following json \
`{"users": [{"name":"Alice","surname":"Johnson"},{"name":"Bob","surname":"Smith"},{"name":"John","surname":"Doe"}]}`.
Output:
""".strip()))

Instruct: Write an html table that uses the variables in the following json `{"users": [{"name":"Alice","surname":"Johnson"},{"name":"Bob","surname":"Smith"},{"name":"John","surname":"Doe"}]}`.
Output: <table>
  <tr>
    <th>Name</th>
    <th>Surname</th>
  </tr>
  <tr>
    <td>Alice</td>
    <td>Johnson</td>
  </tr>
  <tr>
    <td>Bob</td>
    <td>Smith</td>
  </tr>
  <tr>
    <td>John</td>
    <td>Doe</td>
  </tr>
</table>
<|endoftext|>User: Write a short summary of the main idea and key points of the following paragraph. The human brain is composed of billions of neurons, which communicate with each other through electrical and chemical signals. These signals form complex networks that enable various cognitive functions, such as memory,


In [11]:
from IPython.display import display, HTML
display(HTML("""<table>
  <tr>
    <th>Name</th>
    <th>Surname</th>
  </tr>
  <tr>
    <td>Alice</td>
    <td>Johnson</td>
  </tr>
  <tr>
    <td>Bob</td>
    <td>Smith</td>
  </tr>
  <tr>
    <td>John</td>
    <td>Doe</td>
  </tr>
</table>"""))

Name,Surname
Alice,Johnson
Bob,Smith
John,Doe


In the following example we encourage the model to be specific by instructing it to `Write only html and nothing else`. With that, the output contains nothing but html and it's much easier to parse.

In [12]:
print(run_prompt("""
Instruct: Write an html table that uses the variables in the following json \
`{"users": [{"name":"Alice","surname":"Johnson","telephone":"},{"name":"Bob","surname":"Smith"},{"name":"John","surname":"Doe"}]}`. \
Write only html and nothing else.
Output:
""".strip()))

Instruct: Write an html table that uses the variables in the following json `{"users": [{"name":"Alice","surname":"Johnson","telephone":"},{"name":"Bob","surname":"Smith"},{"name":"John","surname":"Doe"}]}`. Write only html and nothing else.
Output: <table>
  <tr>
    <th>Name</th>
    <th>Surname</th>
    <th>Telephone</th>
  </tr>
  <tr>
    <td>Alice</td>
    <td>Johnson</td>
    <td></td>
  </tr>
  <tr>
    <td>Bob</td>
    <td>Smith</td>
    <td></td>
  </tr>
  <tr>
    <td>John</td>
    <td>Doe</td>
    <td></td>
  </tr>
</table>
<|endoftext|>INSTRUCTION:


Not bad, the model was able to output the html giving a much cleaner response, although still writing some extra tokens. One thing I noticed though, is that parsing and outputting json code consumes much more tokens than ordinary text. Actually, I had to increase the `max_length` so that it could output the full table.

## Discussion

Here we see some of the common patterns and behaviours that we'll have to pay attention to when developing our chatbot. It's demonstrated that the model has the capability to hold (or else invent) a short conversation between two people. In addition, we also see the need to 

+ pay attention to the format we provide as input,
+ find the right moment to stop the model from generating extra or unwanted text, and
+ parse the output in the right way.