In [1]:
%run Latex_macros.ipynb

<IPython.core.display.Latex object>

**References**
- [SELF-Instruct paper](https://arxiv.org/pdf/2212.10560.pdf)
- [Self-Alignment with Instruction Backtranslation](https://arxiv.org/pdf/2308.06259.pdf)
- [Large Language Models can Self-Improve](https://arxiv.org/pdf/2210.11610.pdf)

# Using an LLM to generate Instruction Following examples

In the module on [Instruction Following](LLM_Instruction_Following.ipynb)
- we motivated the use of Fine-Tuning a LLM
- to exhibit Instruction Following behavior

Recall: an example of Instruction Following behavior is a triple

$$\langle \text{Instruction}, \text{Context}, \text{Response} \rangle $$

for example
- Instruction: "Tell me the word that is the opposite of the word that I input"
- Context: "Input: Stop"
- Response: "Go"

The Instruction describes the task to be accomplished
- relationship between Input and Response
- the Input/Response pair is an exemplar for this task

In this module, we explore methods
- to generate these fine-tuning examples
- to improve examples

# Using an LLM to generate Instruction Following examples

**Reference**

[SELF-Instruct paper](https://arxiv.org/pdf/2212.10560.pdf)

Is there an alternative to the labor-intensity of constructing Instruction Following examples by human ?

The answer is "Yes"
- and involves the use of In-Context learning.


We can imagine the process as
- starting with a small number $k$ of human-constructed examples

$$
\begin{array}[lll] \\
\langle \text{Instruction}^{(1)}, \text{Context}^{(1)}, \text{Response}^{(1)} \rangle \\
\vdots \\
\langle \text{Instruction}^{(k)}, \text{Context}^{(k)}, \text{Response}^{(k)} \rangle \\
\end{array}
$$

- which are used as exemplars in a *few-shot* learning prompt

the hope is that the LLM will infer
- given a non-exemplar prompt (one without a Response)

$$
\langle \text{Instruction}^{(k+1)}, \text{Context}^{(k+1)}, \rangle 
$$

it's desired completion adds the Response

$$
\langle \text{Instruction}^{(k+1)}, \text{Context}^{(k+1)}, \text{Response}^{(k+1)} \rangle \\
$$

The human and LLM generated examples can then be used to Fine Tune an LLM to better demonstrate Instruction Following

But
- where does a wide variety of Instructions come from ?
- given an Instruction: where do the Context and Response come from ?

The actual process
- is multi-step
- using In-Context learning for each step.

The initial data available
- is a **small** set of "seed" examples
- human generated, perhaps

$$\langle \text{Instruction}, \text{Context}, \text{Response} \rangle $$


The seed data
- is first used to augment the set of possible Instructions
- and then used to generate the Context and Response

resulting in new synthetic triples

Here is an overview of the method
- we will reference this diagram in the following sub-sections

<br>
<img src="images/selfinstruct_process.png">

Attribution: https://arxiv.org/pdf/2212.10560.pdf#page=2

## Generating the Instruction part of an Instruction-Output example

The first step is to generate the  first Instruction part (i.e., the Instruction) of the triple

$$
\langle \textbf{Instruction}^{(k+1)}, \text{Context}^{(k+1)}, \text{Response}^{(k+1)} \rangle \\
$$

using $k$ exemplars
$$
\begin{array}[lll] \\
\langle \text{Instruction}^{(1)} \rangle \\
\vdots \\
\langle \text{Instruction}^{(k)} \rangle \\
\end{array}
$$

**See the box labeled "Step 1" in the illustration above**

The exemplars are the "seed" data in the diagram.

Here is the template for the exemplars used to generate Instructions.


<img src="images/selfinstruct_task_generation_prompts.png" width=90%>

Attribution: https://arxiv.org/pdf/2212.10560.pdf#page=15

With the above template, we expect the LLM
- to generate a continuation of the prompt 
    - ending with `Task 9: `
- which is the Instruction part of a new task

## Generating the Context/Response (Input/Output) part, given an Instruction

Once we have generated a the Instruction part
$$\text{Instruction}^{(k+1)}$$

of the new synthetic example, we need to generate the Context and Response
-

Using the augmented set of Instructions
- seed data + synthetic Instructions

we use exemplars 


$$
\begin{array}[lll] \\
\langle \text{Instruction}^{(1)}, \text{Context}^{(1)}, \text{Response}^{(1)} \rangle \\
\vdots \\
\langle \text{Instruction}^{(k)}, \text{Context}^{(k)}, \text{Response}^{(k)} \rangle \\
\end{array}
$$

and prompt
$$
\text{Instruction}^{(k+1)}
$$

with the expectation that the LLM's continuation will be
$$
\text{Context}^{(k+1)}, \text{Response}^{(k+1)}
$$


**See the box labeled "Step 3" in the diagram above**

To illustrate:

For Classification tasks, the prompt might look like this

    Task: Classify the sentiment of the sentence into positive, negative, or mixed
    
    Example 1
    Sentence: I enjoy the flavor of the restaurant but their service is too slow.
    Class Label: mixed
    
    Example 2
    Sentence: I had a great day today. The weather was beautiful and I spent time with friends.
    Class label: Positive
    
    
    Task: Tell me if the following email is a promotion email or not.
    
    Email: Check out our amazing new sale! Weâ€™ve got discounts on all of your favorite products.
    Class label: Promotion

    Email: We hope you are doing well. Let us know if you need any help.
    Class label: Not Promotion
    
    Task: {instruction for the target task}

The last line above contains a place holder for the Instruction of the Target Task

$$
\text{Instruction}^{(k+1)}
$$

that we created in Step 1.

Here is an example of the template from the paper

<img src="images/selfinstruct_generated_instances.png">

Attribution: https://arxiv.org/pdf/2212.10560.pdf#page=16

### Difficulties in Generating the Input/Output part: Classification tasks

Is Synthetic Data generation with an LLM really so easy ?

Although the few-shot learning approach to generating an Input/Output given an Instruction 
- seems straightforward
- the authors encountered difficulties when generating Input/Output for Classification tasks

Consider the an Instruction Following example for a Classification task

    Task: Classify the sentiment of the sentence into positive, negative, or mixed
    
    Example 1
    Sentence: I enjoy the flavor of the restaurant but their service is too slow.
    Class Label: mixed
 

The authors found that the Response (i.e., `Class Label`) part generated by the LLM for Classification
- were examples whose Class Label's 
- were *not well-distributed* among all possible labels 
    - examples with certain labels were either over or under represented

This issue was traced
- to the **format** of the exemplars

using a deeper understanding of the Language Modeling task.

The original *format* of an exemplar used *Input-first* format
$$
\langle \text{Instruction}^{(i)}, \text{Context}^{(i)}, \text{Response}^{(i)} \rangle
$$

That is: the Response part came **last**.

By changing the format to 

$$
\langle \text{Instruction}^{(i)},  \text{Response}^{(i)}, \text{Context}^{(i)} \rangle
$$

that is: placing the Response in the middle
- the distribution of Responses was improved


For example:

     Task: Classify the sentiment of the sentence into positive, negative, or mixed

     Example 1
        Class Label: mixed
        Sentence: I enjoy the flavor of the restaurant but their service is too slow.
        

        Example 2
        Class label: Positive
        Sentence: I had a great day today. The weather was beautiful and I spent time with friends.
        


This is an example of Prompt Engineering
- In-context learning seems very sensitive to the format of prompts
- There is a skill of engineering a prompt to elicit the desired behavior

This feels similar to the idea behind Chain of Thought prompting
- by presenting `Class Label` first
- the model seems better conditioned to generate a less biased distribution of labels

# Generating Instructions via  Backtranslation

We now present an alternate method for generating the Instruction part of 
$$
\langle \text{Instruction} \rangle
$$
of an Instruction Following example
$$
\langle \text{Instruction}, \text{Context}, \text{Response} \rangle
$$

The method is called *Back Translation*.

Given an LLM $M_{xy}$
- trained on 
$\langle\x, \y \rangle  = \langle \text{Instruction}, \text{Response} \rangle$
pairs

we will train an "inverse" LLM $M_{yz}$
- trained on 
$\langle \y, \x \rangle  = \langle  \text{Response}, \text{Instruction} \rangle$
pairs
- obtained from the training data for $M_{xy}$

This seems odd at first glance.

**But** $M_{yz}$
- can take a target $y$
- and generate a *synthetic* feature vector $y$

For the goal of creating synthetic Instruction Following examples:

$M_{xy}$ is an LLM that generates
$$
\text{Response}
$$

given input 

$$
\langle \text{Instruction}, \text{Context} \rangle
$$

$M_{yz}$ will generate
$$
\langle \text{Instruction}, \text{Context} \rangle
$$

given input 

$$
\text{Response}
$$

This is similar in spirit to the Language Modeling task
- we take an abundant source of unlabeled data
    - documents for the LLM
    - "answers" for the Synthetic Instruction Following task
- and create targets/labels
    - the continuation of a prefix for the LLM
    - the $\langle \text{Instruction}, \text{Context} \rangle$ for the Synthetic Instruction Following task


The advantage of this approach is that
- un-labeled data is plentiful
    - almost any block of text
- but labeled data ( `Response/Instruction` pairs) is scarce.

So, starting with a plentiful resource, we create the scarce resource
- i.e, Instruction Following example triplet

## Self-Improvement

We have seen [Self-Improvement](LLM_Self_Improvement.ipynb) before
- Fine-tuning an LLM in stages


Recall:

- train the Target model $\model$ in stages
    - creating a sequence of fine-tuned Target models $\model_{(0)}, \model_{(1)}, \dots$
    - of  increasing power
- base case
    - fine-tune initial Target $\model_{(0)}$
    - using a mixture of strong (human-generated) and weak (LLM generated) fine-tuning  examples of the Target task
    - resulting in weak Target model $\model_{(1)}$
- inductive case
    - create improved Target $\model_{(\tt+1)}$
    - by fine-tuning $\model_\tp$
        - with the strong examples we already have
        - augmented with examples created as outputs of Target model $\model_\tp$

For the purpose of generating Synthetic Instruction Following examples:

With the newly extended set of seed Instruction/Response pairs
- we have more exemplars
- which we can use as a seed to another iteration of  $M_{yz}$
    - the enlarged set of exemplars may result in *better* synthetic Instruction/Response pairs

We can iterate on this process multiple times
- using the Augmented set of Instruction/Response pairs from step $i$
- as the "seed" for iteration $(i +1)$ of the process

Here is the workflow:

<table>
    <center><strong>Instruction Backtranslation</strong></center>
    <tr>
        <img src="images/instruction_backtranslation.png" width=70%>
    </tr>
    
Attribution: https://arxiv.org/pdf/2308.06259.pdf#page=2
</table>

## Selecting the best synthetic examples for augmentation

The quality of the synthetic examples created at each step may not be uniformly high.

It would be desirable 
- to select only the best examples to use
- in augmenting the seed examples of each iterative Step.

How can we rate the quality of a synthetic example ?

Ask the LLM to do it for you ! 

Using just the seed data
- fine tune a "first generation" LLM
    - denoted $M_0$
- to create a quality score of examples

The following prompt requests that the LLM evaluate the
synthetic example using a rating scale of $1$ (low quality) to $5$ (high quality)

<br>
<table>
    <center><strong>Instruction Backtranslation Curation</strong></center>
    <tr>
        <img src="images/instruction_backtranslation_curating.png" width=70%>
    </tr>
    
Attribution: https://arxiv.org/pdf/2308.06259.pdf#page=4
</table>

Use $M_0$ to
- select the best first generation augmented examples (from the first iteration)

The next generation augmented data set is
- the prior generation 
- augmented with the best (highest quality scores) of the new generation

Now that we have
- an augmented (high quality) "generation $i$" set of seed examples

we continue our iterative process
- creating a more powerful scoring LLM $M_i$
- using exemplars
    - instructions 
    - with scores from the generation $(i-1)$ scorer $M_{i-1}$
    
The scores of $M_i$ can then be used to create
- an even higher quality scorer $M_{i+1}$
-

This too is an example of [LLM Self-Improvement](LLM_Self_Improvement.ipynb).
- improving the Scoring LLM

In [2]:
print("Done")

Done
