# Memory Tuning Example 🦙
Memory Tuning is a powerful feature offered by Lamini that enables effective fact memorization. While it's user-friendly, the process requires iteration. In this example, we'll walk you through the process of creating a model that can accurately answer questions about [Lamini's Python library](https://github.com/lamini-ai/lamini/blob/c1d52643dd129841bf508c1fb09721cafebeab67/lamini/api/lamini.py). You are encouraged to apply it your own use case + data as well. Here are the steps we'll cover for effective memory tuning:


1. Create an evaluation set
1. Benchmark the base model
1. Construct a dataset
1. Review the generated Q&A
1. Tuning run
1. Evaluate using the evaluation set
1. Update Q&A generation
1. Repeat 4-7
1. Expand the dataset




## Create an evaluation set 🖌

Before we dive in, it's essential to clarify what we want the model to learn and the specific improvements we aim for. One effective approach is to create an "evaluation set" or a "golden set." In this example, our goal is to develop a model that can accurately answer questions about Lamini's Python library.

Below are some questions, along with their ideal answers, arranged by increasing complexity:

| Question | Gold Answer |
| :-- | :-- |
| What type is model_name? | A string. |
| What does upload_data return? | The dataset ID, a string.|
| What is the difference between .tune and .train? | They are aliases to each other, i.e. the same. Both functions will tune a model. |
| What arguments are required to make an inference call to Lamini? | To make a call to Lamini, you need to instantiate Lamini with a model name. Then, the generate call only needs a prompt to run inference on the set model. |



## Benchmark 🪑

Since our goal is to improve a base model, let's first evaluate how the base model performs on our evaluation set. In this example, we will be using [Llama 3.1](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) as our base model.

![Base Llama Response](https://drive.google.com/uc?export=view&id=140NIXK2dkHVyd_7jycPRMy-E6bJLLhtO)

As expected, the base model struggled with most of the responses, given that the questions are tailored to our specific use case and aren't part of the model's training data.

Here are the results for our entire evaluation set:

| Question | Llama 3.1 Answer | Gold Answer |
| :-- | :-- | :-- |
| What type is model_name? ✅ | In the Lamini Python API, `model_name` is a string. It represents the name of a model, which can be used to identify and load a specific model from the Lamini model registry. | A string. |
| What does upload_data return? ❌ | In the Lamini Python API, the `upload_data` function is used to upload data to a Lamini instance. The `upload_data` function returns a dictionary containing the response from the Lamini server, which includes the ID of the uploaded data. | The dataset ID, a string. |
| What is the difference between .tune and .train? ❌ | In the Lamini Python API, `.tune` and `.train` are two distinct methods used for hyperparameter tuning and model training, respectively. While they share some similarities, they serve different purposes and have different behaviors. | They are aliases to each other, i.e. the same. Both functions will tune a model. |
| What arguments are required to make an inference call to Lamini? ❌ | I don't have the information about the Lamini Python API. | To make a call to Lamini, you need to instantiate Lamini with a model name. Then, the generate call only needs a prompt to run inference on the set model. |

<details>
  <summary>Generation details</summary>
  We generated these responses using <a href="https://app.lamini.ai/playground" target="_blank">the Lamini playground</a>, with the following System Prompt:
  "You are an expert on Lamini. Answer the following question about the Lamini Python API." 🦙
</details>

But don’t worry—we can only improve from here! 🫡

| Model | Score |
| :-- | :-- |
| Llama 3.1 | 1/4 --> 25% |

## Construct a tuning dataset 🎵

For this example, we'll generate Q&A pairs from [our `lamini.py` code file](https://github.com/lamini-ai/lamini/blob/c1d52643dd129841bf508c1fb09721cafebeab67/lamini/api/lamini.py).

Let's begin with a straightforward approach: we'll pass the entire file to an LLM and ask it to generate some questions and answers.

<details>
  <summary>Prompt templates</summary>
  Meta Llama 3.1 Instruct uses a prompt template, with special tags used to indicate the user query and system prompt.
You can find the documentation on this <a href="https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1" target="_blank">model card</a>.
</details>

In [1]:
!pip install lamini
import lamini

# Get your API key form https://app.lamini.ai/account
lamini.api_key = "<YOUR-LAMINI-API-KEY>"
llm = lamini.Lamini(model_name="meta-llama/Meta-Llama-3.1-8B-Instruct")

with open('lamini_file.py', 'r') as file:
  code_file = file.read()

prompt = f"Generate some questions and answers for this code: {code_file}"
# Apply prompt template
formatted_prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
response = llm.generate(formatted_prompt)
print(response)

Collecting lamini
  Downloading lamini-3.1.1-9-py3-none-any.whl.metadata (1.6 kB)
Collecting lamini-configuration[yaml] (from lamini)
  Downloading lamini_configuration-0.8.3-py3-none-any.whl.metadata (11 kB)
Collecting jsonlines (from lamini)
  Downloading jsonlines-4.0.0-py3-none-any.whl.metadata (1.6 kB)
Collecting azure-storage-blob (from lamini)
  Downloading azure_storage_blob-12.23.1-py3-none-any.whl.metadata (26 kB)
Collecting azure-core>=1.30.0 (from azure-storage-blob->lamini)
  Downloading azure_core-1.31.0-py3-none-any.whl.metadata (39 kB)
Collecting isodate>=0.6.1 (from azure-storage-blob->lamini)
  Downloading isodate-0.6.1-py2.py3-none-any.whl.metadata (9.6 kB)
Downloading lamini-3.1.1-9-py3-none-any.whl (693 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m693.9/693.9 kB[0m [31m28.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading azure_storage_blob-12.23.1-py3-none-any.whl (405 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m405.6/40

Not bad! Let's use json output so we won't have to worry about parsing the response.

In [19]:
json_response = llm.generate(formatted_prompt, output_type={
    "question1": "str",
    "answer1": "str",
    "question2": "str",
    "answer2": "str",
    "question3": "str",
    "answer3": "str"
})
print(json_response)

{'question1': 'What are the arguments required to make an inference call to Lamini', 'answer1': 'The arguments required to make an inference call to Lamini are:', 'question2': 'What are the arguments required to make an inference call to Lamini?', 'answer2': 'The arguments required to make an inference call to Lamini are:', 'question3': 'What are the arguments required to make an inference call to Lamini?', 'answer3': 'The arguments required to make an inference call to Lamini are:'}


## Review your tuning dataset 🔎
Even if you're using your own data while following this notebook, it's important to review it closely. Understanding your data is crucial for tuning an accurate model—the quality of the input directly influences the quality of the output.

![You and your data fusion](https://drive.google.com/uc?export=view&id=184mbTYK0DtBaNs4H2vZ6Ni370HVZn7BF)


Now, let's take a look at the data we generated in the previous step:

```
{
  'question1': 'What is the purpose of the Lamini class in this code',
  'answer1': 'The Lamini class is the main interface for the Lamini platform functionality, providing methods for generation, data upload, training, and evaluation',
  'question2': 'What is the `generate` method used for?',
  'answer2': 'The `generate` method is used to make a generation request to the LLM with the provided prompt, allowing for various options such as specifying the model name, output type, and maximum tokens',
  'question3': 'What is the `upload_data` method used for?',
  'answer3': 'The `upload_data` method is used to upload the provided data to the Lamini Platform, allowing for the option to specify whether the dataset should be public or not'
}

```
And our evaluation set:

| Question | Gold Answer |
| :-- | :-- |
| What type is model_name? | A string. |
| What does upload_data return? | The dataset ID, a string.|
| What is the difference between .tune and .train? | They are aliases to each other, i.e. the same. Both functions will tune a model. |
| What arguments are required to make an inference call to Lamini? | To make a call to Lamini, you need to instantiate Lamini with a model name. Then, the generate call only needs a prompt to run inference on the set model. |

What stands out in our analysis? A few notable issues emerge when we compare the generated data with our evaluation set:
1. The distribution and diversity of questions differ significantly; for instance, the generated Q&A primarily focuses on the "purpose" and the methods used.
1. The generated data lacks coverage of key concepts from the evaluation set, such as parameter types and return types.

![Eval set concepts](https://drive.google.com/uc?export=view&id=1GveYb9Nc9ZteoxJUDslthC6Gof_fnHr-)

To address this, let's create a more diverse dataset that encompasses a broader range of concepts we want the model to learn, utilizing chunking and prompt tuning. You'll find that sometimes simply just asking the model to be more diverse will work. Other times, the model will need some more guidance to ensure specific concepts are adequately convered.

<details>
  <summary>Chunking</summary>
  For simplicity, we use brute chunking in this example (i.e. dividing the code into chunks of 200 lines). If you notice that your generated data is incorrect or lacking key information, it's worth reviewing the input being sent to the model. In some cases, brute chunking can inadvertently cut off words or sentences, resulting in poor-quality data. To mitigate this, consider implementing smarter chunking strategies, such as chunking at the end of each paragraph or section, tailored to your specific data.
</details>

In [3]:
with open('lamini_file.py', 'r') as file:
  code_file = file.read()

# Generate 3 Q&A pairs for every 200 lines of the code (a chunk)
responses = []
code_lines = code_file.splitlines()
for i in range(0, len(code_lines), 200):
  chunk = code_lines[i:i + 200]
  # Update the prompt to ask for more diverse Q&A
  prompt = f"Generate three diverse questions and answers for this code: {chunk}"
  print(prompt)
  # Apply prompt template
  formatted_prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
  # Generate the questions and answers
  r = llm.generate(formatted_prompt, output_type={
      "question1": "str",
      "answer1": "str",
      "question2": "str",
      "answer2": "str",
      "question3": "str",
      "answer3": "str"
  })
  # Save the responses to a list
  responses.append({"input": r["question1"], "output": r["answer1"]})
  responses.append({"input": r["question2"], "output": r["answer2"]})
  responses.append({"input": r["question3"], "output": r["answer3"]})

print(responses)

Generate three diverse questions and answers for this code: ['import json', 'import jsonlines', 'import logging', 'import os', 'import pandas as pd', 'import time', '', 'from lamini.api.lamini_config import get_config', 'from lamini.api.rest_requests import get_version', 'from lamini.api.train import Train', 'from lamini.api.utils.completion import Completion', 'from lamini.api.utils.upload_client import upload_to_blob', 'from lamini.error.error import (', '    DownloadingModelError,', ')', 'from typing import Dict, Iterable, List, Optional, Union, Any, Generator', '', 'logger = logging.getLogger(__name__)', '', '', 'class Lamini:', '    """Main interface for Lamini platform functionality. Key features are:', '        1. Generation calls', '        2. Data Upload/Downloading', '        3. Training orchestration', '        4. Evaluation', '', '    Parameters', '    ----------', '    model_name: str = None', '        LLM hugging face ID', '', '    api_key: Optional[str]', '        Lamini

Now let's revisit the generated data. By carefully reviewing our data and updating the generation code when the dataset is manageable (i.e., something a human can review in just a few minutes), we lay the groundwork to scale the process for larger datasets in future tuning iterations.

```
[
  {
    'input': 'What is the purpose of the `get_version` function in the Lamini class',
    'output': 'The `get_version` function is used to retrieve the version of the Lamini platform, taking into account the provided API key, API URL, and configuration'
  },
  {
    'input': 'What is the difference between `max_tokens` and `max_new_tokens` in the `generate` method',
    'output': 'max_tokens is the total number of tokens the model can use for generation, while max_new_tokens is the maximum number of new tokens the model can generate, allowing for more fine-grained control over the output'
  },
  {
    'input': 'What is the purpose of the `upload_data` method in the Lamini class',
    'output': 'The `upload_data` method is used to upload data to the Lamini platform, taking in an iterable of dictionaries and an optional boolean indicating whether the data should be made public'
  },
  {
    'input': 'What is the purpose of the `upload_data` method in the provided code',
    'output': 'The `upload_data` method is used to upload data to the Lamini Platform, either to Azure or locally, depending on the `upload_base_path` value. It takes in data, an optional `is_public` flag, and returns the dataset ID'
  },
  {
    'input': 'What is the maximum file size allowed for upload in the `_upload_file_impl` method?',
    'output': '10GB'
  },
  {
    'input': 'What is the purpose of the `train` method in the provided code?',
    'output': 'The `train` method is used to handle training jobs through the Trainer object, submitting a training job request to the platform using the provided data'
  },
  {
    'input': 'What is the purpose of the `train` function in this code',
    'output': 'The `train` function submits a training job request to the platform using the provided data and continuously polls until the job is completed'
  },
  {
    'input': 'What is the `tune` alias for in this code',
    'output': 'The `tune` alias is an alias for the `train` function'
  },
  {
    'input': 'What is the purpose of the `cancel_job` function in this code',
    'output': 'The `cancel_job` function cancels a job specified by the job id on the platform'
  },
  {
    'input': 'What is the purpose of the `evaluate` method in this code',
    'output': 'The purpose of the `evaluate` method is to run an evaluation job on the specified training job and return the status of the job on the platform'
  },
  {
    'input': 'What is the type of the `job_id` parameter in the `evaluate` method',
    'output': 'The `job_id` parameter is a string'
  },
  {
    'input': 'What is the return type of the `evaluate` method',
    'output': 'The return type of the `evaluate` method is a string'
  }
]
```


With our improved prompting and chunking, the questions are now more diverse and cover more concepts from the evaluation set. For example, there are now questions about return types and parameter types—key concepts that were not addressed in the earlier generated data.

## Tuning a model 🏰
Finally, we have some solid data and are ready for a tuning run! This is simple with Lamini. Try uncommenting the tune command to create your own model or feel free to continue and use the public model we have already tuned for you.

In [2]:
# Note that responses has already been formatted as [{"input": ..., "output": ...}, ...]
# We want to further format the data using Llama 3.1's prompt template: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/
formatted_responses = [
  {
    "input": f'<|begin_of_text|><|start_header_id|>user<|end_header_id|>{r["input"]}<|eot_id|><|start_header_id|>assistant<|end_header_id|>',
    "output": f'{r["output"]}',
  }
  for r in responses
]

print("Uncomment the tune command to tune your own model!")
## Uncomment to tune your own model! ##
# llm.tune(data_or_dataset_id=formatted_responses)

Uncomment the tune command to tune your own model!
Data pairs uploaded to local.

Your dataset id is: 83eac7d5e636ed35639272cbc3a338e4c505b49613fad1c65456203926eb546d . Consider using this in the future to train using the same data. 
Eg: llm.train(data_or_dataset_id='83eac7d5e636ed35639272cbc3a338e4c505b49613fad1c65456203926eb546d')
Tuning job submitted! Check status of job 12656 here: https://api.lamini.ai/train/12656


{'job_id': 12656,
 'status': 'CREATED',
 'dataset_id': '83eac7d5e636ed35639272cbc3a338e4c505b49613fad1c65456203926eb546d'}

## Evaluate 🤓

After the model has completed tuning, we should benchmark it again against our evaluation set to evaluate its performance and hopefully see some improvement.

In [5]:
# If you tuned your own model, replace model_name once it has finished tuning: https://api.lamini.ai/tune
# You can also use the UI to test this: https://api.lamini.ai/tune
tuned_llm = lamini.Lamini(model_name="a98a7a00623b14b71cf3b53c8f02f7a3c23d95d57d41efbbb8dc79bbb5cb836d")
questions = ["What type is model_name?", "What does upload_data return?", "What is the difference between .tune and .train?", "What arguments are required to make an inference call to Lamini?"]
for q in questions:
  formatted_prompt = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>{q}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
  response = tuned_llm.generate(formatted_prompt)
  print(q, response)


What type is model_name? The model_name is a string
What does upload_data return? The `upload_data` method returns the dataset ID of the uploaded data
What is the difference between .tune and .train? The `tune` alias is an alias for the `train` function
What arguments are required to make an inference call to Lamini? The arguments required to make an inference call to Lamini are:

* The ID of the job function to run
* The input data for the job function
* Any additional configuration options for the job function


| Question | Tuning Run #1 Answer | Gold Answer |
| :-- | :-- | :-- |
| What type is model_name? | The model_name is a string | A string. |
| What does upload_data return? | The upload_data method returns the dataset ID | The dataset ID, a string. |
| What is the difference between .tune and .train? | The `tune` alias is an alias for the `train` function | They are aliases to each other, i.e. the same. Both functions<br> will tune a model. |
| What arguments are required to make an inference call to Lamini? | The arguments required to make an inference call to<br>Lamini are:<br>- The ID of the job function to run <br>- The input data for the job function<br>- Any additional configuration options for the job function | To make a call to Lamini, you need to instantiate Lamini with<br> a model name. Then, the generate call only needs a prompt<br> to run inference on the set model. |

Let's review the models answers against the gold response:
1. What type is model_name? --> Correct ✅
1. What does upload_data return? --> Correct ✅
1. What is the difference between .tune and .train? --> Correct ✅
1. What arguments are required to make an inference call to Lamini? --> Incorrect, does not reference the generate function ❌

And now we can score our first tuning run:

| Model | Score |
| :-- | :-- |
| Llama 3.1 | 1/4 --> 25% |
| Tuning Run #1 | 3/4 --> 75% |

Sweet, that's a great improvement!

But is this the best we can achieve? Not necessarily. Memory tuning is an iterative process. By analyzing why the model made incorrect responses and iterating on our approach, we can enhance the model's quality even further based on our evaluation set, using the techniques we've just covered.

## Iterate ♻
Let's look at the evaluation question that the model got wrong and examine why.

| Question | Tuning Run #1 Answer | Gold Answer |
| :-- | :-- | :-- |
| What arguments are required to make an inference call to Lamini? ❌ | The arguments required to make an inference call to<br>Lamini are:<br>- The ID of the job function to run <br>- The input data for the job function<br>- Any additional configuration options for the job function | To make a call to Lamini, you need to instantiate Lamini with<br> a model name. Then, the generate call only needs a prompt<br> to run inference on the set model. |

A good starting point is to check the data—does the tuning data contain the information needed to answer this question?

<details>
  <summary><i> Tuning data </i></summary>
  <pre><code>[
  {
    'input': 'What is the purpose of the `get_version` function in the Lamini class',
    'output': 'The `get_version` function is used to retrieve the version of the Lamini platform, taking into account the provided API key, API URL, and configuration'
  },
  {
    'input': 'What is the difference between `max_tokens` and `max_new_tokens` in the `generate` method',
    'output': 'max_tokens is the total number of tokens the model can use for generation, while max_new_tokens is the maximum number of new tokens the model can generate, allowing for more fine-grained control over the output'
  },
  {
    'input': 'What is the purpose of the `upload_data` method in the Lamini class',
    'output': 'The `upload_data` method is used to upload data to the Lamini platform, taking in an iterable of dictionaries and an optional boolean indicating whether the data should be made public'
  },
  {
    'input': 'What is the purpose of the `upload_data` method in the provided code',
    'output': 'The `upload_data` method is used to upload data to the Lamini Platform, either to Azure or locally, depending on the `upload_base_path` value. It takes in data, an optional `is_public` flag, and returns the dataset ID'
  },
  {
    'input': 'What is the maximum file size allowed for upload in the `_upload_file_impl` method?',
    'output': '10GB'
  },
  {
    'input': 'What is the purpose of the `train` method in the provided code?',
    'output': 'The `train` method is used to handle training jobs through the Trainer object, submitting a training job request to the platform using the provided data'
  },
  {
    'input': 'What is the purpose of the `train` function in this code',
    'output': 'The `train` function submits a training job request to the platform using the provided data and continuously polls until the job is completed'
  },
  {
    'input': 'What is the `tune` alias for in this code',
    'output': 'The `tune` alias is an alias for the `train` function'
  },
  {
    'input': 'What is the purpose of the `cancel_job` function in this code',
    'output': 'The `cancel_job` function cancels a job specified by the job id on the platform'
  },
  {
    'input': 'What is the purpose of the `evaluate` method in this code',
    'output': 'The purpose of the `evaluate` method is to run an evaluation job on the specified training job and return the status of the job on the platform'
  },
  {
    'input': 'What is the type of the `job_id` parameter in the `evaluate` method',
    'output': 'The `job_id` parameter is a string'
  },
  {
    'input': 'What is the return type of the `evaluate` method',
    'output': 'The return type of the `evaluate` method is a string'
  }
]</code></pre>
</details>

We see that the tuning data includes some information about the `generate` method and its parameters:
```
'input': 'What is the difference between `max_tokens` and `max_new_tokens` in the `generate` method',
'output': 'max_tokens is the total number of tokens the model can use for generation, while max_new_tokens is the maximum number of new tokens the model can generate, allowing for more fine-grained control over the output'
```
However, since the model didn't mention the `generate` method, it likely didn’t make the connection between "inference call" and a call to  `generate`. The data is also missing the required arguments for `generate`.

**Fix**: Manually add Q&A pairs to the tuning data to cover this concept gap (`generate` means inference). Alternatively, update the data generation process to ensure this concept is included. After making these adjustments, re-tune the model and evaluate the results against the evaluation set.

## Conclusion 👑

In this example, we've reached step 7 in the memory tuning process and demonstrated many of the essential tools to get started. Your next step is to revisit the creation of the tuning set (steps 3 and 4) to generate additional questions and answers that address the missing concepts and continue iterating.

Here’s a quick recap of the steps:
1. Create an evaluation set
1. Benchmark the base model
1. Construct a dataset
1. Review the generated Q&A
1. Tuning run
1. Evaluate using the evaluation set
1. Update Q&A generation
1. Repeat 4-7
1. Expand the dataset

Once we think the model is "good enough" or start seeing that the data is there, but the model isn't memorizing it correctly, we can try updating the hyperparams and, again, iterate and try different learning rates, max steps, etc. For more information on hyperparameters, check out the documentation here:: [https://lamini-ai.github.io/tuning/hyperparameters/](https://lamini-ai.github.io/tuning/hyperparameters/).

Finally, once you have a model that performs well with a small dataset, you can start to expand and add more data. By focusing on debugging a subset of your data, you can iterate and refine the process more effectively. For this example, it would be incrementally increasing to all the files in the Lamini Python package.

As you use more data, these tools from Lamini may be helpful: [data generation pipeline](https://github.com/lamini-ai/lamini-examples/blob/main/05_data_pipeline/README.md), evaluation pipeline (WIP).

## Note from the author 🙋🏻‍♀️
While this notebook is meant to provide a straightforward example of memory tuning, even its creation required a lot of iteration. We had to experiment with various prompts, number of data points, modified chunk sizes, tuning hyperparameters, etc. Remember, tuning is as much an art as it is a science. Stay persistent and break the task down into manageable pieces. Embrace the data and start small for the best results!

Here are a few of the over 2 dozen models created while making this notebook.

![Model iterations](https://drive.google.com/uc?export=view&id=1m21fR0frarN2U3gU4w_mf1G98uwsiQIy)