![NVIDIA Logo](images/nvidia.png)

# List Generator

In this notebook we introduce a Python list generator LLM function that leverages a prompt-engineered LLM and can return Python lists of a given size containing items that fit a given description.

In later sections you will use the list generation function extensively, largely in service of creating synthetic data to use for fine-tuning models.

![List Gen 43B](images/list_gen_43b.png)

---

## Learning Objectives

By the time you complete this notebook you will be able to:
- Build an LLM-powered Python list generator for later use in synthetic data generation.

---

## Imports

In [None]:
import json
import ast

from llm_utils.models import LoraModels
from llm_utils.nemo_service_models import NemoServiceBaseModel
from llm_utils.prompt_creators import create_nemo_prompt_with_examples
from llm_utils.llm_functions import make_llm_function

---

## List Models

In [None]:
LoraModels.list_models()

---

## List Generator LLM Function

Our present goal is to create an LLM function called `generate_list` that will expect 2 arguments, `n` and `topic` and which will generate a literal (not string) Python list of length `n` containings items for the given `topic`.

As you recall, the way we are constructing LLM functions requires 3 components: a model instance, a prompt template, and optionally, a postprocessor.

### Model Instance

We begin by instantiating a model instance. Here we will use GPT43B.

In [None]:
model = NemoServiceBaseModel(LoraModels.gpt43b.value)

### Prompt Template

Through prompt engineering we arrived at `gen_list_template` below.

It contains a main prompt `'Make a python list of {n} {topic}'`, and includes 3 example shots for how the model ought to respond. We are using a helper function `create_nemo_prompt_with_example` (introduced in the PubMedQA section of the workshop) to help us construct example shots formatted appropriately to the instruction fine-tuned NeMo GPT43B model we are using.

In [None]:
def gen_list_template(n, topic):
    conversation_examples = [
        ('Make a python list of 2 animals.', '["dog", "spotted owl"]'),
        ('Make a python list of 3 books.', '["The Three Body Problem", "Dandelion Wine", "Deep Learning and the Game of Go"]'),
        ('Make a python list of 6 times of day.', '["morning", "evening", "night", "midday", "midnight", "dawn"]')
    ]
    main_prompt = f'Make a python list of {n} {topic}.'
    return create_nemo_prompt_with_examples(main_prompt, conversation_examples=conversation_examples)

### Postprocessing

We want our function to return a literal Python list, and we take care to ensure that in the following postprocessing function `postprocess_list`, which does several important things:

- It uses Python's `ast.literal_eval` to try to convert the model's string response to a literal list.
- In cases where the model response is not a well-formed list, it returns an empty Python list.
- In situations where our model response may have included duplicate items, we deduplicate the list by casting it to a set and then back to a list.

In [None]:
def postprocess_list(list_str):
    try:
        # If the model response is a well-formed list, this will convert the string into a Python list
        literal_list = ast.literal_eval(list_str)
    except:
        # In cases where the model response is not a well-formed list we return an empty list
        literal_list = []

    # We can deduplicate the list by casting it to a set and then back to a list
    deduplicated_list = list(set(literal_list))
    return deduplicated_list

### Create the LLM Function

Using the model instance, our prompt template and our postprocessor we create the `generate_list` LLM function.

In [None]:
generate_list = make_llm_function(model, gen_list_template, postprocess_list)

---

## Use List Generator

Let's try out `generate_list`.

In [None]:
generate_list(3, 'programming languages')

It looks to be working as expected. Let's capture the response and ensure that it is actually a literal Python list.

In [None]:
good_qualities = generate_list(4, 'good qualities')

In [None]:
type(good_qualities)

In [None]:
len(good_qualities)

In [None]:
for good_quality in good_qualities:
    print(good_quality)

Here we loop through several values for `n` and `topic` to generate a few different lists, and then loop through them with some print statements to ensure it is working as expected.

In [None]:
ns = [3, 6, 4]
topics = ['philosophies', 'technological breakthroughs', 'famous toys']

generated_lists = []

for n, topic in zip(ns, topics):
    generated_lists.append(generate_list(n, topic))

In [None]:
for n, topic, generated_list in zip(ns, topics, generated_lists):
    print(f'topic: {topic}')
    print(f'generated list: {generated_list}')
    print(f'n: {n}')
    print(f'generated list length: {len(generated_list)}')
    print(f'lengths match: {len(generated_list) == n}\n')

As we can see, `generate_list` is producing literal lists of the correct length, without duplicates that contains appropriate items for the provided topic.