<a href="https://colab.research.google.com/github/wandb/edu/blob/main/llm-apps-course/notebooks/02.%20Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{llmapps-generation} -->

# Generation
<!--- @wandbcode{llmapps-generation} -->

In this notebook we will dive deeper on prompting the model by passing a better context by using available data from users questions and using the documentation files to generate better answers.


### Setup

In [None]:
%pip install google-generativeai weave tenacity

In [None]:
%pip install pandas 

In [34]:
from pathlib import Path
from rich.markdown import Markdown
import pandas as pd
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential, # for exponential backoff
)

import google.generativeai as genai
from getpass import getpass

In [2]:
import weave
weave.init("gemini-weave2")

weave version 0.51.10 is available!  To upgrade, please run:
 $ pip install weave --upgrade
Logged in as Weights & Biases user: supriyagdptl.
View Weave data at https://wandb.ai/supriyagdptl/gemini-weave2/weave


<weave.trace.weave_client.WeaveClient at 0x16a3d7190>

In [35]:
GOOGLE_GEMINI_API_KEY = getpass("Paste your Google Gemini API key from: https://aistudio.google.com/app/apikey\n")

Paste your Google Gemini API key from: https://aistudio.google.com/app/apikey
 ········


In [36]:
genai.configure(api_key=GOOGLE_GEMINI_API_KEY)

In [4]:
model_name = "models/gemini-1.5-flash"
model_info = genai.get_model(model_name)
print(model_info)

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description='Fast and versatile multimodal model for scaling across diverse tasks',
      input_token_limit=1000000,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=64)


# Generating synthetic support questions

We will add a retry behavior in case we hit the API rate limit

In [17]:
#@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
model = genai.GenerativeModel(
                   model_name=model_name, 
                   system_instruction=system_prompt,
                   generation_config=genai.types.GenerationConfig(max_output_tokens=50))
def completion_with_backoff(model_name, system_prompt, user_prompt):
    response = model.generate_content(user_prompt)
    return response.text

In [21]:
system_prompt = "You are a helpful assistant. Limit your responses to 50 tokens."
user_prompt = "Generate a support question from a W&B user"
model_name = "models/gemini-1.5-flash"

In [31]:
@weave.op()
def generate_synthetic_questions(system_prompt, user_prompt, model_name="models/gemini-1.5-flash", num_of_ques=5):
    responses = []
    model = genai.GenerativeModel(
           model_name=model_name, 
           system_instruction=system_prompt,
           generation_config=genai.types.GenerationConfig(max_output_tokens=50))
    for _ in range(num_of_ques):
        response = model.generate_content(user_prompt)
        responses.append(response.text)
    return responses

In [25]:
generate_synthetic_questions(system_prompt,user_prompt,model_name,5)

🍩 https://wandb.ai/supriyagdptl/gemini-weave2/r/call/01923b45-3c61-7920-9d89-1808e82983f8


["Why is my model not logging metrics to Weights & Biases? I've set up the wandb.init() call correctly and am using wandb.log() to record my data. \n",
 "I'm trying to log my model weights to Weights & Biases, but I'm getting an error. How do I fix this? \n",
 '"Why is my model\'s training accuracy not improving even after several epochs?" \n',
 'Why are my model weights not saving properly in Weights & Biases? \n',
 "Why is my model not logging metrics to Weights & Biases? I've set up the wandb.init() correctly. \n"]

## Observation:
The generated questions are valid but there is redundancy.

# Few Shot 

Let's read some user submitted queries from the file `examples.txt`. This file contains multiline questions separated by tabs (`\t`).

In [15]:
import random

In [16]:
delimiter = "\t" # tab separated queries
with open("examples.txt", "r") as file:
    data = file.read()
    real_queries = data.split(delimiter)

pprint(f"We have {len(real_queries)} real queries:")  
Markdown(f"Sample one: \n\"{random.choice(real_queries)}\"")

'We have 228 real queries:'


We can now use those real user questions to guide our model to produce synthetic questions like those.

In [30]:
def generate_few_shot_prompt(queries, n=3):
    prompt = "Generate a support question from a W&B user. Limit your responses to 50 tokens. \n" +\
        "Below you will find a few examples of real user queries:\n"
    for _ in range(n):
        prompt += random.choice(queries) + "\n"
    prompt += "Let's start!"
    return prompt

generation_prompt = generate_few_shot_prompt(real_queries)
display(Markdown(generation_prompt))


In [32]:
generate_synthetic_questions(system_prompt, user_prompt=generation_prompt)

🍩 https://wandb.ai/supriyagdptl/gemini-weave2/r/call/01923b58-9862-78f1-9d8a-42dd5c887580


['How can I add custom labels to my runs in the W&B interface? \n',
 'How do I integrate Weights & Biases with my TensorFlow model? \n',
 'How do I download all my model weights from a project? \n',
 'How can I set a custom time range for my charts in W&B? \n',
 'How can I create a custom dashboard with specific metrics? \n']

## Observations:
The generated questions are more realistic and have more variation than zero-shot generation. This is because the Gemini model can now use the context from the user prompt to generate more relevant output.

# Add Context & Response
Let's create a function to find all the markdown files in a directory and return it's content and path

In [58]:
model = genai.GenerativeModel(model_name="gemini-1.5-pro")

In [59]:
def find_md_files(directory):
    "Find all markdown files in a directory and return their content and path"
    md_files = []
    num_tokens = []
    for file in Path(directory).rglob("*.md"):
        with open(file, 'r', encoding='utf-8') as md_file:
            content = md_file.read()
            num_tokens.append(model.count_tokens(content))
        md_files.append((file.relative_to(directory), content))
    return md_files, num_tokens

documents, num_tokens = find_md_files('docs_sample/')
print("num of md files:", len(documents))
print("num of tokens in each md", num_tokens)

num of md files: 11
num of tokens in each md [total_tokens: 4816
, total_tokens: 395
, total_tokens: 1336
, total_tokens: 2921
, total_tokens: 3253
, total_tokens: 631
, total_tokens: 1062
, total_tokens: 833
, total_tokens: 1905
, total_tokens: 2700
, total_tokens: 2252
]


In [60]:
documents[0][1][:1024]

"import Tabs from '@theme/Tabs';\nimport TabItem from '@theme/TabItem';\n\n# PyTorch Lightning\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://wandb.me/lightning)\n\nPyTorch Lightning provides a lightweight wrapper for organizing your PyTorch code and easily adding advanced features such as distributed training and 16-bit precision. W&B provides a lightweight wrapper for logging your ML experiments. But you don't need to combine the two yourself: Weights & Biases is incorporated directly into the PyTorch Lightning library via the [**`WandbLogger`**](https://pytorch-lightning.readthedocs.io/en/stable/extensions/generated/pytorch\\_lightning.loggers.WandbLogger.html#pytorch\\_lightning.loggers.WandbLogger).\n\n## ⚡ Get going lightning-fast with just two lines.\n\n```python\nfrom pytorch_lightning.loggers import WandbLogger\nfrom pytorch_lightning import Trainer\n\nwandb_logger = WandbLogger()\ntrainer = Trainer(logger=wandb_logger)\n```\n\n![Intera

In [61]:
def generate_context_prompt(chunk):
    prompt = "Generate a support question from a W&B user\n" +\
        "The question should be answerable by provided fragment of W&B documentation.\n" +\
        "Below you will find a fragment of W&B documentation:\n" +\
        chunk + "\n" +\
        "Let's start!"
    return prompt

chunk = documents[5][1]
generation_prompt = generate_context_prompt(chunk)
generation_prompt

'Generate a support question from a W&B user\nThe question should be answerable by provided fragment of W&B documentation.\nBelow you will find a fragment of W&B documentation:\n---\ndescription: Explore how to use W&B Tables with this 5 minute Quickstart.\n---\n\n# Tables Quickstart\n\nThe following Quickstart demonstrates how to log data tables, visualize data, and query data.\n\n\nSelect the button below to try a PyTorch Quickstart example project on MNIST data. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/tables-quickstart)\n\n## 1. Log a table\n\nFollow the procedure outlined below to log a Table with W&B:\n1. Initialize a W&B Run with [`wandb.init()`](../../ref/python/init.md). \n2. Create a [`wandb.Table()`](../../ref/python/data-types/table.md) object instance. Pass the name of the columns in your table along with the data for the `columns` and `data` parameters, respectively.  \n3. Log the table with [`run.log()`](../../ref/pytho

In [62]:
generate_synthetic_questions(system_prompt, user_prompt=generation_prompt)

🍩 https://wandb.ai/supriyagdptl/gemini-weave2/r/call/01924126-7da0-7ac0-adcc-243d36bff68e


['How do I log a Pandas DataFrame to W&B as a table? \n',
 'How do I compare tables logged from multiple W&B runs in the same project? \n',
 'How can I log a Pandas DataFrame as a table in W&B? \n',
 "How can I compare tables from different W&B Runs in the same project workspace? I'd like to see how results vary across model versions. \n",
 'How do I log a Pandas DataFrame as a W&B Table? \n']

# Observations:
There is repetitiveness in the generated questions

# Add Context & Response

Let's create a function to find all the markdown files in a directory and return it's content and path

In [None]:
# check if directory exists, if not, create it and download the files, e.g if running in colab
if not os.path.exists("../docs_sample/"):
  !git clone https://github.com/wandb/edu.git
  !cp -r edu/llm-apps-course/docs_sample ../

In [None]:
def find_md_files(directory):
    "Find all markdown files in a directory and return their content and path"
    md_files = []
    for file in Path(directory).rglob("*.md"):
        with open(file, 'r', encoding='utf-8') as md_file:
            content = md_file.read()
        md_files.append((file.relative_to(directory), content))
    return md_files

documents = find_md_files('../docs_sample/')
len(documents)

Let's check if the documents are not too long for our context window. We need to compute the number of tokens in each document.

In [None]:
tokenizer = tiktoken.encoding_for_model(MODEL_NAME)
tokens_per_document = [len(tokenizer.encode(document)) for _, document in documents]
pprint(tokens_per_document)

Some of them are too long - instead of using entire documents, we'll extract a random chunk from them

In [None]:
# extract a random chunk from a document
def extract_random_chunk(document, max_tokens=512):
    tokens = tokenizer.encode(document)
    if len(tokens) <= max_tokens:
        return document
    start = random.randint(0, len(tokens) - max_tokens)
    end = start + max_tokens
    return tokenizer.decode(tokens[start:end])

Now, we will use that extracted chunk to create a question that can be answered by the document. This way we can generate questions that our current documentation is capable of answering.

In [None]:
def generate_context_prompt(chunk):
    prompt = "Generate a support question from a W&B user\n" +\
        "The question should be answerable by provided fragment of W&B documentation.\n" +\
        "Below you will find a fragment of W&B documentation:\n" +\
        chunk + "\n" +\
        "Let's start!"
    return prompt

chunk = extract_random_chunk(documents[0][1])
generation_prompt = generate_context_prompt(chunk)

In [None]:
Markdown(generation_prompt)

Let's generate 3 possible questions:

In [None]:
generate_and_print(system_prompt, generation_prompt, n=3)

> As you can see, sometimes the generation contains an intro phrase like: "Sure, here's a support question based on the documentation:", we may want to put some instructions to avoid this.

### Level 5 prompt

Complex directive that includes the following:
- Description of high-level goal
- A detailed bulleted list of sub-tasks
- An explicit statement asking LLM to explain its own output
- A guideline on how LLM output will be evaluated
- Few-shot examples

In [None]:
# we will use GPT4 from here, as it gives better answers and abides to instructions better
MODEL_NAME = "gpt-4"

In [None]:
# read system_template.txt file into an f-string
with open("system_template.txt", "r") as file:
    system_prompt = file.read()

In [None]:
Markdown(system_prompt)

In [None]:
# read prompt_template.txt file into an f-string
with open("prompt_template.txt", "r") as file:
    prompt_template = file.read()

In [None]:
Markdown(prompt_template)

In [None]:
def generate_context_prompt(chunk, n_questions=3):
    questions = '\n'.join(random.sample(real_queries, n_questions))
    user_prompt = prompt_template.format(QUESTIONS=questions, CHUNK=chunk)
    return user_prompt

user_prompt = generate_context_prompt(chunk)

In [None]:
Markdown(user_prompt)

In [None]:
def generate_questions(documents, n_questions=3, n_generations=5):
    questions = []
    for _, document in documents:
        chunk = extract_random_chunk(document)
        user_prompt = generate_context_prompt(chunk, n_questions)
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ]
        response = completion_with_backoff(
            model=MODEL_NAME,
            messages=messages,
            n = n_generations,
            )
        questions.extend([response.choices[i].message.content for i in range(n_generations)])
    return questions

> A Note about the `system` role: For GPT4 based pipelines you probably want to move some part of the context prompt to the `system` context. As we are using `gpt3.5-turbo` here, you can put the instruction on the user prompt, you can read more about this on [OpenAI docs here](https://platform.openai.com/docs/guides/chat/instructing-chat-models)

In [None]:
# function to parse model generation and extract CONTEXT, QUESTION and ANSWER
def parse_generation(generation):
    lines = generation.split("\n")
    context = []
    question = []
    answer = []
    flag = None
    
    for line in lines:
        if "CONTEXT:" in line:
            flag = "context"
            line = line.replace("CONTEXT:", "").strip()
        elif "QUESTION:" in line:
            flag = "question"
            line = line.replace("QUESTION:", "").strip()
        elif "ANSWER:" in line:
            flag = "answer"
            line = line.replace("ANSWER:", "").strip()

        if flag == "context":
            context.append(line)
        elif flag == "question":
            question.append(line)
        elif flag == "answer":
            answer.append(line)

    context = "\n".join(context)
    question = "\n".join(question)
    answer = "\n".join(answer)
    return context, question, answer

In [None]:
generations = generate_questions([documents[0]], n_questions=3, n_generations=5)
parse_generation(generations[0])

In [None]:
parsed_generations = []
generations = generate_questions(documents, n_questions=3, n_generations=5)
for generation in generations:
    context, question, answer = parse_generation(generation)
    parsed_generations.append({"context": context, "question": question, "answer": answer})

# let's convert parsed_generations to a pandas dataframe and save it locally
df = pd.DataFrame(parsed_generations)
df.to_csv('generated_examples.csv', index=False)

# log df as a table to W&B for interactive exploration
wandb.log({"generated_examples": wandb.Table(dataframe=df)})

# log csv file as an artifact to W&B for later use
artifact = wandb.Artifact("generated_examples", type="dataset")
artifact.add_file("generated_examples.csv")
wandb.log_artifact(artifact)

In [None]:
wandb.finish()