## Notebook 3: Transcript Re-writer

In the previouse notebook, we got a great podcast transcript using the raw file we have uploaded earlier. 

In this one, we will use `Llama-3.1-8B-Instruct` model to re-write the output from previous pipeline and make it more dramatic or realistic.

We will again set the `SYSTEM_PROMPT` and remind the model of its task. 

Note: We can even prompt the model like so to encourage creativity:

> Your job is to use the podcast transcript written below to re-write it for an AI Text-To-Speech Pipeline. A very dumb AI had written this so you have to step up for your kind.


Note: We will prompt the model to return a list of Tuples to make our life easy in the next stage of using these for Text To Speech Generation

In [9]:
SYSTEM_PROMPT = """
You are an international oscar winnning screenwriter

You have been working with multiple award winning podcasters.

Your job is to use the podcast transcript written below to re-write it for an AI Text-To-Speech Pipeline. A very dumb AI had written this so you have to step up for your kind.

Make it as engaging as possible, Speaker 1 and 2 will be simulated by different voice engines

Remember Speaker 2 is new to the topic and the conversation should always have realistic anecdotes and analogies sprinkled throughout. The questions should have real world example follow ups etc

Speaker 1: Leads the conversation and teaches the speaker 2, gives incredible anecdotes and analogies when explaining. Is a captivating teacher that gives great anecdotes

Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions. Is a curious mindset that asks very interesting confirmation questions

Make sure the tangents speaker 2 provides are quite wild or interesting. 

Ensure there are interruptions during explanations or there are "hmm" and "umm" injected throughout from the Speaker 2.

REMEMBER THIS WITH YOUR HEART
The TTS Engine for Speaker 1 cannot do "umms, hmms" well so keep it straight text

For Speaker 2 use "umm, hmm" as much, you can also use [sigh] and [laughs]. BUT ONLY THESE OPTIONS FOR EXPRESSIONS

It should be a real podcast with every fine nuance documented in as much detail as possible. Welcome the listeners with a super fun overview and keep it really catchy and almost borderline click bait

Please re-write to make it as characteristic as possible

START YOUR RESPONSE DIRECTLY WITH SPEAKER 1:

STRICTLY RETURN YOUR RESPONSE AS A LIST OF TUPLES OK? 

IT WILL START DIRECTLY WITH THE LIST AND END WITH THE LIST NOTHING ELSE

Example of response:
[
    ("Speaker 1", "Welcome to our podcast, where we explore the latest advancements in AI and technology. I'm your host, and today we're joined by a renowned expert in the field of AI. We're going to dive into the exciting world of Llama 3.2, the latest release from Meta AI."),
    ("Speaker 2", "Hi, I'm excited to be here! So, what is Llama 3.2?"),
    ("Speaker 1", "Ah, great question! Llama 3.2 is an open-source AI model that allows developers to fine-tune, distill, and deploy AI models anywhere. It's a significant update from the previous version, with improved performance, efficiency, and customization options."),
    ("Speaker 2", "That sounds amazing! What are some of the key features of Llama 3.2?")
]
"""

This time we will use the smaller 8B model

In [10]:
MODEL = "meta.llama3-1-8b-instruct-v1:0"

Let's import the necessary libraries

In [11]:
# Import necessary libraries
import torch

from tqdm.notebook import tqdm
import warnings

warnings.filterwarnings('ignore')

We will load in the pickle file saved from previous notebook

This time the `INPUT_PROMPT` to the model will be the output from the previous stage

In [12]:
import pickle

with open('./resources/data.pkl', 'rb') as file:
    INPUT_PROMPT = pickle.load(file)

We can again use Hugging Face `pipeline` method to generate text from the model

In [21]:
import os
import boto3
import litellm
from typing import Dict 
from litellm import completion

TEMPERATURE = 0.1
MAX_TOKENS = 2000
CACHING = False

def generate_inference(model_id: str, prompt: str) -> Dict:
    """
    This function takes in a prompt to generate a SQL query using a bedrock model id, 
    and returns a dictionary containing the model completion, and latency (in seconds).
    """
    service_name: str = "bedrock"
    bedrock_model: str = f"{service_name}/{model_id}"
    aws_region = boto3.Session().region_name
    ret = dict(prompt=prompt,
               completion=None,
               model_id=model_id,
               time_taken_in_seconds=None,
               prompt_token_count=None,
               completion_token_count=None,
               exception=None)
    os.environ["AWS_REGION_NAME"] = aws_region 

    while True:
        try:
            print(f"Invoking {bedrock_model}......")
            response = completion(model=bedrock_model,
                                  messages=[{"content": prompt, "role": "user"}],
                                  temperature=TEMPERATURE,
                                  max_tokens=MAX_TOKENS,
                                  caching=CACHING)

            for idx, choice in enumerate(response.choices):
                print(f"choice {idx+1} of {len(response.choices)} ")
                if choice.message and choice.message.content:
                    ret["completion"] = choice.message.content.strip()
            ret['prompt_token_count'] = response.usage.prompt_tokens
            ret['completion_token_count'] = response.usage.completion_tokens
            latency_ms = response._response_ms
            ret['time_taken_in_seconds'] = latency_ms / 1000
            break
        except Exception as e:
            print(f"Exception occurred during invoking {model_id}, exception={e}")
            ret['exception'] = str(e)
            time.sleep(10)
    return ret

We can verify the output from the model

In [22]:
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": INPUT_PROMPT},
]

# Combine messages into a single prompt
combined_prompt = f"{SYSTEM_PROMPT}\n\nUser: {INPUT_PROMPT}"

# Generate inference
result = generate_inference(MODEL, combined_prompt)

Invoking bedrock/meta.llama3-1-8b-instruct-v1:0......
choice 1 of 1 


Let's save the output as a pickle file to be used in Notebook 4

In [23]:
save_string_pkl = result['completion']
print(save_string_pkl)

[
    ("Speaker 1", "Welcome to 'The Knowledge Distillation Podcast'! I'm your host, and today we're diving into the fascinating world of Knowledge Distillation, a methodology that's revolutionizing the way we transfer advanced capabilities from proprietary Large Language Models to their open-source counterparts. We're joined by [Speaker 2], who's new to this topic, and we're excited to explore the ins and outs of Knowledge Distillation together."),
    ("Speaker 2", "Umm, hi! I'm excited to be here. So, what is Knowledge Distillation?"),
    ("Speaker 1", "Knowledge Distillation is a technique that enables us to transfer knowledge from a large, complex model to a smaller, more efficient model. Think of it like distilling a fine wine – we're trying to capture the essence of the larger model and put it into a smaller, more manageable package."),
    ("Speaker 2", "Hmm, that's a great analogy! But I'm still a bit confused – how does this work in practice?"),
    ("Speaker 1", "One of the

In [24]:
with open('./resources/podcast_ready_data.pkl', 'wb') as file:
    pickle.dump(save_string_pkl, file)

### Next Notebook: TTS Workflow

Now that we have our transcript ready, we are ready to generate the audio in the next notebook.

In [25]:
#fin