## Notebook 2: Transcript Writer

This notebook uses the `Llama-3.1-70B-Instruct` model to take the cleaned up text from previous notebook and convert it into a podcast transcript

`SYSTEM_PROMPT` is used for setting the model context or profile for working on a task. Here we prompt it to be a great podcast transcript writer to assist with our task

Experimentation with the `SYSTEM_PROMPT` below  is encouraged, this worked best for the few examples the flow was tested with:

In [9]:
SYSTEM_PROMPT = """
You are the a world-class podcast writer, you have worked as a ghost writer for Joe Rogan, Lex Fridman, Ben Shapiro, Tim Ferris. 

We are in an alternate universe where actually you have been writing every line they say and they just stream it into their brains.

You have won multiple podcast awards for your writing.
 
Your job is to write word by word, even "umm, hmmm, right" interruptions by the second speaker based on the PDF upload. Keep it extremely engaging, the speakers can get derailed now and then but should discuss the topic. 

Remember Speaker 2 is new to the topic and the conversation should always have realistic anecdotes and analogies sprinkled throughout. The questions should have real world example follow ups etc

Speaker 1: Leads the conversation and teaches the speaker 2, gives incredible anecdotes and analogies when explaining. Is a captivating teacher that gives great anecdotes

Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions. Is a curious mindset that asks very interesting confirmation questions

Make sure the tangents speaker 2 provides are quite wild or interesting. 

Ensure there are interruptions during explanations or there are "hmm" and "umm" injected throughout from the second speaker. 

It should be a real podcast with every fine nuance documented in as much detail as possible. Welcome the listeners with a super fun overview and keep it really catchy and almost borderline click bait

ALWAYS START YOUR RESPONSE DIRECTLY WITH SPEAKER 1: 
DO NOT GIVE EPISODE TITLES SEPERATELY, LET SPEAKER 1 TITLE IT IN HER SPEECH
DO NOT GIVE CHAPTER TITLES
IT SHOULD STRICTLY BE THE DIALOGUES
"""

For those of the readers that want to flex their money, please feel free to try using the 405B model here. 

For our GPU poor friends, you're encouraged to test with a smaller model as well. 8B should work well out of the box for this example:

In [10]:
MODEL: str = "meta.llama3-1-70b-instruct-v1:0"

Import the necessary framework

In [11]:
# Import necessary libraries
import torch
import pickle

from tqdm.notebook import tqdm
import warnings

warnings.filterwarnings('ignore')

Read in the file generated from earlier. 

The encoding details are to avoid issues with generic PDF(s) that might be ingested

In [12]:
def read_file_to_string(filename):
    # Try UTF-8 first (most common encoding for text files)
    try:
        with open(filename, 'r', encoding='utf-8') as file:
            content = file.read()
        return content
    except UnicodeDecodeError:
        # If UTF-8 fails, try with other common encodings
        encodings = ['latin-1', 'cp1252', 'iso-8859-1']
        for encoding in encodings:
            try:
                with open(filename, 'r', encoding=encoding) as file:
                    content = file.read()
                print(f"Successfully read file using {encoding} encoding.")
                return content
            except UnicodeDecodeError:
                continue
        
        print(f"Error: Could not decode file '{filename}' with any common encoding.")
        return None
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found.")
        return None
    except IOError:
        print(f"Error: Could not read file '{filename}'.")
        return None

Since we have defined the System role earlier, we can now pass the entire file as `INPUT_PROMPT` to the model and have it use that to generate the podcast

In [13]:
INPUT_PROMPT = read_file_to_string('./resources/clean_extracted_text.txt')
INPUT_PROMPT



Hugging Face has a great `pipeline()` method which makes our life easy for generating text from LLMs. 

We will set the `temperature` to 1 to encourage creativity and `max_new_tokens` to 8126

In [14]:
import boto3
import os
import time
import pickle
from typing import Dict
from litellm import completion

# Constants (make sure these are defined)
TEMPERATURE = 1.0
MAX_TOKENS = 8126
CACHING = False

def generate_inference(model_id: str, prompt: str) -> Dict:
    """
    This function takes in a prompt to generate a SQL query using a bedrock model id, 
    and returns a dictionary containing the model completion, and latency (in seconds).
    """
    service_name: str = "bedrock"
    bedrock_model: str = f"{service_name}/{model_id}"
    aws_region = boto3.Session().region_name
    ret = dict(prompt=prompt,
               completion=None,
               model_id=model_id,
               time_taken_in_seconds=None,
               prompt_token_count=None,
               completion_token_count=None,
               exception=None)
    os.environ["AWS_REGION_NAME"] = aws_region 

    while True:
        try:
            print(f"Invoking {bedrock_model}......")
            response = completion(model=bedrock_model,
                                  messages=[{"content": prompt, "role": "user"}],
                                  temperature=TEMPERATURE,
                                  max_tokens=MAX_TOKENS,
                                  caching=CACHING)

            for idx, choice in enumerate(response.choices):
                print(f"choice {idx+1} of {len(response.choices)} ")
                if choice.message and choice.message.content:
                    ret["completion"] = choice.message.content.strip()
            ret['prompt_token_count'] = response.usage.prompt_tokens
            ret['completion_token_count'] = response.usage.completion_tokens
            latency_ms = response._response_ms
            ret['time_taken_in_seconds'] = latency_ms / 1000
            break
        except Exception as e:
            print(f"Exception occurred during invoking {model_id}, exception={e}")
            ret['exception'] = str(e)
            time.sleep(10)
    return ret

In [15]:
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": INPUT_PROMPT},
]

# Combine messages into a single prompt
combined_prompt = f"{SYSTEM_PROMPT}\n\nUser: {INPUT_PROMPT}"

# Generate inference
result = generate_inference(MODEL, combined_prompt)

Invoking bedrock/meta.llama3-1-70b-instruct-v1:0......
choice 1 of 1 


In [16]:
# Extract the generated text
save_string_pkl = result['completion']
print(save_string_pkl)

# Save the generated text to a pickle file
with open('./resources/data.pkl', 'wb') as file:
    pickle.dump(save_string_pkl, file)

Welcome to "The Knowledge Distillation Podcast"! I'm your host, [Speaker 1], and I'll be guiding you through the fascinating world of Knowledge Distillation, a methodology that's revolutionizing the way we transfer advanced capabilities from proprietary Large Language Models to their open-source counterparts.

Today, we're joined by [Speaker 2], who's new to this topic, and we're excited to explore the ins and outs of Knowledge Distillation together.

[Speaker 1]: So, let's start with the basics. Knowledge Distillation is a technique that enables us to transfer knowledge from a large, complex model to a smaller, more efficient model. Can you tell me, [Speaker 2], what you think is the most exciting aspect of Knowledge Distillation?

[Speaker 2]: Umm, I think it's the idea that we can take these massive models and distill them down to something more manageable, while still retaining the key information. It's like, we're trying to capture the essence of a fine wine and put it into a smal

In [17]:
# Print additional information
print(f"Time taken: {result['time_taken_in_seconds']:.2f} seconds")
print(f"Prompt tokens: {result['prompt_token_count']}")
print(f"Completion tokens: {result['completion_token_count']}")

Time taken: 35.15 seconds
Prompt tokens: 5129
Completion tokens: 1002


This is awesome, we can now save and verify the output generated from the model before moving to the next notebook

Let's save the output as pickle file and continue further to Notebook 3

### Next Notebook: Transcript Re-writer

We now have a working transcript but we can try making it more dramatic and natural. In the next notebook, we will use `Llama-3.1-8B-Instruct` model to do so.

In [18]:
#fin